summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-11Merge tag 'for-6.9/block-20240310' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - MD pull requests via Song: - Cleanup redundant checks (Yu Kuai) - Remove deprecated headers (Marc Zyngier, Song Liu) - Concurrency fixes (Li Lingfeng) - Memory leak fix (Li Nan) - Refactor raid1 read_balance (Yu Kuai, Paul Luse) - Clean up and fix for md_ioctl (Li Nan) - Other small fixes (Gui-Dong Han, Heming Zhao) - MD atomic limits (Christoph) - NVMe pull request via Keith: - RDMA target enhancements (Max) - Fabrics fixes (Max, Guixin, Hannes) - Atomic queue_limits usage (Christoph) - Const use for class_register (Ricardo) - Identification error handling fixes (Shin'ichiro, Keith) - Improvement and cleanup for cached request handling (Christoph) - Moving towards atomic queue limits. Core changes and driver bits so far (Christoph) - Fix UAF issues in aoeblk (Chun-Yi) - Zoned fix and cleanups (Damien) - s390 dasd cleanups and fixes (Jan, Miroslav) - Block issue timestamp caching (me) - noio scope guarding for zoned IO (Johannes) - block/nvme PI improvements (Kanchan) - Ability to terminate long running discard loop (Keith) - bdev revalidation fix (Li) - Get rid of old nr_queues hack for kdump kernels (Ming) - Support for async deletion of ublk (Ming) - Improve IRQ bio recycling (Pavel) - Factor in CPU capacity for remote vs local completion (Qais) - Add shared_tags configfs entry for null_blk (Shin'ichiro - Fix for a regression in page refcounts introduced by the folio unification (Tony) - Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid, Ricardo, Roman, Tang, Uwe) * tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits) block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC block/swim: Convert to platform remove callback returning void cdrom: gdrom: Convert to platform remove callback returning void block: remove disk_stack_limits md: remove mddev->queue md: don't initialize queue limits md/raid10: use the atomic queue limit update APIs md/raid5: use the atomic queue limit update APIs md/raid1: use the atomic queue limit update APIs md/raid0: use the atomic queue limit update APIs md: add queue limit helpers md: add a mddev_is_dm helper md: add a mddev_add_trace_msg helper md: add a mddev_trace_remap helper bcache: move calculation of stripe_size and io_opt into bcache_device_init virtio_blk: Do not use disk_set_max_open/active_zones() aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts block: move capacity validation to blkpg_do_ioctl() block: prevent division by zero in blk_rq_stat_sum() drbd: atomically update queue limits in drbd_reconsider_queue_parameters ...
2024-03-11Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512"Catalin Marinas
This reverts commit 0499a78369adacec1af29340b71ff8dd375b4697. Enabling CPUMASK_OFFSTACK on arm64 triggers a warning in the dev_pm_opp_set_config() function followed by a failure to set the regulators and cpufreq-dt probing error. There is no apparent reason why this happens, so revert this commit until further investigation. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/c1f2902d-cefc-4122-9b86-d1d32911f590@samsung.com
2024-03-11Merge tag 'for-6.9/io_uring-20240310' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring updates from Jens Axboe: - Make running of task_work internal loops more fair, and unify how the different methods deal with them (me) - Support for per-ring NAPI. The two minor networking patches are in a shared branch with netdev (Stefan) - Add support for truncate (Tony) - Export SQPOLL utilization stats (Xiaobing) - Multishot fixes (Pavel) - Fix for a race in manipulating the request flags via poll (Pavel) - Cleanup the multishot checking by making it generic, moving it out of opcode handlers (Pavel) - Various tweaks and cleanups (me, Kunwu, Alexander) * tag 'for-6.9/io_uring-20240310' of git://git.kernel.dk/linux: (53 commits) io_uring: Fix sqpoll utilization check racing with dying sqpoll io_uring/net: dedup io_recv_finish req completion io_uring: refactor DEFER_TASKRUN multishot checks io_uring: fix mshot io-wq checks io_uring/net: add io_req_msg_cleanup() helper io_uring/net: simplify msghd->msg_inq checking io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io io_uring/net: correctly handle multishot recvmsg retry setup io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler io_uring: fix io_queue_proc modifying req->flags io_uring: fix mshot read defer taskrun cqe posting io_uring/net: fix overflow check in io_recvmsg_mshot_prep() io_uring/net: correct the type of variable io_uring/sqpoll: statistics of the true utilization of sq threads io_uring/net: move recv/recvmsg flags out of retry loop io_uring/kbuf: flag request if buffer pool is empty after buffer pick io_uring/net: improve the usercopy for sendmsg/recvmsg io_uring/net: move receive multishot out of the generic msghdr path io_uring/net: unify how recvmsg and sendmsg copy in the msghdr ...
2024-03-11vfio/pds: Refactor/simplify reset logicBrett Creeley
The current logic for handling resets is more complicated than it needs to be. The deferred_reset flag is used to indicate a reset is needed and the deferred_reset_state is the requested, post-reset, state. Also, the deferred_reset logic was added to vfio migration drivers to prevent a circular locking dependency with respect to mm_lock and state mutex. This is mainly because of the copy_to/from_user() functions(which takes mm_lock) invoked under state mutex. Remove all of the deferred reset logic and just pass the requested next state to pds_vfio_reset() so it can be used for VMM and DSC initiated resets. This removes the need for pds_vfio_state_mutex_lock(), so remove that and replace its use with a simple mutex_unlock(). Also, remove the reset_mutex as it's no longer needed since the state_mutex can be the driver's primary protector. Suggested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Link: https://lore.kernel.org/r/20240308182149.22036-3-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-03-11vfio/pds: Make sure migration file isn't accessed after resetBrett Creeley
It's possible the migration file is accessed after reset when it has been cleaned up, especially when it's initiated by the device. This is because the driver doesn't rip out the filep when cleaning up it only frees the related page structures and sets its local struct pds_vfio_lm_file pointer to NULL. This can cause a NULL pointer dereference, which is shown in the example below during a restore after a device initiated reset: BUG: kernel NULL pointer dereference, address: 000000000000000c PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:pds_vfio_get_file_page+0x5d/0xf0 [pds_vfio_pci] [...] Call Trace: <TASK> pds_vfio_restore_write+0xf6/0x160 [pds_vfio_pci] vfs_write+0xc9/0x3f0 ? __fget_light+0xc9/0x110 ksys_write+0xb5/0xf0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] Add a disabled flag to the driver's struct pds_vfio_lm_file that gets set during cleanup. Then make sure to check the flag when the migration file is accessed via its file_operations. By default this flag will be false as the memory for struct pds_vfio_lm_file is kzalloc'd, which means the struct pds_vfio_lm_file is enabled and accessible. Also, since the file_operations and driver's migration file cleanup happen under the protection of the same pds_vfio_lm_file.lock, using this flag is thread safe. Fixes: 8512ed256334 ("vfio/pds: Always clear the save/restore FDs on reset") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Link: https://lore.kernel.org/r/20240308182149.22036-2-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-03-11vfio/platform: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/79d3df42fe5b359a05b8061631e72e5ed249b234.1709886922.git.u.kleine-koenig@pengutronix.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-03-11vfio/mlx5: Enforce PRE_COPY supportYishai Hadas
Enable live migration only once the firmware supports PRE_COPY. PRE_COPY has been supported by the firmware for a long time already [1] and is required to achieve a low downtime upon live migration. This lets us clean up some old code that is not applicable those days while PRE_COPY is fully supported by the firmware. [1] The minimum firmware version that supports PRE_COPY is 28.36.1010, it was released in January 2023. No firmware without PRE_COPY support ever available to users. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20240306105624.114830-1-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-03-11Merge tag 'vfs-6.9.uuid' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs uuid updates from Christian Brauner: "This adds two new ioctl()s for getting the filesystem uuid and retrieving the sysfs path based on the path of a mounted filesystem. Getting the filesystem uuid has been implemented in filesystem specific code for a while it's now lifted as a generic ioctl" * tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: add support for FS_IOC_GETFSSYSFSPATH fs: add FS_IOC_GETFSSYSFSPATH fat: Hook up sb->s_uuid fs: FS_IOC_GETUUID ovl: convert to super_set_uuid() fs: super_set_uuid()
2024-03-11Merge tag 'vfs-6.9.super' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull block handle updates from Christian Brauner: "Last cycle we changed opening of block devices, and opening a block device would return a bdev_handle. This allowed us to implement support for restricting and forbidding writes to mounted block devices. It was accompanied by converting and adding helpers to operate on bdev_handles instead of plain block devices. That was already a good step forward but ultimately it isn't necessary to have special purpose helpers for opening block devices internally that return a bdev_handle. Fundamentally, opening a block device internally should just be equivalent to opening files. So now all internal opens of block devices return files just as a userspace open would. Instead of introducing a separate indirection into bdev_open_by_*() via struct bdev_handle bdev_file_open_by_*() is made to just return a struct file. Opening and closing a block device just becomes equivalent to opening and closing a file. This all works well because internally we already have a pseudo fs for block devices and so opening block devices is simple. There's a few places where we needed to be careful such as during boot when the kernel is supposed to mount the rootfs directly without init doing it. Here we need to take care to ensure that we flush out any asynchronous file close. That's what we already do for opening, unpacking, and closing the initramfs. So nothing new here. The equivalence of opening and closing block devices to regular files is a win in and of itself. But it also has various other advantages. We can remove struct bdev_handle completely. Various low-level helpers are now private to the block layer. Other helpers were simply removable completely. A follow-up series that is already reviewed build on this and makes it possible to remove bdev->bd_inode and allows various clean ups of the buffer head code as well. All places where we stashed a bdev_handle now just stash a file and use simple accessors to get to the actual block device which was already the case for bdev_handle" * tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) block: remove bdev_handle completely block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access bdev: remove bdev pointer from struct bdev_handle bdev: make struct bdev_handle private to the block layer bdev: make bdev_{release, open_by_dev}() private to block layer bdev: remove bdev_open_by_path() reiserfs: port block device access to file ocfs2: port block device access to file nfs: port block device access to files jfs: port block device access to file f2fs: port block device access to files ext4: port block device access to file erofs: port device access to file btrfs: port device access to file bcachefs: port block device access to file target: port block device access to file s390: port block device access to file nvme: port block device access to file block2mtd: port device access to files bcache: port block device access to files ...
2024-03-11Merge tag 'vfs-6.9.file' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull file locking updates from Christian Brauner: "A few years ago struct file_lock_context was added to allow for separate lists to track different types of file locks instead of using a singly-linked list for all of them. Now leases no longer need to be tracked using struct file_lock. However, a lot of the infrastructure is identical for leases and locks so separating them isn't trivial. This splits a group of fields used by both file locks and leases into a new struct file_lock_core. The new core struct is embedded in struct file_lock. Coccinelle was used to convert a lot of the callers to deal with the move, with the remaining 25% or so converted by hand. Afterwards several internal functions in fs/locks.c are made to work with struct file_lock_core. Ultimately this allows to split struct file_lock into struct file_lock and struct file_lease. The file lease APIs are then converted to take struct file_lease" * tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (51 commits) filelock: fix deadlock detection in POSIX locking filelock: always define for_each_file_lock() smb: remove redundant check filelock: don't do security checks on nfsd setlease calls filelock: split leases out of struct file_lock filelock: remove temporary compatibility macros smb/server: adapt to breakup of struct file_lock smb/client: adapt to breakup of struct file_lock ocfs2: adapt to breakup of struct file_lock nfsd: adapt to breakup of struct file_lock nfs: adapt to breakup of struct file_lock lockd: adapt to breakup of struct file_lock fuse: adapt to breakup of struct file_lock gfs2: adapt to breakup of struct file_lock dlm: adapt to breakup of struct file_lock ceph: adapt to breakup of struct file_lock afs: adapt to breakup of struct file_lock 9p: adapt to breakup of struct file_lock filelock: convert seqfile handling to use file_lock_core filelock: convert locks_translate_pid to take file_lock_core ...
2024-03-11Merge tag 'vfs-6.9.pidfd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pdfd updates from Christian Brauner: - Until now pidfds could only be created for thread-group leaders but not for threads. There was no technical reason for this. We simply had no users that needed support for this. Now we do have users that need support for this. This introduces a new PIDFD_THREAD flag for pidfd_open(). If that flag is set pidfd_open() creates a pidfd that refers to a specific thread. In addition, we now allow clone() and clone3() to be called with CLONE_PIDFD | CLONE_THREAD which wasn't possible before. A pidfd that refers to an individual thread differs from a pidfd that refers to a thread-group leader: (1) Pidfds are pollable. A task may poll a pidfd and get notified when the task has exited. For thread-group leader pidfds the polling task is woken if the thread-group is empty. In other words, if the thread-group leader task exits when there are still threads alive in its thread-group the polling task will not be woken when the thread-group leader exits but rather when the last thread in the thread-group exits. For thread-specific pidfds the polling task is woken if the thread exits. (2) Passing a thread-group leader pidfd to pidfd_send_signal() will generate thread-group directed signals like kill(2) does. Passing a thread-specific pidfd to pidfd_send_signal() will generate thread-specific signals like tgkill(2) does. The default scope of the signal is thus determined by the type of the pidfd. Since use-cases exist where the default scope of the provided pidfd needs to be overriden the following flags are added to pidfd_send_signal(): - PIDFD_SIGNAL_THREAD Send a thread-specific signal. - PIDFD_SIGNAL_THREAD_GROUP Send a thread-group directed signal. - PIDFD_SIGNAL_PROCESS_GROUP Send a process-group directed signal. The scope change will only work if the struct pid is actually used for this scope. For example, in order to send a thread-group directed signal the provided pidfd must be used as a thread-group leader and similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be used as a process group leader. - Move pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds to become useful for the first time. They can now be compared by inode number which are unique for the system lifetime. Instead of stashing struct pid in file->private_data we can now stash it in inode->i_private. This makes it possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. Another side-effect is that file->private_data is now freed up for per-file options for pidfds. Now, each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode and dentry for each struct pid and we reuse that inode and dentry for all pidfds that refer to the same struct pid. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs. The dentry and inode allocation mechanism is moved into generic infrastructure that is now shared between nsfs and pidfs. The path_from_stashed() helper must be provided with a stashing location, an inode number, a mount, and the private data that is supposed to be used and it will provide a path that can be passed to dentry_open(). The helper will try retrieve an existing dentry from the provided stashing location. If a valid dentry is found it is reused. If not a new one is allocated and we try to stash it in the provided location. If this fails we retry until we either find an existing dentry or the newly allocated dentry could be stashed. Subsequent openers of the same namespace or task are then able to reuse it. - Currently it is only possible to get notified when a task has exited, i.e., become a zombie and userspace gets notified with EPOLLIN. We now also support waiting until the task has been reaped, notifying userspace with EPOLLHUP. - Ensure that ESRCH is reported for getfd if a task is exiting instead of the confusing EBADF. - Various smaller cleanups to pidfd functions. * tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits) libfs: improve path_from_stashed() libfs: add stashed_dentry_prune() libfs: improve path_from_stashed() helper pidfs: convert to path_from_stashed() helper nsfs: convert to path_from_stashed() helper libfs: add path_from_stashed() pidfd: add pidfs pidfd: move struct pidfd_fops pidfd: allow to override signal scope in pidfd_send_signal() pidfd: change pidfd_send_signal() to respect PIDFD_THREAD signal: fill in si_code in prepare_kill_siginfo() selftests: add ESRCH tests for pidfd_getfd() pidfd: getfd should always report ESRCH if a task is exiting pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together pidfd: exit: kill the no longer used thread_group_exited() pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN)) pid: kill the obsolete PIDTYPE_PID code in transfer_pid() pidfd: kill the no longer needed do_notify_pidfd() in de_thread() pidfd_poll: report POLLHUP when pid_task() == NULL pidfd: implement PIDFD_THREAD flag for pidfd_open() ...
2024-03-11Merge tag 'vfs-6.9.iomap' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull iomap updates from Christian Brauner: - Restore read-write hints in struct bio through the bi_write_hint member for the sake of UFS devices in mobile applications. This can result in up to 40% lower write amplification in UFS devices. The patch series that builds on this will be coming in via the SCSI maintainers (Bart) - Overhaul the iomap writeback code. Afterwards ->map_blocks() is able to map multiple blocks at once as long as they're in the same folio. This reduces CPU usage for buffered write workloads on e.g., xfs on systems with lots of cores (Christoph) - Record processed bytes in iomap_iter() trace event (Kassey) - Extend iomap_writepage_map() trace event after Christoph's ->map_block() changes to map mutliple blocks at once (Zhang) * tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) iomap: Add processed for iomap_iter iomap: add pos and dirty_len into trace_iomap_writepage_map block, fs: Restore the per-bio/request data lifetime fields fs: Propagate write hints to the struct block_device inode fs: Move enum rw_hint into a new header file fs: Split fcntl_rw_hint() fs: Verify write lifetime constants at compile time fs: Fix rw_hint validation iomap: pass the length of the dirty region to ->map_blocks iomap: map multiple blocks at a time iomap: submit ioends immediately iomap: factor out a iomap_writepage_map_block helper iomap: only call mapping_set_error once for each failed bio iomap: don't chain bios iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend iomap: clean up the iomap_alloc_ioend calling convention iomap: move all remaining per-folio logic into iomap_writepage_map iomap: factor out a iomap_writepage_handle_eof helper iomap: move the PF_MEMALLOC check to iomap_writepages iomap: move the io_folios field out of struct iomap_ioend ...
2024-03-11Merge tag 'vfs-6.9.ntfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull ntfs update from Christian Brauner: "This removes the old ntfs driver. The new ntfs3 driver is a full replacement that was merged over two years ago. We've went through various userspace and either they use ntfs3 or they use the fuse version of ntfs and thus build neither ntfs nor ntfs3. I think that's a clear sign that we should risk removing the legacy ntfs driver. Quoting from Arch Linux and Debian: - Debian does neither build the legacy ntfs nor the new ntfs3: "Not currently built with Debian's kernel packages, 'ntfs' has been symlinked to 'ntfs-3g' as it relates to fstab and mount commands. Debian kernels are built without support of the ntfs3 driver developed by Paragon Software." (cf. [2]) - Archlinux provides ntfs3 as their default since 5.15: "All officially supported kernels with versions 5.15 or newer are built with CONFIG_NTFS3_FS=m and thus support it. Before 5.15, NTFS read and write support is provided by the NTFS-3G FUSE file system." (cf. [1]). It's unmaintained apart from various odd fixes as well. Worst case we have to reintroduce it if someone really has a valid dependency on it. But it's worth trying to see whether we can remove it" Link: https://wiki.archlinux.org/title/NTFS [1] Link: https://wiki.debian.org/NTFS [2] * tag 'vfs-6.9.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: remove NTFS classic from docum. index fs: Remove NTFS classic
2024-03-11Merge tag 'vfs-6.9.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ...
2024-03-11Merge tag 'linux_kselftest-kunit-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: - fix to make kunit_bus_type const - kunit tool change to Print UML command - DRM device creation helpers are now using the new kunit device creation helpers. This change resulted in DRM helpers switching from using a platform_device, to a dedicated bus and device type used by kunit. kunit devices don't set DMA mask and this caused regression on some drm tests as they can't allocate DMA buffers. Fix this problem by setting DMA masks on the kunit device during initialization. - KUnit has several macros which accept a log message, which can contain printf format specifiers. Some of these (the explicit log macros) already use the __printf() gcc attribute to ensure the format specifiers are valid, but those which could fail the test, and hence used __kunit_do_failed_assertion() behind the scenes, did not. These include: KUNIT_EXPECT_*_MSG(), KUNIT_ASSERT_*_MSG(), and KUNIT_FAIL() A nine-patch series adds the __printf() attribute, and fixes all of the issues uncovered. * tag 'linux_kselftest-kunit-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Annotate _MSG assertion variants with gnu printf specifiers drm: tests: Fix invalid printf format specifiers in KUnit tests drm/xe/tests: Fix printf format specifiers in xe_migrate test net: test: Fix printf format specifier in skb_segment kunit test rtc: test: Fix invalid format specifier. time: test: Fix incorrect format specifier lib: memcpy_kunit: Fix an invalid format specifier in an assertion msg lib/cmdline: Fix an invalid format specifier in an assertion msg kunit: test: Log the correct filter string in executor_test kunit: Setup DMA masks on the kunit device kunit: make kunit_bus_type const kunit: Mark filter* params as rw kunit: tool: Print UML command
2024-03-11Merge tag 'linux_kselftest-next-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest update from Shuah Khan: - livepatch restructuring to move the module out of lib to be built as a out-of-tree modules during kselftest build. This makes it easier change, debug and rebuild the tests by running make on the selftests/livepatch directory, which is not currently possible since the modules on lib/livepatch are build and installed using the main makefile modules target. - livepatch restructuring fixes for problems found by kernel test robot. The change skips the test if kernel-devel isn't installed (default value of KDIR), or if KDIR variable passed doesn't exists. - resctrl test restructuring and new non-contiguous CBMs CAT test - new ktap_helpers to print diagnostic messages, pass/fail tests based on exit code, abort test, and finish the test. - a new test verify power supply properties. - a new ftrace to exercise function tracer across cpu hotplug. - timeout increase for mqueue test to allow the test to run on i3.metal AWS instances. - minor spelling corrections in several tests. - missing gitignore files and changes to existing gitignore files. * tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits) kselftest: Add basic test for probing the rust sample modules selftests: lib.mk: Do not process TEST_GEN_MODS_DIR selftests: livepatch: Avoid running the tests if kernel-devel is missing selftests: livepatch: Add initial .gitignore selftests/resctrl: Add non-contiguous CBMs CAT test selftests/resctrl: Add resource_info_file_exists() selftests/resctrl: Split validate_resctrl_feature_request() selftests/resctrl: Add a helper for the non-contiguous test selftests/resctrl: Add test groups and name L3 CAT test L3_CAT selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy" selftests/mqueue: Set timeout to 180 seconds selftests/ftrace: Add test to exercize function tracer across cpu hotplug selftest: ftrace: fix minor typo in log selftests: thermal: intel: workload_hint: add missing gitignore selftests: thermal: intel: power_floor: add missing gitignore selftests: uevent: add missing gitignore selftests: Add test to verify power supply properties selftests: ktap_helpers: Add a helper to finish the test selftests: ktap_helpers: Add a helper to abort the test selftests: ktap_helpers: Add helper to pass/fail test based on exit code ...
2024-03-11thermal/drivers/rcar_gen3: Add support for R-Car V4MGeert Uytterhoeven
Add support for the Thermal Sensor/Chip Internal Voltage Monitor/Core Voltage Monitor (THS/CIVM/CVM) on the Renesas R-Car V4M (R8A779H0) SoC. The conversion formulas for R-Car V4M are the same as for other R-Car Gen4 SoCs. Based on a patch in the BSP by Duy Nguyen. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/bd5b002a802c1e058e0048592f17862db1d04263.1709722342.git.geert+renesas@glider.be
2024-03-11dt-bindings: thermal: rcar-gen3-thermal: Add r8a779h0 supportDuy Nguyen
Document support for the Thermal Sensor/Chip Internal Voltage Monitor/Core Voltage Monitor (THS/CIVM/CVM) on the Renesas R-Car V4M (R8A779H0) SoC. Just like on other R-Car Gen4 SoCs, interrupts are not routed to the INTC-AP (GIC) but to the Error Control Module (ECM). Signed-off-by: Duy Nguyen <duy.nguyen.rh@renesas.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/b3d135f8b63b9fe2d0f0aa2e48c8a2211b2e947e.1709722342.git.geert+renesas@glider.be
2024-03-11thermal/of: Assume polling-delay(-passive) 0 when absentKonrad Dybcio
Currently, thermal zones associated with providers that have interrupts for signaling hot/critical trips are required to set a polling-delay of 0 to indicate no polling. This feels a bit backwards. Change the code such that "no polling delay" also means "no polling". Suggested-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240125-topic-thermal-v1-2-3c9d4dced138@linaro.org
2024-03-11dt-bindings: thermal-zones: Don't require polling-delay(-passive)Konrad Dybcio
Currently, thermal zones associated with providers that have interrupts for signaling hot/critical trips are required to set a polling-delay of 0 to indicate no polling. This feels a bit backwards. Assume 0 (no polling) when these properties are not defined. Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240125-topic-thermal-v1-1-3c9d4dced138@linaro.org
2024-03-11thermal/drivers/qoriq: Fix getting tmu rangePeng Fan
TMU Version 1 has 4 TTRCRs, while TMU Version >=2 has 16 TTRCRs. So limit the len to 4 will report "invalid range data" for i.MX93. This patch drop the local array with allocated ttrcr array and able to support larger tmu ranges. Fixes: f12d60c81fce ("thermal/drivers/qoriq: Support version 2.1") Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240226003657.3012880-1-peng.fan@oss.nxp.com
2024-03-11thermal/drivers/sun8i: Don't fail probe due to zone registration failureMark Brown
Currently the sun8i thermal driver will fail to probe if any of the thermal zones it is registering fails to register with the thermal core. Since we currently do not define any trip points for the GPU thermal zones on at least A64 or H5 this means that we have no thermal support on these platforms: [ 1.698703] thermal_sys: Failed to find 'trips' node [ 1.698707] thermal_sys: Failed to find trip points for thermal-sensor id=1 even though the main CPU thermal zone on both SoCs is fully configured. This does not seem ideal, while we may not be able to use all the zones it seems better to have those zones which are usable be operational. Instead just carry on registering zones if we get any non-deferral error, allowing use of those zones which are usable. This means that we also need to update the interrupt handler to not attempt to notify the core for events on zones which we have not registered, I didn't see an ability to mask individual interrupts and I would expect that interrupts would still be indicated in the ISR even if they were masked. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240123-thermal-sun8i-registration-v3-1-3e5771b1bbdd@kernel.org
2024-03-11thermal/drivers/sun8i: Add support for H616 THS controllerMartin Botka
Add support for the thermal sensor found in H616 SoCs, is the same as the H6 thermal sensor controller, but with four sensors. Also the registers readings are wrong, unless a bit in the first SYS_CFG register cleared, so set exercise the SRAM regmap to take care of that. Signed-off-by: Martin Botka <martin.botka@somainline.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240219153639.179814-7-andre.przywara@arm.com
2024-03-11thermal/drivers/sun8i: Add SRAM register access codeAndre Przywara
The Allwinner H616 SoC needs to clear a bit in one register in the SRAM controller, to report reasonable temperature values. On reset, bit 16 in register 0x3000000 is set, which leads to the driver reporting temperatures around 200C. Clearing this bit brings the values down to the expected range. The BSP code does a one-time write in U-Boot, with a comment just mentioning the effect on the THS, but offering no further explanation. To not rely on firmware to set things up for us, add code that queries the SRAM controller device via a DT phandle link, then clear just this single bit. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240219153639.179814-6-andre.przywara@arm.com
2024-03-11thermal/drivers/sun8i: Extend H6 calibration to support 4 sensorsMaksim Kiselev
The H616 SoC resembles the H6 thermal sensor controller, with a few changes like four sensors. Extend sun50i_h6_ths_calibrate() function to support calibration of these sensors. Co-developed-by: Martin Botka <martin.botka@somainline.org> Signed-off-by: Martin Botka <martin.botka@somainline.org> Signed-off-by: Maksim Kiselev <bigunclemax@gmail.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240219153639.179814-5-andre.przywara@arm.com
2024-03-11thermal/drivers/sun8i: Explain unknown H6 register valueAndre Przywara
So far we were ORing in some "unknown" value into the THS control register on the Allwinner H6. This part of the register is not explained in the H6 manual, but the H616 manual details those bits, and on closer inspection the THS IP blocks in both SoCs seem very close: - The BSP code for both SoCs writes the same values into THS_CTRL. - The reset values of at least the first three registers are the same. Replace the "unknown" value with its proper meaning: "acquire time", most probably the sample part of the sample & hold circuit of the ADC, according to its explanation in the H616 manual. No functional change, just a macro rename and adjustment. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240219153639.179814-4-andre.przywara@arm.com
2024-03-11dt-bindings: thermal: sun8i: Add H616 THS controllerMartin Botka
This controller is similar to the H6, but covers four sensors and uses slightly different calibration methods. Also the H616 requires to poke a bit in the SYS_CFG register range for correct operation, so add a phandle property to point there. Signed-off-by: Martin Botka <martin.botka@somainline.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240219153639.179814-3-andre.przywara@arm.com
2024-03-11soc: sunxi: sram: export register 0 for THS on H616Andre Przywara
The Allwinner H616 SoC contains a mysterious bit at register offset 0x0 in the SRAM control block. If bit 16 is set (the reset value), the temperature readings of the THS are way off, leading to reports about 200C, at normal ambient temperatures. Clearing this bits brings the reported values down to the expected values. The BSP code clears this bit in firmware (U-Boot), and has an explicit comment about this, but offers no real explanation. Experiments in U-Boot show that register 0x0 has no effect on the SRAM C visibility: all tested bit settings still allow full read and write access by the CPU to the whole of SRAM C. Only bit 24 of the register at offset 0x4 makes all of SRAM C inaccessible by the CPU. So modelling the THS switch functionality as an SRAM region would not reflect reality. Since we should not rely on firmware settings, allow other code (the THS driver) to access this register, by exporting it through the already existing regmap. This mimics what we already do for the LDO control and the EMAC register. To avoid concurrent accesses to the same register at the same time, by the SRAM switch code and the regmap code, use the same lock to protect the access. The regmap subsystem allows to use an existing lock, so we just need to hook in there. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240219153639.179814-2-andre.przywara@arm.com
2024-03-11dt-bindings: thermal: qoriq-thermal: Adjust fsl,tmu-range min/maxItemsFabio Estevam
The number of fsl,tmu-range entries vary among the several NXP SoCs. - lx2160a has two fsl,tmu-range entries (fsl,qoriq-tmu compatible) - imx8mq has four fsl,tmu-range entries. (fsl,imx8mq-tmu compatible) - imx93 has seven fsl,tmu-range entries. (fsl,qoriq-tmu compatible) Change minItems and maxItems accordingly. This fixes the following schema warning: imx93-11x11-evk.dtb: tmu@44482000: fsl,tmu-range: 'oneOf' conditional failed, one must be fixed: [2147483866, 2147483881, 2147483906, 2147483946, 2147484006, 2147484071, 2147484086] is too long Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240104124952.1975160-1-festevam@gmail.com
2024-03-11thermal: Drop spaces before TABsGeert Uytterhoeven
There is never a need to have a space before a TAB, but it hurts the eyes of vim users. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/480478a53fd42621e97b2db36e181903cc0f53e3.1708001426.git.geert+renesas@glider.be
2024-03-11thermal/drivers/mediatek: Fix control buffer enablement on MT7896Frank Wunderlich
Reading thermal sensor on mt7986 devices returns invalid temperature: bpi-r3 ~ # cat /sys/class/thermal/thermal_zone0/temp -274000 Fix this by adding missing members in mtk_thermal_data struct which were used in mtk_thermal_turn_on_buffer after commit 33140e668b10. Cc: stable@vger.kernel.org Fixes: 33140e668b10 ("thermal/drivers/mediatek: Control buffer enablement tweaks") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20230907112018.52811-1-linux@fw-web.de
2024-03-11thermal/drivers/mediatek/lvts_thermal: Fix a memory leak in an error ↵Christophe JAILLET
handling path If devm_krealloc() fails, then 'efuse' is leaking. So free it to avoid a leak. Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/481d345233862d58c3c305855a93d0dbc2bbae7e.1706431063.git.christophe.jaillet@wanadoo.fr
2024-03-11ARM: 9359/1: flush: check if the folio is reserved for no-mapping addressesYongqiang Liu
Since commit a4d5613c4dc6 ("arm: extend pfn_valid to take into account freed memory map alignment") changes the semantics of pfn_valid() to check presence of the memory map for a PFN. A valid page for an address which is reserved but not mapped by the kernel[1], the system crashed during some uio test with the following memory layout: node 0: [mem 0x00000000c0a00000-0x00000000cc8fffff] node 0: [mem 0x00000000d0000000-0x00000000da1fffff] the uio layout is:0xc0900000, 0x100000 the crash backtrace like: Unable to handle kernel paging request at virtual address bff00000 [...] CPU: 1 PID: 465 Comm: startapp.bin Tainted: G O 5.10.0 #1 Hardware name: Generic DT based system PC is at b15_flush_kern_dcache_area+0x24/0x3c LR is at __sync_icache_dcache+0x6c/0x98 [...] (b15_flush_kern_dcache_area) from (__sync_icache_dcache+0x6c/0x98) (__sync_icache_dcache) from (set_pte_at+0x28/0x54) (set_pte_at) from (remap_pfn_range+0x1a0/0x274) (remap_pfn_range) from (uio_mmap+0x184/0x1b8 [uio]) (uio_mmap [uio]) from (__mmap_region+0x264/0x5f4) (__mmap_region) from (__do_mmap_mm+0x3ec/0x440) (__do_mmap_mm) from (do_mmap+0x50/0x58) (do_mmap) from (vm_mmap_pgoff+0xfc/0x188) (vm_mmap_pgoff) from (ksys_mmap_pgoff+0xac/0xc4) (ksys_mmap_pgoff) from (ret_fast_syscall+0x0/0x5c) Code: e0801001 e2423001 e1c00003 f57ff04f (ee070f3e) ---[ end trace 09cf0734c3805d52 ]--- Kernel panic - not syncing: Fatal exception So check if PG_reserved was set to solve this issue. [1]: https://lore.kernel.org/lkml/Zbtdue57RO0QScJM@linux.ibm.com/ Fixes: a4d5613c4dc6 ("arm: extend pfn_valid to take into account freed memory map alignment") Suggested-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-03-11selftests/bpf: Add fexit and kretprobe triggering benchmarksAndrii Nakryiko
We already have kprobe and fentry benchmarks. Let's add kretprobe and fexit ones for completeness. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240309005124.3004446-1-andrii@kernel.org
2024-03-11mm: Introduce vmap_page_range() to map pages in PCI address spaceAlexei Starovoitov
ioremap_page_range() should be used for ranges within vmalloc range only. The vmalloc ranges are allocated by get_vm_area(). PCI has "resource" allocator that manages PCI_IOBASE, IO_SPACE_LIMIT address range, hence introduce vmap_page_range() to be used exclusively to map pages in PCI address space. Fixes: 3e49a866c9dc ("mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.") Reported-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/bpf/CANiq72ka4rir+RTN2FQoT=Vvprp_Ao-CvoYEkSNqtSY+RZj+AA@mail.gmail.com
2024-03-11ARM: 9354/1: ptrace: Use bitfield helpersGeert Uytterhoeven
The isa_mode() macro extracts two fields, and recombines them into a single value. Make this more obvious by using the FIELD_GET() helper, and shifting the result into its final resting place. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-03-11ASoC: amd: yc: Fix non-functional mic on ASUS M7600REM Cooley
The ASUS M7600RE (Vivobook Pro 16X OLED) needs a quirks-table entry for the internal microphone to function properly. Signed-off-by: Mitch Cooley <m.cooley.198@gmail.com> Link: https://msgid.link/r/CALijGznExWW4fujNWwMzmn_K=wo96sGzV_2VkT7NjvEUdkg7Gw@mail.gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-03-11Merge remote-tracking branches 'ras/edac-drivers', 'ras/edac-misc' and ↵Borislav Petkov (AMD)
'ras/edac-amd-atl' into edac-updates-for-v6.9 * ras/edac-drivers: EDAC/i10nm: Add Intel Grand Ridge micro-server support EDAC/igen6: Add one more Intel Alder Lake-N SoC support * ras/edac-misc: EDAC/versal: Convert to platform remove callback returning void EDAC/versal: Make the bit position of injected errors configurable EDAC/synopsys: Convert to devm_platform_ioremap_resource() * ras/edac-amd-atl: RAS/AMD/FMPM: Fix off by one when unwinding on error RAS/AMD/FMPM: Add debugfs interface to print record entries RAS/AMD/FMPM: Save SPA values RAS: Export helper to get ras_debugfs_dir RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2() RAS: Introduce a FRU memory poison manager RAS/AMD/ATL: Add MI300 row retirement support Documentation: Move RAS section to admin-guide RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300() RAS/AMD/ATL: Add MI300 support Documentation: RAS: Add index and address translation section EDAC/amd64: Use new AMD Address Translation Library RAS: Introduce AMD Address Translation Library Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-11Merge tag 'opp-updates-6.9' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm Merge OPP (operating performance points) updates for 6.9 from Viresh Kumar: "- Fix couple of warnings related to W=1 builds. (Viresh Kumar). - Move Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar). - Extend dev_pm_opp_data with turbo support (Sibi Sankar). - dt-bindings: drop maxItems from inner items (David Heidelberg)." * tag 'opp-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: dt-bindings: opp: drop maxItems from inner items OPP: debugfs: Fix warning around icc_get_name() OPP: debugfs: Fix warning with W=1 builds cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h OPP: Extend dev_pm_opp_data with turbo support
2024-03-11Merge tag 'asoc-v6.9' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v6.9 This has been quite a small release, there's a lot of driver specific cleanups and minor enhancements but hardly anything on the core and only one new driver. Highlights include: - SoundWire support for AMD ACP 6.3 systems. - Support for reporting version information for AVS firmware. - Support DSPless mode for Intel Soundwire systems. - Support for configuring CS35L56 amplifiers using EFI calibration data. - Log which component is being operated on as part of power management trace events. - Support for Microchip SAM9x7, NXP i.MX95 and Qualcomm WCD939x
2024-03-11Merge branch 'pm-em'Rafael J. Wysocki
Merge Enery Model changes for 6.9-rc1: - Allow the Energy Model to be updated dynamically (Lukasz Luba). * pm-em: (24 commits) PM: EM: Fix nr_states warnings in static checks Documentation: EM: Update with runtime modification design PM: EM: Add em_dev_compute_costs() PM: EM: Remove old table PM: EM: Change debugfs configuration to use runtime EM table data drivers/thermal/devfreq_cooling: Use new Energy Model interface drivers/thermal/cpufreq_cooling: Use new Energy Model interface powercap/dtpm_devfreq: Use new Energy Model interface to get table powercap/dtpm_cpu: Use new Energy Model interface to get table PM: EM: Optimize em_cpu_energy() and remove division PM: EM: Support late CPUs booting and capacity adjustment PM: EM: Add performance field to struct em_perf_state and optimize PM: EM: Add em_perf_state_from_pd() to get performance states table PM: EM: Introduce em_dev_update_perf_domain() for EM updates PM: EM: Add functions for memory allocations for new EM tables PM: EM: Use runtime modified EM for CPUs energy estimation in EAS PM: EM: Introduce runtime modifiable table PM: EM: Split the allocation and initialization of the EM table PM: EM: Check if the get_cost() callback is present in em_compute_costs() PM: EM: Introduce em_compute_costs() ...
2024-03-11Merge branches 'pm-powercap' and 'pm-tools'Rafael J. Wysocki
Merge power capping changes and power management utilities updates for 6.9-rc1: - Address multiple issues in the TPMI RAPL driver and add support for new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui). - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel Lezcano). - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li). - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth Norway Ananda). - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil). * pm-powercap: powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function powercap: dtpm_cpu: Fix error check against freq_qos_add_request() powercap: intel_rapl: Add support for Arrow Lake powercap: intel_rapl: Add support for Lunar Lake-M paltform powercap: intel_rapl_tpmi: Fix System Domain probing powercap: intel_rapl_tpmi: Fix a register bug powercap: intel_rapl: Fix locking in TPMI RAPL powercap: intel_rapl: Fix a NULL pointer dereference * pm-tools: Fix cpupower-frequency-info.1 man page typo tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num()
2024-03-11Merge branch 'pm-cpuidle'Rafael J. Wysocki
Merge cpuidle updates for 6.9-rc1: - Prevent the haltpoll cpuidle governor from shrinking guest poll_limit_ns below grow_start (Parshuram Sangle). - Avoid potential overflow in integer multiplication when computing cpuidle state parameters (C Cheng). - Adjust MWAIT hint target C-state computation in the ACPI cpuidle driver and in intel_idle to return a correct value for C0 (He Rongguang). * pm-cpuidle: cpuidle: ACPI/intel: fix MWAIT hint target C-state computation cpuidle: Avoid potential overflow in integer multiplication cpuidle: haltpoll: do not shrink guest poll_limit_ns below grow_start
2024-03-11Merge tag 'kvm-x86-xen-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM Xen and pfncache changes for 6.9: - Rip out the half-baked support for using gfn_to_pfn caches to manage pages that are "mapped" into guests via physical addresses. - Add support for using gfn_to_pfn caches with only a host virtual address, i.e. to bypass the "gfn" stage of the cache. The primary use case is overlay pages, where the guest may change the gfn used to reference the overlay page, but the backing hva+pfn remains the same. - Add an ioctl() to allow mapping Xen's shared_info page using an hva instead of a gpa, so that userspace doesn't need to reconfigure and invalidate the cache/mapping if the guest changes the gpa (but userspace keeps the resolved hva the same). - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation. - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior). - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs. - Extend gfn_to_pfn_cache's mutex to cover (de)activation (in addition to refresh), and drop a now-redundant acquisition of xen_lock (that was protecting the shared_info cache) to fix a deadlock due to recursively acquiring xen_lock.
2024-03-11Merge tag 'kvm-x86-pmu-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 PMU changes for 6.9: - Fix several bugs where KVM speciously prevents the guest from utilizing fixed counters and architectural event encodings based on whether or not guest CPUID reports support for the _architectural_ encoding. - Fix a variety of bugs in KVM's emulation of RDPMC, e.g. for "fast" reads, priority of VMX interception vs #GP, PMC types in architectural PMUs, etc. - Add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID, i.e. are difficult to validate via KVM-Unit-Tests. - Zero out PMU metadata on AMD if the virtual PMU is disabled to avoid wasting cycles, e.g. when checking if a PMC event needs to be synthesized when skipping an instruction. - Optimize triggering of emulated events, e.g. for "count instructions" events when skipping an instruction, which yields a ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest. - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit.
2024-03-11Merge tag 'kvm-x86-vmx-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM VMX changes for 6.9: - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace due to an unexpected VM-Exit while the CPU was vectoring an exception. - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support. - Clean up the logic for massaging the passthrough MSR bitmaps when userspace changes its MSR filter.
2024-03-11Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq changes for 6.9-rc1: - Enable preferred core support in the amd-pstate cpufreq driver (Meng Li). - Fix min_perf assignment in amd_pstate_adjust_perf() and make the min/max limit perf values in amd-pstate always stay within the (highest perf, lowest perf) range (Tor Vic, Meng Li). - Change default transition delay in cpufreq to 2ms (Qais Yousef). - Drop long-unused cpudata::prev_cummulative_iowait from the intel_pstate cpufreq driver (Jiri Slaby). - Allow intel_pstate to assign model-specific values to strings used in the EPP sysfs interface and make it do so on Meteor Lake (Srinivas Pandruvada). - Remove references to 10ms minimum sampling rate from comments in the cpufreq code (Pierre Gondois). - Prevent scaling_cur_freq from exceeding scaling_max_freq when the latter is an inefficient frequency (Shivnandan Kumar). - Honour transition_latency over transition_delay_us in cpufreq (Qais Yousef). - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar). - General enhancements / cleanups to ARM cpufreq drivers (tianyu2, Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia Belova). - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan). - Make the SCMI cpufreq driver get a transition delay value from firmware (Pierre Gondois). * pm-cpufreq: (28 commits) cpufreq: scmi: Set transition_delay_us firmware: arm_scmi: Populate fast channel rate_limit firmware: arm_scmi: Populate perf commands rate_limit cpufreq: Don't unregister cpufreq cooling on CPU hotplug cpufreq: Honour transition_latency over transition_delay_us cpufreq: Limit resolving a frequency to policy min/max cpufreq: amd-pstate: adjust min/max limit perf cpufreq: Remove references to 10ms min sampling rate cpufreq: intel_pstate: Update default EPPs for Meteor Lake cpufreq: intel_pstate: Allow model specific EPPs cpufreq: qcom-hw: add CONFIG_COMMON_CLK dependency cpufreq: dt-platdev: block SDM670 in cpufreq-dt-platdev cpufreq: intel_pstate: remove cpudata::prev_cummulative_iowait cpufreq: Change default transition delay to 2ms cpufreq: amd-pstate: Fix min_perf assignment in amd_pstate_adjust_perf() Documentation: PM: amd-pstate: Fix section title underline Documentation: introduce amd-pstate preferrd core mode kernel command line options Documentation: amd-pstate: introduce amd-pstate preferred core cpufreq: amd-pstate: Update amd-pstate preferred core ranking dynamically ACPI: cpufreq: Add highest perf change notification ...
2024-03-11Merge tag 'kvm-x86-mmu-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 MMU changes for 6.9: - Clean up code related to unprotecting shadow pages when retrying a guest instruction after failed #PF-induced emulation. - Zap TDP MMU roots at 4KiB granularity to minimize the delay in yielding if a reschedule is needed, e.g. if a high priority task needs to run. Because KVM doesn't support yielding in the middle of processing a zapped non-leaf SPTE, zapping at 1GiB granularity can result in multi-millisecond lag when attempting to schedule in a high priority. - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot. - Allocate write-tracking metadata on-demand to avoid the memory overhead when running kernels built with KVMGT support (external write-tracking enabled), but for workloads that don't use nested virtualization (shadow paging) or KVMGT.
2024-03-11Merge tag 'cpufreq-arm-updates-6.9' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Merge ARM cpufreq updates for 6.9 from Viresh Kumar: "- General enhancements / cleanups to cpufreq drivers (tianyu2, Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia Belova). - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan). - scmi: get transition delay from firmware (Pierre Gondois)." * tag 'cpufreq-arm-updates-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: scmi: Set transition_delay_us firmware: arm_scmi: Populate fast channel rate_limit firmware: arm_scmi: Populate perf commands rate_limit cpufreq: qcom-hw: add CONFIG_COMMON_CLK dependency cpufreq: dt-platdev: block SDM670 in cpufreq-dt-platdev cpufreq: mediatek-hw: Don't error out if supply is not found Documentation: power: Use kcalloc() instead of kzalloc() cpufreq: mediatek-hw: Wait for CPU supplies before probing cpufreq: brcmstb-avs-cpufreq: add check for cpufreq_cpu_get's return value cpufreq: imx6: use regmap to read ocotp register
2024-03-11Merge tag 'kvm-x86-misc-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.9: - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives (though in fairness in KMSAN, it's comically difficult to see that the uninitialized memory is never truly consumed). - Fix the deubgregs ABI for 32-bit KVM, and clean up code related to reading DR6 and DR7. - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit. This allows VMX to further optimize handling preemption timer exits, and allows SVM to avoid sending a duplicate IPI (SVM also has a need to force an exit). - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, and add WARN to guard against similar bugs. - Provide a dedicated arch hook for checking if a different vCPU was in-kernel (for directed yield), and simplify the logic for checking if the currently loaded vCPU is in-kernel. - Misc cleanups and fixes.