summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-10-24bpf: Add the missing BPF_LINK_TYPE invocation for sockmapHou Tao
There is an out-of-bounds read in bpf_link_show_fdinfo() for the sockmap link fd. Fix it by adding the missing BPF_LINK_TYPE invocation for sockmap link Also add comments for bpf_link_type to prevent missing updates in the future. Fixes: 699c23f02c65 ("bpf: Add bpf_link support for sk_msg and sk_skb progs") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241024013558.1135167-2-houtao@huaweicloud.com
2024-10-24ACPI: EC: make EC support compile-time conditionalArnd Bergmann
The embedded controller code is mainly used on x86 laptops and cannot work without PC style I/O port access. Make this a user-visible configuration option that is default enabled on x86 but otherwise disabled, and that can never be enabled unless CONFIG_HAS_IOPORT is also available. The empty stubs in internal.h help ignore the EC code in configurations that don't support it. In order to see those stubs, the sbshc code also has to include this header and drop duplicate declarations. All the direct callers of ec_read/ec_write already had an x86 dependency and now also need to depend on APCI_EC. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://patch.msgid.link/20241011061948.3211423-1-arnd@kernel.org [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-24thermal: netlink: Add the commands and the events for the thresholdsDaniel Lezcano
The thresholds exist but there is no notification neither action code related to them yet. These changes implement the netlink for the notifications when the thresholds are crossed, added, deleted or flushed as well as the commands which allows to get the list of the thresholds, flush them, add and delete. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20241022155147.463475-3-daniel.lezcano@linaro.org [ rjw: Use the thermal_zone guard for locking, subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-24thermal: core: Add and use cooling device guardRafael J. Wysocki
Add and use a special guard for cooling devices. This allows quite a few error code paths to be simplified among other things and brings in code size reduction for a good measure. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/5837621.DvuYhMxLoT@rjwysocki.net Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-10-24pidfd: add ioctl to retrieve pid infoLuca Boccassi
A common pattern when using pid fds is having to get information about the process, which currently requires /proc being mounted, resolving the fd to a pid, and then do manual string parsing of /proc/N/status and friends. This needs to be reimplemented over and over in all userspace projects (e.g.: I have reimplemented resolving in systemd, dbus, dbus-daemon, polkit so far), and requires additional care in checking that the fd is still valid after having parsed the data, to avoid races. Having a programmatic API that can be used directly removes all these requirements, including having /proc mounted. As discussed at LPC24, add an ioctl with an extensible struct so that more parameters can be added later if needed. Start with returning pid/tgid/ppid and creds unconditionally, and cgroupid optionally. Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com> Link: https://lore.kernel.org/r/20241010155401.2268522-1-luca.boccassi@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24afs: Fix missing subdir edit when renamed between parent dirsDavid Howells
When rename moves an AFS subdirectory between parent directories, the subdir also needs a bit of editing: the ".." entry needs updating to point to the new parent (though I don't make use of the info) and the DV needs incrementing by 1 to reflect the change of content. The server also sends a callback break notification on the subdirectory if we have one, but we can take care of recovering the promise next time we access the subdir. This can be triggered by something like: mount -t afs %example.com:xfstest.test20 /xfstest.test/ mkdir /xfstest.test/{aaa,bbb,aaa/ccc} touch /xfstest.test/bbb/ccc/d mv /xfstest.test/{aaa/ccc,bbb/ccc} touch /xfstest.test/bbb/ccc/e When the pathwalk for the second touch hits "ccc", kafs spots that the DV is incorrect and downloads it again (so the fix is not critical). Fix this, if the rename target is a directory and the old and new parents are different, by: (1) Incrementing the DV number of the target locally. (2) Editing the ".." entry in the target to refer to its new parent's vnode ID and uniquifier. Link: https://lore.kernel.org/r/3340431.1729680010@warthog.procyon.org.uk Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") cc: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24Merge tag 'for-net-2024-10-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_core: Disable works on hci_unregister_dev - SCO: Fix UAF on sco_sock_timeout - ISO: Fix UAF on iso_sock_timeout * tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: ISO: Fix UAF on iso_sock_timeout Bluetooth: SCO: Fix UAF on sco_sock_timeout Bluetooth: hci_core: Disable works on hci_unregister_dev ==================== Link: https://patch.msgid.link/20241023143005.2297694-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24Merge tag 'ipsec-2024-10-22' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-10-22 1) Fix routing behavior that relies on L4 information for xfrm encapsulated packets. From Eyal Birger. 2) Remove leftovers of pernet policy_inexact lists. From Florian Westphal. 3) Validate new SA's prefixlen when the selector family is not set from userspace. From Sabrina Dubroca. 4) Fix a kernel-infoleak when dumping an auth algorithm. From Petr Vaganov. Please pull or let me know if there are problems. ipsec-2024-10-22 * tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix one more kernel-infoleak in algo dumping xfrm: validate new SA's prefixlen using SA family when sel.family is unset xfrm: policy: remove last remnants of pernet inexact list xfrm: respect ip protocols rules criteria when performing dst lookups xfrm: extract dst lookup parameters into a struct ==================== Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24Merge tag 'asoc-fix-v6.12-rc4' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.12 An uncomfortably large set of fixes due to me not getting round to sending them for longer than I should due to travel and illness. This is mostly smaller driver specific changes, but there are a couple of generic changes: - Bumping the minimal topology ABI we check for during validation, the code had support for v4 removed previously but the update of the define used for initial validation was missed. - Fix the assumption that DAPM structs will be embedded in a component which isn't true for card widgets when doing name comparisons, though fortunately this is rarely triggered. We've pulled in one Soundwire fix which was part of a larger series fixing cleanup issues in on Intel Soundwire systems.
2024-10-23rcu: Delete unused rcu_gp_might_be_stalled() functionPaul E. McKenney
The rcu_gp_might_be_stalled() function is no longer used, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-23kasan: Fix Software Tag-Based KASAN with GCCMarco Elver
Per [1], -fsanitize=kernel-hwaddress with GCC currently does not disable instrumentation in functions with __attribute__((no_sanitize_address)). However, __attribute__((no_sanitize("hwaddress"))) does correctly disable instrumentation. Use it instead. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117196 [1] Link: https://lore.kernel.org/r/000000000000f362e80620e27859@google.com Link: https://lore.kernel.org/r/ZvFGwKfoC4yVjN_X@J2N7QTR9R3 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218854 Reported-by: syzbot+908886656a02769af987@syzkaller.appspotmail.com Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrew Pinski <pinskia@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Fixes: 7b861a53e46b ("kasan: Bump required compiler version") Link: https://lore.kernel.org/r/20241021120013.3209481-1-elver@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-10-23Bluetooth: SCO: Fix UAF on sco_sock_timeoutLuiz Augusto von Dentz
conn->sk maybe have been unlinked/freed while waiting for sco_conn_lock so this checks if the conn->sk is still valid by checking if it part of sco_sk_list. Reported-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com Tested-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4c0d0c4cde787116d465 Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-22bpf: Add MEM_WRITE attributeDaniel Borkmann
Add a MEM_WRITE attribute for BPF helper functions which can be used in bpf_func_proto to annotate an argument type in order to let the verifier know that the helper writes into the memory passed as an argument. In the past MEM_UNINIT has been (ab)used for this function, but the latter merely tells the verifier that the passed memory can be uninitialized. There have been bugs with overloading the latter but aside from that there are also cases where the passed memory is read + written which currently cannot be expressed, see also 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22Merge branch 'for-6.13/block-atomic' into for-6.13/blockJens Axboe
Merge in block/fs prep patches for the atomic write support. * for-6.13/block-atomic: block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid()
2024-10-22block: sed-opal: add ioctl IOC_OPAL_SET_SID_PWGreg Joyce
After a SED drive is provisioned, there is no way to change the SID password via the ioctl() interface. A new ioctl IOC_OPAL_SET_SID_PW will allow the password to be changed. The valid current password is required. Signed-off-by: Greg Joyce <gjoyce@linux.ibm.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20240829175639.6478-2-gjoyce@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22ublk: support device recovery without I/O queueingUday Shankar
ublk currently supports the following behaviors on ublk server exit: A: outstanding I/Os get errors, subsequently issued I/Os get errors B: outstanding I/Os get errors, subsequently issued I/Os queue C: outstanding I/Os get reissued, subsequently issued I/Os queue and the following behaviors for recovery of preexisting block devices by a future incarnation of the ublk server: 1: ublk devices stopped on ublk server exit (no recovery possible) 2: ublk devices are recoverable using start/end_recovery commands The userspace interface allows selection of combinations of these behaviors using flags specified at device creation time, namely: default behavior: A + 1 UBLK_F_USER_RECOVERY: B + 2 UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_REISSUE: C + 2 The behavior A + 2 is currently unsupported. Add support for this behavior under the new flag combination UBLK_F_USER_RECOVERY|UBLK_F_USER_RECOVERY_FAIL_IO. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241007182419.3263186-5-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: enable passthrough command statisticsKeith Busch
Applications using the passthrough interfaces for IO want to continue seeing the disk stats. These requests had been fenced off from this block layer feature. While the block layer doesn't necessarily know what a passthrough command does, we do know the data size and direction, which is enough to account for the command's stats. Since tracking these has the potential to produce unexpected results, the passthrough stats are locked behind a new queue flag that needs to be enabled with the /sys/block/<dev>/queue/iostats_passthrough attribute. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241007153236.2818562-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: introduce add_disk_fwnode()Christian Marangi
Introduce add_disk_fwnode() as a replacement of device_add_disk() that permits to pass and attach a fwnode to disk dev. This variant can be useful for eMMC that might have the partition table for the disk defined in DT. A parser can later make use of the attached fwnode to parse the related table and init the hardcoded partition for the disk. device_add_disk() is converted to a simple wrapper of add_disk_fwnode() with the fwnode entry set as NULL. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241002221306.4403-4-ansuelsmth@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: remove redundant passthrough check in blk_mq_need_time_stamp()Jens Axboe
Simply checking the rq_flags is enough to determine if accounting is being done for this request. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22acl: Annotate struct posix_acl with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member a_entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Use struct_size() to calculate the number of bytes to allocate for new and cloned acls and remove the local size variables. Change the posix_acl_alloc() function parameter count from int to unsigned int to match posix_acl's a_count data type. Add identifier names to the function definition to silence two checkpatch warnings. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20241018121426.155247-2-thorsten.blum@linux.dev Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-22acl: Realign struct posix_acl to save 8 bytesThorsten Blum
Reduce posix_acl's struct size by 8 bytes by realigning its members. Cc: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20241015202158.2376-1-thorsten.blum@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-22epoll: Add synchronous wakeup support for ep_poll_callbackXuewen Yan
Now, the epoll only use wake_up() interface to wake up task. However, sometimes, there are epoll users which want to use the synchronous wakeup flag to hint the scheduler, such as Android binder driver. So add a wake_up_sync() define, and use the wake_up_sync() when the sync is true in ep_poll_callback(). Co-developed-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Link: https://lore.kernel.org/r/20240426080548.8203-1-xuewen.yan@unisoc.com Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Brian Geffon <bgeffon@google.com> Reported-by: Benoit Lize <lizeb@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-22epoll: annotate racy checkChristian Brauner
Epoll relies on a racy fastpath check during __fput() in eventpoll_release() to avoid the hit of pointlessly acquiring a semaphore. Annotate that race by using WRITE_ONCE() and READ_ONCE(). Link: https://lore.kernel.org/r/66edfb3c.050a0220.3195df.001a.GAE@google.com Link: https://lore.kernel.org/r/20240925-fungieren-anbauen-79b334b00542@brauner Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: syzbot+3b6b32dc50537a49bb4a@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix the guest view of the ID registers, making the relevant fields writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1) - Correcly expose S1PIE to guests, fixing a regression introduced in 6.12-rc1 with the S1POE support - Fix the recycling of stage-2 shadow MMUs by tracking the context (are we allowed to block or not) as well as the recycling state - Address a couple of issues with the vgic when userspace misconfigures the emulation, resulting in various splats. Headaches courtesy of our Syzkaller friends - Stop wasting space in the HYP idmap, as we are dangerously close to the 4kB limit, and this has already exploded in -next - Fix another race in vgic_init() - Fix a UBSAN error when faking the cache topology with MTE enabled RISCV: - RISCV: KVM: use raw_spinlock for critical section in imsic x86: - A bandaid for lack of XCR0 setup in selftests, which causes trouble if the compiler is configured to have x86-64-v3 (with AVX) as the default ISA. Proper XCR0 setup will come in the next merge window. - Fix an issue where KVM would not ignore low bits of the nested CR3 and potentially leak up to 31 bytes out of the guest memory's bounds - Fix case in which an out-of-date cached value for the segments could by returned by KVM_GET_SREGS. - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL - Override MTRR state for KVM confidential guests, making it WB by default as is already the case for Hyper-V guests. Generic: - Remove a couple of unused functions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits) RISCV: KVM: use raw_spinlock for critical section in imsic KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups KVM: selftests: x86: Avoid using SSE/AVX instructions KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range() KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot x86/kvm: Override default caching mode for SEV-SNP and TDX KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic KVM: Remove unused kvm_vcpu_gfn_to_pfn KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS KVM: arm64: Fix shift-out-of-bounds bug KVM: arm64: Shave a few bytes from the EL2 idmap code KVM: arm64: Don't eagerly teardown the vgic on init error KVM: arm64: Expose S1PIE to guests KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule KVM: arm64: nv: Punt stage-2 recycling to a vCPU request KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed ...
2024-10-21Merge tag 'vfs-6.12-rc5.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "afs: - Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock netfs: - Drop the references to a folio immediately after the folio has been extracted to prevent races with future I/O collection - Fix a documenation build error - Downgrade the i_rwsem for buffered writes to fix a cifs reported performance regression when switching to netfslib vfs: - Explicitly return -E2BIG from openat2() if the specified size is unexpectedly large. This aligns openat2() with other extensible struct based system calls - When copying a mount namespace ensure that we only try to remove the new copy from the mount namespace rbtree if it has already been added to it nilfs: - Clear the buffer delay flag when clearing the buffer state clags when a buffer head is discarded to prevent a kernel OOPs ocfs2: - Fix an unitialized value warning in ocfs2_setattr() proc: - Fix a kernel doc warning" * tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: proc: Fix W=1 build kernel-doc warning afs: Fix lock recursion fs: Fix uninitialized value issue in from_kuid and from_kgid fs: don't try and remove empty rbtree node netfs: Downgrade i_rwsem for a buffered write nilfs2: fix kernel bug due to missing clearing of buffer delay flag openat2: explicitly return -E2BIG for (usize > PAGE_SIZE) netfs: fix documentation build error netfs: In readahead, put the folio refs as soon extracted
2024-10-21iomap: turn iomap_want_unshare_iter into an inline functionChristoph Hellwig
iomap_want_unshare_iter currently sits in fs/iomap/buffered-io.c, which depends on CONFIG_BLOCK. It is also in used in fs/dax.c whіch has no such dependency. Given that it is a trivial check turn it into an inline in include/linux/iomap.h to fix the DAX && !BLOCK build. Fixes: 6ef6a0e821d3 ("iomap: share iomap_unshare_iter predicate code with fsdax") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241015041350.118403-1-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21uaccess: add copy_struct_to_user helperAleksa Sarai
This is based on copy_struct_from_user(), but there is one additional case to consider when creating a syscall that returns an extensible-struct to userspace -- how should data in the struct that cannot fit into the userspace struct be handled (ksize > usize)? There are three possibilies: 1. The interface is like sched_getattr(2), where new information will be silently not provided to userspace. This is probably what most interfaces will want to do, as it provides the most possible backwards-compatibility. 2. The interface is like lsm_list_modules(2), where you want to return an error like -EMSGSIZE if not providing information could result in the userspace program making a serious mistake (such as one that could lead to a security problem) or if you want to provide some flag to userspace so they know that they are missing some information. 3. The interface is like statx(2), where there some kind of a request mask that indicates what data userspace would like. One could imagine that statx2(2) (using extensible structs) would want to return -EMSGSIZE if the user explicitly requested a field that their structure is too small to fit, but not return an error if the field was not explicitly requested. This is kind of a mix between (1) and (2) based on the requested mask. The copy_struct_to_user() helper includes a an extra argument that is used to return a boolean flag indicating whether there was a non-zero byte in the trailing bytes that were not copied to userspace. This can be used in the following ways to handle all three cases, respectively: 1. Just pass NULL, as you don't care about this case. 2. Return an error (say -EMSGSIZE) if the argument was set to true by copy_struct_to_user(). 3. If the argument was set to true by copy_struct_to_user(), check if there is a flag that implies a field larger than usize. This is the only case where callers of copy_struct_to_user() should check usize themselves. This will probably require scanning an array that specifies what flags were added for each version of the flags struct and returning an error if the request mask matches any of the flags that were added in versions of the struct that are larger than usize. At the moment we don't have any users of (3), so this patch doesn't include any helpers to make the necessary scanning easier, but it should be fairly easy to add some if necessary. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20241010-extensible-structs-check_fields-v3-1-d2833dfe6edd@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel spaceBibo Mao
There are two pages in one TLB entry on LoongArch system. For kernel space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit set, otherwise HW treats it as non-global tlb, there will be potential problems if tlb entry for kernel space is not global. Such as fail to flush kernel tlb with the function local_flush_tlb_kernel_range() which supposed only flush tlb with global bit. Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and kasan areas. For these areas both two consecutive page table entries should be enabled with PAGE_GLOBAL bit. So with function set_pte() and pte_clear(), pte buddy entry is checked and set besides its own pte entry. However it is not atomic operation to set both two pte entries, there is problem with test_vmalloc test case. So function kernel_pte_init() is added to init a pte table when it is created for kernel address space, and the default initial pte value is PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry need update with function set_pte() and pte_clear(), nothing to do with the pte buddy entry. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-10-21fs/super.c: introduce get_tree_bdev_flags()Gao Xiang
As Allison reported [1], currently get_tree_bdev() will store "Can't lookup blockdev" error message. Although it makes sense for pure bdev-based fses, this message may mislead users who try to use EROFS file-backed mounts since get_tree_nodev() is used as a fallback then. Add get_tree_bdev_flags() to specify extensible flags [2] and GET_TREE_BDEV_QUIET_LOOKUP to silence "Can't lookup blockdev" message since it's misleading to EROFS file-backed mounts now. [1] https://lore.kernel.org/r/CAOYeF9VQ8jKVmpy5Zy9DNhO6xmWSKMB-DO8yvBB0XvBE7=3Ugg@mail.gmail.com [2] https://lore.kernel.org/r/ZwUkJEtwIpUA4qMz@infradead.org Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241009033151.2334888-1-hsiangkao@linux.alibaba.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21net: fix races in netdev_tx_sent_queue()/dev_watchdog()Eric Dumazet
Some workloads hit the infamous dev_watchdog() message: "NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out" It seems possible to hit this even for perfectly normal BQL enabled drivers: 1) Assume a TX queue was idle for more than dev->watchdog_timeo (5 seconds unless changed by the driver) 2) Assume a big packet is sent, exceeding current BQL limit. 3) Driver ndo_start_xmit() puts the packet in TX ring, and netdev_tx_sent_queue() is called. 4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue() before txq->trans_start has been written. 5) txq->trans_start is written later, from netdev_start_xmit() if (rc == NETDEV_TX_OK) txq_trans_update(txq) dev_watchdog() running on another cpu could read the old txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5) did not happen yet. To solve the issue, write txq->trans_start right before one XOFF bit is set : - _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue() - __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue() From dev_watchdog(), we have to read txq->state before txq->trans_start. Add memory barriers to enforce correct ordering. In the future, we could avoid writing over txq->trans_start for normal operations, and rename this field to txq->xoff_start_time. Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-20Merge tag 'tty-6.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for 6.12-rc4: - qcom-geni serial driver fixes, wow what a mess of a UART chip that thing is... - vt infoleak fix for odd font sizes - imx serial driver bugfix - yet-another n_gsm ldisc bugfix, slowly chipping down the issues in that piece of code All of these have been in linux-next for over a week with no reported issues" * tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: qcom-geni: rename suspend functions serial: qcom-geni: drop unused receive parameter serial: qcom-geni: drop flip buffer WARN() serial: qcom-geni: fix rx cancel dma status bit serial: qcom-geni: fix receiver enable serial: qcom-geni: fix dma rx cancellation serial: qcom-geni: fix shutdown race serial: qcom-geni: revert broken hibernation support serial: qcom-geni: fix polled console initialisation serial: imx: Update mctrl old_status on RTSD interrupt tty: n_gsm: Fix use-after-free in gsm_cleanup_mux vt: prevent kernel-infoleak in con_font_get()
2024-10-20Merge tag 'irq_urgent_for_v6.12_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a case for sifive-plic where an interrupt gets disabled *and* masked and remains masked when it gets reenabled later - Plug a small race in GIC-v4 where userspace can force an affinity change of a virtual CPU (vPE) in its unmapping path - Do not mix the two sets of ocelot irqchip's registers in the mask calculation of the main interrupt sticky register - Other smaller fixlets and cleanups * tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/renesas-rzg2l: Fix missing put_device irqchip/riscv-intc: Fix SMP=n boot with ACPI irqchip/sifive-plic: Unmask interrupt in plic_irq_enable() irqchip/gic-v4: Don't allow a VMOVP on a dying VPE irqchip/sifive-plic: Return error code on failure irqchip/riscv-imsic: Fix output text of base address irqchip/ocelot: Comment sticky register clearing code irqchip/ocelot: Fix trigger register address irqchip: Remove obsolete config ARM_GIC_V3_ITS_PCI
2024-10-20Merge tag 'sched_urgent_for_v6.12_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduling fixes from Borislav Petkov: - Add PREEMPT_RT maintainers - Fix another aspect of delayed dequeued tasks wrt determining their state, i.e., whether they're runnable or blocked - Handle delayed dequeued tasks and their migration wrt PSI properly - Fix the situation where a delayed dequeue task gets enqueued into a new class, which should not happen - Fix a case where memory allocation would happen while the runqueue lock is held, which is a no-no - Do not over-schedule when tasks with shorter slices preempt the currently running task - Make sure delayed to deque entities are properly handled before unthrottling - Other smaller cleanups and improvements * tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Add an entry for PREEMPT_RT. sched/fair: Fix external p->on_rq users sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug sched/core: Dequeue PSI signals for blocked tasks that are delayed sched: Fix delayed_dequeue vs switched_from_fair() sched/core: Disable page allocation in task_tick_mm_cid() sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl() sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running sched: Fix sched_delayed vs cfs_bandwidth
2024-10-20Merge tag 'for-linus-6.12a-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A single fix for a build failure introduced this merge window" * tag 'for-linus-6.12a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Remove dependency between pciback and privcmd
2024-10-20Merge tag 'dma-mapping-6.12-2024-10-20' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: "Just another small tracing fix from Sean" * tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: fix tracing dma_alloc/free with vmalloc'd memory
2024-10-20KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomicDr. David Alan Gilbert
The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit 1bbc60d0c7e5 ("KVM: x86/mmu: Remove MMU auditing") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Message-ID: <20241001141354.18009-3-linux@treblig.org> [Adjust Documentation/virt/kvm/locking.rst. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20KVM: Remove unused kvm_vcpu_gfn_to_pfnDr. David Alan Gilbert
The last use of kvm_vcpu_gfn_to_pfn was removed by commit b1624f99aa8f ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Message-ID: <20241001141354.18009-2-linux@treblig.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-19getname_maybe_null() - the third variant of pathname copy-inAl Viro
Semantics used by statx(2) (and later *xattrat(2)): without AT_EMPTY_PATH it's standard getname() (i.e. ERR_PTR(-ENOENT) on empty string, ERR_PTR(-EFAULT) on NULL), with AT_EMPTY_PATH both empty string and NULL are accepted. Calling conventions: getname_maybe_null(user_pointer, flags) returns * pointer to struct filename when non-empty string had been successfully read * ERR_PTR(...) on error * NULL if an empty string or NULL pointer had been given with AT_EMPTY_PATH in the flags argument. It tries to avoid allocation in the last case; it's not always able to do so, in which case the temporary struct filename instance is freed and NULL returned anyway. Fast path is inlined. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-19block: Add bdev atomic write limits helpersJohn Garry
Add helpers to get atomic write limits for a bdev, so that we don't access request_queue helpers outside the block layer. We check if the bdev can actually atomic write in these helpers, so we can avoid users missing using this check. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241019125113.369994-4-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-19fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid()John Garry
Currently FMODE_CAN_ATOMIC_WRITE is set if the bdev can atomic write and the file is open for direct IO. This does not work if the file is not opened for direct IO, yet fcntl(O_DIRECT) is used on the fd later. Change to check for direct IO on a per-IO basis in generic_atomic_write_valid(). Since we want to report -EOPNOTSUPP for non-direct IO for an atomic write, change to return an error code. Relocate the block fops atomic write checks to the common write path, as to catch non-direct IO. Fixes: c34fc6f26ab8 ("fs: Initial atomic write support") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241019125113.369994-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-19block/fs: Pass an iocb to generic_atomic_write_valid()John Garry
Darrick and Hannes both thought it better that generic_atomic_write_valid() should be passed a struct iocb, and not just the member of that struct which is referenced; see [0] and [1]. I think that makes a more generic and clean API, so make that change. [0] https://lore.kernel.org/linux-block/680ce641-729b-4150-b875-531a98657682@suse.de/ [1] https://lore.kernel.org/linux-xfs/20240620212401.GA3058325@frogsfrogsfrogs/ Fixes: c34fc6f26ab8 ("fs: Initial atomic write support") Suggested-by: Darrick J. Wong <djwong@kernel.org> Suggested-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241019125113.369994-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-19fs: add file_refChristian Brauner
As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop it has O(N^2) behaviour under contention with N concurrent operations and it is in a hot path in __fget_files_rcu(). The rcuref infrastructures remedies this problem by using an unconditional increment relying on safe- and dead zones to make this work and requiring rcu protection for the data structure in question. This not just scales better it also introduces overflow protection. However, in contrast to generic rcuref, files require a memory barrier and thus cannot rely on *_relaxed() atomic operations and also require to be built on atomic_long_t as having massive amounts of reference isn't unheard of even if it is just an attack. As suggested by Linus, add a file specific variant instead of making this a generic library. Files are SLAB_TYPESAFE_BY_RCU and thus don't have "regular" rcu protection. In short, freeing of files isn't delayed until a grace period has elapsed. Instead, they are freed immediately and thus can be reused (multiple times) within the same grace period. So when picking a file from the file descriptor table via its file descriptor number it is thus possible to see an elevated reference count on file->f_count even though the file has already been recycled possibly multiple times by another task. To guard against this the vfs will pick the file from the file descriptor table twice. Once before the refcount increment and once after to compare the pointers (grossly simplified). If they match then the file is still valid. If not the caller needs to fput() it. The unconditional increment makes the following race possible as illustrated by rcuref: > Deconstruction race > =================== > > The release operation must be protected by prohibiting a grace period in > order to prevent a possible use after free: > > T1 T2 > put() get() > // ref->refcnt = ONEREF > if (!atomic_add_negative(-1, &ref->refcnt)) > return false; <- Not taken > > // ref->refcnt == NOREF > --> preemption > // Elevates ref->refcnt to ONEREF > if (!atomic_add_negative(1, &ref->refcnt)) > return true; <- taken > > if (put(&p->ref)) { <-- Succeeds > remove_pointer(p); > kfree_rcu(p, rcu); > } > > RCU grace period ends, object is freed > > atomic_cmpxchg(&ref->refcnt, NOREF, DEAD); <- UAF > > [...] it prevents the grace period which keeps the object alive until > all put() operations complete. Having files by SLAB_TYPESAFE_BY_RCU shouldn't cause any problems for this deconstruction race. Afaict, the only interesting case would be someone freeing the file and someone immediately recycling it within the same grace period and reinitializing file->f_count to ONEREF while a concurrent fput() is doing atomic_cmpxchg(&ref->refcnt, NOREF, DEAD) as in the race above. But this is safe from SLAB_TYPESAFE_BY_RCU's perspective and it should be safe from rcuref's perspective. T1 T2 T3 fput() fget() // f_count->refcnt = ONEREF if (!atomic_add_negative(-1, &f_count->refcnt)) return false; <- Not taken // f_count->refcnt == NOREF --> preemption // Elevates f_count->refcnt to ONEREF if (!atomic_add_negative(1, &f_count->refcnt)) return true; <- taken if (put(&f_count)) { <-- Succeeds remove_pointer(p); /* * Cache is SLAB_TYPESAFE_BY_RCU * so this is freed without a grace period. */ kmem_cache_free(p); } kmem_cache_alloc() init_file() { // Sets f_count->refcnt to ONEREF rcuref_long_init(&f->f_count, 1); } Object has been reused within the same grace period via kmem_cache_alloc()'s SLAB_TYPESAFE_BY_RCU. /* * With SLAB_TYPESAFE_BY_RCU this would be a safe UAF access and * it would work correctly because the atomic_cmpxchg() * will fail because the refcount has been reset to ONEREF by T3. */ atomic_cmpxchg(&ref->refcnt, NOREF, DEAD); <- UAF However, there are other cases to consider: (1) Benign race due to multiple atomic_long_read() CPU1 CPU2 file_ref_put() // last reference // => count goes negative/FILE_REF_NOREF atomic_long_add_negative_release(-1, &ref->refcnt) -> __file_ref_put() file_ref_get() // goes back from negative/FILE_REF_NOREF to 0 // and file_ref_get() succeeds atomic_long_add_negative(1, &ref->refcnt) // This is immediately followed by file_ref_put() // managing to set FILE_REF_DEAD file_ref_put() // __file_ref_put() continues and sees // cnt > FILE_REF_RELEASED // and splats with // "imbalanced put on file reference count" cnt = atomic_long_read(&ref->refcnt); The race however is benign and the problem is the atomic_long_read(). Instead of performing a separate read this uses atomic_long_dec_return() and pass the value to __file_ref_put(). Thanks to Linus for pointing out that braino. (2) SLAB_TYPESAFE_BY_RCU may cause recycled files to be marked dead When a file is recycled the following race exists: CPU1 CPU2 // @file is already dead and thus // cnt >= FILE_REF_RELEASED. file_ref_get(file) atomic_long_add_negative(1, &ref->refcnt) // We thus call into __file_ref_get() -> __file_ref_get() // which sees cnt >= FILE_REF_RELEASED cnt = atomic_long_read(&ref->refcnt); // In the meantime @file gets freed kmem_cache_free() // and is immediately recycled file = kmem_cache_zalloc() // and the reference count is reinitialized // and the file alive again in someone // else's file descriptor table file_ref_init(&ref->refcnt, 1); // the __file_ref_get() slowpath now continues // and as it saw earlier that cnt >= FILE_REF_RELEASED // it wants to ensure that we're staying in the middle // of the deadzone and unconditionally sets // FILE_REF_DEAD. // This marks @file dead for CPU2... atomic_long_set(&ref->refcnt, FILE_REF_DEAD); // Caller issues a close() system call to close @file close(fd) file = file_close_fd_locked() filp_flush() // The caller sees that cnt >= FILE_REF_RELEASED // and warns the first time... CHECK_DATA_CORRUPTION(file_count(file) == 0) // and then splats a second time because // __file_ref_put() sees cnt >= FILE_REF_RELEASED file_ref_put(&ref->refcnt); -> __file_ref_put() My initial inclination was to replace the unconditional atomic_long_set() with an atomic_long_try_cmpxchg() but Linus pointed out that: > I think we should just make file_ref_get() do a simple > > return !atomic_long_add_negative(1, &ref->refcnt)); > > and nothing else. Yes, multiple CPU's can race, and you can increment > more than once, but the gap - even on 32-bit - between DEAD and > becoming close to REF_RELEASED is so big that we simply don't care. > That's the point of having a gap. I've been testing this with will-it-scale using fstat() on a machine that Jens gave me access (thank you very much!): processor : 511 vendor_id : AuthenticAMD cpu family : 25 model : 160 model name : AMD EPYC 9754 128-Core Processor and I consistently get a 3-5% improvement on 256+ threads. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202410151043.5d224a27-oliver.sang@intel.com Closes: https://lore.kernel.org/all/202410151611.f4cd71f2-oliver.sang@intel.com Link: https://lore.kernel.org/r/20241007-brauner-file-rcuref-v2-2-387e24dc9163@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-18Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to not affect subreg_def marks in its range propagation (Eduard Zingerman) - Fix a truncation bug in the BPF verifier's handling of coerce_reg_to_size_sx (Dimitar Kanaliev) - Fix the BPF verifier's delta propagation between linked registers under 32-bit addition (Daniel Borkmann) - Fix a NULL pointer dereference in BPF devmap due to missing rxq information (Florian Kauer) - Fix a memory leak in bpf_core_apply (Jiri Olsa) - Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for arrays of nested structs (Hou Tao) - Fix build ID fetching where memory areas backing the file were created with memfd_secret (Andrii Nakryiko) - Fix BPF task iterator tid filtering which was incorrectly using pid instead of tid (Jordan Rome) - Several fixes for BPF sockmap and BPF sockhash redirection in combination with vsocks (Michal Luczaj) - Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri) - Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility of an infinite BPF tailcall (Pu Lehui) - Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot be resolved (Thomas Weißschuh) - Fix a bug in kfunc BTF caching for modules where the wrong BTF object was returned (Toke Høiland-Jørgensen) - Fix a BPF selftest compilation error in cgroup-related tests with musl libc (Tony Ambardar) - Several fixes to BPF link info dumps to fill missing fields (Tyrone Wu) - Add BPF selftests for kfuncs from multiple modules, checking that the correct kfuncs are called (Simon Sundberg) - Ensure that internal and user-facing bpf_redirect flags don't overlap (Toke Høiland-Jørgensen) - Switch to use kvzmalloc to allocate BPF verifier environment (Rik van Riel) - Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat under RT (Wander Lairson Costa) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits) lib/buildid: Handle memfd_secret() files in build_id_parse() selftests/bpf: Add test case for delta propagation bpf: Fix print_reg_state's constant scalar dump bpf: Fix incorrect delta propagation between linked registers bpf: Properly test iter/task tid filtering bpf: Fix iter/task tid filtering riscv, bpf: Make BPF_CMPXCHG fully ordered bpf, vsock: Drop static vsock_bpf_prot initialization vsock: Update msg_count on read_skb() vsock: Update rx_bytes on read_skb() bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock selftests/bpf: Add asserts for netfilter link info bpf: Fix link info netfilter flags to populate defrag flag selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx() selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx() bpf: Fix truncation bug in coerce_reg_to_size_sx() selftests/bpf: Assert link info uprobe_multi count & path_size if unset bpf: Fix unpopulated path_size when uprobe_multi fields unset selftests/bpf: Fix cross-compiling urandom_read selftests/bpf: Add test for kfunc module order ...
2024-10-18Merge tag 'block-6.12-20241018' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Fix target passthrough identifier (Nilay) - Fix tcp locking (Hannes) - Replace list with sbitmap for tracking RDMA rsp tags (Guixen) - Remove unnecessary fallthrough statements (Tokunori) - Remove ready-without-media support (Greg) - Fix multipath partition scan deadlock (Keith) - Fix concurrent PCI reset and remove queue mapping (Maurizio) - Fabrics shutdown fixes (Nilay) - Fix for a kerneldoc warning (Keith) - Fix a race with blk-rq-qos and wakeups (Omar) - Cleanup of checking for always-set tag_set (SurajSonawane2415) - Fix for a crash with CPU hotplug notifiers (Ming) - Don't allow zero-copy ublk on unprivileged device (Ming) - Use array_index_nospec() for CDROM (Josh) - Remove dead code in drbd (David) - Tweaks to elevator loading (Breno) * tag 'block-6.12-20241018' of git://git.kernel.dk/linux: cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed() nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function nvme: make keep-alive synchronous operation nvme-loop: flush off pending I/O while shutting down loop controller nvme-pci: fix race condition between reset and nvme_dev_disable() ublk: don't allow user copy for unprivileged device blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race nvme-multipath: defer partition scanning blk-mq: setup queue ->tag_set before initializing hctx elevator: Remove argument from elevator_find_get elevator: do not request_module if elevator exists drbd: Remove unused conn_lowest_minor nvme: disable CC.CRIME (NVME_CC_CRIME) nvme: delete unnecessary fallthru comment nvmet-rdma: use sbitmap to replace rsp free list block: Fix elevator_get_default() checking for NULL q->tag_set nvme: tcp: avoid race between queue_lock lock and destroy nvmet-passthru: clear EUID/NGUID/UUID while using loop target block: fix blk_rq_map_integrity_sg kernel-doc
2024-10-18Merge tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: - Fix integer overflow in xrep_bmap - Fix stale dealloc punching for COW IO * tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: punch delalloc extents from the COW fork for COW writes xfs: set IOMAP_F_SHARED for all COW fork allocations xfs: share more code in xfs_buffered_write_iomap_begin xfs: support the COW fork in xfs_bmap_punch_delalloc_range xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof xfs: factor out a xfs_file_write_zero_eof helper iomap: move locking out of iomap_write_delalloc_release iomap: remove iomap_file_buffered_write_punch_delalloc iomap: factor out a iomap_last_written_block helper xfs: fix integer overflow in xrep_bmap
2024-10-18Merge tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly fixes, msm and xe are the two main ones, with a bunch of scattered fixes including a largish revert in mgag200, then amdgpu, vmwgfx and scattering of other minor ones. All seems pretty regular. msm: - Display: - move CRTC resource assignment to atomic_check otherwise to make consecutive calls to atomic_check() consistent - fix rounding / sign-extension issues with pclk calculation in case of DSC - cleanups to drop incorrect null checks in dpu snapshots - fix to use kvzalloc in dpu snapshot to avoid allocation issues in heavily loaded system cases - Fix to not program merge_3d block if dual LM is not being used - Fix to not flush merge_3d block if its not enabled otherwise this leads to false timeouts - GPU: - a7xx: add a fence wait before SMMU table update xe: - New workaround to Xe2 (Aradhya) - Fix unbalanced rpm put (Matthew Auld) - Remove fragile lock optimization (Matthew Brost) - Fix job release, delegating it to the drm scheduler (Matthew Brost) - Fix timestamp bit width for Xe2 (Lucas) - Fix external BO's dma-resv usag (Matthew Brost) - Fix returning success for timeout in wait_token (Nirmoy) - Initialize fence to avoid it being detected as signaled (Matthew Auld) - Improve cache flush for BMG (Matthew Auld) - Don't allow hflip for tile4 framebuffer on Xe2 (Juha-Pekka) amdgpu: - SR-IOV fix - CS chunk handling fix - MES fixes - SMU13 fixes amdkfd: - VRAM usage reporting fix radeon: - Fix possible_clones handling i915: - Two DP bandwidth related MST fixes ast: - Clear EDID on unplugged connectors host1x: - Fix boot on Tegra186 - Set DMA parameters mgag200: - Revert VBLANK support panel: - himax-hx83192: Adjust power and gamma qaic: - Sgtable loop fixes vmwgfx: - Limit display layout allocatino size - Handle allocation errors in connector checks - Clean up KMS code for 2d-only setup - Report surface-check errors correctly - Remove NULL test around kvfree()" * tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel: (45 commits) drm/ast: vga: Clear EDID if no display is connected drm/ast: sil164: Clear EDID if no display is connected Revert "drm/mgag200: Add vblank support" drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs drm/i915/display: Don't allow tile4 framebuffer to do hflip on display20 or greater drm/xe/bmg: improve cache flushing behaviour drm/xe/xe_sync: initialise ufence.signalled drm/xe/ufence: ufence can be signaled right after wait_woken drm/xe: Use bookkeep slots for external BO's in exec IOCTL drm/xe/query: Increase timestamp width drm/xe: Don't free job in TDR drm/xe: Take job list lock in xe_sched_add_pending_job drm/xe: fix unbalanced rpm put() with declare_wedged() drm/xe: fix unbalanced rpm put() with fence_fini() drm/xe/xe2lpg: Extend Wa_15016589081 for xe2lpg drm/i915/dp_mst: Don't require DSC hblank quirk for a non-DSC compatible mode drm/i915/dp_mst: Handle error during DSC BW overhead/slice calculation drm/msm/a6xx+: Insert a fence wait before SMMU table update drm/msm/dpu: don't always program merge_3d block drm/msm/dpu: Don't always set merge_3d pending flush ...
2024-10-18xen: Remove dependency between pciback and privcmdJiqian Chen
Commit 2fae6bb7be32 ("xen/privcmd: Add new syscall to get gsi from dev") adds a weak reverse dependency to the config XEN_PRIVCMD definition, that dependency causes xen-privcmd can't be loaded on domU, because dependent xen-pciback isn't always be loaded successfully on domU. To solve above problem, remove that dependency, and do not call pcistub_get_gsi_from_sbdf() directly, instead add a hook in drivers/xen/apci.c, xen-pciback register the real call function, then in privcmd_ioctl_pcidev_get_gsi call that hook. Fixes: 2fae6bb7be32 ("xen/privcmd: Add new syscall to get gsi from dev") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20241012084537.1543059-1-Jiqian.Chen@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-17Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "28 hotfixes. 13 are cc:stable. 23 are MM. It is the usual shower of unrelated singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits) maple_tree: add regression test for spanning store bug maple_tree: correct tree corruption on spanning store mm/mglru: only clear kswapd_failures if reclaimable mm/swapfile: skip HugeTLB pages for unuse_vma selftests: mm: fix the incorrect usage() info of khugepaged MAINTAINERS: add Jann as memory mapping/VMA reviewer mm: swap: prevent possible data-race in __try_to_reclaim_swap mm: khugepaged: fix the incorrect statistics when collapsing large file folios MAINTAINERS: kasan, kcov: add bugzilla links mm: don't install PMD mappings when THPs are disabled by the hw/process/vma mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs Docs/damon/maintainer-profile: add missing '_' suffixes for external web links maple_tree: check for MA_STATE_BULK on setting wr_rebalance mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets() mm: remove unused stub for can_swapin_thp() mailmap: add an entry for Andy Chiu MAINTAINERS: add memory mapping/VMA co-maintainers fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization ...
2024-10-18Merge tag 'drm-misc-fixes-2024-10-17' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: ast: - Clear EDID on unplugged connectors host1x: - Fix boot on Tegra186 - Set DMA parameters mgag200: - Revert VBLANK support panel: - himax-hx83192: Adjust power and gamma qaic: - Sgtable loop fixes vmwgfx: - Limit display layout allocatino size - Handle allocation errors in connector checks - Clean up KMS code for 2d-only setup - Report surface-check errors correctly - Remove NULL test around kvfree() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20241017115516.GA196624@linux.fritz.box
2024-10-17Merge tag 'sound-6.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes, nothing really stands out: - Usual HD-audio quirks / device-specific fixes - Kconfig dependency fix for UM - A series of minor fixes for SoundWire - Updates of USB-audio LINE6 contact address" * tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2 ALSA/hda: intel-sdw-acpi: add support for sdw-manager-list property read ALSA/hda: intel-sdw-acpi: simplify sdw-master-count property read ALSA/hda: intel-sdw-acpi: fetch fwnode once in sdw_intel_scan_controller() ALSA/hda: intel-sdw-acpi: cleanup sdw_intel_scan_controller ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects ALSA: scarlett2: Add error check after retrieving PEQ filter values ALSA: hda/cs8409: Fix possible NULL dereference sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML ALSA: line6: update contact information ALSA: usb-audio: Fix NULL pointer deref in snd_usb_power_domain_set() ALSA: hda/conexant - Fix audio routing for HP EliteOne 1000 G2 ALSA: hda: Sound support for HP Spectre x360 16 inch model 2024