summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-07-29Merge tag 'driver-core-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...
2025-07-29Merge tag 'tty-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver updates from Greg KH: "Here is the big set of TTY and Serial driver updates for 6.17-rc1. Included in here is the following types of changes: - another cleanup round from Jiri for the 8250 serial driver and some other tty drivers, things are slowly getting better with our apis thanks to this work. This touched many tty drivers all over the tree. - qcom_geni_serial driver update for new platforms and devices - 8250 quirk handling fixups - dt serial binding updates for different boards/platforms - other minor cleanups and fixes All of these have been in linux-next with no reported issues" * tag 'tty-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits) dt-bindings: serial: snps-dw-apb-uart: Allow use of a power-domain serial: 8250: fix panic due to PSLVERR dt-bindings: serial: samsung: add samsung,exynos2200-uart compatible vt: defkeymap: Map keycodes above 127 to K_HOLE vt: keyboard: Don't process Unicode characters in K_OFF mode serial: qcom-geni: Enable Serial on SA8255p Qualcomm platforms serial: qcom-geni: Enable PM runtime for serial driver serial: qcom-geni: move clock-rate logic to separate function serial: qcom-geni: move resource control logic to separate functions serial: qcom-geni: move resource initialization to separate function soc: qcom: geni-se: Enable QUPs on SA8255p Qualcomm platforms dt-bindings: qcom: geni-se: describe SA8255p dt-bindings: serial: describe SA8255p serial: 8250_dw: Fix typo "notifer" dt-bindings: serial: 8250: spacemit: set clocks property as required dt-bindings: serial: renesas: Document RZ/V2N SCIF serial: 8250_ce4100: Fix CONFIG_SERIAL_8250=n build tty: omit need_resched() before cond_resched() serial: 8250_ni: Reorder local variables serial: 8250_ni: Fix build warning ...
2025-07-28Merge tag 'libcrypto-updates-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: "This is the main crypto library pull request for 6.17. The main focus this cycle is on reorganizing the SHA-1 and SHA-2 code, providing high-quality library APIs for SHA-1 and SHA-2 including HMAC support, and establishing conventions for lib/crypto/ going forward: - Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares most of the SHA-512 code) into lib/crypto/. This includes both the generic and architecture-optimized code. Greatly simplify how the architecture-optimized code is integrated. Add an easy-to-use library API for each SHA variant, including HMAC support. Finally, reimplement the crypto_shash support on top of the library API. - Apply the same reorganization to the SHA-256 code (and also SHA-224 which shares most of the SHA-256 code). This is a somewhat smaller change, due to my earlier work on SHA-256. But this brings in all the same additional improvements that I made for SHA-1 and SHA-512. There are also some smaller changes: - Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For these algorithms it's just a move, not a full reorganization yet. - Fix the MIPS chacha-core.S to build with the clang assembler. - Fix the Poly1305 functions to work in all contexts. - Fix a performance regression in the x86_64 Poly1305 code. - Clean up the x86_64 SHA-NI optimized SHA-1 assembly code. Note that since the new organization of the SHA code is much simpler, the diffstat of this pull request is negative, despite the addition of new fully-documented library APIs for multiple SHA and HMAC-SHA variants. These APIs will allow further simplifications across the kernel as users start using them instead of the old-school crypto API. (I've already written a lot of such conversion patches, removing over 1000 more lines of code. But most of those will target 6.18 or later)" * tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits) lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils lib/crypto: x86/sha1-ni: Convert to use rounds macros lib/crypto: x86/sha1-ni: Minor optimizations and cleanup crypto: sha1 - Remove sha1_base.h lib/crypto: x86/sha1: Migrate optimized code into library lib/crypto: sparc/sha1: Migrate optimized code into library lib/crypto: s390/sha1: Migrate optimized code into library lib/crypto: powerpc/sha1: Migrate optimized code into library lib/crypto: mips/sha1: Migrate optimized code into library lib/crypto: arm64/sha1: Migrate optimized code into library lib/crypto: arm/sha1: Migrate optimized code into library crypto: sha1 - Use same state format as legacy drivers crypto: sha1 - Wrap library and add HMAC support lib/crypto: sha1: Add HMAC support lib/crypto: sha1: Add SHA-1 library functions lib/crypto: sha1: Rename sha1_init() to sha1_init_raw() crypto: x86/sha1 - Rename conflicting symbol lib/crypto: sha2: Add hmac_sha*_init_usingrawkey() lib/crypto: arm/poly1305: Remove unneeded empty weak function lib/crypto: x86/poly1305: Fix performance regression on short messages ...
2025-07-28Merge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring updates from Jens Axboe: - Optimization to avoid reference counts on non-cloned registered buffers. This is how these buffers were handled prior to having cloning support, and we can still use that approach as long as the buffers haven't been cloned to another ring. - Cleanup and improvement for uring_cmd, where btrfs was the only user of storing allocated data for the lifetime of the uring_cmd. Clean that up so we can get rid of the need to do that. - Avoid unnecessary memory copies in uring_cmd usage. This is particularly important as a lot of uring_cmd usage necessitates the use of 128b SQEs. - A few updates for recv multishot, where it's now possible to add fairness limits for limiting how much is transferred for each retry loop. Additionally, recv multishot now supports an overall cap as well, where once reached the multishot recv will terminate. The latter is useful for buffer management and juggling many recv streams at the same time. - Add support for returning the TX timestamps via a new socket command. This feature can work in either singleshot or multishot mode, where the latter triggers a completion whenever new timestamps are available. This is an alternative to using the existing error queue. - Add support for an io_uring "mock" file, which is the start of being able to do 100% targeted testing in terms of exercising io_uring request handling. The idea is to have a file type that can be anything the tester would like, and behave exactly how you want it to behave in terms of hitting the code paths you want. - Improve zcrx by using sgtables to de-duplicate and improve dma address handling. - Prep work for supporting larger pages for zcrx. - Various little improvements and fixes. * tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits) io_uring/zcrx: fix leaking pages on sg init fail io_uring/zcrx: don't leak pages on account failure io_uring/zcrx: fix null ifq on area destruction io_uring: fix breakage in EXPERT menu io_uring/cmd: remove struct io_uring_cmd_data btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag io_uring/zcrx: account area memory io_uring: export io_[un]account_mem io_uring/net: Support multishot receive len cap io_uring: deduplicate wakeup handling io_uring/net: cast min_not_zero() type io_uring/poll: cleanup apoll freeing io_uring/net: allow multishot receive per-invocation cap io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags io_uring/net: use passed in 'len' in io_recv_buf_select() io_uring/zcrx: prepare fallback for larger pages io_uring/zcrx: assert area type in io_zcrx_iov_page io_uring/zcrx: allocate sgtable for umem areas io_uring/zcrx: introduce io_populate_area_dma ...
2025-07-28Merge tag 'vfs-6.17-rc1.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: - persistent info Persist exit and coredump information independent of whether anyone currently holds a pidfd for the struct pid. The current scheme allocated pidfs dentries on-demand repeatedly. This scheme is reaching it's limits as it makes it impossible to pin information that needs to be available after the task has exited or coredumped and that should not be lost simply because the pidfd got closed temporarily. The next opener should still see the stashed information. This is also a prerequisite for supporting extended attributes on pidfds to allow attaching meta information to them. If someone opens a pidfd for a struct pid a pidfs dentry is allocated and stashed in pid->stashed. Once the last pidfd for the struct pid is closed the pidfs dentry is released and removed from pid->stashed. So if 10 callers create a pidfs dentry for the same struct pid sequentially, i.e., each closing the pidfd before the other creates a new one then a new pidfs dentry is allocated every time. Because multiple tasks acquiring and releasing a pidfd for the same struct pid can race with each another a task may still find a valid pidfs entry from the previous task in pid->stashed and reuse it. Or it might find a dead dentry in there and fail to reuse it and so stashes a new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry but only one will succeed, the other ones will put their dentry. The current scheme aims to ensure that a pidfs dentry for a struct pid can only be created if the task is still alive or if a pidfs dentry already existed before the task was reaped and so exit information has been was stashed in the pidfs inode. That's great except that it's buggy. If a pidfs dentry is stashed in pid->stashed after pidfs_exit() but before __unhash_process() is called we will return a pidfd for a reaped task without exit information being available. The pidfds_pid_valid() check does not guard against this race as it doens't sync at all with pidfs_exit(). The pid_has_task() check might be successful simply because we're before __unhash_process() but after pidfs_exit(). Introduce a new scheme where the lifetime of information associated with a pidfs entry (coredump and exit information) isn't bound to the lifetime of the pidfs inode but the struct pid itself. The first time a pidfs dentry is allocated for a struct pid a struct pidfs_attr will be allocated which will be used to store exit and coredump information. If all pidfs for the pidfs dentry are closed the dentry and inode can be cleaned up but the struct pidfs_attr will stick until the struct pid itself is freed. This will ensure minimal memory usage while persisting relevant information. The new scheme has various advantages. First, it allows to close the race where we end up handing out a pidfd for a reaped task for which no exit information is available. Second, it minimizes memory usage. Third, it allows to remove complex lifetime tracking via dentries when registering a struct pid with pidfs. There's no need to get or put a reference. Instead, the lifetime of exit and coredump information associated with a struct pid is bound to the lifetime of struct pid itself. - extended attributes Now that we have a way to persist information for pidfs dentries we can start supporting extended attributes on pidfds. This will allow userspace to attach meta information to tasks. One natural extension would be to introduce a custom pidfs.* extended attribute space and allow for the inheritance of extended attributes across fork() and exec(). The first simple scheme will allow privileged userspace to set trusted extended attributes on pidfs inodes. - Allow autonomous pidfs file handles Various filesystems such as pidfs and drm support opening file handles without having to require a file descriptor to identify the filesystem. The filesystem are global single instances and can be trivially identified solely on the information encoded in the file handle. This makes it possible to not have to keep or acquire a sentinal file descriptor just to pass it to open_by_handle_at() to identify the filesystem. That's especially useful when such sentinel file descriptor cannot or should not be acquired. For pidfs this means a file handle can function as full replacement for storing a pid in a file. Instead a file handle can be stored and reopened purely based on the file handle. Such autonomous file handles can be opened with or without specifying a a file descriptor. If no proper file descriptor is used the FD_PIDFS_ROOT sentinel must be passed. This allows us to define further special negative fd sentinels in the future. Userspace can trivially test for support by trying to open the file handle with an invalid file descriptor. - Allow pidfds for reaped tasks with SCM_PIDFD messages This is a logical continuation of the earlier work to create pidfds for reaped tasks through the SO_PEERPIDFD socket option merged in 923ea4d4482b ("Merge patch series "net, pidfs: enable handing out pidfds for reaped sk->sk_peer_pid""). - Two minor fixes: * Fold fs_struct->{lock,seq} into a seqlock * Don't bother with path_{get,put}() in unix_open_file() * tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits) don't bother with path_get()/path_put() in unix_open_file() fold fs_struct->{lock,seq} into a seqlock selftests: net: extend SCM_PIDFD test to cover stale pidfds af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD af_unix: stash pidfs dentry when needed af_unix/scm: fix whitespace errors af_unix: introduce and use scm_replace_pid() helper af_unix: introduce unix_skb_to_scm helper af_unix: rework unix_maybe_add_creds() to allow sleep selftests/pidfd: decode pidfd file handles withou having to specify an fd fhandle, pidfs: support open_by_handle_at() purely based on file handle uapi/fcntl: add FD_PIDFS_ROOT uapi/fcntl: add FD_INVALID fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP uapi/fcntl: mark range as reserved fhandle: reflow get_path_anchor() pidfs: add pidfs_root_path() helper fhandle: rename to get_path_anchor() fhandle: hoist copy_from_user() above get_path_from_fd() fhandle: raise FILEID_IS_DIR in handle_type ...
2025-07-28Merge tag 'vfs-6.17-rc1.nsfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains namespace updates. This time specifically for nsfs: - Userspace heavily relies on the root inode numbers for namespaces to identify the initial namespaces. That's already a hard dependency. So we cannot change that anymore. Move the initial inode numbers to a public header and align the only two namespaces that currently don't do that with all the other namespaces. - The root inode of /proc having a fixed inode number has been part of the core kernel ABI since its inception, and recently some userspace programs (mainly container runtimes) have started to explicitly depend on this behaviour. The main reason this is useful to userspace is that by checking that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is PROCFS_ROOT_INO, they can then use openat2() together with RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a bind-mount that replaces some procfs file with a different one. This kind of attack has lead to security issues in container runtimes in the past (such as CVE-2019-19921) and libraries like libpathrs[1] use this feature of procfs to provide safe procfs handling functions" * tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: uapi: export PROCFS_ROOT_INO mntns: use stable inode number for initial mount ns netns: use stable inode number for initial mount ns nsfs: move root inode number to uapi
2025-07-28Merge tag 'pull-rpc_pipefs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rpc_pipefs updates from Al Viro: "Massage rpc_pipefs to use saner primitives and clean up the APIs provided to the rest of the kernel" * tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rpc_create_client_dir(): return 0 or -E... rpc_create_client_dir(): don't bother with rpc_populate() rpc_new_dir(): the last argument is always NULL rpc_pipe: expand the calls of rpc_mkdir_populate() rpc_gssd_dummy_populate(): don't bother with rpc_populate() rpc_mkpipe_dentry(): switch to simple_start_creating() rpc_pipe: saner primitive for creating regular files rpc_pipe: saner primitive for creating subdirectories rpc_pipe: don't overdo directory locking rpc_mkpipe_dentry(): saner calling conventions rpc_unlink(): saner calling conventions rpc_populate(): lift cleanup into callers rpc_unlink(): use simple_recursive_removal() rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead rpc_pipe: clean failure exits in fill_super new helper: simple_start_creating()
2025-07-28Merge tag 'pull-dcache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dentry d_flags updates from Al Viro: "The current exclusion rules for dentry->d_flags stores are rather unpleasant. The basic rules are simple: - stores to dentry->d_flags are OK under dentry->d_lock - stores to dentry->d_flags are OK in the dentry constructor, before becomes potentially visible to other threads Unfortunately, there's a couple of exceptions to that, and that's where the headache comes from. The main PITA comes from d_set_d_op(); that primitive sets ->d_op of dentry and adjusts the flags that correspond to presence of individual methods. It's very easy to misuse; existing uses _are_ safe, but proof of correctness is brittle. Use in __d_alloc() is safe (we are within a constructor), but we might as well precalculate the initial value of 'd_flags' when we set the default ->d_op for given superblock and set 'd_flags' directly instead of messing with that helper. The reasons why other uses are safe are bloody convoluted; I'm not going to reproduce it here. See [1] for gory details, if you care. The critical part is using d_set_d_op() only just prior to d_splice_alias(), which makes a combination of d_splice_alias() with setting ->d_op, etc a natural replacement primitive. Better yet, if we go that way, it's easy to take setting ->d_op and modifying 'd_flags' under ->d_lock, which eliminates the headache as far as 'd_flags' exclusion rules are concerned. Other exceptions are minor and easy to deal with. What this series does: - d_set_d_op() is no longer available; instead a new primitive (d_splice_alias_ops()) is provided, equivalent to combination of d_set_d_op() and d_splice_alias(). - new field of struct super_block - 's_d_flags'. This sets the default value of 'd_flags' to be used when allocating dentries on this filesystem. - new primitive for setting 's_d_op': set_default_d_op(). This replaces stores to 's_d_op' at mount time. All in-tree filesystems converted; out-of-tree ones will get caught by the compiler ('s_d_op' is renamed, so stores to it will be caught). 's_d_flags' is set by the same primitive to match the 's_d_op'. - a lot of filesystems had sb->s_d_op->d_delete equal to always_delete_dentry; that is equivalent to setting DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well set that bit in 's_d_flags' and drop 'd_delete()' from dentry_operations. In quite a few cases that results in empty dentry_operations, which means that we can get rid of those. - kill simple_dentry_operations - not needed anymore - massage d_alloc_parallel() to get rid of the other exception wrt 'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we allocate the new dentry; no need to delay that until we commit to using the sucker. As the result, 'd_flags' stores are all either under ->d_lock or done before the dentry becomes visible in any shared data structures" Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1] * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits) configfs: use DCACHE_DONTCACHE debugfs: use DCACHE_DONTCACHE efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry() 9p: don't bother with always_delete_dentry ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE kill simple_dentry_operations devpts, sunrpc, hostfs: don't bother with ->d_op shmem: no dentry retention past the refcount reaching zero d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier make d_set_d_op() static simple_lookup(): just set DCACHE_DONTCACHE tracefs: Add d_delete to remove negative dentries set_default_d_op(): calculate the matching value for ->d_flags correct the set of flags forbidden at d_set_d_op() time split d_flags calculation out of d_set_d_op() new helper: set_default_d_op() fuse: no need for special dentry_operations for root dentry switch procfs from d_set_d_op() to d_splice_alias_ops() new helper: d_splice_alias_ops() procfs: kill ->proc_dops ...
2025-07-28Merge tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "NFSD is finally able to offer write delegations to clients that open files with O_WRONLY, thanks to patches from Dai Ngo. We're expecting this to accelerate a few interesting corner cases. The cap on the number of operations per NFSv4 COMPOUND has been lifted. Now, clients that send COMPOUNDs containing dozens of operations (for example, a long stream of LOOKUP operations to walk a pathname in a single round trip) will no longer be rejected. This release re-enables the ability for NFSD to perform NFSv4.2 COPY operations asynchronously. This feature has been disabled to mitigate the risk of denial-of-service when too many such requests arrive. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.17 development cycle" * tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (32 commits) nfsd: Drop dprintk in blocklayout xdr functions sunrpc: make svc_tcp_sendmsg() take a signed sentp pointer sunrpc: rearrange struct svc_rqst for fewer cachelines sunrpc: return better error in svcauth_gss_accept() on alloc failure sunrpc: reset rq_accept_statp when starting a new RPC sunrpc: remove SVC_SYSERR sunrpc: fix handling of unknown auth status codes NFSD: Simplify struct knfsd_fh NFSD: Access a knfsd_fh's fsid by pointer Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous" NFSD: Avoid multiple -Wflex-array-member-not-at-end warnings NFSD: Use vfs_iocb_iter_write() NFSD: Use vfs_iocb_iter_read() NFSD: Clean up kdoc for nfsd_open_local_fh() NFSD: Clean up kdoc for nfsd_file_put_local() NFSD: Remove definition for trace_nfsd_ctl_maxconn NFSD: Remove definition for trace_nfsd_file_gc_recent NFSD: Remove definitions for unused trace_nfsd_file_lru trace points NFSD: Remove definition for trace_nfsd_file_unhash_and_queue nfsd: Use correct error code when decoding extents ...
2025-07-24Merge tag 'ipsec-2025-07-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-07-23 1) Premption fixes for xfrm_state_find. From Sabrina Dubroca. 2) Initialize offload path also for SW IPsec GRO. This fixes a performance regression on SW IPsec offload. From Leon Romanovsky. 3) Fix IPsec UDP GRO for IKE packets. From Tobias Brunner, 4) Fix transport header setting for IPcomp after decompressing. From Fernando Fernandez Mancera. 5) Fix use-after-free when xfrmi_changelink tries to change collect_md for a xfrm interface. From Eyal Birger . 6) Delete the special IPcomp x->tunnel state along with the state x to avoid refcount problems. From Sabrina Dubroca. Please pull or let me know if there are problems. * tag 'ipsec-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: Revert "xfrm: destroy xfrm_state synchronously on net exit path" xfrm: delete x->tunnel as we delete x xfrm: interface: fix use-after-free after changing collect_md xfrm interface xfrm: ipcomp: adjust transport header after decompressing xfrm: Set transport header to fix UDP GRO handling xfrm: always initialize offload path xfrm: state: use a consistent pcpu_id in xfrm_state_find xfrm: state: initialize state_ptrs earlier in xfrm_state_find ==================== Link: https://patch.msgid.link/20250723075417.3432644-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-22net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in ↵Xiang Mei
qfq_delete_class might_sleep could be trigger in the atomic context in qfq_delete_class. qfq_destroy_class was moved into atomic context locked by sch_tree_lock to avoid a race condition bug on qfq_aggregate. However, might_sleep could be triggered by qfq_destroy_class, which introduced sleeping in atomic context (path: qfq_destroy_class->qdisc_put->__qdisc_destroy->lockdep_unregister_key ->might_sleep). Considering the race is on the qfq_aggregate objects, keeping qfq_rm_from_agg in the lock but moving the left part out can solve this issue. Fixes: 5e28d5a3f774 ("net/sched: sch_qfq: Fix race condition on qfq_aggregate") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/4a04e0cc-a64b-44e7-9213-2880ed641d77@sabinyo.mountain Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/20250717230128.159766-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-21net: appletalk: Fix use-after-free in AARP proxy probeKito Xu (veritas501)
The AARP proxy‐probe routine (aarp_proxy_probe_network) sends a probe, releases the aarp_lock, sleeps, then re-acquires the lock. During that window an expire timer thread (__aarp_expire_timer) can remove and kfree() the same entry, leading to a use-after-free. race condition: cpu 0 | cpu 1 atalk_sendmsg() | atif_proxy_probe_device() aarp_send_ddp() | aarp_proxy_probe_network() mod_timer() | lock(aarp_lock) // LOCK!! timeout around 200ms | alloc(aarp_entry) and then call | proxies[hash] = aarp_entry aarp_expire_timeout() | aarp_send_probe() | unlock(aarp_lock) // UNLOCK!! lock(aarp_lock) // LOCK!! | msleep(100); __aarp_expire_timer(&proxies[ct]) | free(aarp_entry) | unlock(aarp_lock) // UNLOCK!! | | lock(aarp_lock) // LOCK!! | UAF aarp_entry !! ================================================================== BUG: KASAN: slab-use-after-free in aarp_proxy_probe_network+0x560/0x630 net/appletalk/aarp.c:493 Read of size 4 at addr ffff8880123aa360 by task repro/13278 CPU: 3 UID: 0 PID: 13278 Comm: repro Not tainted 6.15.2 #3 PREEMPT(full) Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xc1/0x630 mm/kasan/report.c:521 kasan_report+0xca/0x100 mm/kasan/report.c:634 aarp_proxy_probe_network+0x560/0x630 net/appletalk/aarp.c:493 atif_proxy_probe_device net/appletalk/ddp.c:332 [inline] atif_ioctl+0xb58/0x16c0 net/appletalk/ddp.c:857 atalk_ioctl+0x198/0x2f0 net/appletalk/ddp.c:1818 sock_do_ioctl+0xdc/0x260 net/socket.c:1190 sock_ioctl+0x239/0x6a0 net/socket.c:1311 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl fs/ioctl.c:892 [inline] __x64_sys_ioctl+0x194/0x200 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcb/0x250 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> Allocated: aarp_alloc net/appletalk/aarp.c:382 [inline] aarp_proxy_probe_network+0xd8/0x630 net/appletalk/aarp.c:468 atif_proxy_probe_device net/appletalk/ddp.c:332 [inline] atif_ioctl+0xb58/0x16c0 net/appletalk/ddp.c:857 atalk_ioctl+0x198/0x2f0 net/appletalk/ddp.c:1818 Freed: kfree+0x148/0x4d0 mm/slub.c:4841 __aarp_expire net/appletalk/aarp.c:90 [inline] __aarp_expire_timer net/appletalk/aarp.c:261 [inline] aarp_expire_timeout+0x480/0x6e0 net/appletalk/aarp.c:317 The buggy address belongs to the object at ffff8880123aa300 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 96 bytes inside of freed 192-byte region [ffff8880123aa300, ffff8880123aa3c0) Memory state around the buggy address: ffff8880123aa200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880123aa280: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc >ffff8880123aa300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880123aa380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880123aa400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kito Xu (veritas501) <hxzene@gmail.com> Link: https://patch.msgid.link/20250717012843.880423-1-hxzene@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-21Merge tag 'v6.16-rc7' into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17Merge tag 'for-net-2025-07-17' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: fix connectable extended advertising when using static random address - hci_core: fix typos in macros - hci_core: add missing braces when using macro parameters - hci_core: replace 'quirks' integer by 'quirk_flags' bitmap - SMP: If an unallowed command is received consider it a failure - SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout - L2CAP: Fix null-ptr-deref in l2cap_sock_resume_cb() - L2CAP: Fix attempting to adjust outgoing MTU - btintel: Check if controller is ISO capable on btintel_classify_pkt_type - btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID * tag 'for-net-2025-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmap Bluetooth: hci_core: add missing braces when using macro parameters Bluetooth: hci_core: fix typos in macros Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout Bluetooth: SMP: If an unallowed command is received consider it a failure Bluetooth: btintel: Check if controller is ISO capable on btintel_classify_pkt_type Bluetooth: hci_sync: fix connectable extended advertising when using static random address Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb() ==================== Link: https://patch.msgid.link/20250717142849.537425-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix to use conn aborts for conn-wide failuresDavid Howells
Fix rxrpc to use connection-level aborts for things that affect the whole connection, such as the service ID not matching a local service. Fixes: 57af281e5389 ("rxrpc: Tidy up abort generation infrastructure") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix transmission of an abort in response to an abortDavid Howells
Under some circumstances, such as when a server socket is closing, ABORT packets will be generated in response to incoming packets. Unfortunately, this also may include generating aborts in response to incoming aborts - which may cause a cycle. It appears this may be made possible by giving the client a multicast address. Fix this such that rxrpc_reject_packet() will refuse to generate aborts in response to aborts. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix notification vs call-release vs recvmsgDavid Howells
When a call is released, rxrpc takes the spinlock and removes it from ->recvmsg_q in an effort to prevent racing recvmsg() invocations from seeing the same call. Now, rxrpc_recvmsg() only takes the spinlock when actually removing a call from the queue; it doesn't, however, take it in the lead up to that when it checks to see if the queue is empty. It *does* hold the socket lock, which prevents a recvmsg/recvmsg race - but this doesn't prevent sendmsg from ending the call because sendmsg() drops the socket lock and relies on the call->user_mutex. Fix this by firstly removing the bit in rxrpc_release_call() that dequeues the released call and, instead, rely on recvmsg() to simply discard released calls (done in a preceding fix). Secondly, rxrpc_notify_socket() is abandoned if the call is already marked as released rather than trying to be clever by setting both pointers in call->recvmsg_link to NULL to trick list_empty(). This isn't perfect and can still race, resulting in a released call on the queue, but recvmsg() will now clean that up. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix recv-recv race of completed callDavid Howells
If a call receives an event (such as incoming data), the call gets placed on the socket's queue and a thread in recvmsg can be awakened to go and process it. Once the thread has picked up the call off of the queue, further events will cause it to be requeued, and once the socket lock is dropped (recvmsg uses call->user_mutex to allow the socket to be used in parallel), a second thread can come in and its recvmsg can pop the call off the socket queue again. In such a case, the first thread will be receiving stuff from the call and the second thread will be blocked on call->user_mutex. The first thread can, at this point, process both the event that it picked call for and the event that the second thread picked the call for and may see the call terminate - in which case the call will be "released", decoupling the call from the user call ID assigned to it (RXRPC_USER_CALL_ID in the control message). The first thread will return okay, but then the second thread will wake up holding the user_mutex and, if it sees that the call has been released by the first thread, it will BUG thusly: kernel BUG at net/rxrpc/recvmsg.c:474! Fix this by just dequeuing the call and ignoring it if it is seen to be already released. We can't tell userspace about it anyway as the user call ID has become stale. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix irq-disabled in local_bh_enable()David Howells
The rxrpc_assess_MTU_size() function calls down into the IP layer to find out the MTU size for a route. When accepting an incoming call, this is called from rxrpc_new_incoming_call() which holds interrupts disabled across the code that calls down to it. Unfortunately, the IP layer uses local_bh_enable() which, config dependent, throws a warning if IRQs are enabled: WARNING: CPU: 1 PID: 5544 at kernel/softirq.c:387 __local_bh_enable_ip+0x43/0xd0 ... RIP: 0010:__local_bh_enable_ip+0x43/0xd0 ... Call Trace: <TASK> rt_cache_route+0x7e/0xa0 rt_set_nexthop.isra.0+0x3b3/0x3f0 __mkroute_output+0x43a/0x460 ip_route_output_key_hash+0xf7/0x140 ip_route_output_flow+0x1b/0x90 rxrpc_assess_MTU_size.isra.0+0x2a0/0x590 rxrpc_new_incoming_peer+0x46/0x120 rxrpc_alloc_incoming_call+0x1b1/0x400 rxrpc_new_incoming_call+0x1da/0x5e0 rxrpc_input_packet+0x827/0x900 rxrpc_io_thread+0x403/0xb60 kthread+0x2f7/0x310 ret_from_fork+0x2a/0x230 ret_from_fork_asm+0x1a/0x30 ... hardirqs last enabled at (23): _raw_spin_unlock_irq+0x24/0x50 hardirqs last disabled at (24): _raw_read_lock_irq+0x17/0x70 softirqs last enabled at (0): copy_process+0xc61/0x2730 softirqs last disabled at (25): rt_add_uncached_list+0x3c/0x90 Fix this by moving the call to rxrpc_assess_MTU_size() out of rxrpc_init_peer() and further up the stack where it can be done without interrupts disabled. It shouldn't be a problem for rxrpc_new_incoming_call() to do it after the locks are dropped as pmtud is going to be performed by the I/O thread - and we're in the I/O thread at this point. Fixes: a2ea9a907260 ("rxrpc: Use irq-disabling spinlocks between app and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtreeWilliam Liu
htb_lookup_leaf has a BUG_ON that can trigger with the following: tc qdisc del dev lo root tc qdisc add dev lo root handle 1: htb default 1 tc class add dev lo parent 1: classid 1:1 htb rate 64bit tc qdisc add dev lo parent 1:1 handle 2: netem tc qdisc add dev lo parent 2:1 handle 3: blackhole ping -I lo -c1 -W0.001 127.0.0.1 The root cause is the following: 1. htb_dequeue calls htb_dequeue_tree which calls the dequeue handler on the selected leaf qdisc 2. netem_dequeue calls enqueue on the child qdisc 3. blackhole_enqueue drops the packet and returns a value that is not just NET_XMIT_SUCCESS 4. Because of this, netem_dequeue calls qdisc_tree_reduce_backlog, and since qlen is now 0, it calls htb_qlen_notify -> htb_deactivate -> htb_deactiviate_prios -> htb_remove_class_from_row -> htb_safe_rb_erase 5. As this is the only class in the selected hprio rbtree, __rb_change_child in __rb_erase_augmented sets the rb_root pointer to NULL 6. Because blackhole_dequeue returns NULL, netem_dequeue returns NULL, which causes htb_dequeue_tree to call htb_lookup_leaf with the same hprio rbtree, and fail the BUG_ON The function graph for this scenario is shown here: 0) | htb_enqueue() { 0) + 13.635 us | netem_enqueue(); 0) 4.719 us | htb_activate_prios(); 0) # 2249.199 us | } 0) | htb_dequeue() { 0) 2.355 us | htb_lookup_leaf(); 0) | netem_dequeue() { 0) + 11.061 us | blackhole_enqueue(); 0) | qdisc_tree_reduce_backlog() { 0) | qdisc_lookup_rcu() { 0) 1.873 us | qdisc_match_from_root(); 0) 6.292 us | } 0) 1.894 us | htb_search(); 0) | htb_qlen_notify() { 0) 2.655 us | htb_deactivate_prios(); 0) 6.933 us | } 0) + 25.227 us | } 0) 1.983 us | blackhole_dequeue(); 0) + 86.553 us | } 0) # 2932.761 us | qdisc_warn_nonwc(); 0) | htb_lookup_leaf() { 0) | BUG_ON(); ------------------------------------------ The full original bug report can be seen here [1]. We can fix this just by returning NULL instead of the BUG_ON, as htb_dequeue_tree returns NULL when htb_lookup_leaf returns NULL. [1] https://lore.kernel.org/netdev/pF5XOOIim0IuEfhI-SOxTgRvNoDwuux7UHKnE_Y5-zVd4wmGvNk2ceHjKb8ORnzw0cGwfmVu42g9dL7XyJLf1NEzaztboTWcm0Ogxuojoeo=@willsroot.io/ Fixes: 512bb43eb542 ("pkt_sched: sch_htb: Optimize WARN_ONs in htb_dequeue_tree() etc.") Signed-off-by: William Liu <will@willsroot.io> Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io> Link: https://patch.msgid.link/20250717022816.221364-1-will@willsroot.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: bridge: Do not offload IGMP/MLD messagesJoseph Huang
Do not offload IGMP/MLD messages as it could lead to IGMP/MLD Reports being unintentionally flooded to Hosts. Instead, let the bridge decide where to send these IGMP/MLD messages. Consider the case where the local host is sending out reports in response to a remote querier like the following: mcast-listener-process (IP_ADD_MEMBERSHIP) \ br0 / \ swp1 swp2 | | QUERIER SOME-OTHER-HOST In the above setup, br0 will want to br_forward() reports for mcast-listener-process's group(s) via swp1 to QUERIER; but since the source hwdom is 0, the report is eligible for tx offloading, and is flooded by hardware to both swp1 and swp2, reaching SOME-OTHER-HOST as well. (Example and illustration provided by Tobias.) Fixes: 472111920f1c ("net: bridge: switchdev: allow the TX data plane forwarding to be offloaded") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716153551.1830255-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtimeDong Chenchen
Assuming the "rx-vlan-filter" feature is enabled on a net device, the 8021q module will automatically add or remove VLAN 0 when the net device is put administratively up or down, respectively. There are a couple of problems with the above scheme. The first problem is a memory leak that can happen if the "rx-vlan-filter" feature is disabled while the device is running: # ip link add bond1 up type bond mode 0 # ethtool -K bond1 rx-vlan-filter off # ip link del dev bond1 When the device is put administratively down the "rx-vlan-filter" feature is disabled, so the 8021q module will not remove VLAN 0 and the memory will be leaked [1]. Another problem that can happen is that the kernel can automatically delete VLAN 0 when the device is put administratively down despite not adding it when the device was put administratively up since during that time the "rx-vlan-filter" feature was disabled. null-ptr-unref or bug_on[2] will be triggered by unregister_vlan_dev() for refcount imbalance if toggling filtering during runtime: $ ip link add bond0 type bond mode 0 $ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q $ ethtool -K bond0 rx-vlan-filter off $ ifconfig bond0 up $ ethtool -K bond0 rx-vlan-filter on $ ifconfig bond0 down $ ip link del vlan0 Root cause is as below: step1: add vlan0 for real_dev, such as bond, team. register_vlan_dev vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1 step2: disable vlan filter feature and enable real_dev step3: change filter from 0 to 1 vlan_device_event vlan_filter_push_vids ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0 step4: real_dev down vlan_device_event vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0 vlan_info_rcu_free //free vlan0 step5: delete vlan0 unregister_vlan_dev BUG_ON(!vlan_info); //vlan_info is null Fix both problems by noting in the VLAN info whether VLAN 0 was automatically added upon NETDEV_UP and based on that decide whether it should be deleted upon NETDEV_DOWN, regardless of the state of the "rx-vlan-filter" feature. [1] unreferenced object 0xffff8880068e3100 (size 256): comm "ip", pid 384, jiffies 4296130254 hex dump (first 32 bytes): 00 20 30 0d 80 88 ff ff 00 00 00 00 00 00 00 00 . 0............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 81ce31fa): __kmalloc_cache_noprof+0x2b5/0x340 vlan_vid_add+0x434/0x940 vlan_device_event.cold+0x75/0xa8 notifier_call_chain+0xca/0x150 __dev_notify_flags+0xe3/0x250 rtnl_configure_link+0x193/0x260 rtnl_newlink_create+0x383/0x8e0 __rtnl_newlink+0x22c/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 [2] kernel BUG at net/8021q/vlan.c:99! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 #61 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1)) RSP: 0018:ffff88810badf310 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8 RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80 R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000 R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e FS: 00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0 Call Trace: <TASK> rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553) rtnetlink_rcv_msg (net/core/rtnetlink.c:6945) netlink_rcv_skb (net/netlink/af_netlink.c:2535) netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339) netlink_sendmsg (net/netlink/af_netlink.c:1883) ____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566) ___sys_sendmsg (net/socket.c:2622) __sys_sendmsg (net/socket.c:2652) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Reported-by: syzbot+a8b046e462915c65b10b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250716034504.2285203-2-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17tls: always refresh the queue when reading sockJakub Kicinski
After recent changes in net-next TCP compacts skbs much more aggressively. This unearthed a bug in TLS where we may try to operate on an old skb when checking if all skbs in the queue have matching decrypt state and geometry. BUG: KASAN: slab-use-after-free in tls_strp_check_rcv+0x898/0x9a0 [tls] (net/tls/tls_strp.c:436 net/tls/tls_strp.c:530 net/tls/tls_strp.c:544) Read of size 4 at addr ffff888013085750 by task tls/13529 CPU: 2 UID: 0 PID: 13529 Comm: tls Not tainted 6.16.0-rc5-virtme Call Trace: kasan_report+0xca/0x100 tls_strp_check_rcv+0x898/0x9a0 [tls] tls_rx_rec_wait+0x2c9/0x8d0 [tls] tls_sw_recvmsg+0x40f/0x1aa0 [tls] inet_recvmsg+0x1c3/0x1f0 Always reload the queue, fast path is to have the record in the queue when we wake, anyway (IOW the path going down "if !strp->stm.full_len"). Fixes: 0d87bbd39d7f ("tls: strp: make sure the TCP skbs do not have overlapping data") Link: https://patch.msgid.link/20250716143850.1520292-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept()Nathan Chancellor
A new warning in clang [1] points out a place in pep_sock_accept() where dst is uninitialized then passed as a const pointer to pep_find_pipe(): net/phonet/pep.c:829:37: error: variable 'dst' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer] 829 | newsk = pep_find_pipe(&pn->hlist, &dst, pipe_handle); | ^~~: Move the call to pn_skb_get_dst_sockaddr(), which initializes dst, to before the call to pep_find_pipe(), so that dst is consistently used initialized throughout the function. Cc: stable@vger.kernel.org Fixes: f7ae8d59f661 ("Phonet: allocate sock from accept syscall rather than soft IRQ") Link: https://github.com/llvm/llvm-project/commit/00dacf8c22f065cb52efb14cd091d441f19b319e [1] Closes: https://github.com/ClangBuiltLinux/linux/issues/2101 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20250715-net-phonet-fix-uninit-const-pointer-v1-1-8efd1bd188b3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Bluetooth: L2CAP: Fix attempting to adjust outgoing MTULuiz Augusto von Dentz
Configuration request only configure the incoming direction of the peer initiating the request, so using the MTU is the other direction shall not be used, that said the spec allows the peer responding to adjust: Bluetooth Core 6.1, Vol 3, Part A, Section 4.5 'Each configuration parameter value (if any is present) in an L2CAP_CONFIGURATION_RSP packet reflects an ‘adjustment’ to a configuration parameter value that has been sent (or, in case of default values, implied) in the corresponding L2CAP_CONFIGURATION_REQ packet.' That said adjusting the MTU in the response shall be limited to ERTM channels only as for older modes the remote stack may not be able to detect the adjustment causing it to silently drop packets. Link: https://github.com/bluez/bluez/issues/1422 Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/149 Link: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4793 Fixes: 042bb9603c44 ("Bluetooth: L2CAP: Fix L2CAP MTU negotiation") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-17Merge tag 'nf-25-07-17' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Three patches to enhance conntrack selftests for resize and clash resolution, from Florian Westphal. 2) Expand nft_concat_range.sh selftest to improve coverage from error path, from Florian Westphal. 3) Hide clash bit to userspace from netlink dumps until there is a good reason to expose, from Florian Westphal. 4) Revert notification for device registration/unregistration for nftables basechains and flowtables, we decided to go for a better way to handle this through the nfnetlink_hook infrastructure which will come via nf-next, patch from Phil Sutter. 5) Fix crash in conntrack due to race related to SLAB_TYPESAFE_BY_RCU that results in removing a recycled object that is not yet in the hashes. Move IPS_CONFIRM setting after the object is in the hashes. From Florian Westphal. netfilter pull request 25-07-17 * tag 'nf-25-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Revert "netfilter: nf_tables: Add notifications for hook changes" netfilter: nf_tables: hide clash bit from userspace selftests: netfilter: nft_concat_range.sh: send packets to empty set selftests: netfilter: conntrack_resize.sh: also use udpclash tool selftests: netfilter: add conntrack clash resolution test case selftests: netfilter: conntrack_resize.sh: extend resize test ==================== Link: https://patch.msgid.link/20250717095808.41725-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17netfilter: nf_conntrack: fix crash due to removal of uninitialised entryFlorian Westphal
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <rzvncj@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/ Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/ Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-17net: fix segmentation after TCP/UDP fraglist GROFelix Fietkau
Since "net: gro: use cb instead of skb->network_header", the skb network header is no longer set in the GRO path. This breaks fraglist segmentation, which relies on ip_hdr()/tcp_hdr() to check for address/port changes. Fix this regression by selectively setting the network header for merged segment skbs. Fixes: 186b1ea73ad8 ("net: gro: use cb instead of skb->network_header") Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250705150622.10699-1-nbd@nbd.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-16ipv6: mcast: Delay put pmc->idev in mld_del_delrec()Yue Haibing
pmc->idev is still used in ip6_mc_clear_src(), so as mld_clear_delrec() does, the reference should be put after ip6_mc_clear_src() return. Fixes: 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250714141957.3301871-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmapChristian Eggers
The 'quirks' member already ran out of bits on some platforms some time ago. Replace the integer member by a bitmap in order to have enough bits in future. Replace raw bit operations by accessor macros. Fixes: ff26b2dd6568 ("Bluetooth: Add quirk for broken READ_VOICE_SETTING") Fixes: 127881334eaa ("Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE") Suggested-by: Pauli Virtanen <pav@iki.fi> Tested-by: Ivan Pravdin <ipravdin.official@gmail.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeoutLuiz Augusto von Dentz
This replaces the usage of HCI_ERROR_REMOTE_USER_TERM, which as the name suggest is to indicate a regular disconnection initiated by an user, with HCI_ERROR_AUTH_FAILURE to indicate the session has timeout thus any pairing shall be considered as failed. Fixes: 1e91c29eb60c ("Bluetooth: Use hci_disconnect for immediate disconnection from SMP") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: SMP: If an unallowed command is received consider it a failureLuiz Augusto von Dentz
If a command is received while a bonding is ongoing consider it a pairing failure so the session is cleanup properly and the device is disconnected immediately instead of continuing with other commands that may result in the session to get stuck without ever completing such as the case bellow: > ACL Data RX: Handle 2048 flags 0x02 dlen 21 SMP: Identity Information (0x08) len 16 Identity resolving key[16]: d7e08edef97d3e62cd2331f82d8073b0 > ACL Data RX: Handle 2048 flags 0x02 dlen 21 SMP: Signing Information (0x0a) len 16 Signature key[16]: 1716c536f94e843a9aea8b13ffde477d Bluetooth: hci0: unexpected SMP command 0x0a from XX:XX:XX:XX:XX:XX > ACL Data RX: Handle 2048 flags 0x02 dlen 12 SMP: Identity Address Information (0x09) len 7 Address: XX:XX:XX:XX:XX:XX (Intel Corporate) While accourding to core spec 6.1 the expected order is always BD_ADDR first first then CSRK: When using LE legacy pairing, the keys shall be distributed in the following order: LTK by the Peripheral EDIV and Rand by the Peripheral IRK by the Peripheral BD_ADDR by the Peripheral CSRK by the Peripheral LTK by the Central EDIV and Rand by the Central IRK by the Central BD_ADDR by the Central CSRK by the Central When using LE Secure Connections, the keys shall be distributed in the following order: IRK by the Peripheral BD_ADDR by the Peripheral CSRK by the Peripheral IRK by the Central BD_ADDR by the Central CSRK by the Central According to the Core 6.1 for commands used for key distribution "Key Rejected" can be used: '3.6.1. Key distribution and generation A device may reject a distributed key by sending the Pairing Failed command with the reason set to "Key Rejected". Fixes: b28b4943660f ("Bluetooth: Add strict checks for allowed SMP PDUs") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_sync: fix connectable extended advertising when using static ↵Alessandro Gasbarroni
random address Currently, the connectable flag used by the setup of an extended advertising instance drives whether we require privacy when trying to pass a random address to the advertising parameters (Own Address). If privacy is not required, then it automatically falls back to using the controller's public address. This can cause problems when using controllers that do not have a public address set, but instead use a static random address. e.g. Assume a BLE controller that does not have a public address set. The controller upon powering is set with a random static address by default by the kernel. < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 Address: E4:AF:26:D8:3E:3A (Static) > HCI Event: Command Complete (0x0e) plen 4 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) Setting non-connectable extended advertisement parameters in bluetoothctl mgmt add-ext-adv-params -r 0x801 -x 0x802 -P 2M -g 1 correctly sets Own address type as Random < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 ... Own address type: Random (0x01) Setting connectable extended advertisement parameters in bluetoothctl mgmt add-ext-adv-params -r 0x801 -x 0x802 -P 2M -g -c 1 mistakenly sets Own address type to Public (which causes to use Public Address 00:00:00:00:00:00) < HCI Command: LE Set Extended Advertising Parameters (0x08|0x0036) plen 25 ... Own address type: Public (0x00) This causes either the controller to emit an Invalid Parameters error or to mishandle the advertising. This patch makes sure that we use the already set static random address when requesting a connectable extended advertising when we don't require privacy and our public address is not set (00:00:00:00:00:00). Fixes: 3fe318ee72c5 ("Bluetooth: move hci_get_random_address() to hci_sync") Signed-off-by: Alessandro Gasbarroni <alex.gasbarroni@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb()Kuniyuki Iwashima
syzbot reported null-ptr-deref in l2cap_sock_resume_cb(). [0] l2cap_sock_resume_cb() has a similar problem that was fixed by commit 1bff51ea59a9 ("Bluetooth: fix use-after-free error in lock_sock_nested()"). Since both l2cap_sock_kill() and l2cap_sock_resume_cb() are executed under l2cap_sock_resume_cb(), we can avoid the issue simply by checking if chan->data is NULL. Let's not access to the killed socket in l2cap_sock_resume_cb(). [0]: BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:82 [inline] BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] BUG: KASAN: null-ptr-deref in l2cap_sock_resume_cb+0xb4/0x17c net/bluetooth/l2cap_sock.c:1711 Write of size 8 at addr 0000000000000570 by task kworker/u9:0/52 CPU: 1 UID: 0 PID: 52 Comm: kworker/u9:0 Not tainted 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Workqueue: hci0 hci_rx_work Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:501 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 print_report+0x58/0x84 mm/kasan/report.c:524 kasan_report+0xb0/0x110 mm/kasan/report.c:634 check_region_inline mm/kasan/generic.c:-1 [inline] kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189 __kasan_check_write+0x20/0x30 mm/kasan/shadow.c:37 instrument_atomic_write include/linux/instrumented.h:82 [inline] clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline] l2cap_sock_resume_cb+0xb4/0x17c net/bluetooth/l2cap_sock.c:1711 l2cap_security_cfm+0x524/0xea0 net/bluetooth/l2cap_core.c:7357 hci_auth_cfm include/net/bluetooth/hci_core.h:2092 [inline] hci_auth_complete_evt+0x2e8/0xa4c net/bluetooth/hci_event.c:3514 hci_event_func net/bluetooth/hci_event.c:7511 [inline] hci_event_packet+0x650/0xe9c net/bluetooth/hci_event.c:7565 hci_rx_work+0x320/0xb18 net/bluetooth/hci_core.c:4070 process_one_work+0x7e8/0x155c kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3321 [inline] worker_thread+0x958/0xed8 kernel/workqueue.c:3402 kthread+0x5fc/0x75c kernel/kthread.c:464 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:847 Fixes: d97c899bde33 ("Bluetooth: Introduce L2CAP channel callback for resuming") Reported-by: syzbot+e4d73b165c3892852d22@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/686c12bd.a70a0220.29fe6c.0b13.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-15mptcp: reset fallback status gracefully at disconnect() timePaolo Abeni
mptcp_disconnect() clears the fallback bit unconditionally, without touching the associated flags. The bit clear is safe, as no fallback operation can race with that -- all subflow are already in TCP_CLOSE status thanks to the previous FASTCLOSE -- but we need to consistently reset all the fallback related status. Also acquire the relevant lock, to avoid fouling static analyzers. Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-3-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15mptcp: plug races between subflow fail and subflow creationPaolo Abeni
We have races similar to the one addressed by the previous patch between subflow failing and additional subflow creation. They are just harder to trigger. The solution is similar. Use a separate flag to track the condition 'socket state prevent any additional subflow creation' protected by the fallback lock. The socket fallback makes such flag true, and also receiving or sending an MP_FAIL option. The field 'allow_infinite_fallback' is now always touched under the relevant lock, we can drop the ONCE annotation on write. Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-2-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-15mptcp: make fallback action and fallback decision atomicPaolo Abeni
Syzkaller reported the following splat: WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 __mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 check_fully_established net/mptcp/options.c:982 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Modules linked in: CPU: 1 UID: 0 PID: 7704 Comm: syz.3.1419 Not tainted 6.16.0-rc3-gbd5ce2324dba #20 PREEMPT(voluntary) Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:__mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] RIP: 0010:mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] RIP: 0010:check_fully_established net/mptcp/options.c:982 [inline] RIP: 0010:mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Code: 24 18 e8 bb 2a 00 fd e9 1b df ff ff e8 b1 21 0f 00 e8 ec 5f c4 fc 44 0f b7 ac 24 b0 00 00 00 e9 54 f1 ff ff e8 d9 5f c4 fc 90 <0f> 0b 90 e9 b8 f4 ff ff e8 8b 2a 00 fd e9 8d e6 ff ff e8 81 2a 00 RSP: 0018:ffff8880a3f08448 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880180a8000 RCX: ffffffff84afcf45 RDX: ffff888090223700 RSI: ffffffff84afdaa7 RDI: 0000000000000001 RBP: ffff888017955780 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880180a8910 R14: ffff8880a3e9d058 R15: 0000000000000000 FS: 00005555791b8500(0000) GS:ffff88811c495000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000110c2800b7 CR3: 0000000058e44000 CR4: 0000000000350ef0 Call Trace: <IRQ> tcp_reset+0x26f/0x2b0 net/ipv4/tcp_input.c:4432 tcp_validate_incoming+0x1057/0x1b60 net/ipv4/tcp_input.c:5975 tcp_rcv_established+0x5b5/0x21f0 net/ipv4/tcp_input.c:6166 tcp_v4_do_rcv+0x5dc/0xa70 net/ipv4/tcp_ipv4.c:1925 tcp_v4_rcv+0x3473/0x44a0 net/ipv4/tcp_ipv4.c:2363 ip_protocol_deliver_rcu+0xba/0x480 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2f1/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_local_deliver+0x1be/0x560 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:469 [inline] ip_rcv_finish net/ipv4/ip_input.c:447 [inline] NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_rcv+0x514/0x810 net/ipv4/ip_input.c:567 __netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5975 __netif_receive_skb+0x1f/0x120 net/core/dev.c:6088 process_backlog+0x301/0x1360 net/core/dev.c:6440 __napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7453 napi_poll net/core/dev.c:7517 [inline] net_rx_action+0xb44/0x1010 net/core/dev.c:7644 handle_softirqs+0x1d0/0x770 kernel/softirq.c:579 do_softirq+0x3f/0x90 kernel/softirq.c:480 </IRQ> <TASK> __local_bh_enable_ip+0xed/0x110 kernel/softirq.c:407 local_bh_enable include/linux/bottom_half.h:33 [inline] inet_csk_listen_stop+0x2c5/0x1070 net/ipv4/inet_connection_sock.c:1524 mptcp_check_listen_stop.part.0+0x1cc/0x220 net/mptcp/protocol.c:2985 mptcp_check_listen_stop net/mptcp/mib.h:118 [inline] __mptcp_close+0x9b9/0xbd0 net/mptcp/protocol.c:3000 mptcp_close+0x2f/0x140 net/mptcp/protocol.c:3066 inet_release+0xed/0x200 net/ipv4/af_inet.c:435 inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:487 __sock_release+0xb3/0x270 net/socket.c:649 sock_close+0x1c/0x30 net/socket.c:1439 __fput+0x402/0xb70 fs/file_table.c:465 task_work_run+0x150/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xd4/0xe0 kernel/entry/common.c:114 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline] do_syscall_64+0x245/0x360 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc92f8a36ad Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcf52802d8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 00007ffcf52803a8 RCX: 00007fc92f8a36ad RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007fc92fae7ba0 R08: 0000000000000001 R09: 0000002800000000 R10: 00007fc92f700000 R11: 0000000000000246 R12: 00007fc92fae5fac R13: 00007fc92fae5fa0 R14: 0000000000026d00 R15: 0000000000026c51 </TASK> irq event stamp: 4068 hardirqs last enabled at (4076): [<ffffffff81544816>] __up_console_sem+0x76/0x80 kernel/printk/printk.c:344 hardirqs last disabled at (4085): [<ffffffff815447fb>] __up_console_sem+0x5b/0x80 kernel/printk/printk.c:342 softirqs last enabled at (3096): [<ffffffff840e1be0>] local_bh_enable include/linux/bottom_half.h:33 [inline] softirqs last enabled at (3096): [<ffffffff840e1be0>] inet_csk_listen_stop+0x2c0/0x1070 net/ipv4/inet_connection_sock.c:1524 softirqs last disabled at (3097): [<ffffffff813b6b9f>] do_softirq+0x3f/0x90 kernel/softirq.c:480 Since we need to track the 'fallback is possible' condition and the fallback status separately, there are a few possible races open between the check and the actual fallback action. Add a spinlock to protect the fallback related information and use it close all the possible related races. While at it also remove the too-early clearing of allow_infinite_fallback in __mptcp_subflow_connect(): the field will be correctly cleared by subflow_finish_connect() if/when the connection will complete successfully. If fallback is not possible, as per RFC, reset the current subflow. Since the fallback operation can now fail and return value should be checked, rename the helper accordingly. Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status") Cc: stable@vger.kernel.org Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/570 Reported-by: syzbot+5cf807c20386d699b524@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/555 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-1-391aff963322@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14smc: Fix various oops due to inet_sock type confusion.Kuniyuki Iwashima
syzbot reported weird splats [0][1] in cipso_v4_sock_setattr() while freeing inet_sk(sk)->inet_opt. The address was freed multiple times even though it was read-only memory. cipso_v4_sock_setattr() did nothing wrong, and the root cause was type confusion. The cited commit made it possible to create smc_sock as an INET socket. The issue is that struct smc_sock does not have struct inet_sock as the first member but hijacks AF_INET and AF_INET6 sk_family, which confuses various places. In this case, inet_sock.inet_opt was actually smc_sock.clcsk_data_ready(), which is an address of a function in the text segment. $ pahole -C inet_sock vmlinux struct inet_sock { ... struct ip_options_rcu * inet_opt; /* 784 8 */ $ pahole -C smc_sock vmlinux struct smc_sock { ... void (*clcsk_data_ready)(struct sock *); /* 784 8 */ The same issue for another field was reported before. [2][3] At that time, an ugly hack was suggested [4], but it makes both INET and SMC code error-prone and hard to change. Also, yet another variant was fixed by a hacky commit 98d4435efcbf3 ("net/smc: prevent NULL pointer dereference in txopt_get"). Instead of papering over the root cause by such hacks, we should not allow non-INET socket to reuse the INET infra. Let's add inet_sock as the first member of smc_sock. [0]: kvfree_call_rcu(): Double-freed call. rcu_head 000000006921da73 WARNING: CPU: 0 PID: 6718 at mm/slab_common.c:1956 kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 Modules linked in: CPU: 0 UID: 0 PID: 6718 Comm: syz.0.17 Tainted: G W 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 lr : kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 sp : ffff8000a03a7730 x29: ffff8000a03a7730 x28: 00000000fffffff5 x27: 1fffe000184823d3 x26: dfff800000000000 x25: ffff0000c2411e9e x24: ffff0000dd88da00 x23: ffff8000891ac9a0 x22: 00000000ffffffea x21: ffff8000891ac9a0 x20: ffff8000891ac9a0 x19: ffff80008afc2480 x18: 00000000ffffffff x17: 0000000000000000 x16: ffff80008ae642c8 x15: ffff700011ede14c x14: 1ffff00011ede14c x13: 0000000000000004 x12: ffffffffffffffff x11: ffff700011ede14c x10: 0000000000ff0100 x9 : 5fa3c1ffaf0ff000 x8 : 5fa3c1ffaf0ff000 x7 : 0000000000000001 x6 : 0000000000000001 x5 : ffff8000a03a7078 x4 : ffff80008f766c20 x3 : ffff80008054d360 x2 : 0000000000000000 x1 : 0000000000000201 x0 : 0000000000000000 Call trace: kvfree_call_rcu+0x94/0x3f0 mm/slab_common.c:1955 (P) cipso_v4_sock_setattr+0x2f0/0x3f4 net/ipv4/cipso_ipv4.c:1914 netlbl_sock_setattr+0x240/0x334 net/netlabel/netlabel_kapi.c:1000 smack_netlbl_add+0xa8/0x158 security/smack/smack_lsm.c:2581 smack_inode_setsecurity+0x378/0x430 security/smack/smack_lsm.c:2912 security_inode_setsecurity+0x118/0x3c0 security/security.c:2706 __vfs_setxattr_noperm+0x174/0x5c4 fs/xattr.c:251 __vfs_setxattr_locked+0x1ec/0x218 fs/xattr.c:295 vfs_setxattr+0x158/0x2ac fs/xattr.c:321 do_setxattr fs/xattr.c:636 [inline] file_setxattr+0x1b8/0x294 fs/xattr.c:646 path_setxattrat+0x2ac/0x320 fs/xattr.c:711 __do_sys_fsetxattr fs/xattr.c:761 [inline] __se_sys_fsetxattr fs/xattr.c:758 [inline] __arm64_sys_fsetxattr+0xc0/0xdc fs/xattr.c:758 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x58/0x180 arch/arm64/kernel/entry-common.c:879 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 [1]: Unable to handle kernel write to read-only memory at virtual address ffff8000891ac9a8 KASAN: probably user-memory-access in range [0x0000000448d64d40-0x0000000448d64d47] Mem abort info: ESR = 0x000000009600004e EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x0e: level 2 permission fault Data abort info: ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000207144000 [ffff8000891ac9a8] pgd=0000000000000000, p4d=100000020f950003, pud=100000020f951003, pmd=0040000201000781 Internal error: Oops: 000000009600004e [#1] SMP Modules linked in: CPU: 0 UID: 0 PID: 6946 Comm: syz.0.69 Not tainted 6.16.0-rc4-syzkaller-g7482bb149b9f #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kvfree_call_rcu+0x31c/0x3f0 mm/slab_common.c:1971 lr : add_ptr_to_bulk_krc_lock mm/slab_common.c:1838 [inline] lr : kvfree_call_rcu+0xfc/0x3f0 mm/slab_common.c:1963 sp : ffff8000a28a7730 x29: ffff8000a28a7730 x28: 00000000fffffff5 x27: 1fffe00018b09bb3 x26: 0000000000000001 x25: ffff80008f66e000 x24: ffff00019beaf498 x23: ffff00019beaf4c0 x22: 0000000000000000 x21: ffff8000891ac9a0 x20: ffff8000891ac9a0 x19: 0000000000000000 x18: 00000000ffffffff x17: ffff800093363000 x16: ffff80008052c6e4 x15: ffff700014514ecc x14: 1ffff00014514ecc x13: 0000000000000004 x12: ffffffffffffffff x11: ffff700014514ecc x10: 0000000000000001 x9 : 0000000000000001 x8 : ffff00019beaf7b4 x7 : ffff800080a94154 x6 : 0000000000000000 x5 : ffff8000935efa60 x4 : 0000000000000008 x3 : ffff80008052c7fc x2 : 0000000000000001 x1 : ffff8000891ac9a0 x0 : 0000000000000001 Call trace: kvfree_call_rcu+0x31c/0x3f0 mm/slab_common.c:1967 (P) cipso_v4_sock_setattr+0x2f0/0x3f4 net/ipv4/cipso_ipv4.c:1914 netlbl_sock_setattr+0x240/0x334 net/netlabel/netlabel_kapi.c:1000 smack_netlbl_add+0xa8/0x158 security/smack/smack_lsm.c:2581 smack_inode_setsecurity+0x378/0x430 security/smack/smack_lsm.c:2912 security_inode_setsecurity+0x118/0x3c0 security/security.c:2706 __vfs_setxattr_noperm+0x174/0x5c4 fs/xattr.c:251 __vfs_setxattr_locked+0x1ec/0x218 fs/xattr.c:295 vfs_setxattr+0x158/0x2ac fs/xattr.c:321 do_setxattr fs/xattr.c:636 [inline] file_setxattr+0x1b8/0x294 fs/xattr.c:646 path_setxattrat+0x2ac/0x320 fs/xattr.c:711 __do_sys_fsetxattr fs/xattr.c:761 [inline] __se_sys_fsetxattr fs/xattr.c:758 [inline] __arm64_sys_fsetxattr+0xc0/0xdc fs/xattr.c:758 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x58/0x180 arch/arm64/kernel/entry-common.c:879 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 Code: aa1f03e2 52800023 97ee1e8d b4000195 (f90006b4) Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Reported-by: syzbot+40bf00346c3fe40f90f2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/686d9b50.050a0220.1ffab7.0020.GAE@google.com/ Tested-by: syzbot+40bf00346c3fe40f90f2@syzkaller.appspotmail.com Reported-by: syzbot+f22031fad6cbe52c70e7@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/686da0f3.050a0220.1ffab7.0022.GAE@google.com/ Reported-by: syzbot+271fed3ed6f24600c364@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 # [2] Link: https://lore.kernel.org/netdev/99f284be-bf1d-4bc4-a629-77b268522fff@huawei.com/ # [3] Link: https://lore.kernel.org/netdev/20250331081003.1503211-1-wangliang74@huawei.com/ # [4] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20250711060808.2977529-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14sunrpc: make svc_tcp_sendmsg() take a signed sentp pointerJeff Layton
The return value of sock_sendmsg() is signed, and svc_tcp_sendto() wants a signed value to return. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: return better error in svcauth_gss_accept() on alloc failureJeff Layton
This ends up returning AUTH_BADCRED when memory allocation fails today. Fix it to return AUTH_FAILED, which better indicates a failure on the server. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: reset rq_accept_statp when starting a new RPCJeff Layton
rq_accept_statp should point to the location of the accept_status in the reply. This field is not reset between RPCs so if svc_authenticate or pg_authenticate return SVC_DENIED without setting the pointer, it could result in the status being written to the wrong place. This pointer starts its lifetime as NULL. Reset it on every iteration so we get consistent behavior if this happens. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: remove SVC_SYSERRJeff Layton
Nothing returns this error code. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: fix handling of unknown auth status codesJeff Layton
In the case of an unknown error code from svc_authenticate or pg_authenticate, return AUTH_ERROR with a status of AUTH_FAILED. Also add the other auth_stat value from RFC 5531, and document all the status codes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: new tracepoints around svc thread wakeupsJeff Layton
Convert the svc_wake_up tracepoint into svc_pool_thread_event class. Have it also record the pool id, and add new tracepoints for when the thread is already running and for when there are no idle threads. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: unexport csum_partial_copy_to_xdrChristoph Hellwig
csum_partial_copy_to_xdr is only used inside the sunrpc module, so remove the export. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: simplify xdr_partial_copy_from_skbChristoph Hellwig
csum_partial_copy_to_xdr can handle a checksumming and non-checksumming case and implements this using a callback, which leads to a lot of boilerplate code and indirect calls in the fast path. Switch to storing a need_checksum flag in struct xdr_skb_reader instead to remove the indirect call and simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: simplify xdr_init_encode_pagesChristoph Hellwig
The rqst argument to xdr_init_encode_pages is set to NULL by all callers, and pages is always set to buf->pages. Remove the two arguments and hardcode the assignments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()Eric Biggers
Rename the existing sha1_init() to sha1_init_raw(), since it conflicts with the upcoming library function. This will later be removed, but this keeps the kernel building for the introduction of the library. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14Revert "netfilter: nf_tables: Add notifications for hook changes"Phil Sutter
This reverts commit 465b9ee0ee7bc268d7f261356afd6c4262e48d82. Such notifications fit better into core or nfnetlink_hook code, following the NFNL_MSG_HOOK_GET message format. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-14netfilter: nf_tables: hide clash bit from userspaceFlorian Westphal
Its a kernel implementation detail, at least at this time: We can later decide to revert this patch if there is a compelling reason, but then we should also remove the ifdef that prevents exposure of ip_conntrack_status enum IPS_NAT_CLASH value in the uapi header. Clash entries are not included in dumps (true for both old /proc and ctnetlink) either. So for now exclude the clash bit when dumping. Fixes: 7e5c6aa67e6f ("netfilter: nf_tables: add packets conntrack state to debug trace info") Link: https://lore.kernel.org/netfilter-devel/aGwf3dCggwBlRKKC@strlen.de/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>