summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
21 hoursMerge tag 'bpf-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Remove usermode driver (UMD) framework (Thomas Weißschuh) - Introduce Strongly Connected Component (SCC) in the verifier to detect loops and refine register liveness (Eduard Zingerman) - Allow 'void *' cast using bpf_rdonly_cast() and corresponding '__arg_untrusted' for global function parameters (Eduard Zingerman) - Improve precision for BPF_ADD and BPF_SUB operations in the verifier (Harishankar Vishwanathan) - Teach the verifier that constant pointer to a map cannot be NULL (Ihor Solodrai) - Introduce BPF streams for error reporting of various conditions detected by BPF runtime (Kumar Kartikeya Dwivedi) - Teach the verifier to insert runtime speculation barrier (lfence on x86) to mitigate speculative execution instead of rejecting the programs (Luis Gerhorst) - Various improvements for 'veristat' (Mykyta Yatsenko) - For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to improve bug detection by syzbot (Paul Chaignon) - Support BPF private stack on arm64 (Puranjay Mohan) - Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's node (Song Liu) - Introduce kfuncs for read-only string opreations (Viktor Malik) - Implement show_fdinfo() for bpf_links (Tao Chen) - Reduce verifier's stack consumption (Yonghong Song) - Implement mprog API for cgroup-bpf programs (Yonghong Song) * tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits) selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite selftests/bpf: Add selftest for attaching tracing programs to functions in deny list bpf: Add log for attaching tracing programs to functions in deny list bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions bpf: Fix various typos in verifier.c comments bpf: Add third round of bounds deduction selftests/bpf: Test invariants on JSLT crossing sign selftests/bpf: Test cross-sign 64bits range refinement selftests/bpf: Update reg_bound range refinement logic bpf: Improve bounds when s64 crosses sign boundary bpf: Simplify bounds refinement from s32 selftests/bpf: Enable private stack tests for arm64 bpf, arm64: JIT support for private stack bpf: Move bpf_jit_get_prog_name() to core.c bpf, arm64: Fix fp initialization for exception boundary umd: Remove usermode driver framework bpf/preload: Don't select USERMODE_DRIVER selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure selftests/bpf: Increase xdp data size for arm64 64K page size ...
22 hoursMerge tag 'net-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
35 hoursMerge tag 's390-6.17-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Standardize on the __ASSEMBLER__ macro that is provided by GCC and Clang compilers and replace __ASSEMBLY__ with __ASSEMBLER__ in both uapi and non-uapi headers - Explicitly include <linux/export.h> in architecture and driver files which contain an EXPORT_SYMBOL() and remove the include from the files which do not contain the EXPORT_SYMBOL() - Use the full title of "z/Architecture Principles of Operation" manual and the name of a section where facility bits are listed - Use -D__DISABLE_EXPORTS for files in arch/s390/boot to avoid unnecessary slowing down of the build and confusing external kABI tools that process symtypes data - Print additional unrecoverable machine check information to make the root cause analysis easier - Move cmpxchg_user_key() handling to uaccess library code, since the generated code is large anyway and there is no benefit if it is inlined - Fix a problem when cmpxchg_user_key() is executing a code with a non-default key: if a system is IPL-ed with "LOAD NORMAL", and the previous system used storage keys where the fetch-protection bit was set for some pages, and the cmpxchg_user_key() is located within such page, a protection exception happens - Either the external call or emergency signal order is used to send an IPI to a remote CPU. Use the external order only, since it is at least as good and sometimes even better, than the emergency signal - In case of an early crash the early program check handler prints more or less random value of the last breaking event address, since it is not initialized properly. Copy the last breaking event address from the lowcore to pt_regs to address this - During STP synchronization check udelay() can not be used, since the first CPU modifies tod_clock_base and get_tod_clock_monotonic() might return a non-monotonic time. Instead, busy-loop on other CPUs, while the the first CPU actually handles the synchronization operation - When debugging the early kernel boot using QEMU with the -S flag and GDB attached, skip the decompressor and start directly in kernel - Rename PAI Crypto event 4210 according to z16 and z17 "z/Architecture Principles of Operation" manual - Remove the in-kernel time steering support in favour of the new s390 PTP driver, which allows the kernel clock steered more precisely - Remove a possible false-positive warning in pte_free_defer(), which could be triggered in a valid case KVM guest process is initializing * tag 's390-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits) s390/mm: Remove possible false-positive warning in pte_free_defer() s390/stp: Default to enabled s390/stp: Remove leap second support s390/time: Remove in-kernel time steering s390/sclp: Use monotonic clock in sclp_sync_wait() s390/smp: Use monotonic clock in smp_emergency_stop() s390/time: Use monotonic clock in get_cycles() s390/pai_crypto: Rename PAI Crypto event 4210 scripts/gdb/symbols: make lx-symbols skip the s390 decompressor s390/boot: Introduce jump_to_kernel() function s390/stp: Remove udelay from stp_sync_clock() s390/early: Copy last breaking event address to pt_regs s390/smp: Remove conditional emergency signal order code usage s390/uaccess: Merge cmpxchg_user_key() inline assemblies s390/uaccess: Prevent kprobes on cmpxchg_user_key() functions s390/uaccess: Initialize code pages executed with non-default access key s390/skey: Provide infrastructure for executing with non-default access key s390/uaccess: Make cmpxchg_user_key() library code s390/page: Add memory clobber to page_set_storage_key() s390/page: Cleanup page_set_storage_key() inline assemblies ...
43 hoursMerge tag 'driver-core-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...
45 hoursMerge tag 'tty-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver updates from Greg KH: "Here is the big set of TTY and Serial driver updates for 6.17-rc1. Included in here is the following types of changes: - another cleanup round from Jiri for the 8250 serial driver and some other tty drivers, things are slowly getting better with our apis thanks to this work. This touched many tty drivers all over the tree. - qcom_geni_serial driver update for new platforms and devices - 8250 quirk handling fixups - dt serial binding updates for different boards/platforms - other minor cleanups and fixes All of these have been in linux-next with no reported issues" * tag 'tty-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (79 commits) dt-bindings: serial: snps-dw-apb-uart: Allow use of a power-domain serial: 8250: fix panic due to PSLVERR dt-bindings: serial: samsung: add samsung,exynos2200-uart compatible vt: defkeymap: Map keycodes above 127 to K_HOLE vt: keyboard: Don't process Unicode characters in K_OFF mode serial: qcom-geni: Enable Serial on SA8255p Qualcomm platforms serial: qcom-geni: Enable PM runtime for serial driver serial: qcom-geni: move clock-rate logic to separate function serial: qcom-geni: move resource control logic to separate functions serial: qcom-geni: move resource initialization to separate function soc: qcom: geni-se: Enable QUPs on SA8255p Qualcomm platforms dt-bindings: qcom: geni-se: describe SA8255p dt-bindings: serial: describe SA8255p serial: 8250_dw: Fix typo "notifer" dt-bindings: serial: 8250: spacemit: set clocks property as required dt-bindings: serial: renesas: Document RZ/V2N SCIF serial: 8250_ce4100: Fix CONFIG_SERIAL_8250=n build tty: omit need_resched() before cond_resched() serial: 8250_ni: Reorder local variables serial: 8250_ni: Fix build warning ...
3 daysMerge tag 'libcrypto-updates-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: "This is the main crypto library pull request for 6.17. The main focus this cycle is on reorganizing the SHA-1 and SHA-2 code, providing high-quality library APIs for SHA-1 and SHA-2 including HMAC support, and establishing conventions for lib/crypto/ going forward: - Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares most of the SHA-512 code) into lib/crypto/. This includes both the generic and architecture-optimized code. Greatly simplify how the architecture-optimized code is integrated. Add an easy-to-use library API for each SHA variant, including HMAC support. Finally, reimplement the crypto_shash support on top of the library API. - Apply the same reorganization to the SHA-256 code (and also SHA-224 which shares most of the SHA-256 code). This is a somewhat smaller change, due to my earlier work on SHA-256. But this brings in all the same additional improvements that I made for SHA-1 and SHA-512. There are also some smaller changes: - Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For these algorithms it's just a move, not a full reorganization yet. - Fix the MIPS chacha-core.S to build with the clang assembler. - Fix the Poly1305 functions to work in all contexts. - Fix a performance regression in the x86_64 Poly1305 code. - Clean up the x86_64 SHA-NI optimized SHA-1 assembly code. Note that since the new organization of the SHA code is much simpler, the diffstat of this pull request is negative, despite the addition of new fully-documented library APIs for multiple SHA and HMAC-SHA variants. These APIs will allow further simplifications across the kernel as users start using them instead of the old-school crypto API. (I've already written a lot of such conversion patches, removing over 1000 more lines of code. But most of those will target 6.18 or later)" * tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits) lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils lib/crypto: x86/sha1-ni: Convert to use rounds macros lib/crypto: x86/sha1-ni: Minor optimizations and cleanup crypto: sha1 - Remove sha1_base.h lib/crypto: x86/sha1: Migrate optimized code into library lib/crypto: sparc/sha1: Migrate optimized code into library lib/crypto: s390/sha1: Migrate optimized code into library lib/crypto: powerpc/sha1: Migrate optimized code into library lib/crypto: mips/sha1: Migrate optimized code into library lib/crypto: arm64/sha1: Migrate optimized code into library lib/crypto: arm/sha1: Migrate optimized code into library crypto: sha1 - Use same state format as legacy drivers crypto: sha1 - Wrap library and add HMAC support lib/crypto: sha1: Add HMAC support lib/crypto: sha1: Add SHA-1 library functions lib/crypto: sha1: Rename sha1_init() to sha1_init_raw() crypto: x86/sha1 - Rename conflicting symbol lib/crypto: sha2: Add hmac_sha*_init_usingrawkey() lib/crypto: arm/poly1305: Remove unneeded empty weak function lib/crypto: x86/poly1305: Fix performance regression on short messages ...
3 daysMerge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring updates from Jens Axboe: - Optimization to avoid reference counts on non-cloned registered buffers. This is how these buffers were handled prior to having cloning support, and we can still use that approach as long as the buffers haven't been cloned to another ring. - Cleanup and improvement for uring_cmd, where btrfs was the only user of storing allocated data for the lifetime of the uring_cmd. Clean that up so we can get rid of the need to do that. - Avoid unnecessary memory copies in uring_cmd usage. This is particularly important as a lot of uring_cmd usage necessitates the use of 128b SQEs. - A few updates for recv multishot, where it's now possible to add fairness limits for limiting how much is transferred for each retry loop. Additionally, recv multishot now supports an overall cap as well, where once reached the multishot recv will terminate. The latter is useful for buffer management and juggling many recv streams at the same time. - Add support for returning the TX timestamps via a new socket command. This feature can work in either singleshot or multishot mode, where the latter triggers a completion whenever new timestamps are available. This is an alternative to using the existing error queue. - Add support for an io_uring "mock" file, which is the start of being able to do 100% targeted testing in terms of exercising io_uring request handling. The idea is to have a file type that can be anything the tester would like, and behave exactly how you want it to behave in terms of hitting the code paths you want. - Improve zcrx by using sgtables to de-duplicate and improve dma address handling. - Prep work for supporting larger pages for zcrx. - Various little improvements and fixes. * tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits) io_uring/zcrx: fix leaking pages on sg init fail io_uring/zcrx: don't leak pages on account failure io_uring/zcrx: fix null ifq on area destruction io_uring: fix breakage in EXPERT menu io_uring/cmd: remove struct io_uring_cmd_data btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag io_uring/zcrx: account area memory io_uring: export io_[un]account_mem io_uring/net: Support multishot receive len cap io_uring: deduplicate wakeup handling io_uring/net: cast min_not_zero() type io_uring/poll: cleanup apoll freeing io_uring/net: allow multishot receive per-invocation cap io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags io_uring/net: use passed in 'len' in io_recv_buf_select() io_uring/zcrx: prepare fallback for larger pages io_uring/zcrx: assert area type in io_zcrx_iov_page io_uring/zcrx: allocate sgtable for umem areas io_uring/zcrx: introduce io_populate_area_dma ...
3 daysMerge tag 'vfs-6.17-rc1.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: - persistent info Persist exit and coredump information independent of whether anyone currently holds a pidfd for the struct pid. The current scheme allocated pidfs dentries on-demand repeatedly. This scheme is reaching it's limits as it makes it impossible to pin information that needs to be available after the task has exited or coredumped and that should not be lost simply because the pidfd got closed temporarily. The next opener should still see the stashed information. This is also a prerequisite for supporting extended attributes on pidfds to allow attaching meta information to them. If someone opens a pidfd for a struct pid a pidfs dentry is allocated and stashed in pid->stashed. Once the last pidfd for the struct pid is closed the pidfs dentry is released and removed from pid->stashed. So if 10 callers create a pidfs dentry for the same struct pid sequentially, i.e., each closing the pidfd before the other creates a new one then a new pidfs dentry is allocated every time. Because multiple tasks acquiring and releasing a pidfd for the same struct pid can race with each another a task may still find a valid pidfs entry from the previous task in pid->stashed and reuse it. Or it might find a dead dentry in there and fail to reuse it and so stashes a new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry but only one will succeed, the other ones will put their dentry. The current scheme aims to ensure that a pidfs dentry for a struct pid can only be created if the task is still alive or if a pidfs dentry already existed before the task was reaped and so exit information has been was stashed in the pidfs inode. That's great except that it's buggy. If a pidfs dentry is stashed in pid->stashed after pidfs_exit() but before __unhash_process() is called we will return a pidfd for a reaped task without exit information being available. The pidfds_pid_valid() check does not guard against this race as it doens't sync at all with pidfs_exit(). The pid_has_task() check might be successful simply because we're before __unhash_process() but after pidfs_exit(). Introduce a new scheme where the lifetime of information associated with a pidfs entry (coredump and exit information) isn't bound to the lifetime of the pidfs inode but the struct pid itself. The first time a pidfs dentry is allocated for a struct pid a struct pidfs_attr will be allocated which will be used to store exit and coredump information. If all pidfs for the pidfs dentry are closed the dentry and inode can be cleaned up but the struct pidfs_attr will stick until the struct pid itself is freed. This will ensure minimal memory usage while persisting relevant information. The new scheme has various advantages. First, it allows to close the race where we end up handing out a pidfd for a reaped task for which no exit information is available. Second, it minimizes memory usage. Third, it allows to remove complex lifetime tracking via dentries when registering a struct pid with pidfs. There's no need to get or put a reference. Instead, the lifetime of exit and coredump information associated with a struct pid is bound to the lifetime of struct pid itself. - extended attributes Now that we have a way to persist information for pidfs dentries we can start supporting extended attributes on pidfds. This will allow userspace to attach meta information to tasks. One natural extension would be to introduce a custom pidfs.* extended attribute space and allow for the inheritance of extended attributes across fork() and exec(). The first simple scheme will allow privileged userspace to set trusted extended attributes on pidfs inodes. - Allow autonomous pidfs file handles Various filesystems such as pidfs and drm support opening file handles without having to require a file descriptor to identify the filesystem. The filesystem are global single instances and can be trivially identified solely on the information encoded in the file handle. This makes it possible to not have to keep or acquire a sentinal file descriptor just to pass it to open_by_handle_at() to identify the filesystem. That's especially useful when such sentinel file descriptor cannot or should not be acquired. For pidfs this means a file handle can function as full replacement for storing a pid in a file. Instead a file handle can be stored and reopened purely based on the file handle. Such autonomous file handles can be opened with or without specifying a a file descriptor. If no proper file descriptor is used the FD_PIDFS_ROOT sentinel must be passed. This allows us to define further special negative fd sentinels in the future. Userspace can trivially test for support by trying to open the file handle with an invalid file descriptor. - Allow pidfds for reaped tasks with SCM_PIDFD messages This is a logical continuation of the earlier work to create pidfds for reaped tasks through the SO_PEERPIDFD socket option merged in 923ea4d4482b ("Merge patch series "net, pidfs: enable handing out pidfds for reaped sk->sk_peer_pid""). - Two minor fixes: * Fold fs_struct->{lock,seq} into a seqlock * Don't bother with path_{get,put}() in unix_open_file() * tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits) don't bother with path_get()/path_put() in unix_open_file() fold fs_struct->{lock,seq} into a seqlock selftests: net: extend SCM_PIDFD test to cover stale pidfds af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD af_unix: stash pidfs dentry when needed af_unix/scm: fix whitespace errors af_unix: introduce and use scm_replace_pid() helper af_unix: introduce unix_skb_to_scm helper af_unix: rework unix_maybe_add_creds() to allow sleep selftests/pidfd: decode pidfd file handles withou having to specify an fd fhandle, pidfs: support open_by_handle_at() purely based on file handle uapi/fcntl: add FD_PIDFS_ROOT uapi/fcntl: add FD_INVALID fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP uapi/fcntl: mark range as reserved fhandle: reflow get_path_anchor() pidfs: add pidfs_root_path() helper fhandle: rename to get_path_anchor() fhandle: hoist copy_from_user() above get_path_from_fd() fhandle: raise FILEID_IS_DIR in handle_type ...
3 daysMerge tag 'vfs-6.17-rc1.nsfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains namespace updates. This time specifically for nsfs: - Userspace heavily relies on the root inode numbers for namespaces to identify the initial namespaces. That's already a hard dependency. So we cannot change that anymore. Move the initial inode numbers to a public header and align the only two namespaces that currently don't do that with all the other namespaces. - The root inode of /proc having a fixed inode number has been part of the core kernel ABI since its inception, and recently some userspace programs (mainly container runtimes) have started to explicitly depend on this behaviour. The main reason this is useful to userspace is that by checking that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is PROCFS_ROOT_INO, they can then use openat2() together with RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a bind-mount that replaces some procfs file with a different one. This kind of attack has lead to security issues in container runtimes in the past (such as CVE-2019-19921) and libraries like libpathrs[1] use this feature of procfs to provide safe procfs handling functions" * tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: uapi: export PROCFS_ROOT_INO mntns: use stable inode number for initial mount ns netns: use stable inode number for initial mount ns nsfs: move root inode number to uapi
3 daysMerge tag 'pull-rpc_pipefs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rpc_pipefs updates from Al Viro: "Massage rpc_pipefs to use saner primitives and clean up the APIs provided to the rest of the kernel" * tag 'pull-rpc_pipefs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rpc_create_client_dir(): return 0 or -E... rpc_create_client_dir(): don't bother with rpc_populate() rpc_new_dir(): the last argument is always NULL rpc_pipe: expand the calls of rpc_mkdir_populate() rpc_gssd_dummy_populate(): don't bother with rpc_populate() rpc_mkpipe_dentry(): switch to simple_start_creating() rpc_pipe: saner primitive for creating regular files rpc_pipe: saner primitive for creating subdirectories rpc_pipe: don't overdo directory locking rpc_mkpipe_dentry(): saner calling conventions rpc_unlink(): saner calling conventions rpc_populate(): lift cleanup into callers rpc_unlink(): use simple_recursive_removal() rpc_{rmdir_,}depopulate(): use simple_recursive_removal() instead rpc_pipe: clean failure exits in fill_super new helper: simple_start_creating()
3 daysMerge tag 'pull-dcache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dentry d_flags updates from Al Viro: "The current exclusion rules for dentry->d_flags stores are rather unpleasant. The basic rules are simple: - stores to dentry->d_flags are OK under dentry->d_lock - stores to dentry->d_flags are OK in the dentry constructor, before becomes potentially visible to other threads Unfortunately, there's a couple of exceptions to that, and that's where the headache comes from. The main PITA comes from d_set_d_op(); that primitive sets ->d_op of dentry and adjusts the flags that correspond to presence of individual methods. It's very easy to misuse; existing uses _are_ safe, but proof of correctness is brittle. Use in __d_alloc() is safe (we are within a constructor), but we might as well precalculate the initial value of 'd_flags' when we set the default ->d_op for given superblock and set 'd_flags' directly instead of messing with that helper. The reasons why other uses are safe are bloody convoluted; I'm not going to reproduce it here. See [1] for gory details, if you care. The critical part is using d_set_d_op() only just prior to d_splice_alias(), which makes a combination of d_splice_alias() with setting ->d_op, etc a natural replacement primitive. Better yet, if we go that way, it's easy to take setting ->d_op and modifying 'd_flags' under ->d_lock, which eliminates the headache as far as 'd_flags' exclusion rules are concerned. Other exceptions are minor and easy to deal with. What this series does: - d_set_d_op() is no longer available; instead a new primitive (d_splice_alias_ops()) is provided, equivalent to combination of d_set_d_op() and d_splice_alias(). - new field of struct super_block - 's_d_flags'. This sets the default value of 'd_flags' to be used when allocating dentries on this filesystem. - new primitive for setting 's_d_op': set_default_d_op(). This replaces stores to 's_d_op' at mount time. All in-tree filesystems converted; out-of-tree ones will get caught by the compiler ('s_d_op' is renamed, so stores to it will be caught). 's_d_flags' is set by the same primitive to match the 's_d_op'. - a lot of filesystems had sb->s_d_op->d_delete equal to always_delete_dentry; that is equivalent to setting DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well set that bit in 's_d_flags' and drop 'd_delete()' from dentry_operations. In quite a few cases that results in empty dentry_operations, which means that we can get rid of those. - kill simple_dentry_operations - not needed anymore - massage d_alloc_parallel() to get rid of the other exception wrt 'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we allocate the new dentry; no need to delay that until we commit to using the sucker. As the result, 'd_flags' stores are all either under ->d_lock or done before the dentry becomes visible in any shared data structures" Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1] * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits) configfs: use DCACHE_DONTCACHE debugfs: use DCACHE_DONTCACHE efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry() 9p: don't bother with always_delete_dentry ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE kill simple_dentry_operations devpts, sunrpc, hostfs: don't bother with ->d_op shmem: no dentry retention past the refcount reaching zero d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier make d_set_d_op() static simple_lookup(): just set DCACHE_DONTCACHE tracefs: Add d_delete to remove negative dentries set_default_d_op(): calculate the matching value for ->d_flags correct the set of flags forbidden at d_set_d_op() time split d_flags calculation out of d_set_d_op() new helper: set_default_d_op() fuse: no need for special dentry_operations for root dentry switch procfs from d_set_d_op() to d_splice_alias_ops() new helper: d_splice_alias_ops() procfs: kill ->proc_dops ...
3 daysMerge tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "NFSD is finally able to offer write delegations to clients that open files with O_WRONLY, thanks to patches from Dai Ngo. We're expecting this to accelerate a few interesting corner cases. The cap on the number of operations per NFSv4 COMPOUND has been lifted. Now, clients that send COMPOUNDs containing dozens of operations (for example, a long stream of LOOKUP operations to walk a pathname in a single round trip) will no longer be rejected. This release re-enables the ability for NFSD to perform NFSv4.2 COPY operations asynchronously. This feature has been disabled to mitigate the risk of denial-of-service when too many such requests arrive. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.17 development cycle" * tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (32 commits) nfsd: Drop dprintk in blocklayout xdr functions sunrpc: make svc_tcp_sendmsg() take a signed sentp pointer sunrpc: rearrange struct svc_rqst for fewer cachelines sunrpc: return better error in svcauth_gss_accept() on alloc failure sunrpc: reset rq_accept_statp when starting a new RPC sunrpc: remove SVC_SYSERR sunrpc: fix handling of unknown auth status codes NFSD: Simplify struct knfsd_fh NFSD: Access a knfsd_fh's fsid by pointer Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous" NFSD: Avoid multiple -Wflex-array-member-not-at-end warnings NFSD: Use vfs_iocb_iter_write() NFSD: Use vfs_iocb_iter_read() NFSD: Clean up kdoc for nfsd_open_local_fh() NFSD: Clean up kdoc for nfsd_file_put_local() NFSD: Remove definition for trace_nfsd_ctl_maxconn NFSD: Remove definition for trace_nfsd_file_gc_recent NFSD: Remove definitions for unused trace_nfsd_file_lru trace points NFSD: Remove definition for trace_nfsd_file_unhash_and_queue nfsd: Use correct error code when decoding extents ...
5 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.17 net-next PR. Conflicts: net/core/neighbour.c 1bbb76a89948 ("neighbour: Fix null-ptr-deref in neigh_flush_dev().") 13a936bb99fb ("neighbour: Protect tbl->phash_buckets[] with a dedicated mutex.") 03dc03fa0432 ("neighbor: Add NTF_EXT_VALIDATED flag for externally validated entries") Adjacent changes: drivers/net/usb/usbnet.c 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event") 2c04d279e857 ("net: usb: Convert tasklet API to new bottom half workqueue mechanism") net/ipv6/route.c 31d7d67ba127 ("ipv6: annotate data-races around rt->fib6_nsiblings") 1caf27297215 ("ipv6: adopt dst_dev() helper") 3b3ccf9ed05e ("net: Remove unnecessary NULL check for lwtunnel_fill_encap()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysipv6: annotate data-races around rt->fib6_nsiblingsEric Dumazet
rt->fib6_nsiblings can be read locklessly, add corresponding READ_ONCE() and WRITE_ONCE() annotations. Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250725140725.3626540-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysipv6: fix possible infinite loop in fib6_info_uses_dev()Eric Dumazet
fib6_info_uses_dev() seems to rely on RCU without an explicit protection. Like the prior fix in rt6_nlmsg_size(), we need to make sure fib6_del_route() or fib6_add_rt2node() have not removed the anchor from the list, or we risk an infinite loop. Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250725140725.3626540-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysipv6: prevent infinite loop in rt6_nlmsg_size()Eric Dumazet
While testing prior patch, I was able to trigger an infinite loop in rt6_nlmsg_size() in the following place: list_for_each_entry_rcu(sibling, &f6i->fib6_siblings, fib6_siblings) { rt6_nh_nlmsg_size(sibling->fib6_nh, &nexthop_len); } This is because fib6_del_route() and fib6_add_rt2node() uses list_del_rcu(), which can confuse rcu readers, because they might no longer see the head of the list. Restart the loop if f6i->fib6_nsiblings is zero. Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250725140725.3626540-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysipv6: add a retry logic in net6_rt_notify()Eric Dumazet
inet6_rt_notify() can be called under RCU protection only. This means the route could be changed concurrently and rt6_fill_node() could return -EMSGSIZE. Re-size the skb when this happens and retry, removing one WARN_ON() that syzbot was able to trigger: WARNING: CPU: 3 PID: 6291 at net/ipv6/route.c:6342 inet6_rt_notify+0x475/0x4b0 net/ipv6/route.c:6342 Modules linked in: CPU: 3 UID: 0 PID: 6291 Comm: syz.0.77 Not tainted 6.16.0-rc7-syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:inet6_rt_notify+0x475/0x4b0 net/ipv6/route.c:6342 Code: fc ff ff e8 6d 52 ea f7 e9 47 fc ff ff 48 8b 7c 24 08 4c 89 04 24 e8 5a 52 ea f7 4c 8b 04 24 e9 94 fd ff ff e8 9c fe 84 f7 90 <0f> 0b 90 e9 bd fd ff ff e8 6e 52 ea f7 e9 bb fb ff ff 48 89 df e8 RSP: 0018:ffffc900035cf1d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffffc900035cf540 RCX: ffffffff8a36e790 RDX: ffff88802f7e8000 RSI: ffffffff8a36e9d4 RDI: 0000000000000005 RBP: ffff88803c230f00 R08: 0000000000000005 R09: 00000000ffffffa6 R10: 00000000ffffffa6 R11: 0000000000000001 R12: 00000000ffffffa6 R13: 0000000000000900 R14: ffff888032ea4100 R15: 0000000000000000 FS: 00007fac7b89a6c0(0000) GS:ffff8880d6a20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fac7b899f98 CR3: 0000000034b3f000 CR4: 0000000000352ef0 Call Trace: <TASK> ip6_route_mpath_notify+0xde/0x280 net/ipv6/route.c:5356 ip6_route_multipath_add+0x1181/0x1bd0 net/ipv6/route.c:5536 inet6_rtm_newroute+0xe4/0x1a0 net/ipv6/route.c:5647 rtnetlink_rcv_msg+0x95e/0xe90 net/core/rtnetlink.c:6944 netlink_rcv_skb+0x155/0x420 net/netlink/af_netlink.c:2552 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0x58d/0x850 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg net/socket.c:727 [inline] ____sys_sendmsg+0xa95/0xc70 net/socket.c:2566 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620 Fixes: 169fd62799e8 ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://patch.msgid.link/20250725140725.3626540-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysnet/sched: taprio: align entry index attr validation with mqprioSimon Horman
Both taprio and mqprio have code to validate respective entry index attributes. The validation is indented to ensure that the attribute is present, and that it's value is in range, and that each value is only used once. The purpose of this patch is to align the implementation of taprio with that of mqprio as there seems to be no good reason for them to differ. For one thing, this way, bugs will be present in both or neither. As a follow-up some consideration could be given to a common function used by both sch. No functional change intended. Except of tdc run: the results of the taprio tests # ok 81 ba39 - Add taprio Qdisc to multi-queue device (8 queues) # ok 82 9462 - Add taprio Qdisc with multiple sched-entry # ok 83 8d92 - Add taprio Qdisc with txtime-delay # ok 84 d092 - Delete taprio Qdisc with valid handle # ok 85 8471 - Show taprio class # ok 86 0a85 - Add taprio Qdisc to single-queue device # ok 87 6f62 - Add taprio Qdisc with too short interval # ok 88 831f - Add taprio Qdisc with too short cycle-time # ok 89 3e1e - Add taprio Qdisc with an invalid cycle-time # ok 90 39b4 - Reject grafting taprio as child qdisc of software taprio # ok 91 e8a1 - Reject grafting taprio as child qdisc of offloaded taprio # ok 92 a7bf - Graft cbs as child of software taprio # ok 93 6a83 - Graft cbs as child of offloaded taprio Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Cc: Maher Azzouzi <maherazz04@gmail.com> Link: https://lore.kernel.org/netdev/20250723125521.GA2459@horms.kernel.org/ Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://patch.msgid.link/20250725-taprio-idx-parse-v1-1-b582fffcde37@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysvsock: remove unnecessary null check in vsock_getname()Wang Liang
The local variable 'vm_addr' is always not NULL, no need to check it. Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250725013808.337924-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysMerge tag 'nf-next-25-07-25' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following series contains Netfilter/IPVS updates for net-next: 1) Display netns inode in conntrack table full log, from lvxiafei. 2) Autoload nf_log_syslog in case no logging backend is available, from Lance Yang. 3) Three patches to remove unused functions in x_tables, nf_tables and conntrack. From Yue Haibing. 4) Exclude LEGACY TABLES on PREEMPT_RT: Add NETFILTER_XTABLES_LEGACY to exclude xtables legacy infrastructure. 5) Restore selftests by toggling NETFILTER_XTABLES_LEGACY where needed. From Florian Westphal. 6) Use CONFIG_INET_SCTP_DIAG in tools/testing/selftests/net/netfilter/config, from Sebastian Andrzej Siewior. 7) Use timer_delete in comment in IPVS codebase, from WangYuli. 8) Dump flowtable information in nfnetlink_hook, this includes an initial patch to consolidate common code in helper function, from Phil Sutter. 9) Remove unused arguments in nft_pipapo set backend, from Florian Westphal. 10) Return nft_set_ext instead of boolean in set lookup function, from Florian Westphal. 11) Remove indirection in dynamic set infrastructure, also from Florian. 12) Consolidate pipapo_get/lookup, from Florian. 13) Use kvmalloc in nft_pipapop, from Florian Westphal. 14) syzbot reports slab-out-of-bounds in xt_nfacct log message, fix from Florian Westphal. 15) Ignored tainted kernels in selftest nft_interface_stress.sh, from Phil Sutter. 16) Fix IPVS selftest by disabling rp_filter with ipip tunnel device, from Yi Chen. * tag 'nf-next-25-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: selftests: netfilter: ipvs.sh: Explicity disable rp_filter on interface tunl0 selftests: netfilter: Ignore tainted kernels in interface stress test netfilter: xt_nfacct: don't assume acct name is null-terminated netfilter: nft_set_pipapo: prefer kvmalloc for scratch maps netfilter: nft_set_pipapo: merge pipapo_get/lookup netfilter: nft_set: remove indirection from update API call netfilter: nft_set: remove one argument from lookup and update functions netfilter: nft_set_pipapo: remove unused arguments netfilter: nfnetlink_hook: Dump flowtable info netfilter: nfnetlink: New NFNLA_HOOK_INFO_DESC helper ipvs: Rename del_timer in comment in ip_vs_conn_expire_now() selftests: netfilter: Enable CONFIG_INET_SCTP_DIAG selftests: net: Enable legacy netfilter legacy options. netfilter: Exclude LEGACY TABLES on PREEMPT_RT. netfilter: conntrack: Remove unused net in nf_conntrack_double_lock() netfilter: nf_tables: Remove unused nft_reduce_is_readonly() netfilter: x_tables: Remove unused functions xt_{in|out}name() netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid netfilter: conntrack: table full detailed log ==================== Link: https://patch.msgid.link/20250725170340.21327-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet/sched: Add precise drop reason for pfifo_fast queue overflowsFan Yu
Currently, packets dropped by pfifo_fast due to queue overflow are marked with a generic SKB_DROP_REASON_QDISC_DROP in __dev_xmit_skb(). This patch adds explicit drop reason SKB_DROP_REASON_QDISC_OVERLIMIT for queue-full cases, providing better distinction from other qdisc drops. Signed-off-by: Fan Yu <fan.yu9@zte.com.cn> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250724212837119BP9HOs0ibXDRWgsXMMir7@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysneighbour: Fix null-ptr-deref in neigh_flush_dev().Kuniyuki Iwashima
kernel test robot reported null-ptr-deref in neigh_flush_dev(). [0] The cited commit introduced per-netdev neighbour list and converted neigh_flush_dev() to use it instead of the global hash table. One thing we missed is that neigh_table_clear() calls neigh_ifdown() with NULL dev. Let's restore the hash table iteration. Note that IPv6 module is no longer unloadable, so neigh_table_clear() is called only when IPv6 fails to initialise, which is unlikely to happen. [0]: IPv6: Attempt to unregister permanent protocol 136 IPv6: Attempt to unregister permanent protocol 17 Oops: general protection fault, probably for non-canonical address 0xdffffc00000001a0: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000d00-0x0000000000000d07] CPU: 1 UID: 0 PID: 1 Comm: systemd Tainted: G T 6.12.0-rc6-01246-gf7f52738637f #1 Tainted: [T]=RANDSTRUCT Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:neigh_flush_dev.llvm.6395807810224103582+0x52/0x570 Code: c1 e8 03 42 8a 04 38 84 c0 0f 85 15 05 00 00 31 c0 41 83 3e 0a 0f 94 c0 48 8d 1c c3 48 81 c3 f8 0c 00 00 48 89 d8 48 c1 e8 03 <42> 80 3c 38 00 74 08 48 89 df e8 f7 49 93 fe 4c 8b 3b 4d 85 ff 0f RSP: 0000:ffff88810026f408 EFLAGS: 00010206 RAX: 00000000000001a0 RBX: 0000000000000d00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffc0631640 RBP: ffff88810026f470 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffffc0625250 R14: ffffffffc0631640 R15: dffffc0000000000 FS: 00007f575cb83940(0000) GS:ffff8883aee00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f575db40008 CR3: 00000002bf936000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __neigh_ifdown.llvm.6395807810224103582+0x44/0x390 neigh_table_clear+0xb1/0x268 ndisc_cleanup+0x21/0x38 [ipv6] init_module+0x2f5/0x468 [ipv6] do_one_initcall+0x1ba/0x628 do_init_module+0x21a/0x530 load_module+0x2550/0x2ea0 __se_sys_finit_module+0x3d2/0x620 __x64_sys_finit_module+0x76/0x88 x64_sys_call+0x7ff/0xde8 do_syscall_64+0xfb/0x1e8 entry_SYSCALL_64_after_hwframe+0x67/0x6f RIP: 0033:0x7f575d6f2719 Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007fff82a2a268 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000557827b45310 RCX: 00007f575d6f2719 RDX: 0000000000000000 RSI: 00007f575d584efd RDI: 0000000000000004 RBP: 00007f575d584efd R08: 0000000000000000 R09: 0000557827b47b00 R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000020000 R13: 0000000000000000 R14: 0000557827b470e0 R15: 00007f575dbb4270 </TASK> Modules linked in: ipv6(+) Fixes: f7f52738637f4 ("neighbour: Create netdev->neighbour association") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507200931.7a89ecd8-lkp@intel.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250723195443.448163-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysipv6: add `force_forwarding` sysctl to enable per-interface forwardingGabriel Goller
It is currently impossible to enable ipv6 forwarding on a per-interface basis like in ipv4. To enable forwarding on an ipv6 interface we need to enable it on all interfaces and disable it on the other interfaces using a netfilter rule. This is especially cumbersome if you have lots of interfaces and only want to enable forwarding on a few. According to the sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding for all interfaces, while the interface-specific `net.ipv6.conf.<interface>.forwarding` configures the interface Host/Router configuration. Introduce a new sysctl flag `force_forwarding`, which can be set on every interface. The ip6_forwarding function will then check if the global forwarding flag OR the force_forwarding flag is active and forward the packet. To preserve backwards-compatibility reset the flag (on all interfaces) to 0 if the net.ipv6.conf.all.forwarding flag is set to 0. Add a short selftest that checks if a packet gets forwarded with and without `force_forwarding`. [0]: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Gabriel Goller <g.goller@proxmox.com> Link: https://patch.msgid.link/20250722081847.132632-1-g.goller@proxmox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysmptcp: remove pr_fallback()Paolo Abeni
We can now track fully the fallback status of a given connection via the relevant mibs, the mentioned helper is redundant. Remove it completely. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250723-net-next-mptcp-track-fallbacks-v1-2-a83cce08f2d5@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysmptcp: track fallbacks accurately via mibsPaolo Abeni
Add the mibs required to cover the few possible fallback causes still lacking suck info. Move the relevant mib increment into the fallback helper, so that no eventual future fallback operation will miss a paired mib increment. Additionally track failed fallback via its own mib, such mib is incremented only when a fallback mandated by the protocol fails - due to racing subflow creation. While at the above, rename an existing helper to reduce long lines problems all along. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250723-net-next-mptcp-track-fallbacks-v1-1-a83cce08f2d5@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnetfilter: xt_nfacct: don't assume acct name is null-terminatedFlorian Westphal
BUG: KASAN: slab-out-of-bounds in .. lib/vsprintf.c:721 Read of size 1 at addr ffff88801eac95c8 by task syz-executor183/5851 [..] string+0x231/0x2b0 lib/vsprintf.c:721 vsnprintf+0x739/0xf00 lib/vsprintf.c:2874 [..] nfacct_mt_checkentry+0xd2/0xe0 net/netfilter/xt_nfacct.c:41 xt_check_match+0x3d1/0xab0 net/netfilter/x_tables.c:523 nfnl_acct_find_get() handles non-null input, but the error printk relied on its presence. Reported-by: syzbot+4ff165b9251e4d295690@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4ff165b9251e4d295690 Tested-by: syzbot+4ff165b9251e4d295690@syzkaller.appspotmail.com Fixes: ceb98d03eac5 ("netfilter: xtables: add nfacct match to support extended accounting") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: nft_set_pipapo: prefer kvmalloc for scratch mapsFlorian Westphal
The scratchmap size depends on the number of elements in the set. For huge sets, each scratch map can easily require very large allocations, e.g. for 100k entries each scratch map will require close to 64kbyte of memory. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: nft_set_pipapo: merge pipapo_get/lookupFlorian Westphal
The matching algorithm has implemented thrice: 1. data path lookup, generic version 2. data path lookup, avx2 version 3. control plane lookup Merge 1 and 3 by refactoring pipapo_get as a common helper, then make nft_pipapo_lookup and nft_pipapo_get both call the common helper. Aside from the code savings this has the benefit that we no longer allocate temporary scratch maps for each control plane get and insertion operation. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: nft_set: remove indirection from update API callFlorian Westphal
This stems from a time when sets and nft_dynset resided in different kernel modules. We can replace this with a direct call. We could even remove both ->update and ->delete, given its only supported by rhashtable, but on the off-chance we'll see runtime add/delete for other types or a new set type keep that as-is for now. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: nft_set: remove one argument from lookup and update functionsFlorian Westphal
Return the extension pointer instead of passing it as a function argument to be filled in by the callee. As-is, whenever false is returned, the extension pointer is not used. For all set types, when true is returned, the extension pointer was set to the matching element. Only exception: nft_set_bitmap doesn't support extensions. Return a pointer to a static const empty element extension container. return false -> return NULL return true -> return the elements' extension pointer. This saves one function argument. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: nft_set_pipapo: remove unused argumentsFlorian Westphal
They are not used anymore, so remove them. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: nfnetlink_hook: Dump flowtable infoPhil Sutter
Introduce NFNL_HOOK_TYPE_NFT_FLOWTABLE to distinguish flowtable hooks from base chain ones. Nested attributes are shared with the old NFTABLES hook info type since they fit apart from their misleading name. Old nftables in user space will ignore this new hook type and thus continue to print flowtable hooks just like before, e.g.: | family netdev { | hook ingress device test0 { | 0000000000 nf_flow_offload_ip_hook [nf_flow_table] | } | } With this patch in place and support for the new hook info type, output becomes more useful: | family netdev { | hook ingress device test0 { | 0000000000 flowtable ip mytable myft [nf_flow_table] | } | } Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: nfnetlink: New NFNLA_HOOK_INFO_DESC helperPhil Sutter
Introduce a helper routine adding the nested attribute for use by a second caller later. Note how this introduces cancelling of 'nest2' for categorical reasons. Since always followed by cancelling of the outer 'nest', it is technically not needed. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysipvs: Rename del_timer in comment in ip_vs_conn_expire_now()WangYuli
Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()") switched del_timer to timer_delete, but did not modify the comment for ip_vs_conn_expire_now(). Now fix it. Signed-off-by: WangYuli <wangyuli@uniontech.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: Exclude LEGACY TABLES on PREEMPT_RT.Pablo Neira Ayuso
The seqcount xt_recseq is used to synchronize the replacement of xt_table::private in xt_replace_table() against all readers such as ipt_do_table() To ensure that there is only one writer, the writing side disables bottom halves. The sequence counter can be acquired recursively. Only the first invocation modifies the sequence counter (signaling that a writer is in progress) while the following (recursive) writer does not modify the counter. The lack of a proper locking mechanism for the sequence counter can lead to live lock on PREEMPT_RT if the high prior reader preempts the writer. Additionally if the per-CPU lock on PREEMPT_RT is removed from local_bh_disable() then there is no synchronisation for the per-CPU sequence counter. The affected code is "just" the legacy netfilter code which is replaced by "netfilter tables". That code can be disabled without sacrificing functionality because everything is provided by the newer implementation. This will only requires the usage of the "-nft" tools instead of the "-legacy" ones. The long term plan is to remove the legacy code so lets accelerate the progress. Relax dependencies on iptables legacy, replace select with depends on, this should cause no harm to existing kernel configs and users can still toggle IP{6}_NF_IPTABLES_LEGACY in any case. Make EBTABLES_LEGACY, IPTABLES_LEGACY and ARPTABLES depend on NETFILTER_XTABLES_LEGACY. Hide xt_recseq and its users, xt_register_table() and xt_percpu_counter_alloc() behind NETFILTER_XTABLES_LEGACY. Let NETFILTER_XTABLES_LEGACY depend on !PREEMPT_RT. This will break selftest expecing the legacy options enabled and will be addressed in a following patch. Co-developed-by: Florian Westphal <fw@strlen.de> Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: conntrack: Remove unused net in nf_conntrack_double_lock()Yue Haibing
Since commit a3efd81205b1 ("netfilter: conntrack: move generation seqcnt out of netns_ct") this param is unused. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: load nf_log_syslog on enabling nf_conntrack_log_invalidLance Yang
When no logger is registered, nf_conntrack_log_invalid fails to log invalid packets, leaving users unaware of actual invalid traffic. Improve this by loading nf_log_syslog, similar to how 'iptables -I FORWARD 1 -m conntrack --ctstate INVALID -j LOG' triggers it. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Zi Li <zi.li@linux.dev> Signed-off-by: Lance Yang <lance.yang@linux.dev> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: conntrack: table full detailed loglvxiafei
Add the netns field in the "nf_conntrack: table full, dropping packet" log to help locate the specific netns when the table is full. Signed-off-by: lvxiafei <lvxiafei@sensetime.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
7 daysnet: define an enum for the napi threaded stateSamiullah Khawaja
Instead of using '0' and '1' for napi threaded state use an enum with 'disabled' and 'enabled' states. Tested: ./tools/testing/selftests/net/nl_netdev.py TAP version 13 1..7 ok 1 nl_netdev.empty_check ok 2 nl_netdev.lo_check ok 3 nl_netdev.page_pool_check ok 4 nl_netdev.napi_list_check ok 5 nl_netdev.dev_set_threaded ok 6 nl_netdev.napi_set_threaded ok 7 nl_netdev.nsim_rxq_reset_down # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Link: https://patch.msgid.link/20250723013031.2911384-4-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: Use netif_threaded_enable instead of netif_set_threaded in driversSamiullah Khawaja
Prepare for adding an enum type for NAPI threaded states by adding netif_threaded_enable API. De-export the existing netif_set_threaded API and only use it internally. Update existing drivers to use netif_threaded_enable instead of the de-exported netif_set_threaded. Note that dev_set_threaded used by mt76 debugfs file is unchanged. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Link: https://patch.msgid.link/20250723013031.2911384-3-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet: Create separate gro_flush_normal functionSamiullah Khawaja
Move multiple copies of same code snippet doing `gro_flush` and `gro_normal_list` into separate helper function. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250723013031.2911384-2-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'for-net-next-2025-07-23' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: core: - hci_sync: fix double free in 'hci_discovery_filter_clear()' - hci_event: Mask data status from LE ext adv reports - hci_devcd_dump: fix out-of-bounds via dev_coredumpv - ISO: add socket option to report packet seqnum via CMSG - hci_event: Add support for handling LE BIG Sync Lost event - ISO: Support SCM_TIMESTAMPING for ISO TS - hci_core: Add PA_LINK to distinguish BIG sync and PA sync connections - hci_sock: Reset cookie to zero in hci_sock_free_cookie() drivers: - btusb: Add new VID/PID 0489/e14e for MT7925 - btusb: Add a new VID/PID 2c7c/7009 for MT7925 - btusb: Add RTL8852BE device 0x13d3:0x3618 - btusb: Add support for variant of RTL8851BE (USB ID 13d3:3601) - btusb: Add USB ID 3625:010b for TP-LINK Archer TX10UB Nano - btusb: QCA: Support downloading custom-made firmwares - btusb: Add one more ID 0x28de:0x1401 for Qualcomm WCN6855 - nxp: add support for supply and reset - btnxpuart: Add support for 4M baudrate - btnxpuart: Correct the Independent Reset handling after FW dump - btnxpuart: Add uevents for FW dump and FW download complete - btintel: Define a macro for Intel Reset vendor command - btintel_pcie: Support Function level reset - btintel_pcie: Add support for device 0x4d76 - btintel_pcie: Make driver wait for alive interrupt - btintel_pcie: Fix Alive Context State Handling - hci_qca: Enable ISO data packet RX * tag 'for-net-next-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (42 commits) Bluetooth: Add PA_LINK to distinguish BIG sync and PA sync connections Bluetooth: hci_event: Mask data status from LE ext adv reports Bluetooth: btintel_pcie: Fix Alive Context State Handling Bluetooth: btintel_pcie: Make driver wait for alive interrupt Bluetooth: hci_devcd_dump: fix out-of-bounds via dev_coredumpv Bluetooth: hci_sync: fix double free in 'hci_discovery_filter_clear()' Bluetooth: btusb: Add one more ID 0x28de:0x1401 for Qualcomm WCN6855 Bluetooth: btusb: Sort WCN6855 device IDs by VID and PID Bluetooth: btusb: QCA: Support downloading custom-made firmwares Bluetooth: btnxpuart: Add uevents for FW dump and FW download complete Bluetooth: btnxpuart: Correct the Independent Reset handling after FW dump Bluetooth: ISO: Support SCM_TIMESTAMPING for ISO TS Bluetooth: ISO: add socket option to report packet seqnum via CMSG Bluetooth: btintel: Define a macro for Intel Reset vendor command Bluetooth: Fix typos in comments Bluetooth: RFCOMM: Fix typos in comments Bluetooth: aosp: Fix typo in comment Bluetooth: hci_bcm4377: Fix typo in comment Bluetooth: btrtl: Fix typo in comment Bluetooth: btmtk: Fix typo in log string ... ==================== Link: https://patch.msgid.link/20250723190233.166823-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-07-24 We've added 3 non-merge commits during the last 3 day(s) which contain a total of 4 files changed, 40 insertions(+), 15 deletions(-). The main changes are: 1) Improved verifier error message for incorrect narrower load from pointer field in ctx, from Paul Chaignon. 2) Disabled migration in nf_hook_run_bpf to address a syzbot report, from Kuniyuki Iwashima. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Test invalid narrower ctx load bpf: Reject narrower access to pointer ctx fields bpf: Disable migration in nf_hook_run_bpf(). ==================== Link: https://patch.msgid.link/20250724173306.3578483-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'wireless-next-2025-07-24' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Another wireless update: - rtw89: - STA+P2P concurrency - support for USB devices RTL8851BU/RTL8852BU - ath9k: OF support - ath12k: - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - iwlwifi: some FIPS interoperability - brcm80211: support SDIO 43751 device - rt2x00: better DT/OF support - cfg80211/mac80211: - improved S1G support - beacon monitor for MLO * tag 'wireless-next-2025-07-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (199 commits) ssb: use new GPIO line value setter callbacks for the second GPIO chip wifi: Fix typos wifi: brcmsmac: Use str_true_false() helper wifi: brcmfmac: fix EXTSAE WPA3 connection failure due to AUTH TX failure wifi: brcm80211: Remove yet more unused functions wifi: brcm80211: Remove more unused functions wifi: brcm80211: Remove unused functions wifi: iwlwifi: Revert "wifi: iwlwifi: remove support of several iwl_ppag_table_cmd versions" wifi: iwlwifi: check validity of the FW API range wifi: iwlwifi: don't export symbols that we shouldn't wifi: iwlwifi: mld: use spec link id and not FW link id wifi: iwlwifi: mld: decode EOF bit for AMPDUs wifi: iwlwifi: Remove support for rx OMI bandwidth reduction wifi: iwlwifi: stop supporting iwl_omi_send_status_notif ver 1 wifi: iwlwifi: remove SC2F firmware support wifi: iwlwifi: mvm: Remove NAN support wifi: iwlwifi: mld: avoid outdated reorder buffer head_sn wifi: iwlwifi: mvm: avoid outdated reorder buffer head_sn wifi: iwlwifi: disable certain features for fips_enabled wifi: iwlwifi: mld: support channel survey collection for ACS scans ... ==================== Link: https://patch.msgid.link/20250724100349.21564-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc8). Conflicts: drivers/net/ethernet/microsoft/mana/gdma_main.c 9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion") 755391121038 ("net: mana: Allocate MSI-X vectors dynamically") https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/icssg/icssg_prueth.h 6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG") ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'ipsec-next-2025-07-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2025-07-23 1) Optimize to hold device only for the asynchronous decryption, where it is really needed. From Jianbo Liu. 2) Align our inbund SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup. From Aakash Kumar S. 3) Skip redundant statistics update for xfrm crypto offload. From Jianbo Liu. Please pull or let me know if there are problems. * tag 'ipsec-next-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Skip redundant statistics update for crypto offload xfrm: Duplicate SPI Handling xfrm: hold device only for the asynchronous decryption ==================== Link: https://patch.msgid.link/20250723080402.3439619-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysMerge tag 'ipsec-2025-07-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-07-23 1) Premption fixes for xfrm_state_find. From Sabrina Dubroca. 2) Initialize offload path also for SW IPsec GRO. This fixes a performance regression on SW IPsec offload. From Leon Romanovsky. 3) Fix IPsec UDP GRO for IKE packets. From Tobias Brunner, 4) Fix transport header setting for IPcomp after decompressing. From Fernando Fernandez Mancera. 5) Fix use-after-free when xfrmi_changelink tries to change collect_md for a xfrm interface. From Eyal Birger . 6) Delete the special IPcomp x->tunnel state along with the state x to avoid refcount problems. From Sabrina Dubroca. Please pull or let me know if there are problems. * tag 'ipsec-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: Revert "xfrm: destroy xfrm_state synchronously on net exit path" xfrm: delete x->tunnel as we delete x xfrm: interface: fix use-after-free after changing collect_md xfrm interface xfrm: ipcomp: adjust transport header after decompressing xfrm: Set transport header to fix UDP GRO handling xfrm: always initialize offload path xfrm: state: use a consistent pcpu_id in xfrm_state_find xfrm: state: initialize state_ptrs earlier in xfrm_state_find ==================== Link: https://patch.msgid.link/20250723075417.3432644-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysbpf: Reject narrower access to pointer ctx fieldsPaul Chaignon
The following BPF program, simplified from a syzkaller repro, causes a kernel warning: r0 = *(u8 *)(r1 + 169); exit; With pointer field sk being at offset 168 in __sk_buff. This access is detected as a narrower read in bpf_skb_is_valid_access because it doesn't match offsetof(struct __sk_buff, sk). It is therefore allowed and later proceeds to bpf_convert_ctx_access. Note that for the "is_narrower_load" case in the convert_ctx_accesses(), the insn->off is aligned, so the cnt may not be 0 because it matches the offsetof(struct __sk_buff, sk) in the bpf_convert_ctx_access. However, the target_size stays 0 and the verifier errors with a kernel warning: verifier bug: error during ctx access conversion(1) This patch fixes that to return a proper "invalid bpf_context access off=X size=Y" error on the load instruction. The same issue affects multiple other fields in context structures that allow narrow access. Some other non-affected fields (for sk_msg, sk_lookup, and sockopt) were also changed to use bpf_ctx_range_ptr for consistency. Note this syzkaller crash was reported in the "Closes" link below, which used to be about a different bug, fixed in commit fce7bd8e385a ("bpf/verifier: Handle BPF_LOAD_ACQ instructions in insn_def_regno()"). Because syzbot somehow confused the two bugs, the new crash and repro didn't get reported to the mailing list. Fixes: f96da09473b52 ("bpf: simplify narrower ctx access") Fixes: 0df1a55afa832 ("bpf: Warn on internal verifier errors") Reported-by: syzbot+0ef84a7bdf5301d4cbec@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0ef84a7bdf5301d4cbec Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://patch.msgid.link/3b8dcee67ff4296903351a974ddd9c4dca768b64.1753194596.git.paul.chaignon@gmail.com
8 daysbpf: Disable migration in nf_hook_run_bpf().Kuniyuki Iwashima
syzbot reported that the netfilter bpf prog can be called without migration disabled in xmit path. Then the assertion in __bpf_prog_run() fails, triggering the splat below. [0] Let's use bpf_prog_run_pin_on_cpu() in nf_hook_run_bpf(). [0]: BUG: assuming non migratable context at ./include/linux/filter.h:703 in_atomic(): 0, irqs_disabled(): 0, migration_disabled() 0 pid: 5829, name: sshd-session 3 locks held by sshd-session/5829: #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1667 [inline] #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x20/0x50 net/ipv4/tcp.c:1395 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline] #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline] #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x69/0x26c0 net/ipv4/ip_output.c:470 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline] #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline] #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: nf_hook+0xb2/0x680 include/linux/netfilter.h:241 CPU: 0 UID: 0 PID: 5829 Comm: sshd-session Not tainted 6.16.0-rc6-syzkaller-00002-g155a3c003e55 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120 __cant_migrate kernel/sched/core.c:8860 [inline] __cant_migrate+0x1c7/0x250 kernel/sched/core.c:8834 __bpf_prog_run include/linux/filter.h:703 [inline] bpf_prog_run include/linux/filter.h:725 [inline] nf_hook_run_bpf+0x83/0x1e0 net/netfilter/nf_bpf_link.c:20 nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline] nf_hook_slow+0xbb/0x200 net/netfilter/core.c:623 nf_hook+0x370/0x680 include/linux/netfilter.h:272 NF_HOOK_COND include/linux/netfilter.h:305 [inline] ip_output+0x1bc/0x2a0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:459 [inline] ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x1d7d/0x26c0 net/ipv4/ip_output.c:527 __tcp_transmit_skb+0x2686/0x3e90 net/ipv4/tcp_output.c:1479 tcp_transmit_skb net/ipv4/tcp_output.c:1497 [inline] tcp_write_xmit+0x1274/0x84e0 net/ipv4/tcp_output.c:2838 __tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3021 tcp_push+0x225/0x700 net/ipv4/tcp.c:759 tcp_sendmsg_locked+0x1870/0x42b0 net/ipv4/tcp.c:1359 tcp_sendmsg+0x2e/0x50 net/ipv4/tcp.c:1396 inet_sendmsg+0xb9/0x140 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg net/socket.c:727 [inline] sock_write_iter+0x4aa/0x5b0 net/socket.c:1131 new_sync_write fs/read_write.c:593 [inline] vfs_write+0x6c7/0x1150 fs/read_write.c:686 ksys_write+0x1f8/0x250 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe7d365d407 Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: Fixes: fd9c663b9ad67 ("bpf: minimal support for programs hooked into netfilter framework") Reported-by: syzbot+40f772d37250b6d10efc@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6879466d.a00a0220.3af5df.0022.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Tested-by: syzbot+40f772d37250b6d10efc@syzkaller.appspotmail.com Acked-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20250722224041.112292-1-kuniyu@google.com
8 dayssched: Add enqueue/dequeue of dualpi2 qdiscKoen De Schepper
DualPI2 provides L4S-type low latency & loss to traffic that uses a scalable congestion controller (e.g. TCP-Prague, DCTCP) without degrading the performance of 'classic' traffic (e.g. Reno, Cubic etc.). It is to be the reference implementation of IETF RFC9332 DualQ Coupled AQM (https://datatracker.ietf.org/doc/html/rfc9332). Note that creating two independent queues cannot meet the goal of DualPI2 mentioned in RFC9332: "...to preserve fairness between ECN-capable and non-ECN-capable traffic." Further, it could even lead to starvation of Classic traffic, which is also inconsistent with the requirements in RFC9332: "...although priority MUST be bounded in order not to starve Classic traffic." DualPI2 is designed to maintain approximate per-flow fairness on L-queue and C-queue by forming a single qdisc using the coupling factor and scheduler between two queues. The qdisc provides two queues called low latency and classic. It classifies packets based on the ECN field in the IP headers. By default it directs non-ECN and ECT(0) into the classic queue and ECT(1) and CE into the low latency queue, as per the IETF spec. Each queue runs its own AQM: * The classic AQM is called PI2, which is similar to the PIE AQM but more responsive and simpler. Classic traffic requires a decent target queue (default 15ms for Internet deployment) to fully utilize the link and to avoid high drop rates. * The low latency AQM is, by default, a very shallow ECN marking threshold (1ms) similar to that used for DCTCP. The DualQ isolates the low queuing delay of the Low Latency queue from the larger delay of the 'Classic' queue. However, from a bandwidth perspective, flows in either queue will share out the link capacity as if there was just a single queue. This bandwidth pooling effect is achieved by coupling together the drop and ECN-marking probabilities of the two AQMs. The PI2 AQM has two main parameters in addition to its target delay. The integral gain factor alpha is used to slowly correct any persistent standing queue error from the target delay, while the proportional gain factor beta is used to quickly compensate for queue changes (growth or shrinkage). Either alpha and beta are given as a parameter, or they can be calculated by tc from alternative typical and maximum RTT parameters. Internally, the output of a linear Proportional Integral (PI) controller is used for both queues. This output is squared to calculate the drop or ECN-marking probability of the classic queue. This counterbalances the square-root rate equation of Reno/Cubic, which is the trick that balances flow rates across the queues. For the ECN-marking probability of the low latency queue, the output of the base AQM is multiplied by a coupling factor. This determines the balance between the flow rates in each queue. The default setting makes the flow rates roughly equal, which should be generally applicable. If DUALPI2 AQM has detected overload (due to excessive non-responsive traffic in either queue), it will switch to signaling congestion solely using drop, irrespective of the ECN field. Alternatively, it can be configured to limit the drop probability and let the queue grow and eventually overflow (like tail-drop). GSO splitting in DUALPI2 is configurable from userspace while the default behavior is to split gso. When running DUALPI2 at unshaped 10gigE with 4 download streams test, splitting gso apart results in halving the latency with no loss in throughput: Summary of tcp_4down run 'no_split_gso': avg median # data pts Ping (ms) ICMP : 0.53 0.30 ms 350 TCP download avg : 2326.86 N/A Mbits/s 350 TCP download sum : 9307.42 N/A Mbits/s 350 TCP download::1 : 2672.99 2568.73 Mbits/s 350 TCP download::2 : 2586.96 2570.51 Mbits/s 350 TCP download::3 : 1786.26 1798.82 Mbits/s 350 TCP download::4 : 2261.21 2309.49 Mbits/s 350 Summart of tcp_4down run 'split_gso': avg median # data pts Ping (ms) ICMP   : 0.22 0.23 ms 350 TCP download avg : 2335.02 N/A Mbits/s 350 TCP download sum : 9340.09 N/A Mbits/s 350 TCP download::1 : 2335.30 2334.22 Mbits/s 350 TCP download::2 : 2334.72 2334.20 Mbits/s 350 TCP download::3 : 2335.28 2334.58 Mbits/s 350 TCP download::4 : 2334.79 2334.39 Mbits/s 350 A similar result is observed when running DUALPI2 at unshaped 1gigE with 1 download stream test: Summary of tcp_1down run 'no_split_gso': avg median # data pts Ping (ms) ICMP : 1.13 1.25 ms 350 TCP download : 941.41 941.46 Mbits/s 350 Summart of tcp_1down run 'split_gso': avg median # data pts Ping (ms) ICMP : 0.51 0.55 ms 350 TCP download : 941.41 941.45 Mbits/s 350 Additional details can be found in the draft: https://datatracker.ietf.org/doc/html/rfc9332 Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com> Co-developed-by: Olga Albisser <olga@albisser.org> Signed-off-by: Olga Albisser <olga@albisser.org> Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com> Co-developed-by: Henrik Steen <henrist@henrist.net> Signed-off-by: Henrik Steen <henrist@henrist.net> Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Signed-off-by: Bob Briscoe <research@bobbriscoe.net> Signed-off-by: Ilpo Järvinen <ij@kernel.org> Acked-by: Dave Taht <dave.taht@gmail.com> Link: https://patch.msgid.link/20250722095915.24485-4-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>