summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-29ublk: enhance check for register/unregister io buffer commandMing Lei
The simple check of UBLK_IO_FLAG_OWNED_BY_SRV can avoid incorrect register/unregister io buffer easily, so check it before calling starting to register/un-register io buffer. Also only allow io buffer register/unregister uring_cmd in case of UBLK_F_SUPPORT_ZERO_COPY. Also mark argument 'ublk_queue *' of ublk_register_io_buf as const. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 1f6540e2aabb ("ublk: zc register/unregister bvec") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250429022941.1718671-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-29ublk: decouple zero copy from user copyMing Lei
UBLK_F_USER_COPY and UBLK_F_SUPPORT_ZERO_COPY are two different features, and shouldn't be coupled together. Commit 1f6540e2aabb ("ublk: zc register/unregister bvec") enables user copy automatically in case of UBLK_F_SUPPORT_ZERO_COPY, this way isn't correct. So decouple zero copy from user copy, and use independent helper to check each one. Fixes: 1f6540e2aabb ("ublk: zc register/unregister bvec") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250429022941.1718671-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-29selftests: ublk: fix UBLK_F_NEED_GET_DATAMing Lei
Commit 57e13a2e8cd2 ("selftests: ublk: support user recovery") starts to support UBLK_F_NEED_GET_DATA for covering recovery feature, however the ublk utility implementation isn't done correctly. Fix it by supporting UBLK_F_NEED_GET_DATA correctly. Also add test generic_07 for covering UBLK_F_NEED_GET_DATA. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 57e13a2e8cd2 ("selftests: ublk: support user recovery") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250429022941.1718671-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-29pinctrl: qcom: Fix PINGROUP definition for sm8750Maulik Shah
On newer SoCs intr_target_bit position is at 8 instead of 5. Fix it. Also add missing intr_wakeup_present_bit and intr_wakeup_enable_bit which enables forwarding of GPIO interrupts to parent PDC interrupt controller. Fixes: afe9803e3b82 ("pinctrl: qcom: Add sm8750 pinctrl driver") Signed-off-by: Maulik Shah <maulik.shah@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Melody Olvera <melody.olvera@oss.qualcomm.com> Link: https://lore.kernel.org/20250429-pinctrl_sm8750-v2-1-87d45dd3bd82@oss.qualcomm.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-04-29i2c: imx-lpi2c: Fix clock count when probe defersClark Wang
Deferred probe with pm_runtime_put() may delay clock disable, causing incorrect clock usage count. Use pm_runtime_put_sync() to ensure the clock is disabled immediately. Fixes: 13d6eb20fc79 ("i2c: imx-lpi2c: add runtime pm support") Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Carlos Song <carlos.song@nxp.com> Cc: <stable@vger.kernel.org> # v4.16+ Link: https://lore.kernel.org/r/20250421062341.2471922-1-carlos.song@nxp.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-04-28drm/xe/guc: Fix capture of steering registersJohn Harrison
The list of registers to capture on a GPU hang includes some that require steering. Unfortunately, the flag to say this was being wiped to due a missing OR on the assignment of the next flag field. Fix that. Fixes: b170d696c1e2 ("drm/xe/guc: Add XE_LP steered register lists") Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-xe@lists.freedesktop.org Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Link: https://lore.kernel.org/r/20250417195215.3002210-2-John.C.Harrison@Intel.com (cherry picked from commit 532da44b54a10d50ebad14a8a02bd0b78ec23e8b) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-28drm/xe/svm: fix dereferencing error pointer in drm_gpusvm_range_alloc()Harshit Mogalapalli
xe_svm_range_alloc() returns ERR_PTR(-ENOMEM) on failure and there is a dereference of "range" after that: --> range->gpusvm = gpusvm; In xe_svm_range_alloc(), when memory allocation fails return NULL instead to handle this situation. Fixes: 99624bdff867 ("drm/gpusvm: Add support for GPU Shared Virtual Memory") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/adaef4dd-5866-48ca-bc22-4a1ddef20381@stanley.mountain/ Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250323124907.3946370-1-harshit.m.mogalapalli@oracle.com (cherry picked from commit 7a0322122cfdd9a6f10fc7701023d75c98eb3d22) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-29fs/erofs/fileio: call erofs_onlinefolio_split() after bio_add_folio()Max Kellermann
If bio_add_folio() fails (because it is full), erofs_fileio_scan_folio() needs to submit the I/O request via erofs_fileio_rq_submit() and allocate a new I/O request with an empty `struct bio`. Then it retries the bio_add_folio() call. However, at this point, erofs_onlinefolio_split() has already been called which increments `folio->private`; the retry will call erofs_onlinefolio_split() again, but there will never be a matching erofs_onlinefolio_end() call. This leaves the folio locked forever and all waiters will be stuck in folio_wait_bit_common(). This bug has been added by commit ce63cb62d794 ("erofs: support unencoded inodes for fileio"), but was practically unreachable because there was room for 256 folios in the `struct bio` - until commit 9f74ae8c9ac9 ("erofs: shorten bvecs[] for file-backed mounts") which reduced the array capacity to 16 folios. It was now trivial to trigger the bug by manually invoking readahead from userspace, e.g.: posix_fadvise(fd, 0, st.st_size, POSIX_FADV_WILLNEED); This should be fixed by invoking erofs_onlinefolio_split() only after bio_add_folio() has succeeded. This is safe: asynchronous completions invoking erofs_onlinefolio_end() will not unlock the folio because erofs_fileio_scan_folio() is still holding a reference to be released by erofs_onlinefolio_end() at the end. Fixes: ce63cb62d794 ("erofs: support unencoded inodes for fileio") Fixes: 9f74ae8c9ac9 ("erofs: shorten bvecs[] for file-backed mounts") Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Tested-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20250428230933.3422273-1-max.kellermann@ionos.com Signed-off-by: Gao Xiang <xiang@kernel.org>
2025-04-28bcachefs: Topology error after insert is now an EROKent Overstreet
A user hit this, and this will naturally be easier to debug if we don't panic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Use bch2_kvmalloc() for journal keys arrayKent Overstreet
We can hit this limit fairly easy when we have to reconstuct large amounts of alloc info on large filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: More informative error message when shutting down due to errorKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: btree_root_unreadable_and_scan_found_nothing autofix for non data ↵Kent Overstreet
btrees If loosing a btree won't cause data loss - i.e. it's an alloc btree, or we can easily reconstruct it - we shouldn't require user action to continue repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28scsi: ufs: core: Remove redundant query_complete traceKeoseong Park
The query_complete trace was not removed after ufshcd_issue_dev_cmd() was called from the bsg path, resulting in duplicate output. Below is an example of the trace: ufs-utils-773 [000] ..... 218.176933: ufshcd_upiu: query_send: 0000:00:04.0: HDR:16 00 00 1f 00 01 00 00 00 00 00 00, OSF:03 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ufs-utils-773 [000] ..... 218.177145: ufshcd_upiu: query_complete: 0000:00:04.0: HDR:36 00 00 1f 00 01 00 00 00 00 00 00, OSF:03 07 00 00 00 00 00 00 00 00 00 08 00 00 00 00 ufs-utils-773 [000] ..... 218.177146: ufshcd_upiu: query_complete: 0000:00:04.0: HDR:36 00 00 1f 00 01 00 00 00 00 00 00, OSF:03 07 00 00 00 00 00 00 00 00 00 08 00 00 00 00 Remove the redundant trace call in the bsg path, preventing duplication. Signed-off-by: Keoseong Park <keosung.park@samsung.com> Link: https://lore.kernel.org/r/20250425010605epcms2p67e89b351398832fe0fd547404d3afc65@epcms2p6 Fixes: 71aabb747d5f ("scsi: ufs: core: Reuse exec_dev_cmd") Reviewed-by: Avri Altman <avri.altman@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-28scsi: myrb: Fix spelling mistake "statux" -> "status"Colin Ian King
There is a spelling mistake in a dev_err() message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20250422170347.66792-1-colin.i.king@gmail.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-04-28net: ti: icssg-prueth: Add ICSSG FW StatsMD Danish Anwar
The ICSSG firmware maintains set of stats called PA_STATS. Currently the driver only dumps 4 stats. Add support for dumping more stats. The offset for different stats are defined as MACROs in icssg_switch_map.h file. All the offsets are for Slice0. Slice1 offsets are slice0 + 4. The offset calculation is taken care while reading the stats in emac_update_hardware_stats(). The statistics are documented in Documentation/networking/device_drivers/icssg_prueth.rst Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20250424095316.2643573-1-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28tools/Makefile: Add ynl targetJoe Damato
Add targets to build, clean, and install ynl headers, libynl.a, and python tooling. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250423204647.190784-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28rtase: Modify the format specifier in snprintf to %uJustin Lai
Modify the format specifier in snprintf to %u. Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250425064057.30035-1-justinlai0215@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28Merge tag 'v6.15-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix three potential use after frees: in session logoff, in krb5 auth, and in RPC open - Fix missing rc check in session setup authentication * tag 'v6.15-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix use-after-free in session logoff ksmbd: fix use-after-free in kerberos authentication ksmbd: fix use-after-free in ksmbd_session_rpc_open smb: server: smb2pdu: check return value of xa_store()
2025-04-28Merge branch 'phase-out-hybrid-pci-devres-api'Jakub Kicinski
Philipp Stanner says: ==================== Phase out hybrid PCI devres API Fixes a number of minor issues with the usage of the PCI API in net. Notbaly, it replaces calls to the sometimes-managed pci_request_regions() to the always-managed pcim_request_all_regions(), enabling us to remove that hybrid functionality from PCI. ==================== Link: https://patch.msgid.link/20250425085740.65304-2-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: thunder_bgx: Don't disable PCI device manuallyPhilipp Stanner
thunder_bgx's PCI device is enabled with pcim_enable_device(), a managed devres function which ensures that the device gets enabled on driver detach automatically. Remove the calls to pci_disable_device(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-10-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: thunder_bgx: Use pure PCI devres APIPhilipp Stanner
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Furthermore, the PCI function being managed implies that it's not necessary to call pci_release_regions() manually. Remove the calls to pci_release_regions(). Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-9-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: mdio: thunder: Use pure PCI devres APIPhilipp Stanner
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Furthermore, the PCI function being managed implies that it's not necessary to call pci_release_regions() manually. Remove the calls to pci_release_regions(). Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-8-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: ethernet: sis900: Use pure PCI devres APIPhilipp Stanner
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Daniele Venzano <venza@brownhat.org> Link: https://patch.msgid.link/20250425085740.65304-7-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: ethernet: natsemi: Use pure PCI devres APIPhilipp Stanner
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-6-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: tulip: Use pure PCI devres APIPhilipp Stanner
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-5-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: octeontx2: Use pure PCI devres APIPhilipp Stanner
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Furthermore, the PCI function being managed implies that it's not necessary to call pci_release_regions() manually. Remove the calls to pci_release_regions(). Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20250425085740.65304-4-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net: prestera: Use pure PCI devres APIPhilipp Stanner
The currently used function pci_request_regions() is one of the problematic "hybrid devres" PCI functions, which are sometimes managed through devres, and sometimes not (depending on whether pci_enable_device() or pcim_enable_device() has been called before). The PCI subsystem wants to remove this behavior and, therefore, needs to port all users to functions that don't have this problem. Furthermore, the PCI function being managed implies that it's not necessary to call pci_release_regions() manually. Remove the calls to pci_release_regions(). Replace pci_request_regions() with pcim_request_all_regions(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Acked-by: Elad Nachman <enachman@marvell.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250425085740.65304-3-phasta@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28Merge branch 'intel-net-queue-100GbE'Jakub Kicinski
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-04-22 (ice, idpf) For ice: Paul removes setting of ICE_AQ_FLAG_RD in ice_get_set_tx_topo() on E830 devices. Xuanqiang Luo adds error check for NULL VF VSI. For idpf: Madhu fixes misreporting of, currently, unsupported encapsulated packets. ==================== Link: https://patch.msgid.link/20250425222636.3188441-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28idpf: fix offloads support for encapsulated packetsMadhu Chittim
Split offloads into csum, tso and other offloads so that tunneled packets do not by default have all the offloads enabled. Stateless offloads for encapsulated packets are not yet supported in firmware/software but in the driver we were setting the features same as non encapsulated features. Fixed naming to clarify CSUM bits are being checked for Tx. Inherit netdev features to VLAN interfaces as well. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Tested-by: Zachary Goldstein <zachmgoldstein@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250425222636.3188441-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28ice: Check VF VSI Pointer Value in ice_vc_add_fdir_fltr()Xuanqiang Luo
As mentioned in the commit baeb705fd6a7 ("ice: always check VF VSI pointer values"), we need to perform a null pointer check on the return value of ice_get_vf_vsi() before using it. Fixes: 6ebbe97a4881 ("ice: Add a per-VF limit on number of FDIR filters") Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250425222636.3188441-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28ice: fix Get Tx Topology AQ command error on E830Paul Greenwalt
The Get Tx Topology AQ command (opcode 0x0418) has different read flag requirements depending on the hardware/firmware. For E810, E822, and E823 firmware the read flag must be set, and for newer hardware (E825 and E830) it must not be set. This results in failure to configure Tx topology and the following warning message during probe: DDP package does not support Tx scheduling layers switching feature - please update to the latest DDP package and try again The current implementation only handles E825-C but not E830. It is confusing as we first check ice_is_e825c() and then set the flag in the set case. Finally, we check ice_is_e825c() again and set the flag for all other hardware in both the set and get case. Instead, notice that we always need the read flag for set, but only need the read flag for get on E810, E822, and E823 firmware. Fix the logic to check the MAC type and set the read flag in get only on the older devices which require it. Fixes: ba1124f58afd ("ice: Add E830 device IDs, MAC type and registers") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250425222636.3188441-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28Merge branch 'net_sched-adapt-qdiscs-for-reentrant-enqueue-cases'Jakub Kicinski
Victor Nogueira says: ==================== net_sched: Adapt qdiscs for reentrant enqueue cases As described in Gerrard's report [1], there are cases where netem can make the qdisc enqueue callback reentrant. Some qdiscs (drr, hfsc, ets, qfq) break whenever the enqueue callback has reentrant behaviour. This series addresses these issues by adding extra checks that cater for these reentrant corner cases. This series has passed all relevant test cases in the TDC suite. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ ==================== Link: https://patch.msgid.link/20250425220710.3964791-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28selftests: tc-testing: Add TDC tests that exercise reentrant enqueue behaviourVictor Nogueira
Add 5 TDC tests that exercise the reentrant enqueue behaviour in drr, ets, qfq, and hfsc: - Test DRR's enqueue reentrant behaviour with netem (which caused a double list add) - Test ETS's enqueue reentrant behaviour with netem (which caused a double list add) - Test QFQ's enqueue reentrant behaviour with netem (which caused a double list add) - Test HFSC's enqueue reentrant behaviour with netem (which caused a UAF) - Test nested DRR's enqueue reentrant behaviour with netem (which caused a double list add) Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250425220710.3964791-6-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net_sched: qfq: Fix double list add in class with netem as child qdiscVictor Nogueira
As described in Gerrard's report [1], there are use cases where a netem child qdisc will make the parent qdisc's enqueue callback reentrant. In the case of qfq, there won't be a UAF, but the code will add the same classifier to the list twice, which will cause memory corruption. This patch checks whether the class was already added to the agg->active list (cl_is_active) before doing the addition to cater for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250425220710.3964791-5-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net_sched: ets: Fix double list add in class with netem as child qdiscVictor Nogueira
As described in Gerrard's report [1], there are use cases where a netem child qdisc will make the parent qdisc's enqueue callback reentrant. In the case of ets, there won't be a UAF, but the code will add the same classifier to the list twice, which will cause memory corruption. In addition to checking for qlen being zero, this patch checks whether the class was already added to the active_list (cl_is_active) before doing the addition to cater for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250425220710.3964791-4-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net_sched: hfsc: Fix a UAF vulnerability in class with netem as child qdiscVictor Nogueira
As described in Gerrard's report [1], we have a UAF case when an hfsc class has a netem child qdisc. The crux of the issue is that hfsc is assuming that checking for cl->qdisc->q.qlen == 0 guarantees that it hasn't inserted the class in the vttree or eltree (which is not true for the netem duplicate case). This patch checks the n_active class variable to make sure that the code won't insert the class in the vttree or eltree twice, catering for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs") Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250425220710.3964791-3-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28net_sched: drr: Fix double list add in class with netem as child qdiscVictor Nogueira
As described in Gerrard's report [1], there are use cases where a netem child qdisc will make the parent qdisc's enqueue callback reentrant. In the case of drr, there won't be a UAF, but the code will add the same classifier to the list twice, which will cause memory corruption. In addition to checking for qlen being zero, this patch checks whether the class was already added to the active_list (cl_is_active) before adding to the list to cover for the reentrant case. [1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/ Fixes: 37d9cf1a3ce3 ("sched: Fix detection of empty queues in child qdiscs") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250425220710.3964791-2-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28pds_core: remove write-after-free of client_idShannon Nelson
A use-after-free error popped up in stress testing: [Mon Apr 21 21:21:33 2025] BUG: KFENCE: use-after-free write in pdsc_auxbus_dev_del+0xef/0x160 [pds_core] [Mon Apr 21 21:21:33 2025] Use-after-free write at 0x000000007013ecd1 (in kfence-#47): [Mon Apr 21 21:21:33 2025] pdsc_auxbus_dev_del+0xef/0x160 [pds_core] [Mon Apr 21 21:21:33 2025] pdsc_remove+0xc0/0x1b0 [pds_core] [Mon Apr 21 21:21:33 2025] pci_device_remove+0x24/0x70 [Mon Apr 21 21:21:33 2025] device_release_driver_internal+0x11f/0x180 [Mon Apr 21 21:21:33 2025] driver_detach+0x45/0x80 [Mon Apr 21 21:21:33 2025] bus_remove_driver+0x83/0xe0 [Mon Apr 21 21:21:33 2025] pci_unregister_driver+0x1a/0x80 The actual device uninit usually happens on a separate thread scheduled after this code runs, but there is no guarantee of order of thread execution, so this could be a problem. There's no actual need to clear the client_id at this point, so simply remove the offending code. Fixes: 10659034c622 ("pds_core: add the aux client API") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250425203857.71547-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28Merge tag 'for-net-2025-04-25' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - btmtksdio: Check function enabled before doing close - btmtksdio: Do close if SDIO card removed without close - btusb: avoid NULL pointer dereference in skb_dequeue() - btintel_pcie: Avoid redundant buffer allocation - btintel_pcie: Add additional to checks to clear TX/RX paths - hci_conn: Fix not setting conn_timeout for Broadcast Receiver - hci_conn: Fix not setting timeout for BIG Create Sync * tag 'for-net-2025-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: copy RX timestamp to new fragments Bluetooth: btintel_pcie: Add additional to checks to clear TX/RX paths Bluetooth: btmtksdio: Do close if SDIO card removed without close Bluetooth: btmtksdio: Check function enabled before doing close Bluetooth: btusb: avoid NULL pointer dereference in skb_dequeue() Bluetooth: btintel_pcie: Avoid redundant buffer allocation Bluetooth: hci_conn: Fix not setting timeout for BIG Create Sync Bluetooth: hci_conn: Fix not setting conn_timeout for Broadcast Receiver ==================== Link: https://patch.msgid.link/20250425192412.1578759-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28netlink: specs: ethtool: Remove UAPI duplication of phy-upstream enumKory Maincent
The phy-upstream enum is already defined in the ethtool.h UAPI header and used by the ethtool userspace tool. However, the ethtool spec does not reference it, causing YNL to auto-generate a duplicate and redundant enum. Fix this by updating the spec to reference the existing UAPI enum in ethtool.h. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250425171419.947352-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28Merge branch 'virtio-net-disable-delayed-refill-when-pausing-rx'Jakub Kicinski
Bui Quang Minh says: ==================== virtio-net: disable delayed refill when pausing rx Hi everyone, This only includes the selftest for virtio-net deadlock bug. The fix commit has been applied already. Link: https://lore.kernel.org/virtualization/174537302875.2111809.8543884098526067319.git-patchwork-notify@kernel.org/T/ ==================== Link: https://patch.msgid.link/20250425071018.36078-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28selftests: net: add a virtio_net deadlock selftestBui Quang Minh
The selftest reproduces the deadlock scenario when binding/unbinding XDP program, XDP socket, rx ring resize on virtio_net interface. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20250425071018.36078-5-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28selftests: net: retry when bind returns EBUSY in xdp_helperBui Quang Minh
When binding the XDP socket, we may get EBUSY because the deferred destructor of XDP socket in previous test has not been executed yet. If that is the case, just sleep and retry some times. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20250425071018.36078-4-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28selftests: net: add flag to force zerocopy mode in xdp_helperBui Quang Minh
This commit adds an optional -z flag to xdp_helper. When this flag is provided, the XDP socket binding is forced to be in zerocopy mode. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20250425071018.36078-3-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28selftests: net: move xdp_helper to net/libBui Quang Minh
Move xdp_helper to net/lib to make it easier for other selftests to use the helper. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20250425071018.36078-2-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29netfilter: nf_tables: export set count and backend name to userspaceFlorian Westphal
nf_tables picks a suitable set backend implementation (bitmap, hash, rbtree..) based on the userspace requirements. Figuring out the chosen backend requires information about the set flags and the kernel version. Export this to userspace so nft can include this information in '--debug=netlink' output. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-29docs: tproxy: fix formatting for nft code blockChen Linxuan
The nft command snippet for redirecting traffic isn't formatted in a literal code block like the rest of snippets. Fix the formatting inconsistency. Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-29netfilter: conntrack: Remove redundant NFCT_ALIGN callXuanqiang Luo
The "nf_ct_tmpl_alloc" function had a redundant call to "NFCT_ALIGN" when aligning the pointer "p". Since "NFCT_ALIGN" always gives the same result for the same input. Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-28Merge branch 'veth-qdisc-backpressure-and-qdisc-check-refactor'Jakub Kicinski
Jesper Dangaard Brouer says: ==================== veth: qdisc backpressure and qdisc check refactor This patch series addresses TX drops seen on veth devices under load, particularly when using threaded NAPI, which is our setup in production. The root cause is that the NAPI consumer often runs on a different CPU than the producer. Combined with scheduling delays or simply slower consumption, this increases the chance that the ptr_ring fills up before packets are drained, resulting in drops from veth_xmit() (ndo_start_xmit()). To make this easier to reproduce, we’ve created a script that sets up a test scenario using network namespaces. The script inserts 1000 iptables rules in the consumer namespace to slow down packet processing and amplify the issue. Reproducer script: https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_setup01_NAPI_TX_drops.sh This series first introduces a helper to detect no-queue qdiscs and then uses it in the veth driver to conditionally apply qdisc-level backpressure when a real qdisc is attached. The behavior is off by default and opt-in, ensuring minimal impact and easy activation. v6: https://lore.kernel.org/174549933665.608169.392044991754158047.stgit@firesoul v5: https://lore.kernel.org/174489803410.355490.13216831426556849084.stgit@firesoul v4 https://lore.kernel.org/174472463778.274639.12670590457453196991.stgit@firesoul v3: https://lore.kernel.org/174464549885.20396.6987653753122223942.stgit@firesoul v2: https://lore.kernel.org/174412623473.3702169.4235683143719614624.stgit@firesoul RFC-v1: https://lore.kernel.org/174377814192.3376479.16481605648460889310.stgit@firesoul ==================== Link: https://patch.msgid.link/174559288731.827981.8748257839971869213.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-28veth: apply qdisc backpressure on full ptr_ring to reduce TX dropsJesper Dangaard Brouer
In production, we're seeing TX drops on veth devices when the ptr_ring fills up. This can occur when NAPI mode is enabled, though it's relatively rare. However, with threaded NAPI - which we use in production - the drops become significantly more frequent. The underlying issue is that with threaded NAPI, the consumer often runs on a different CPU than the producer. This increases the likelihood of the ring filling up before the consumer gets scheduled, especially under load, leading to drops in veth_xmit() (ndo_start_xmit()). This patch introduces backpressure by returning NETDEV_TX_BUSY when the ring is full, signaling the qdisc layer to requeue the packet. The txq (netdev queue) is stopped in this condition and restarted once veth_poll() drains entries from the ring, ensuring coordination between NAPI and qdisc. Backpressure is only enabled when a qdisc is attached. Without a qdisc, the driver retains its original behavior - dropping packets immediately when the ring is full. This avoids unexpected behavior changes in setups without a configured qdisc. With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management (AQM) to fairly schedule packets across flows and reduce collateral damage from elephant flows. A known limitation of this approach is that the full ring sits in front of the qdisc layer, effectively forming a FIFO buffer that introduces base latency. While AQM still improves fairness and mitigates flow dominance, the latency impact is measurable. In hardware drivers, this issue is typically addressed using BQL (Byte Queue Limits), which tracks in-flight bytes needed based on physical link rate. However, for virtual drivers like veth, there is no fixed bandwidth constraint - the bottleneck is CPU availability and the scheduler's ability to run the NAPI thread. It is unclear how effective BQL would be in this context. This patch serves as a first step toward addressing TX drops. Future work may explore adapting a BQL-like mechanism to better suit virtual devices like veth. Reported-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Link: https://patch.msgid.link/174559294022.827981.1282809941662942189.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>