summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-27netdevsim: don't assume core pre-populates HDS params on GETJakub Kicinski
Syzbot reports: BUG: KMSAN: uninit-value in nsim_get_ringparam+0xa8/0xe0 drivers/net/netdevsim/ethtool.c:77 nsim_get_ringparam+0xa8/0xe0 drivers/net/netdevsim/ethtool.c:77 ethtool_set_ringparam+0x268/0x570 net/ethtool/ioctl.c:2072 __dev_ethtool net/ethtool/ioctl.c:3209 [inline] dev_ethtool+0x126d/0x2a40 net/ethtool/ioctl.c:3398 dev_ioctl+0xb0e/0x1280 net/core/dev_ioctl.c:759 This is the SET path, where we call GET to either check user request against max values, or check if any of the settings will change. The logic in netdevsim is trying to report the default (ENABLED) if user has not requested any specific setting. The user setting is recorded in dev->cfg, don't depend on kernel_ringparam being pre-populated with it. Fixes: 928459bbda19 ("net: ethtool: populate the default HDS params in the core") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+b3bcd80232d00091e061@syzkaller.appspotmail.com Tested-by: syzbot+b3bcd80232d00091e061@syzkaller.appspotmail.com Link: https://patch.msgid.link/20250123221410.1067678-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27MAINTAINERS: add Paul Fertser as a NC-SI reviewerJakub Kicinski
Paul has been providing very solid reviews for NC-SI changes lately, so much so I started CCing him on all NC-SI patches. Make the designation official. Reviewed-by: Paul Fertser <fercerpav@gmail.com> Link: https://patch.msgid.link/20250123155540.943243-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27Merge branch 'eth-fix-calling-napi_enable-in-atomic-context'Jakub Kicinski
Jakub Kicinski says: ==================== eth: fix calling napi_enable() in atomic context Dan has reported that I missed a lot of drivers which call napi_enable() in atomic with the naive coccinelle search for spin locks: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain Fix them. Most of the fixes involve taking the netdev_lock() before the spin lock. mt76 is special because we can just move napi_enable() from the BH section. All patches compile tested only. v2: https://lore.kernel.org/20250123004520.806855-1-kuba@kernel.org v1: https://lore.kernel.org/20250121221519.392014-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250124031841.1179756-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27wifi: mt76: move napi_enable() from under BHJakub Kicinski
mt76 does a lot of: local_bh_disable(); napi_enable(...napi); napi_schedule(...napi); local_bh_enable(); local_bh_disable() is not a real lock, its most likely taken because napi_schedule() requires that we invoke softirqs at some point. napi_enable() needs to take a mutex, so move it from under the BH protection. Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250124031841.1179756-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27eth: via-rhine: fix calling napi_enable() in atomic contextJakub Kicinski
napi_enable() may sleep now, take netdev_lock() before rp->lock. napi_enable() is hidden inside init_registers(). Note that this patch orders netdev_lock after rp->task_lock, to avoid having to take the netdev_lock() around disable path. Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250124031841.1179756-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27eth: niu: fix calling napi_enable() in atomic contextJakub Kicinski
napi_enable() may sleep now, take netdev_lock() before np->lock. Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250124031841.1179756-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27eth: 8139too: fix calling napi_enable() in atomic contextJakub Kicinski
napi_enable() may sleep now, take netdev_lock() before tp->lock and tp->rx_lock. Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Link: https://patch.msgid.link/20250124031841.1179756-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27eth: forcedeth: fix calling napi_enable() in atomic contextJakub Kicinski
napi_enable() may sleep now, take netdev_lock() before np->lock. Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250124031841.1179756-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27eth: forcedeth: remove local wrappers for napi enable/disableJakub Kicinski
The local helpers for calling napi_enable() and napi_disable() don't serve much purpose and they will complicate the fix in the subsequent patch. Remove them, call the core functions directly. Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250124031841.1179756-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27eth: tg3: fix calling napi_enable() in atomic contextJakub Kicinski
tg3 has a spin lock protecting most of the config, switch to taking netdev_lock() explicitly on enable/start paths. Disable/stop paths seem to not be under the spin lock (since napi_disable() already needs to sleep), so leave that side as is. tg3_restart_hw() releases and re-takes the spin lock, we need to do the same because dev_close() needs to take netdev_lock(). Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250124031841.1179756-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27tools: ynl: c: correct reverse decode of empty attrsJakub Kicinski
netlink reports which attribute was incorrect by sending back an attribute offset. Offset points to the address of struct nlattr, but to interpret the type we also need the nesting path. Attribute IDs have different meaning in different nests of the same message. Correct the condition for "is the offset within current attribute". ynl_attr_data_len() does not include the attribute header, so the end offset was off by 4 bytes. This means that we'd always skip over flags and empty nests. The devmem tests, for example, issues an invalid request with empty queue nests, resulting in the following error: YNL failed: Kernel error: missing attribute: .queues.ifindex The message is incorrect, "queues" nest does not have an "ifindex" attribute defined. With this fix we decend correctly into the nest: YNL failed: Kernel error: missing attribute: .queues.id Fixes: 86878f14d71a ("tools: ynl: user space helpers") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250124012130.1121227-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27ptp: Ensure info->enable callback is always setThomas Weißschuh
The ioctl and sysfs handlers unconditionally call the ->enable callback. Not all drivers implement that callback, leading to NULL dereferences. Example of affected drivers: ptp_s390.c, ptp_vclock.c and ptp_mock.c. Instead use a dummy callback if no better was specified by the driver. Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250123-ptp-enable-v1-1-b015834d3a47@weissschuh.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27documentation: networking: fix spelling mistakesKhaled Elnaggar
Fix a couple of typos/spelling mistakes in the documentation. Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20250123082521.59997-1-khaledelnaggarlinux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27net/mlx5e: add missing cpu_to_node to kvzalloc_node in mlx5e_open_xdpredirect_sqStanislav Fomichev
kvzalloc_node is not doing a runtime check on the node argument (__alloc_pages_node_noprof does have a VM_BUG_ON, but it expands to nothing on !CONFIG_DEBUG_VM builds), so doing any ethtool/netlink operation that calls mlx5e_open on a CPU that's larger that MAX_NUMNODES triggers OOB access and panic (see the trace below). Add missing cpu_to_node call to convert cpu id to node id. [ 165.427394] mlx5_core 0000:5c:00.0 beth1: Link up [ 166.479327] BUG: unable to handle page fault for address: 0000000800000010 [ 166.494592] #PF: supervisor read access in kernel mode [ 166.505995] #PF: error_code(0x0000) - not-present page ... [ 166.816958] Call Trace: [ 166.822380] <TASK> [ 166.827034] ? __die_body+0x64/0xb0 [ 166.834774] ? page_fault_oops+0x2cd/0x3f0 [ 166.843862] ? exc_page_fault+0x63/0x130 [ 166.852564] ? asm_exc_page_fault+0x22/0x30 [ 166.861843] ? __kvmalloc_node_noprof+0x43/0xd0 [ 166.871897] ? get_partial_node+0x1c/0x320 [ 166.880983] ? deactivate_slab+0x269/0x2b0 [ 166.890069] ___slab_alloc+0x521/0xa90 [ 166.898389] ? __kvmalloc_node_noprof+0x43/0xd0 [ 166.908442] __kmalloc_node_noprof+0x216/0x3f0 [ 166.918302] ? __kvmalloc_node_noprof+0x43/0xd0 [ 166.928354] __kvmalloc_node_noprof+0x43/0xd0 [ 166.938021] mlx5e_open_channels+0x5e2/0xc00 [ 166.947496] mlx5e_open_locked+0x3e/0xf0 [ 166.956201] mlx5e_open+0x23/0x50 [ 166.963551] __dev_open+0x114/0x1c0 [ 166.971292] __dev_change_flags+0xa2/0x1b0 [ 166.980378] dev_change_flags+0x21/0x60 [ 166.988887] do_setlink+0x38d/0xf20 [ 166.996628] ? ep_poll_callback+0x1b9/0x240 [ 167.005910] ? __nla_validate_parse.llvm.10713395753544950386+0x80/0xd70 [ 167.020782] ? __wake_up_sync_key+0x52/0x80 [ 167.030066] ? __mutex_lock+0xff/0x550 [ 167.038382] ? security_capable+0x50/0x90 [ 167.047279] rtnl_setlink+0x1c9/0x210 [ 167.055403] ? ep_poll_callback+0x1b9/0x240 [ 167.064684] ? security_capable+0x50/0x90 [ 167.073579] rtnetlink_rcv_msg+0x2f9/0x310 [ 167.082667] ? rtnetlink_bind+0x30/0x30 [ 167.091173] netlink_rcv_skb+0xb1/0xe0 [ 167.099492] netlink_unicast+0x20f/0x2e0 [ 167.108191] netlink_sendmsg+0x389/0x420 [ 167.116896] __sys_sendto+0x158/0x1c0 [ 167.125024] __x64_sys_sendto+0x22/0x30 [ 167.133534] do_syscall_64+0x63/0x130 [ 167.141657] ? __irq_exit_rcu.llvm.17843942359718260576+0x52/0xd0 [ 167.155181] entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: bb135e40129d ("net/mlx5e: move XDP_REDIRECT sq to dynamic allocation") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250123000407.3464715-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27net: netdevsim: try to close UDP port harness racesJakub Kicinski
syzbot discovered that we remove the debugfs files after we free the netdev. Try to clean up the relevant dir while the device is still around. Reported-by: syzbot+2e5de9e3ab986b71d2bf@syzkaller.appspotmail.com Fixes: 424be63ad831 ("netdevsim: add UDP tunnel port offload support") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250122224503.762705-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27net: rose: fix timer races against user threadsEric Dumazet
Rose timers only acquire the socket spinlock, without checking if the socket is owned by one user thread. Add a check and rearm the timers if needed. BUG: KASAN: slab-use-after-free in rose_timer_expiry+0x31d/0x360 net/rose/rose_timer.c:174 Read of size 2 at addr ffff88802f09b82a by task swapper/0/0 CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc5-syzkaller-00172-gd1bf27c4e176 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x169/0x550 mm/kasan/report.c:489 kasan_report+0x143/0x180 mm/kasan/report.c:602 rose_timer_expiry+0x31d/0x360 net/rose/rose_timer.c:174 call_timer_fn+0x187/0x650 kernel/time/timer.c:1793 expire_timers kernel/time/timer.c:1844 [inline] __run_timers kernel/time/timer.c:2418 [inline] __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2430 run_timer_base kernel/time/timer.c:2439 [inline] run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2449 handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:561 __do_softirq kernel/softirq.c:595 [inline] invoke_softirq kernel/softirq.c:435 [inline] __irq_exit_rcu+0xf7/0x220 kernel/softirq.c:662 irq_exit_rcu+0x9/0x30 kernel/softirq.c:678 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1049 </IRQ> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250122180244.1861468-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27net: the appletalk subsystem no longer uses ndo_do_ioctl谢致邦 (XIE Zhibang)
ndo_do_ioctl is no longer used by the appletalk subsystem after commit 45bd1c5ba758 ("net: appletalk: Drop aarp_send_probe_phase1()"). Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/tencent_4AC6ED413FEA8116B4253D3ED6947FDBCF08@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27net/ncsi: use dev_set_mac_address() for Get MC MAC Address handlingPaul Fertser
Copy of the rationale from 790071347a0a1a89e618eedcd51c687ea783aeb3: Change ndo_set_mac_address to dev_set_mac_address because dev_set_mac_address provides a way to notify network layer about MAC change. In other case, services may not aware about MAC change and keep using old one which set from network adapter driver. As example, DHCP client from systemd do not update MAC address without notification from net subsystem which leads to the problem with acquiring the right address from DHCP server. Since dev_set_mac_address requires RTNL lock the operation can not be performed directly in the response handler, see 9e2bbab94b88295dcc57c7580393c9ee08d7314d. The way of selecting the first suitable MAC address from the list is changed, instead of having the driver check it this patch just assumes any valid MAC should be good. Fixes: b8291cf3d118 ("net/ncsi: Add NC-SI 1.2 Get MC MAC Address command") Signed-off-by: Paul Fertser <fercerpav@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-24iavf: allow changing VLAN state without calling PFMichal Swiatkowski
First case: > ip l a l $VF name vlanx type vlan id 100 > ip l d vlanx > ip l a l $VF name vlanx type vlan id 100 As workqueue can be execute after sometime, there is a window to have call trace like that: - iavf_del_vlan - iavf_add_vlan - iavf_del_vlans (wq) It means that our VLAN 100 will change the state from IAVF_VLAN_ACTIVE to IAVF_VLAN_REMOVE (iavf_del_vlan). After that in iavf_add_vlan state won't be changed because VLAN 100 is on the filter list. The final result is that the VLAN 100 filter isn't added in hardware (no iavf_add_vlans call). To fix that change the state if the filter wasn't removed yet directly to active. It is save as IAVF_VLAN_REMOVE means that virtchnl message wasn't sent yet. Second case: > ip l a l $VF name vlanx type vlan id 100 Any type of VF reset ex. change trust > ip l s $PF vf $VF_NUM trust on > ip l d vlanx > ip l a l $VF name vlanx type vlan id 100 In case of reset iavf driver is responsible for readding all filters that are being used. To do that all VLAN filters state are changed to IAVF_VLAN_ADD. Here is even longer window for changing VLAN state from kernel side, as workqueue isn't called immediately. We can have call trace like that: - changing to IAVF_VLAN_ADD (after reset) - iavf_del_vlan (called from kernel ops) - iavf_del_vlans (wq) Not exsisitng VLAN filters will be removed from hardware. It isn't a bug, ice driver will handle it fine. However, we can have call trace like that: - changing to IAVF_VLAN_ADD (after reset) - iavf_del_vlan (called from kernel ops) - iavf_add_vlan (called from kernel ops) - iavf_del_vlans (wq) With fix for previous case we end up with no VLAN filters in hardware. We have to remove VLAN filters if the state is IAVF_VLAN_ADD and delete VLAN was called. It is save as IAVF_VLAN_ADD means that virtchnl message wasn't sent yet. Fixes: 0c0da0e95105 ("iavf: refactor VLAN filter states") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24ice: remove invalid parameter of equalizerMateusz Polchlopek
It occurred that in the commit 70838938e89c ("ice: Implement driver functionality to dump serdes equalizer values") the invalid DRATE parameter for reading has been added. The output of the command: $ ethtool -d <ethX> returns the garbage value in the place where DRATE value should be stored. Remove mentioned parameter to prevent return of corrupted data to userspace. Fixes: 70838938e89c ("ice: Implement driver functionality to dump serdes equalizer values") Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24ice: fix ice_parser_rt::bst_key array sizePrzemek Kitszel
Fix &ice_parser_rt::bst_key size. It was wrongly set to 10 instead of 20 in the initial impl commit (see Fixes tag). All usage code assumed it was of size 20. That was also the initial size present up to v2 of the intro series [2], but halved by v3 [3] refactor described as "Replace magic hardcoded values with macros." The introducing series was so big that some ugliness was unnoticed, same for bugs :/ ICE_BST_KEY_TCAM_SIZE and ICE_BST_TCAM_KEY_SIZE were differing by one. There was tmp variable @j in the scope of edited function, but was not used in all places. This ugliness is now gone. I'm moving ice_parser_rt::pg_prio a few positions up, to fill up one of the holes in order to compensate for the added 10 bytes to the ::bst_key, resulting in the same size of the whole as prior to the fix, and minimal changes in the offsets of the fields. Extend also the debug dump print of the key to cover all bytes. To not have string with 20 "%02x" and 20 params, switch to ice_debug_array_w_prefix(). This fix obsoletes Ahmed's attempt at [1]. [1] https://lore.kernel.org/intel-wired-lan/20240823230847.172295-1-ahmed.zaki@intel.com [2] https://lore.kernel.org/intel-wired-lan/20230605054641.2865142-13-junfeng.guo@intel.com [3] https://lore.kernel.org/intel-wired-lan/20230817093442.2576997-13-junfeng.guo@intel.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/intel-wired-lan/b1fb6ff9-b69e-4026-9988-3c783d86c2e0@stanley.mountain Fixes: 9a4c07aaa0f5 ("ice: add parser execution main loop") CC: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24wifi: ath12k: fix handling of 6 GHz rulesAditya Kumar Singh
In the US country code, to avoid including 6 GHz rules in the 5 GHz rules list, the number of 5 GHz rules is set to a default constant value of 4 (REG_US_5G_NUM_REG_RULES). However, if there are more than 4 valid 5 GHz rules, the current logic will bypass the legitimate 6 GHz rules. For example, if there are 5 valid 5 GHz rules and 1 valid 6 GHz rule, the current logic will only consider 4 of the 5 GHz rules, treating the last valid rule as a 6 GHz rule. Consequently, the actual 6 GHz rule is never processed, leading to the eventual disabling of 6 GHz channels. To fix this issue, instead of hardcoding the value to 4, use a helper function to determine the number of 6 GHz rules present in the 5 GHz rules list and ignore only those rules. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1 Cc: stable@vger.kernel.org Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com> Link: https://patch.msgid.link/20250123-fix_6ghz_rules_handling-v1-1-d734bfa58ff4@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-01-24idpf: add more info during virtchnl transaction timeout/salt mismatchManoj Vishwanathan
Add more information related to the transaction like cookie, vc_op, salt when transaction times out and include similar information when transaction salt does not match. Info output for transaction timeout: ------------------- (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms) ------------------- before it was: ------------------- (op 5015, 60000ms) ------------------- Signed-off-by: Manoj Vishwanathan <manojvishy@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24idpf: convert workqueues to unboundMarco Leogrande
When a workqueue is created with `WQ_UNBOUND`, its work items are served by special worker-pools, whose host workers are not bound to any specific CPU. In the default configuration (i.e. when `queue_delayed_work` and friends do not specify which CPU to run the work item on), `WQ_UNBOUND` allows the work item to be executed on any CPU in the same node of the CPU it was enqueued on. While this solution potentially sacrifices locality, it avoids contention with other processes that might dominate the CPU time of the processor the work item was scheduled on. This is not just a theoretical problem: in a particular scenario misconfigured process was hogging most of the time from CPU0, leaving less than 0.5% of its CPU time to the kworker. The IDPF workqueues that were using the kworker on CPU0 suffered large completion delays as a result, causing performance degradation, timeouts and eventual system crash. Tested: * I have also run a manual test to gauge the performance improvement. The test consists of an antagonist process (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This process is run under `taskset 01` to bind it to CPU0, and its priority is changed with `chrt -pQ 9900 10000 ${pid}` and `renice -n -20 ${pid}` after start. Then, the IDPF driver is forced to prefer CPU0 by editing all calls to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0. Finally, `ktraces` for the workqueue events are collected. Without the current patch, the antagonist process can force arbitrary delays between `workqueue_queue_work` and `workqueue_execute_start`, that in my tests were as high as `30ms`. With the current patch applied, the workqueue can be migrated to another unloaded CPU in the same node, and, keeping everything else equal, the maximum delay I could see was `6us`. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Signed-off-by: Marco Leogrande <leogrande@google.com> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24idpf: Acquire the lock before accessing the xn->saltManoj Vishwanathan
The transaction salt was being accessed before acquiring the idpf_vc_xn_lock when idpf has to forward the virtchnl reply. Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Signed-off-by: Manoj Vishwanathan <manojvishy@google.com> Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24idpf: fix transaction timeouts on resetEmil Tantilov
Restore the call to idpf_vc_xn_shutdown() at the beginning of idpf_vc_core_deinit() provided the function is not called on remove. In the reset path the mailbox is destroyed, leading to all transactions timing out. Fixes: 09d0fb5cb30e ("idpf: deinit virtchnl transaction manager after vport and vectors") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24idpf: add read memory barrier when checking descriptor done bitEmil Tantilov
Add read memory barrier to ensure the order of operations when accessing control queue descriptors. Specifically, we want to avoid cases where loads can be reordered: 1. Load #1 is dispatched to read descriptor flags. 2. Load #2 is dispatched to read some other field from the descriptor. 3. Load #2 completes, accessing memory/cache at a point in time when the DD flag is zero. 4. NIC DMA overwrites the descriptor, now the DD flag is one. 5. Any fields loaded before step 4 are now inconsistent with the actual descriptor state. Add read memory barrier between steps 1 and 2, so that load #2 is not executed until load #1 has completed. Fixes: 8077c727561a ("idpf: add controlq init and reset checks") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Suggested-by: Lance Richardson <rlance@google.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-24xfrm: Don't disable preemption while looking up cache state.Sebastian Sewior
For the state cache lookup xfrm_input_state_lookup() first disables preemption, to remain on the CPU and then retrieves a per-CPU pointer. Within the preempt-disable section it also acquires netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be acquired with explicit disabled preemption (such as by get_cpu()) because this lock becomes a sleeping lock on PREEMPT_RT. To remain on the same CPU is just an optimisation for the CPU local lookup. The actual modification of the per-CPU variable happens with netns_xfrm::xfrm_state_lock acquired. Remove get_cpu() and use the state_cache_input on the current CPU. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/ Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-01-23ipmr: do not call mr_mfc_uses_dev() for unres entriesEric Dumazet
syzbot found that calling mr_mfc_uses_dev() for unres entries would crash [1], because c->mfc_un.res.minvif / c->mfc_un.res.maxvif alias to "struct sk_buff_head unresolved", which contain two pointers. This code never worked, lets remove it. [1] Unable to handle kernel paging request at virtual address ffff5fff2d536613 KASAN: maybe wild-memory-access in range [0xfffefff96a9b3098-0xfffefff96a9b309f] Modules linked in: CPU: 1 UID: 0 PID: 7321 Comm: syz.0.16 Not tainted 6.13.0-rc7-syzkaller-g1950a0af2d55 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mr_mfc_uses_dev net/ipv4/ipmr_base.c:290 [inline] pc : mr_table_dump+0x5a4/0x8b0 net/ipv4/ipmr_base.c:334 lr : mr_mfc_uses_dev net/ipv4/ipmr_base.c:289 [inline] lr : mr_table_dump+0x694/0x8b0 net/ipv4/ipmr_base.c:334 Call trace: mr_mfc_uses_dev net/ipv4/ipmr_base.c:290 [inline] (P) mr_table_dump+0x5a4/0x8b0 net/ipv4/ipmr_base.c:334 (P) mr_rtm_dumproute+0x254/0x454 net/ipv4/ipmr_base.c:382 ipmr_rtm_dumproute+0x248/0x4b4 net/ipv4/ipmr.c:2648 rtnl_dump_all+0x2e4/0x4e8 net/core/rtnetlink.c:4327 rtnl_dumpit+0x98/0x1d0 net/core/rtnetlink.c:6791 netlink_dump+0x4f0/0xbc0 net/netlink/af_netlink.c:2317 netlink_recvmsg+0x56c/0xe64 net/netlink/af_netlink.c:1973 sock_recvmsg_nosec net/socket.c:1033 [inline] sock_recvmsg net/socket.c:1055 [inline] sock_read_iter+0x2d8/0x40c net/socket.c:1125 new_sync_read fs/read_write.c:484 [inline] vfs_read+0x740/0x970 fs/read_write.c:565 ksys_read+0x15c/0x26c fs/read_write.c:708 Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") Reported-by: syzbot+5cfae50c0e5f2c500013@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/678fe2d1.050a0220.15cac.00b3.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250121181241.841212-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-23selftests/net: packetdrill: more xfail changes (and a correction)Jakub Kicinski
Recent change to add more cases to XFAIL has a broken regex, the matching needs a real regex not a glob pattern. While at it add the cases Willem pointed out during review. Fixes: 3030e3d57ba8 ("selftests/net: packetdrill: make tcp buf limited timing tests benign") Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250121143423.215261-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-23net: mvneta: fix locking in mvneta_cpu_online()Harshit Mogalapalli
When port is stopped, unlock before returning Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250121005002.3938236-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-23net: fec: implement TSO descriptor cleanupDheeraj Reddy Jonnalagadda
Implement cleanup of descriptors in the TSO error path of fec_enet_txq_submit_tso(). The cleanup - Unmaps DMA buffers for data descriptors skipping TSO header - Clears all buffer descriptors - Handles extended descriptors by clearing cbd_esc when enabled Fixes: 79f339125ea3 ("net: fec: Add software TSO support") Signed-off-by: Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250120085430.99318-1-dheeraj.linuxdev@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-23net: phy: marvell-88q2xxx: Fix temperature measurement with reset-gpiosDimitri Fedrau
When using temperature measurement on Marvell 88Q2XXX devices and the reset-gpios property is set in DT, the device does a hardware reset when interface is brought down and up again. That means that the content of the register MDIO_MMD_PCS_MV_TEMP_SENSOR2 is reset to default and that leads to permanent deactivation of the temperature measurement, because activation is done in mv88q2xxx_probe. To fix this move activation of temperature measurement to mv88q222x_config_init. Fixes: a557a92e6881 ("net: phy: marvell-88q2xxx: add support for temperature sensor") Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250118-marvell-88q2xxx-fix-hwmon-v2-1-402e62ba2dcb@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-22net: hns3: fix oops when unload drivers parallelingJian Shen
When unload hclge driver, it tries to disable sriov first for each ae_dev node from hnae3_ae_dev_list. If user unloads hns3 driver at the time, because it removes all the ae_dev nodes, and it may cause oops. But we can't simply use hnae3_common_lock for this. Because in the process flow of pci_disable_sriov(), it will trigger the remove flow of VF, which will also take hnae3_common_lock. To fixes it, introduce a new mutex to protect the unload process. Fixes: 0dd8a25f355b ("net: hns3: disable sriov before unload hclge layer") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20250118094741.3046663-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-22dt-bindings: net: qcom,ethqos: Correct fallback compatible for ↵Yijie Yang
qcom,qcs615-ethqos The qcs615-ride utilizes the same EMAC as the qcs404, rather than the sm8150. The current incorrect fallback could result in packet loss. The Ethernet on qcs615-ride is currently not utilized by anyone. Therefore, there is no need to worry about any ABI impact. Fixes: 32535b9410b8 ("dt-bindings: net: qcom,ethqos: add description for qcs615") Signed-off-by: Yijie Yang <quic_yijiyang@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250120-schema_qcs615-v4-1-d9d122f89e64@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-22net/ncsi: wait for the last response to Deselect Package before configuring ↵Paul Fertser
channel The NCSI state machine as it's currently implemented assumes that transition to the next logical state is performed either explicitly by calling `schedule_work(&ndp->work)` to re-queue itself or implicitly after processing the predefined (ndp->pending_req_num) number of replies. Thus to avoid the configuration FSM from advancing prematurely and getting out of sync with the process it's essential to not skip waiting for a reply. This patch makes the code wait for reception of the Deselect Package response for the last package probed before proceeding to channel configuration. Thanks go to Potin Lai and Cosmo Chou for the initial investigation and testing. Fixes: 8e13f70be05e ("net/ncsi: Probe single packages to avoid conflict") Cc: stable@vger.kernel.org Signed-off-by: Paul Fertser <fercerpav@gmail.com> Link: https://patch.msgid.link/20250116152900.8656-1-fercerpav@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-22net: airoha: Fix wrong GDM4 register definitionChristian Marangi
Fix wrong GDM4 register definition, in Airoha SDK GDM4 is defined at offset 0x2400 but this doesn't make sense as it does conflict with the CDM4 that is in the same location. Following the pattern where each GDM base is at the FWD_CFG, currently GDM4 base offset is set to 0x2500. This is correct but REG_GDM4_FWD_CFG and REG_GDM4_SRC_PORT_SET are still using the SDK reference with the 0x2400 offset. Fix these 2 define by subtracting 0x100 to each register to reflect the real address location. Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250120154148.13424-1-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-22NFC: nci: Add bounds checking in nci_hci_create_pipe()Dan Carpenter
The "pipe" variable is a u8 which comes from the network. If it's more than 127, then it results in memory corruption in the caller, nci_hci_connect_gate(). Cc: stable@vger.kernel.org Fixes: a1b0b9415817 ("NFC: nci: Create pipe on specific gate in nci_hci_connect_gate") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/bcf5453b-7204-4297-9c20-4d8c7dacf586@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-22net: sched: fix ets qdisc OOB IndexingJamal Hadi Salim
Haowei Yan <g1042620637@gmail.com> found that ets_class_from_arg() can index an Out-Of-Bound class in ets_class_from_arg() when passed clid of 0. The overflow may cause local privilege escalation. [ 18.852298] ------------[ cut here ]------------ [ 18.853271] UBSAN: array-index-out-of-bounds in net/sched/sch_ets.c:93:20 [ 18.853743] index 18446744073709551615 is out of range for type 'ets_class [16]' [ 18.854254] CPU: 0 UID: 0 PID: 1275 Comm: poc Not tainted 6.12.6-dirty #17 [ 18.854821] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 18.856532] Call Trace: [ 18.857441] <TASK> [ 18.858227] dump_stack_lvl+0xc2/0xf0 [ 18.859607] dump_stack+0x10/0x20 [ 18.860908] __ubsan_handle_out_of_bounds+0xa7/0xf0 [ 18.864022] ets_class_change+0x3d6/0x3f0 [ 18.864322] tc_ctl_tclass+0x251/0x910 [ 18.864587] ? lock_acquire+0x5e/0x140 [ 18.865113] ? __mutex_lock+0x9c/0xe70 [ 18.866009] ? __mutex_lock+0xa34/0xe70 [ 18.866401] rtnetlink_rcv_msg+0x170/0x6f0 [ 18.866806] ? __lock_acquire+0x578/0xc10 [ 18.867184] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 18.867503] netlink_rcv_skb+0x59/0x110 [ 18.867776] rtnetlink_rcv+0x15/0x30 [ 18.868159] netlink_unicast+0x1c3/0x2b0 [ 18.868440] netlink_sendmsg+0x239/0x4b0 [ 18.868721] ____sys_sendmsg+0x3e2/0x410 [ 18.869012] ___sys_sendmsg+0x88/0xe0 [ 18.869276] ? rseq_ip_fixup+0x198/0x260 [ 18.869563] ? rseq_update_cpu_node_id+0x10a/0x190 [ 18.869900] ? trace_hardirqs_off+0x5a/0xd0 [ 18.870196] ? syscall_exit_to_user_mode+0xcc/0x220 [ 18.870547] ? do_syscall_64+0x93/0x150 [ 18.870821] ? __memcg_slab_free_hook+0x69/0x290 [ 18.871157] __sys_sendmsg+0x69/0xd0 [ 18.871416] __x64_sys_sendmsg+0x1d/0x30 [ 18.871699] x64_sys_call+0x9e2/0x2670 [ 18.871979] do_syscall_64+0x87/0x150 [ 18.873280] ? do_syscall_64+0x93/0x150 [ 18.874742] ? lock_release+0x7b/0x160 [ 18.876157] ? do_user_addr_fault+0x5ce/0x8f0 [ 18.877833] ? irqentry_exit_to_user_mode+0xc2/0x210 [ 18.879608] ? irqentry_exit+0x77/0xb0 [ 18.879808] ? clear_bhb_loop+0x15/0x70 [ 18.880023] ? clear_bhb_loop+0x15/0x70 [ 18.880223] ? clear_bhb_loop+0x15/0x70 [ 18.880426] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 18.880683] RIP: 0033:0x44a957 [ 18.880851] Code: ff ff e8 fc 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 8974 24 10 [ 18.881766] RSP: 002b:00007ffcdd00fad8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 18.882149] RAX: ffffffffffffffda RBX: 00007ffcdd010db8 RCX: 000000000044a957 [ 18.882507] RDX: 0000000000000000 RSI: 00007ffcdd00fb70 RDI: 0000000000000003 [ 18.885037] RBP: 00007ffcdd010bc0 R08: 000000000703c770 R09: 000000000703c7c0 [ 18.887203] R10: 0000000000000080 R11: 0000000000000246 R12: 0000000000000001 [ 18.888026] R13: 00007ffcdd010da8 R14: 00000000004ca7d0 R15: 0000000000000001 [ 18.888395] </TASK> [ 18.888610] ---[ end trace ]--- Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc") Reported-by: Haowei Yan <g1042620637@gmail.com> Suggested-by: Haowei Yan <g1042620637@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250111145740.74755-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-22Merge tag 'net-next-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "This is slightly smaller than usual, with the most interesting work being still around RTNL scope reduction. Core: - More core refactoring to reduce the RTNL lock contention, including preparatory work for the per-network namespace RTNL lock, replacing RTNL lock with a per device-one to protect NAPI-related net device data and moving synchronize_net() calls outside such lock. - Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge and more specific TCP coverage. - Reduce network namespace tear-down time by removing per-subsystems synchronize_net() in tipc and sched. - Add flow label selector support for fib rules, allowing traffic redirection based on such header field. Netfilter: - Do not remove netdev basechain when last device is gone, allowing netdev basechains without devices. - Revisit the flowtable teardown strategy, dealing better with fin, reset and re-open events. - Scale-up IP-vs connection dumping by avoiding linear search on each restart. Protocols: - A significant XDP socket refactor, consolidating and optimizing several helpers into the core - Better scaling of ICMP rate-limiting, by removing false-sharing in inet peers handling. - Introduces netlink notifications for multicast IPv4 and IPv6 address changes. - Add ipsec support for IP-TFS/AggFrag encapsulation, allowing aggregation and fragmentation of the inner IP. - Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to avoid local port exhaustion issues when the average connection lifetime is very short. - Support updating keys (re-keying) for connections using kernel TLS (for TLS 1.3 only). - Support ipv4-mapped ipv6 address clients in smc-r v2. - Add support for jumbo data packet transmission in RxRPC sockets, gluing multiple data packets in a single UDP packet. - Support RxRPC RACK-TLP to manage packet loss and retransmission in conjunction with the congestion control algorithm. Driver API: - Introduce a unified and structured interface for reporting PHY statistics, exposing consistent data across different H/W via ethtool. - Make timestamping selectable, allow the user to select the desired hwtstamp provider (PHY or MAC) administratively. - Add support for configuring a header-data-split threshold (HDS) value via ethtool, to deal with partial or buggy H/W implementation. - Consolidate DSA drivers Energy Efficiency Ethernet support. - Add EEE management to phylink, making use of the phylib implementation. - Add phylib support for in-band capabilities negotiation. - Simplify how phylib-enabled mac drivers expose the supported interfaces. Tests and tooling: - Make the YNL tool package-friendly to make it easier to deploy it separately from the kernel. - Increase TCP selftest coverage importing several packetdrill test-cases. - Regenerate the ethtool uapi header from the YNL spec, to ease maintenance and future development. - Add YNL support for decoding the link types used in net self-tests, allowing a single build to run both net and drivers/net. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - add cross E-Switch QoS support - add SW Steering support for ConnectX-8 - implement support for HW-Managed Flow Steering, improving the rule deletion/insertion rate - support for multi-host LAG - Intel (ixgbe, ice, igb): - ice: add support for devlink health events - ixgbe: add initial support for E610 chipset variant - igb: add support for AF_XDP zero-copy - Meta: - add support for basic RSS config - allow changing the number of channels - add hardware monitoring support - Broadcom (bnxt): - implement TCP data split and HDS threshold ethtool support, enabling Device Memory TCP. - Marvell Octeon: - implement egress ipsec offload support for the cn10k family - Hisilicon (HIBMC): - implement unicast MAC filtering - Ethernet NICs embedded and virtual: - Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding contented atomic operations for drop counters - Freescale: - quicc: phylink conversion - enetc: support Tx and Rx checksum offload and improve TSO performances - MediaTek: - airoha: introduce support for ETS and HTB Qdisc offload - Microchip: - lan78XX USB: preparation work for phylink conversion - Synopsys (stmmac): - support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45 - refactor EEE support to leverage the new driver API - optimize DMA and cache access to increase raw RX performances by 40% - TI: - icssg-prueth: add multicast filtering support for VLAN interface - netkit: - add ability to configure head/tailroom - VXLAN: - accepts packets with user-defined reserved bit - Ethernet switches: - Microchip: - lan969x: add RGMII support - lan969x: improve TX and RX performance using the FDMA engine - nVidia/Mellanox: - move Tx header handling to PCI driver, to ease XDP support - Ethernet PHYs: - Texas Instruments DP83822: - add support for GPIO2 clock output - Realtek: - 8169: add support for RTL8125D rev.b - rtl822x: add hwmon support for the temperature sensor - Microchip: - add support for RDS PTP hardware - consolidate periodic output signal generation - CAN: - several DT-bindings to DT schema conversions - tcan4x5x: - add HW standby support - support nWKRQ voltage selection - kvaser: - allowing Bus Error Reporting runtime configuration - WiFi: - the on-going Multi-Link Operation (MLO) effort continues, affecting both the stack and in drivers - mac80211/cfg80211: - Emergency Preparedness Communication Services (EPCS) station mode support - support for adding and removing station links for MLO - add support for WiFi 7/EHT mesh over 320 MHz channels - report Tx power info for each link - RealTek (rtw88): - enable USB Rx aggregation and USB 3 to improve performance - LED support - RealTek (rtw89): - refactor power save to support Multi-Link Operations - add support for RTL8922AE-VS variant - MediaTek (mt76): - single wiphy multiband support (preparation for MLO) - p2p device support - add TP-Link TXE50UH USB adapter support - Qualcomm (ath10k): - support for the QCA6698AQ IP core - Qualcomm (ath12k): - enable MLO for QCN9274 - Bluetooth: - Allow sysfs to trigger hdev reset, to allow recovering devices not responsive from user-space - MediaTek: add support for MT7922, MT7925, MT7921e devices - Realtek: add support for RTL8851BE devices - Qualcomm: add support for WCN785x devices - ISO: allow BIG re-sync" * tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits) net/rose: prevent integer overflows in rose_setsockopt() net: phylink: fix regression when binding a PHY net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL. ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL. ipv6: Move lifetime validation to inet6_rtm_newaddr(). ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr(). ipv6: Pass dev to inet6_addr_add(). ipv6: Convert inet6_ioctl() to per-netns RTNL. ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup(). ipv6: Hold rtnl_net_lock() in addrconf_dad_work(). ipv6: Hold rtnl_net_lock() in addrconf_verify_work(). ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL. ipv6: Add __in6_dev_get_rtnl_net(). net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags net: mii: Fix the Speed display when the network cable is not connected sysctl net: Remove macro checks for CONFIG_SYSCTL eth: bnxt: update header sizing defaults ...
2025-01-21cachestat: fix page cache statistics permission checkingLinus Torvalds
When the 'cachestat()' system call was added in commit cf264e1329fb ("cachestat: implement cachestat syscall"), it was meant to be a much more convenient (and performant) version of mincore() that didn't need mapping things into the user virtual address space in order to work. But it ended up missing the "check for writability or ownership" fix for mincore(), done in commit 134fca9063ad ("mm/mincore.c: make mincore() more conservative"). This just adds equivalent logic to 'cachestat()', modified for the file context (rather than vma). Reported-by: Sudheendra Raghav Neela <sneela@tugraz.at> Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") Tested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-21Merge tag 'audit-pr-20250121' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit update from Paul Moore: "A single audit patch that fixes a problem when collecting pathnames for audit PATH records that was caused by some faulty pathname matching logic" * tag 'audit-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix suffixed '/' filename matching
2025-01-21Merge tag 'selinux-pr-20250121' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Extended permissions supported in conditional policy The SELinux extended permissions, aka "xperms", allow security admins to target individuals ioctls, and recently netlink messages, with their SELinux policy. Adding support for conditional policies allows admins to toggle the granular xperms using SELinux booleans, helping pave the way for greater use of xperms in general purpose SELinux policies. This change bumps the maximum SELinux policy version to 34. - Fix a SCTP/SELinux error return code inconsistency Depending on the loaded SELinux policy, specifically it's EXTSOCKCLASS support, the bind(2) LSM/SELinux hook could return different error codes due to the SELinux code checking the socket's SELinux object class (which can vary depending on EXTSOCKCLASS) and not the socket's sk_protocol field. We fix this by doing the obvious, and looking at the sock->sk_protocol field instead of the object class. - Makefile fixes to properly cleanup av_permissions.h Add av_permissions.h to "targets" so that it is properly cleaned up using the kbuild infrastructure. - A number of smaller improvements by Christian Göttsche A variety of straightforward changes to reduce code duplication, reduce pointer lookups, migrate void pointers to defined types, simplify code, constify function parameters, and correct iterator types. * tag 'selinux-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: make more use of str_read() when loading the policy selinux: avoid unnecessary indirection in struct level_datum selinux: use known type instead of void pointer selinux: rename comparison functions for clarity selinux: rework match_ipv6_addrmask() selinux: constify and reconcile function parameter names selinux: avoid using types indicating user space interaction selinux: supply missing field initializers selinux: add netlink nlmsg_type audit message selinux: add support for xperms in conditional policies selinux: Fix SCTP error inconsistency in selinux_socket_bind() selinux: use native iterator types selinux: add generated av_permissions.h to targets
2025-01-21Merge tag 'lsm-pr-20250121' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Improved handling of LSM "secctx" strings through lsm_context struct The LSM secctx string interface is from an older time when only one LSM was supported, migrate over to the lsm_context struct to better support the different LSMs we now have and make it easier to support new LSMs in the future. These changes explain the Rust, VFS, and networking changes in the diffstat. - Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are enabled Small tweak to be a bit smarter about when we build the LSM's common audit helpers. - Check for absurdly large policies from userspace in SafeSetID SafeSetID policies rules are fairly small, basically just "UID:UID", it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which helps quiet a number of syzbot related issues. While work is being done to address the syzbot issues through other mechanisms, this is a trivial and relatively safe fix that we can do now. - Various minor improvements and cleanups A collection of improvements to the kernel selftests, constification of some function parameters, removing redundant assignments, and local variable renames to improve readability. * tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lockdown: initialize local array before use to quiet static analysis safesetid: check size of policy writes net: corrections for security_secid_to_secctx returns lsm: rename variable to avoid shadowing lsm: constify function parameters security: remove redundant assignment to return variable lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test binder: initialize lsm_context structure rust: replace lsm context+len with lsm_context lsm: secctx provider check on release lsm: lsm_context in security_dentry_init_security lsm: use lsm_context in security_inode_getsecctx lsm: replace context+len with lsm_context lsm: ensure the correct LSM context releaser
2025-01-21Merge tag 'Smack-for-6.14' of https://github.com/cschaufler/smack-nextLinus Torvalds
Pull smack update from Casey Schaufler: "One minor code improvement for v6.14" * tag 'Smack-for-6.14' of https://github.com/cschaufler/smack-next: smack: deduplicate access to string conversion
2025-01-21Merge tag 'integrity-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "There's just a couple of changes: two kernel messages addressed, a measurement policy collision addressed, and one policy cleanup. Please note that the contents of the IMA measurement list is potentially affected. The builtin tmpfs IMA policy rule change might introduce additional measurements, while detecting a reboot might eliminate some measurements" * tag 'integrity-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: ignore suffixed policy rule comments ima: limit the builtin 'tcb' dont_measure tmpfs policy rule ima: kexec: silence RCU list traversal warning ima: Suspend PCR extends and log appends when rebooting
2025-01-21Merge tag 'chrome-platform-firmware-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform firmware updates from Tzung-Bi Shih: - Constify 'struct bin_attribute'. * tag 'chrome-platform-firmware-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: firmware: google: vpd: Use const 'struct bin_attribute' callback firmware: google: memconsole: Use const 'struct bin_attribute' callback firmware: google: gsmi: Constify 'struct bin_attribute' firmware: google: cbmem: Constify 'struct bin_attribute'
2025-01-21Merge tag 'chrome-platform-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform updates from Tzung-Bi Shih: "New: - Support new EC if the memory region information comes from the CRS ACPI resource descriptor in cros_ec_lpc Improvements: - Make sure EC is in RW before probing - Only check events on MKBP notifies to reduce the number of query commands in cros_ec_lpc Cleanups: - Remove unused code and DT bindings for cros-kbd-led-backlight - Constify 'struct bin_attribute' in cros_ec_vbc - Use str_enabled_disabled() in cros_usbpd_logger" * tag 'chrome-platform-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_lpc: Handle EC without CRS section platform/chrome: cros_usbpd_logger: Use str_enabled_disabled() helper platform/chrome: cros_ec_lpc: Support direct EC register memory access platform/chrome: cros_ec_lpc: Merge lpc_driver_ops into ec private structure platform/chrome: Update ChromeOS EC command tracing platform/chrome: cros_ec_lpc: Only check for events on MKBP notifies platform/chrome: cros_ec_vbc: Constify 'struct bin_attribute' dt-bindings: cros-ec: Remove google,cros-kbd-led-backlight platform/chrome: cros_kbd_led_backlight: Remove OF match platform/chrome: cros_ec_proto: remove unnecessary retries platform/chrome: cros_ec: jump to RW before probing platform/chrome: cros_kbd_led_backlight: remove unneeded if-statement
2025-01-21Merge tag 'docs-6.14' of git://git.lwn.net/linuxLinus Torvalds
Pull Documentation updates from Jonathan Corbet: - Quite a bit of Chinese and Spanish translation work - Clarifying that Git commit IDs >12chars are OK - A new nvme-multipath document - A reorganization of the admin-guide top-level page to make it readable - Clarification of the role of Acked-by and maintainer discretion on their acceptance - Some reorganization of debugging-oriented docs ... and typo fixes, documentation updates, etc as usual * tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits) Documentation: Fix x86_64 UEFI outdated references to elilo Documentation/sysctl: Add timer_migration to kernel.rst docs/mm: Physical memory: Remove zone_t docs: submitting-patches: clarify that signers may use their discretion on tags docs: submitting-patches: clarify difference between Acked-by and Reviewed-by docs: submitting-patches: clarify Acked-by and introduce "# Suffix" Documentation: bug-hunting.rst: remove odd contact information docs/zh_CN: Add sak index Chinese translation doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes doc: module: Fix documented type of namespace Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst docs/zh_CN: Add landlock index Chinese translation Documentation: Fix typo localmodonfig -> localmodconfig overlayfs.rst: Fix and improve grammar docs/zh_CN: Add siphash index Chinese translation docs/zh_CN: Add security IMA-templates Chinese translation docs/zh_CN: Add security digsig Chinese translation Align git commit ID abbreviation guidelines and checks docs: process: submitting-patches: split canonical patch format section docs/zh_CN: Add security lsm Chinese translation ...
2025-01-21Merge tag 'rust-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Finish the move to custom FFI integer types started in the previous cycle and finally map 'long' to 'isize' and 'char' to 'u8'. Do a few cleanups on top thanks to that. - Start to use 'derive(CoercePointee)' on Rust >= 1.84.0. This is a major milestone on the path to build the kernel using only stable Rust features. In particular, previously we were using the unstable features 'coerce_unsized', 'dispatch_from_dyn' and 'unsize', and now we will use the new 'derive_coerce_pointee' one, which is on track to stabilization. This new feature is a macro that essentially expands into code that internally uses the unstable features that we were using before, without having to expose those. With it, stable Rust users, including the kernel, will be able to build custom smart pointers that work with trait objects, e.g.: fn f(p: &Arc<dyn Display>) { pr_info!("{p}\n"); } let a: Arc<dyn Display> = Arc::new(42i32, GFP_KERNEL)?; let b: Arc<dyn Display> = Arc::new("hello there", GFP_KERNEL)?; f(&a); // Prints "42". f(&b); // Prints "hello there". Together with the 'arbitrary_self_types' feature that we started using in the previous cycle, using our custom smart pointers like 'Arc' will eventually only rely in stable Rust. - Introduce 'PROCMACROLDFLAGS' environment variable to allow to link Rust proc macros using different flags than those used for linking Rust host programs (e.g. when 'rustc' uses a different C library than the host programs' one), which Android needs. - Help kernel builds under macOS with Rust enabled by accomodating other naming conventions for dynamic libraries (i.e. '.so' vs. '.dylib') which are used for Rust procedural macros. The actual support for macOS (i.e. the rest of the pieces needed) is provided out-of-tree by others, following the policy used for other parts of the kernel by Kbuild. - Run Clippy for 'rusttest' code too and clean the bits it spotted. - Provide Clippy with the minimum supported Rust version to improve the suggestions it gives. - Document 'bindgen' 0.71.0 regression. 'kernel' crate: - 'build_error!': move users of the hidden function to the documented macro, prevent such uses in the future by moving the function elsewhere and add the macro to the prelude. - 'types' module: add improved version of 'ForeignOwnable::borrow_mut' (which was removed in the past since it was problematic); change 'ForeignOwnable' pointer type to '*mut'. - 'alloc' module: implement 'Display' for 'Box' and align the 'Debug' implementation to it; add example (doctest) for 'ArrayLayout::new()' - 'sync' module: document 'PhantomData' in 'Arc'; use 'NonNull::new_unchecked' in 'ForeignOwnable for Arc' impl. - 'uaccess' module: accept 'Vec's with different allocators in 'UserSliceReader::read_all'. - 'workqueue' module: enable run-testing a couple more doctests. - 'error' module: simplify 'from_errno()'. - 'block' module: fix formatting in code documentation (a lint to catch these is being implemented). - Avoid 'unwrap()'s in doctests, which also improves the examples by showing how kernel code is supposed to be written. - Avoid 'as' casts with 'cast{,_mut}' calls which are a bit safer. And a few other cleanups" * tag 'rust-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (32 commits) kbuild: rust: add PROCMACROLDFLAGS rust: uaccess: generalize userSliceReader to support any Vec rust: kernel: add improved version of `ForeignOwnable::borrow_mut` rust: kernel: reorder `ForeignOwnable` items rust: kernel: change `ForeignOwnable` pointer to mut rust: arc: split unsafe block, add missing comment rust: types: avoid `as` casts rust: arc: use `NonNull::new_unchecked` rust: use derive(CoercePointee) on rustc >= 1.84.0 rust: alloc: add doctest for `ArrayLayout::new()` rust: init: update `stack_try_pin_init` examples rust: error: import `kernel`'s `LayoutError` instead of `core`'s rust: str: replace unwraps with question mark operators rust: page: remove unnecessary helper function from doctest rust: rbtree: remove unwrap in asserts rust: init: replace unwraps with question mark operators rust: use host dylib naming convention to support macOS rust: add `build_error!` to the prelude rust: kernel: move `build_error` hidden function to prevent mistakes rust: use the `build_error!` macro, not the hidden function ...