linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-10-24	Merge branch 'net-dsa-mv88e6xxx-fix-mv88e6393x-phc-frequency-on-internal-clock'	Paolo Abeni
	Shenghao Yang says: ==================== net: dsa: mv88e6xxx: fix MV88E6393X PHC frequency on internal clock The MV88E6393X family of switches can additionally run their cycle counters using a 250MHz internal clock instead of the usual 125MHz external clock [1]. The driver currently assumes all designs utilize that external clock, but MikroTik's RB5009 uses the internal source - causing the PHC to be seen running at 2x real time in userspace, making synchronization with ptp4l impossible. This series adds support for reading off the cycle counter frequency known to the hardware in the TAI_CLOCK_PERIOD register and picking an appropriate set of scaling coefficients instead of using a fixed set for each switch family. Patch 1 groups those cycle counter coefficients into a new structure to make it easier to pass them around. Patch 2 modifies PTP initialization to probe TAI_CLOCK_PERIOD and use an appropriate set of coefficients. Patch 3 adds support for 4000ps cycle counter periods. Changes since v2 [2]: - Patch 1: "net: dsa: mv88e6xxx: group cycle counter coefficients" - Moved declaration of mv88e6xxx_cc_coeffs to avoid moving that in Patch 2. - Patch 2: "net: dsa: mv88e6xxx: read cycle counter period from hardware" - Removed move of mv88e6xxx_cc_coeffs declaration. - Patch 3: "net: dsa: mv88e6xxx: support 4000ps cycle counter periods" - No change. [1] https://lore.kernel.org/netdev/d6622575-bf1b-445a-b08f-2739e3642aae@lunn.ch/ [2] https://lore.kernel.org/netdev/20241006145951.719162-1-me@shenghaoyang.info/ ==================== Link: https://patch.msgid.link/20241020063833.5425-1-me@shenghaoyang.info Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24	net: dsa: mv88e6xxx: support 4000ps cycle counter period	Shenghao Yang
	The MV88E6393X family of devices can run its cycle counter off an internal 250MHz clock instead of an external 125MHz one. Add support for this cycle counter period by adding another set of coefficients and lowering the periodic cycle counter read interval to compensate for faster overflows at the increased frequency. Otherwise, the PHC runs at 2x real time in userspace and cannot be synchronized. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Shenghao Yang <me@shenghaoyang.info> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24	net: dsa: mv88e6xxx: read cycle counter period from hardware	Shenghao Yang
	Instead of relying on a fixed mapping of hardware family to cycle counter frequency, pull this information from the MV88E6XXX_TAI_CLOCK_PERIOD register. This lets us support switches whose cycle counter frequencies depend on board design. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Shenghao Yang <me@shenghaoyang.info> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24	net: dsa: mv88e6xxx: group cycle counter coefficients	Shenghao Yang
	Instead of having them as individual fields in ptp_ops, wrap the coefficients in a separate struct so they can be referenced together. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Shenghao Yang <me@shenghaoyang.info> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24	net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition	Reinhard Speyerer
	Add Fibocom FG132 0x0112 composition: T: Bus=03 Lev=02 Prnt=06 Port=01 Cnt=02 Dev#= 10 Spd=12 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2cb7 ProdID=0112 Rev= 5.15 S: Manufacturer=Fibocom Wireless Inc. S: Product=Fibocom Module S: SerialNumber=xxxxxxxx C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms Signed-off-by: Reinhard Speyerer <rspmn@arcor.de> Link: https://patch.msgid.link/ZxLKp5YZDy-OM0-e@arcor.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24	hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event	Haiyang Zhang
	The existing code moves VF to the same namespace as the synthetic NIC during netvsc_register_vf(). But, if the synthetic device is moved to a new namespace after the VF registration, the VF won't be moved together. To make the behavior more consistent, add a namespace check for synthetic NIC's NETDEV_REGISTER event (generated during its move), and move the VF if it is not in the same namespace. Cc: stable@vger.kernel.org Fixes: c0a41b887ce6 ("hv_netvsc: move VF to same namespace as netvsc device") Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1729275922-17595-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24	net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x	Tim Harvey
	The well-known errata regarding EEE not being functional on various KSZ switches has been refactored a few times. Recently the refactoring has excluded several switches that the errata should also apply to. Disable EEE for additional switches with this errata and provide additional comments referring to the public errata document. The original workaround for the errata was applied with a register write to manually disable the EEE feature in MMD 7:60 which was being applied for KSZ9477/KSZ9897/KSZ9567 switch ID's. Then came commit 26dd2974c5b5 ("net: phy: micrel: Move KSZ9477 errata fixes to PHY driver") and commit 6068e6d7ba50 ("net: dsa: microchip: remove KSZ9477 PHY errata handling") which moved the errata from the switch driver to the PHY driver but only for PHY_ID_KSZ9477 (PHY ID) however that PHY code was dead code because an entry was never added for PHY_ID_KSZ9477 via MODULE_DEVICE_TABLE. This was apparently realized much later and commit 54a4e5c16382 ("net: phy: micrel: add Microchip KSZ 9477 to the device table") added the PHY_ID_KSZ9477 to the PHY driver but as the errata was only being applied to PHY_ID_KSZ9477 it's not completely clear what switches that relates to. Later commit 6149db4997f5 ("net: phy: micrel: fix KSZ9477 PHY issues after suspend/resume") breaks this again for all but KSZ9897 by only applying the errata for that PHY ID. Following that this was affected with commit 08c6d8bae48c("net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)") which removes the blatant register write to MMD 7:60 and replaces it by setting phydev->eee_broken_modes = -1 so that the generic phy-c45 code disables EEE but this is only done for the KSZ9477_CHIP_ID (Switch ID). Lastly commit 0411f73c13af ("net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.") adds some additional switches that were missing to the errata due to the previous changes. This commit adds an additional set of switches. Fixes: 0411f73c13af ("net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.") Signed-off-by: Tim Harvey <tharvey@gateworks.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241018160658.781564-1-tharvey@gateworks.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24	Merge tag 'for-net-2024-10-23' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_core: Disable works on hci_unregister_dev - SCO: Fix UAF on sco_sock_timeout - ISO: Fix UAF on iso_sock_timeout * tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: ISO: Fix UAF on iso_sock_timeout Bluetooth: SCO: Fix UAF on sco_sock_timeout Bluetooth: hci_core: Disable works on hci_unregister_dev ==================== Link: https://patch.msgid.link/20241023143005.2297694-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24	Merge tag 'ipsec-2024-10-22' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-10-22 1) Fix routing behavior that relies on L4 information for xfrm encapsulated packets. From Eyal Birger. 2) Remove leftovers of pernet policy_inexact lists. From Florian Westphal. 3) Validate new SA's prefixlen when the selector family is not set from userspace. From Sabrina Dubroca. 4) Fix a kernel-infoleak when dumping an auth algorithm. From Petr Vaganov. Please pull or let me know if there are problems. ipsec-2024-10-22 * tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix one more kernel-infoleak in algo dumping xfrm: validate new SA's prefixlen using SA family when sel.family is unset xfrm: policy: remove last remnants of pernet inexact list xfrm: respect ip protocols rules criteria when performing dst lookups xfrm: extract dst lookup parameters into a struct ==================== Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23	Bluetooth: ISO: Fix UAF on iso_sock_timeout	Luiz Augusto von Dentz
	conn->sk maybe have been unlinked/freed while waiting for iso_conn_lock so this checks if the conn->sk is still valid by checking if it part of iso_sk_list. Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23	Bluetooth: SCO: Fix UAF on sco_sock_timeout	Luiz Augusto von Dentz
	conn->sk maybe have been unlinked/freed while waiting for sco_conn_lock so this checks if the conn->sk is still valid by checking if it part of sco_sk_list. Reported-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com Tested-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4c0d0c4cde787116d465 Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23	Bluetooth: hci_core: Disable works on hci_unregister_dev	Luiz Augusto von Dentz
	This make use of disable_work_* on hci_unregister_dev since the hci_dev is about to be freed new submissions are not disarable. Fixes: 0d151a103775 ("Bluetooth: hci_core: cancel all works upon hci_unregister_dev()") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23	posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()	Jinjie Ruan
	If get_clock_desc() succeeds, it calls fget() for the clockid's fd, and get the clk->rwsem read lock, so the error path should release the lock to make the lock balance and fput the clockid's fd to make the refcount balance and release the fd related resource. However the below commit left the error path locked behind resulting in unbalanced locking. Check timespec64_valid_strict() before get_clock_desc() to fix it, because the "ts" is not changed after that. Fixes: d8794ac20a29 ("posix-clock: Fix missing timespec64 check in pc_clock_settime()") Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Anna-Maria Behnsen <anna-maria@linutronix.de> [pabeni@redhat.com: fixed commit message typo] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23	r8169: avoid unsolicited interrupts	Heiner Kallweit
	It was reported that after resume from suspend a PCI error is logged and connectivity is broken. Error message is: PCI error (cmd = 0x0407, status_errs = 0x0000) The message seems to be a red herring as none of the error bits is set, and the PCI command register value also is normal. Exception handling for a PCI error includes a chip reset what apparently brakes connectivity here. The interrupt status bit triggering the PCI error handling isn't actually used on PCIe chip versions, so it's not clear why this bit is set by the chip. Fix this by ignoring this bit on PCIe chip versions. Fixes: 0e4851502f84 ("r8169: merge with version 8.001.00 of Realtek's r8168 driver") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219388 Tested-by: Atlas Yu <atlas.yu@canonical.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/78e2f535-438f-4212-ad94-a77637ac6c9c@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23	net: sched: use RCU read-side critical section in taprio_dump()	Dmitry Antipov
	Fix possible use-after-free in 'taprio_dump()' by adding RCU read-side critical section there. Never seen on x86 but found on a KASAN-enabled arm64 system when investigating https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa: [T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0 [T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862 [T15862] [T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2 [T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024 [T15862] Call trace: [T15862] dump_backtrace+0x20c/0x220 [T15862] show_stack+0x2c/0x40 [T15862] dump_stack_lvl+0xf8/0x174 [T15862] print_report+0x170/0x4d8 [T15862] kasan_report+0xb8/0x1d4 [T15862] __asan_report_load4_noabort+0x20/0x2c [T15862] taprio_dump+0xa0c/0xbb0 [T15862] tc_fill_qdisc+0x540/0x1020 [T15862] qdisc_notify.isra.0+0x330/0x3a0 [T15862] tc_modify_qdisc+0x7b8/0x1838 [T15862] rtnetlink_rcv_msg+0x3c8/0xc20 [T15862] netlink_rcv_skb+0x1f8/0x3d4 [T15862] rtnetlink_rcv+0x28/0x40 [T15862] netlink_unicast+0x51c/0x790 [T15862] netlink_sendmsg+0x79c/0xc20 [T15862] __sock_sendmsg+0xe0/0x1a0 [T15862] ____sys_sendmsg+0x6c0/0x840 [T15862] ___sys_sendmsg+0x1ac/0x1f0 [T15862] __sys_sendmsg+0x110/0x1d0 [T15862] __arm64_sys_sendmsg+0x74/0xb0 [T15862] invoke_syscall+0x88/0x2e0 [T15862] el0_svc_common.constprop.0+0xe4/0x2a0 [T15862] do_el0_svc+0x44/0x60 [T15862] el0_svc+0x50/0x184 [T15862] el0t_64_sync_handler+0x120/0x12c [T15862] el0t_64_sync+0x190/0x194 [T15862] [T15862] Allocated by task 15857: [T15862] kasan_save_stack+0x3c/0x70 [T15862] kasan_save_track+0x20/0x3c [T15862] kasan_save_alloc_info+0x40/0x60 [T15862] __kasan_kmalloc+0xd4/0xe0 [T15862] __kmalloc_cache_noprof+0x194/0x334 [T15862] taprio_change+0x45c/0x2fe0 [T15862] tc_modify_qdisc+0x6a8/0x1838 [T15862] rtnetlink_rcv_msg+0x3c8/0xc20 [T15862] netlink_rcv_skb+0x1f8/0x3d4 [T15862] rtnetlink_rcv+0x28/0x40 [T15862] netlink_unicast+0x51c/0x790 [T15862] netlink_sendmsg+0x79c/0xc20 [T15862] __sock_sendmsg+0xe0/0x1a0 [T15862] ____sys_sendmsg+0x6c0/0x840 [T15862] ___sys_sendmsg+0x1ac/0x1f0 [T15862] __sys_sendmsg+0x110/0x1d0 [T15862] __arm64_sys_sendmsg+0x74/0xb0 [T15862] invoke_syscall+0x88/0x2e0 [T15862] el0_svc_common.constprop.0+0xe4/0x2a0 [T15862] do_el0_svc+0x44/0x60 [T15862] el0_svc+0x50/0x184 [T15862] el0t_64_sync_handler+0x120/0x12c [T15862] el0t_64_sync+0x190/0x194 [T15862] [T15862] Freed by task 6192: [T15862] kasan_save_stack+0x3c/0x70 [T15862] kasan_save_track+0x20/0x3c [T15862] kasan_save_free_info+0x4c/0x80 [T15862] poison_slab_object+0x110/0x160 [T15862] __kasan_slab_free+0x3c/0x74 [T15862] kfree+0x134/0x3c0 [T15862] taprio_free_sched_cb+0x18c/0x220 [T15862] rcu_core+0x920/0x1b7c [T15862] rcu_core_si+0x10/0x1c [T15862] handle_softirqs+0x2e8/0xd64 [T15862] __do_softirq+0x14/0x20 Fixes: 18cdd2f0998a ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://patch.msgid.link/20241018051339.418890-2-dmantipov@yandex.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23	net: sched: fix use-after-free in taprio_change()	Dmitry Antipov
	In 'taprio_change()', 'admin' pointer may become dangling due to sched switch / removal caused by 'advance_sched()', and critical section protected by 'q->current_entry_lock' is too small to prevent from such a scenario (which causes use-after-free detected by KASAN). Fix this by prefer 'rcu_replace_pointer()' over 'rcu_assign_pointer()' to update 'admin' immediately before an attempt to schedule freeing. Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reported-by: syzbot+b65e0af58423fc8a73aa@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://patch.msgid.link/20241018051339.418890-1-dmantipov@yandex.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23	net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions ↵	Vladimir Oltean
	created by classifiers tcf_action_init() has logic for checking mismatches between action and filter offload flags (skip_sw/skip_hw). AFAIU, this is intended to run on the transition between the new tc_act_bind(flags) returning true (aka now gets bound to classifier) and tc_act_bind(act->tcfa_flags) returning false (aka action was not bound to classifier before). Otherwise, the check is skipped. For the case where an action is not standalone, but rather it was created by a classifier and is bound to it, tcf_action_init() skips the check entirely, and this means it allows mismatched flags to occur. Taking the matchall classifier code path as an example (with mirred as an action), the reason is the following: 1 \| mall_change() 2 \| -> mall_replace_hw_filter() 3 \| -> tcf_exts_validate_ex() 4 \| -> flags \|= TCA_ACT_FLAGS_BIND; 5 \| -> tcf_action_init() 6 \| -> tcf_action_init_1() 7 \| -> a_o->init() 8 \| -> tcf_mirred_init() 9 \| -> tcf_idr_create_from_flags() 10 \| -> tcf_idr_create() 11 \| -> p->tcfa_flags = flags; 12 \| -> tc_act_bind(flags)) 13 \| -> tc_act_bind(act->tcfa_flags) When invoked from tcf_exts_validate_ex() like matchall does (but other classifiers validate their extensions as well), tcf_action_init() runs in a call path where "flags" always contains TCA_ACT_FLAGS_BIND (set by line 4). So line 12 is always true, and line 13 is always true as well. No transition ever takes place, and the check is skipped. The code was added in this form in commit c86e0209dc77 ("flow_offload: validate flags of filter and actions"), but I'm attributing the blame even earlier in that series, to when TCA_ACT_FLAGS_SKIP_HW and TCA_ACT_FLAGS_SKIP_SW were added to the UAPI. Following the development process of this change, the check did not always exist in this form. A change took place between v3 [1] and v4 [2], AFAIU due to review feedback that it doesn't make sense for action flags to be different than classifier flags. I think I agree with that feedback, but it was translated into code that omits enforcing this for "classic" actions created at the same time with the filters themselves. There are 3 more important cases to discuss. First there is this command: $ tc qdisc add dev eth0 clasct $ tc filter add dev eth0 ingress matchall skip_sw \ action mirred ingress mirror dev eth1 which should be allowed, because prior to the concept of dedicated action flags, it used to work and it used to mean the action inherited the skip_sw/skip_hw flags from the classifier. It's not a mismatch. Then we have this command: $ tc qdisc add dev eth0 clasct $ tc filter add dev eth0 ingress matchall skip_sw \ action mirred ingress mirror dev eth1 skip_hw where there is a mismatch and it should be rejected. Finally, we have: $ tc qdisc add dev eth0 clasct $ tc filter add dev eth0 ingress matchall skip_sw \ action mirred ingress mirror dev eth1 skip_sw where the offload flags coincide, and this should be treated the same as the first command based on inheritance, and accepted. [1]: https://lore.kernel.org/netdev/20211028110646.13791-9-simon.horman@corigine.com/ [2]: https://lore.kernel.org/netdev/20211118130805.23897-10-simon.horman@corigine.com/ Fixes: 7adc57651211 ("flow_offload: add skip_hw and skip_sw to control if offload the action") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20241017161049.3570037-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22	net: usb: usbnet: fix name regression	Oliver Neukum
	The fix for MAC addresses broke detection of the naming convention because it gave network devices no random MAC before bind() was called. This means that the check for the local assignment bit was always negative as the address was zeroed from allocation, instead of from overwriting the MAC with a unique hardware address. The correct check for whether bind() has altered the MAC is done with is_zero_ether_addr Signed-off-by: Oliver Neukum <oneukum@suse.com> Reported-by: Greg Thelen <gthelen@google.com> Diagnosed-by: John Sperbeck <jsperbeck@google.com> Fixes: bab8eb0dd4cb9 ("usbnet: modern method to get random MAC") Link: https://patch.msgid.link/20241017071849.389636-1-oneukum@suse.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22	mlxsw: spectrum_router: fix xa_store() error checking	Yuan Can
	It is meant to use xa_err() to extract the error encoded in the return value of xa_store(). Fixes: 44c2fbebe18a ("mlxsw: spectrum_router: Share nexthop counters in resilient groups") Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20241017023223.74180-1-yuancan@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22	Merge tag 'nf-24-10-21' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== This patchset contains Netfilter fixes for net: 1) syzkaller managed to triger UaF due to missing reference on netns in bpf infrastructure, from Florian Westphal. 2) Fix incorrect conversion from NFPROTO_UNSPEC to NFPROTO_{IPV4,IPV6} in the following xtables targets: MARK and NFLOG. Moreover, add missing I have my half share in this mistake, I did not take the necessary time to review this: For several years I have been struggling to keep working on Netfilter, juggling a myriad of side consulting projects to stop burning my own savings. I have extended the iptables-tests.py test infrastructure to improve the coverage of ip6tables and detect similar problems in the future. This is a v2 including a extended PR with one more fix. netfilter pull request 24-10-21 * tag 'nf-24-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: xtables: fix typo causing some targets not to load on IPv6 netfilter: bpf: must hold reference on net namespace ==================== Link: https://patch.msgid.link/20241021094536.81487-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21	virtio_net: fix integer overflow in stats	Michael S. Tsirkin
	Static analysis on linux-next has detected the following issue in function virtnet_stats_ctx_init, in drivers/net/virtio_net.c : if (vi->device_stats_cap & VIRTIO_NET_STATS_TYPE_CVQ) { queue_type = VIRTNET_Q_TYPE_CQ; ctx->bitmap[queue_type] \|= VIRTIO_NET_STATS_TYPE_CVQ; ctx->desc_num[queue_type] += ARRAY_SIZE(virtnet_stats_cvq_desc); ctx->size[queue_type] += sizeof(struct virtio_net_stats_cvq); } ctx->bitmap is declared as a u32 however it is being bit-wise or'd with VIRTIO_NET_STATS_TYPE_CVQ and this is defined as 1 << 32: include/uapi/linux/virtio_net.h:#define VIRTIO_NET_STATS_TYPE_CVQ (1ULL << 32) ..and hence the bit-wise or operation won't set any bits in ctx->bitmap because 1ULL < 32 is too wide for a u32. In fact, the field is read into a u64: u64 offset, bitmap; .... bitmap = ctx->bitmap[queue_type]; so to fix, it is enough to make bitmap an array of u64. Fixes: 941168f8b40e5 ("virtio_net: support device stats") Reported-by: "Colin King (gmail)" <colin.i.king@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/53e2bd6728136d5916e384a7840e5dc7eebff832.1729099611.git.mst@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21	net: fix races in netdev_tx_sent_queue()/dev_watchdog()	Eric Dumazet
	Some workloads hit the infamous dev_watchdog() message: "NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out" It seems possible to hit this even for perfectly normal BQL enabled drivers: 1) Assume a TX queue was idle for more than dev->watchdog_timeo (5 seconds unless changed by the driver) 2) Assume a big packet is sent, exceeding current BQL limit. 3) Driver ndo_start_xmit() puts the packet in TX ring, and netdev_tx_sent_queue() is called. 4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue() before txq->trans_start has been written. 5) txq->trans_start is written later, from netdev_start_xmit() if (rc == NETDEV_TX_OK) txq_trans_update(txq) dev_watchdog() running on another cpu could read the old txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5) did not happen yet. To solve the issue, write txq->trans_start right before one XOFF bit is set : - _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue() - __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue() From dev_watchdog(), we have to read txq->state before txq->trans_start. Add memory barriers to enforce correct ordering. In the future, we could avoid writing over txq->trans_start for normal operations, and rename this field to txq->xoff_start_time. Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21	net: wwan: fix global oob in wwan_rtnl_policy	Lin Ma
	The variable wwan_rtnl_link_ops assign a bigger maxtype which leads to a global out-of-bounds read when parsing the netlink attributes. Exactly same bug cause as the oob fixed in commit b33fb5b801c6 ("net: qualcomm: rmnet: fix global oob in rmnet_policy"). ================================================================== BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:388 [inline] BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x19d7/0x29a0 lib/nlattr.c:603 Read of size 1 at addr ffffffff8b09cb60 by task syz.1.66276/323862 CPU: 0 PID: 323862 Comm: syz.1.66276 Not tainted 6.1.70 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x177/0x231 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x14f/0x750 mm/kasan/report.c:395 kasan_report+0x139/0x170 mm/kasan/report.c:495 validate_nla lib/nlattr.c:388 [inline] __nla_validate_parse+0x19d7/0x29a0 lib/nlattr.c:603 __nla_parse+0x3c/0x50 lib/nlattr.c:700 nla_parse_nested_deprecated include/net/netlink.h:1269 [inline] __rtnl_newlink net/core/rtnetlink.c:3514 [inline] rtnl_newlink+0x7bc/0x1fd0 net/core/rtnetlink.c:3623 rtnetlink_rcv_msg+0x794/0xef0 net/core/rtnetlink.c:6122 netlink_rcv_skb+0x1de/0x420 net/netlink/af_netlink.c:2508 netlink_unicast_kernel net/netlink/af_netlink.c:1326 [inline] netlink_unicast+0x74b/0x8c0 net/netlink/af_netlink.c:1352 netlink_sendmsg+0x882/0xb90 net/netlink/af_netlink.c:1874 sock_sendmsg_nosec net/socket.c:716 [inline] __sock_sendmsg net/socket.c:728 [inline] ____sys_sendmsg+0x5cc/0x8f0 net/socket.c:2499 ___sys_sendmsg+0x21c/0x290 net/socket.c:2553 __sys_sendmsg net/socket.c:2582 [inline] __do_sys_sendmsg net/socket.c:2591 [inline] __se_sys_sendmsg+0x19e/0x270 net/socket.c:2589 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x90 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f67b19a24ad RSP: 002b:00007f67b17febb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f67b1b45f80 RCX: 00007f67b19a24ad RDX: 0000000000000000 RSI: 0000000020005e40 RDI: 0000000000000004 RBP: 00007f67b1a1e01d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd2513764f R14: 00007ffd251376e0 R15: 00007f67b17fed40 </TASK> The buggy address belongs to the variable: wwan_rtnl_policy+0x20/0x40 The buggy address belongs to the physical page: page:ffffea00002c2700 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xb09c flags: 0xfff00000001000(reserved\|node=0\|zone=1\|lastcpupid=0x7ff) raw: 00fff00000001000 ffffea00002c2708 ffffea00002c2708 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner info is not present (never set?) Memory state around the buggy address: ffffffff8b09ca00: 05 f9 f9 f9 05 f9 f9 f9 00 01 f9 f9 00 01 f9 f9 ffffffff8b09ca80: 00 00 00 05 f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9 >ffffffff8b09cb00: 00 00 00 00 05 f9 f9 f9 00 00 00 00 f9 f9 f9 f9 ^ ffffffff8b09cb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== According to the comment of `nla_parse_nested_deprecated`, use correct size `IFLA_WWAN_MAX` here to fix this issue. Fixes: 88b710532e53 ("wwan: add interface creation support") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241015131621.47503-1-linma@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21	netfilter: xtables: fix typo causing some targets not to load on IPv6	Pablo Neira Ayuso
	- There is no NFPROTO_IPV6 family for mark and NFLOG. - TRACE is also missing module autoload with NFPROTO_IPV6. This results in ip6tables failing to restore a ruleset. This issue has been reported by several users providing incomplete patches. Very similar to Ilya Katsnelson's patch including a missing chunk in the TRACE extension. Fixes: 0bfcb7b71e73 ("netfilter: xtables: avoid NFPROTO_UNSPEC where needed") Reported-by: Ignat Korchagin <ignat@cloudflare.com> Reported-by: Ilya Katsnelson <me@0upti.me> Reported-by: Krzysztof Olędzki <ole@ans.pl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-21	Merge branch 'fsl-fman-fix-refcount-handling-of-fman-related-devices'	Paolo Abeni
	Aleksandr Mishin says: ==================== fsl/fman: Fix refcount handling of fman-related devices The series is intended to fix refcount handling for fman-related "struct device" objects - the devices are not released upon driver removal or in the error paths during probe. This leads to device reference leaks. The device pointers are now saved to struct mac_device and properly handled in the driver's probe and removal functions. Originally reported by Simon Horman <horms@kernel.org> (https://lore.kernel.org/all/20240702133651.GK598357@kernel.org/) Compile tested only. ==================== Link: https://patch.msgid.link/20241015060122.25709-1-amishin@t-argos.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21	fsl/fman: Fix refcount handling of fman-related devices	Aleksandr Mishin
	In mac_probe() there are multiple calls to of_find_device_by_node(), fman_bind() and fman_port_bind() which takes references to of_dev->dev. Not all references taken by these calls are released later on error path in mac_probe() and in mac_remove() which lead to reference leaks. Add references release. Fixes: 3933961682a3 ("fsl/fman: Add FMan MAC driver") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21	fsl/fman: Save device references taken in mac_probe()	Aleksandr Mishin
	In mac_probe() there are calls to of_find_device_by_node() which takes references to of_dev->dev. These references are not saved and not released later on error path in mac_probe() and in mac_remove(). Add new fields into mac_device structure to save references taken for future use in mac_probe() and mac_remove(). This is a preparation for further reference leaks fix. Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-20	MAINTAINERS: add samples/pktgen to NETWORKING [GENERAL]	Hangbin Liu
	samples/pktgen is missing in the MAINTAINERS file. Suggested-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Message-ID: <20241018005301.10052-1-liuhangbin@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-20	mailmap: update entry for Jesper Dangaard Brouer	Jesper Dangaard Brouer
	Mapping all my previously used emails to my kernel.org email. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Message-ID: <172909057364.2452383.8019986488234344607.stgit@firesoul> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-20	net: dsa: mv88e6xxx: Fix error when setting port policy on mv88e6393x	Peter Rashleigh
	mv88e6393x_port_set_policy doesn't correctly shift the ptr value when converting the policy format between the old and new styles, so the target register ends up with the ptr being written over the data bits. Shift the pointer to align with the format expected by mv88e6393x_port_policy_write(). Fixes: 6584b26020fc ("net: dsa: mv88e6xxx: implement .port_set_policy for Amethyst") Signed-off-by: Peter Rashleigh <peter@rashleigh.ca> Reviewed-by: Simon Horman <horms@kernel.org> Message-ID: <20241016040822.3917-1-peter@rashleigh.ca> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	octeon_ep: Add SKB allocation failures handling in __octep_oq_process_rx()	Aleksandr Mishin
	build_skb() returns NULL in case of a memory allocation failure so handle it inside __octep_oq_process_rx() to avoid NULL pointer dereference. __octep_oq_process_rx() is called during NAPI polling by the driver. If skb allocation fails, keep on pulling packets out of the Rx DMA queue: we shouldn't break the polling immediately and thus falsely indicate to the octep_napi_poll() that the Rx pressure is going down. As there is no associated skb in this case, don't process the packets and don't push them up the network stack - they are skipped. Helper function is implemented to unmmap/flush all the fragment buffers used by the dropped packet. 'alloc_failures' counter is incremented to mark the skb allocation error in driver statistics. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	octeon_ep: Implement helper for iterating packets in Rx queue	Aleksandr Mishin
	The common code with some packet and index manipulations is extracted and moved to newly implemented helper to make the code more readable and avoid duplication. This is a preparation for skb allocation failure handling. Found by Linux Verification Center (linuxtesting.org) with SVACE. Suggested-by: Simon Horman <horms@kernel.org> Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	bnxt_en: replace ptp_lock with irqsave variant	Vadim Fedorenko
	In netpoll configuration the completion processing can happen in hard irq context which will break with spin_lock_bh() for fullfilling RX timestamp in case of all packets timestamping. Replace it with spin_lock_irqsave() variant. Fixes: 7f5515d19cd7 ("bnxt_en: Get the RX packet timestamp") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Message-ID: <20241016195234.2622004-1-vadfed@meta.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	net: phy: dp83822: Fix reset pin definitions	Michel Alex
	This change fixes a rare issue where the PHY fails to detect a link due to incorrect reset behavior. The SW_RESET definition was incorrectly assigned to bit 14, which is the Digital Restart bit according to the datasheet. This commit corrects SW_RESET to bit 15 and assigns DIG_RESTART to bit 14 as per the datasheet specifications. The SW_RESET define is only used in the phy_reset function, which fully re-initializes the PHY after the reset is performed. The change in the bit definitions should not have any negative impact on the functionality of the PHY. v2: - added Fixes tag - improved commit message Cc: stable@vger.kernel.org Fixes: 5dc39fd5ef35 ("net: phy: DP83822: Add ability to advertise Fiber connection") Signed-off-by: Alex Michel <alex.michel@wiedemann-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Message-ID: <AS1P250MB0608A798661549BF83C4B43EA9462@AS1P250MB0608.EURP250.PROD.OUTLOOK.COM> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	MAINTAINERS: add Simon as an official reviewer	Jakub Kicinski
	Simon has been diligently and consistently reviewing networking changes for at least as long as our development statistics go back. Often if not usually topping the list of reviewers. Make his role official. Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Message-ID: <20241015153005.2854018-1-kuba@kernel.org> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	net: plip: fix break; causing plip to never transmit	Jakub Boehm
	Since commit 71ae2cb30531 ("net: plip: Fix fall-through warnings for Clang") plip was not able to send any packets, this patch replaces one unintended break; with fallthrough; which was originally missed by commit 9525d69a3667 ("net: plip: mark expected switch fall-throughs"). I have verified with a real hardware PLIP connection that everything works once again after applying this patch. Fixes: 71ae2cb30531 ("net: plip: Fix fall-through warnings for Clang") Signed-off-by: Jakub Boehm <boehm.jakub@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Message-ID: <20241015-net-plip-tx-fix-v1-1-32d8be1c7e0b@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	be2net: fix potential memory leak in be_xmit()	Wang Hai
	The be_xmit() returns NETDEV_TX_OK without freeing skb in case of be_xmit_enqueue() fails, add dev_kfree_skb_any() to fix it. Fixes: 760c295e0e8d ("be2net: Support for OS2BMC.") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Message-ID: <20241015144802.12150-1-wanghai38@huawei.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	net/sun3_82586: fix potential memory leak in sun3_82586_send_packet()	Wang Hai
	The sun3_82586_send_packet() returns NETDEV_TX_OK without freeing skb in case of skb->len being too long, add dev_kfree_skb() to fix it. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Message-ID: <20241015144148.7918-1-wanghai38@huawei.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-19	net: pse-pd: Fix out of bound for loop	Kory Maincent
	Adjust the loop limit to prevent out-of-bounds access when iterating over PI structures. The loop should not reach the index pcdev->nr_lines since we allocate exactly pcdev->nr_lines number of PI structures. This fix ensures proper bounds are maintained during iterations. Fixes: 9be9567a7c59 ("net: pse-pd: Add support for PSE PIs") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Message-ID: <20241015130255.125508-1-kory.maincent@bootlin.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-17	Merge tag 'net-6.12-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Current release - new code bugs: - eth: mlx5: HWS, don't destroy more bwc queue locks than allocated Previous releases - regressions: - ipv4: give an IPv4 dev to blackhole_netdev - udp: compute L4 checksum as usual when not segmenting the skb - tcp/dccp: don't use timer_pending() in reqsk_queue_unlink(). - eth: mlx5e: don't call cleanup on profile rollback failure - eth: microchip: vcap api: fix memory leaks in vcap_api_encode_rule_test() - eth: enetc: disable Tx BD rings after they are empty - eth: macb: avoid 20s boot delay by skipping MDIO bus registration for fixed-link PHY Previous releases - always broken: - posix-clock: fix missing timespec64 check in pc_clock_settime() - genetlink: hold RCU in genlmsg_mcast() - mptcp: prevent MPC handshake on port-based signal endpoints - eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame - eth: stmmac: dwmac-tegra: fix link bring-up sequence - eth: bcmasp: fix potential memory leak in bcmasp_xmit() Misc: - add Andrew Lunn as a co-maintainer of all networking drivers" * tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) net/mlx5e: Don't call cleanup on profile rollback failure net/mlx5: Unregister notifier on eswitch init failure net/mlx5: Fix command bitmask initialization net/mlx5: Check for invalid vector index on EQ creation net/mlx5: HWS, use lock classes for bwc locks net/mlx5: HWS, don't destroy more bwc queue locks than allocated net/mlx5: HWS, fixed double free in error flow of definer layout net/mlx5: HWS, removed wrong access to a number of rules variable mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame net: dsa: vsc73xx: fix reception from VLAN-unaware bridges net: ravb: Only advertise Rx/Tx timestamps if hardware supports it net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test() net: phy: mdio-bcm-unimac: Add BCM6846 support dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio udp: Compute L4 checksum as usual when not segmenting the skb genetlink: hold RCU in genlmsg_mcast() net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361 tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink(). ...
2024-10-17	netfilter: bpf: must hold reference on net namespace	Florian Westphal
	BUG: KASAN: slab-use-after-free in __nf_unregister_net_hook+0x640/0x6b0 Read of size 8 at addr ffff8880106fe400 by task repro/72= bpf_nf_link_release+0xda/0x1e0 bpf_link_free+0x139/0x2d0 bpf_link_release+0x68/0x80 __fput+0x414/0xb60 Eric says: It seems that bpf was able to defer the __nf_unregister_net_hook() after exit()/close() time. Perhaps a netns reference is missing, because the netns has been dismantled/freed already. bpf_nf_link_attach() does : link->net = net; But I do not see a reference being taken on net. Add such a reference and release it after hook unreg. Note that I was unable to get syzbot reproducer to work, so I do not know if this resolves this splat. Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs") Diagnosed-by: Eric Dumazet <edumazet@google.com> Reported-by: Lai, Yi <yi1.lai@linux.intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-17	Merge branch 'mlx5-misc-fixes-2024-10-15'	Paolo Abeni
	Tariq Toukan says: ==================== mlx5 misc fixes 2024-10-15 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. Series generated against: commit 174714f0e505 ("selftests: drivers: net: fix name not defined") ==================== Link: https://patch.msgid.link/20241015093208.197603-1-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17	net/mlx5e: Don't call cleanup on profile rollback failure	Cosmin Ratiu
	When profile rollback fails in mlx5e_netdev_change_profile, the netdev profile var is left set to NULL. Avoid a crash when unloading the driver by not calling profile->cleanup in such a case. This was encountered while testing, with the original trigger that the wq rescuer thread creation got interrupted (presumably due to Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by mlx5e_priv_init, the profile rollback also fails for the same reason (signal still active) so the profile is left as NULL, leading to a crash later in _mlx5e_remove. [ 732.473932] mlx5_core 0000:08:00.1: E-Switch: Unload vfs: mode(OFFLOADS), nvfs(2), necvfs(0), active vports(2) [ 734.525513] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR [ 734.557372] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12 [ 734.559187] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: new profile init failed, -12 [ 734.560153] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR [ 734.589378] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12 [ 734.591136] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12 [ 745.537492] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 745.538222] #PF: supervisor read access in kernel mode <snipped> [ 745.551290] Call Trace: [ 745.551590] <TASK> [ 745.551866] ? __die+0x20/0x60 [ 745.552218] ? page_fault_oops+0x150/0x400 [ 745.555307] ? exc_page_fault+0x79/0x240 [ 745.555729] ? asm_exc_page_fault+0x22/0x30 [ 745.556166] ? mlx5e_remove+0x6b/0xb0 [mlx5_core] [ 745.556698] auxiliary_bus_remove+0x18/0x30 [ 745.557134] device_release_driver_internal+0x1df/0x240 [ 745.557654] bus_remove_device+0xd7/0x140 [ 745.558075] device_del+0x15b/0x3c0 [ 745.558456] mlx5_rescan_drivers_locked.part.0+0xb1/0x2f0 [mlx5_core] [ 745.559112] mlx5_unregister_device+0x34/0x50 [mlx5_core] [ 745.559686] mlx5_uninit_one+0x46/0xf0 [mlx5_core] [ 745.560203] remove_one+0x4e/0xd0 [mlx5_core] [ 745.560694] pci_device_remove+0x39/0xa0 [ 745.561112] device_release_driver_internal+0x1df/0x240 [ 745.561631] driver_detach+0x47/0x90 [ 745.562022] bus_remove_driver+0x84/0x100 [ 745.562444] pci_unregister_driver+0x3b/0x90 [ 745.562890] mlx5_cleanup+0xc/0x1b [mlx5_core] [ 745.563415] __x64_sys_delete_module+0x14d/0x2f0 [ 745.563886] ? kmem_cache_free+0x1b0/0x460 [ 745.564313] ? lockdep_hardirqs_on_prepare+0xe2/0x190 [ 745.564825] do_syscall_64+0x6d/0x140 [ 745.565223] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 745.565725] RIP: 0033:0x7f1579b1288b Fixes: 3ef14e463f6e ("net/mlx5e: Separate between netdev objects and mlx5e profiles initialization") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17	net/mlx5: Unregister notifier on eswitch init failure	Cosmin Ratiu
	It otherwise remains registered and a subsequent attempt at eswitch enabling might trigger warnings of the sort: [ 682.589148] ------------[ cut here ]------------ [ 682.590204] notifier callback eswitch_vport_event [mlx5_core] already registered [ 682.590256] WARNING: CPU: 13 PID: 2660 at kernel/notifier.c:31 notifier_chain_register+0x3e/0x90 [...snipped] [ 682.610052] Call Trace: [ 682.610369] <TASK> [ 682.610663] ? __warn+0x7c/0x110 [ 682.611050] ? notifier_chain_register+0x3e/0x90 [ 682.611556] ? report_bug+0x148/0x170 [ 682.611977] ? handle_bug+0x36/0x70 [ 682.612384] ? exc_invalid_op+0x13/0x60 [ 682.612817] ? asm_exc_invalid_op+0x16/0x20 [ 682.613284] ? notifier_chain_register+0x3e/0x90 [ 682.613789] atomic_notifier_chain_register+0x25/0x40 [ 682.614322] mlx5_eswitch_enable_locked+0x1d4/0x3b0 [mlx5_core] [ 682.614965] mlx5_eswitch_enable+0xc9/0x100 [mlx5_core] [ 682.615551] mlx5_device_enable_sriov+0x25/0x340 [mlx5_core] [ 682.616170] mlx5_core_sriov_configure+0x50/0x170 [mlx5_core] [ 682.616789] sriov_numvfs_store+0xb0/0x1b0 [ 682.617248] kernfs_fop_write_iter+0x117/0x1a0 [ 682.617734] vfs_write+0x231/0x3f0 [ 682.618138] ksys_write+0x63/0xe0 [ 682.618536] do_syscall_64+0x4c/0x100 [ 682.618958] entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: 7624e58a8b3a ("net/mlx5: E-switch, register event handler before arming the event") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17	net/mlx5: Fix command bitmask initialization	Shay Drory
	Command bitmask have a dedicated bit for MANAGE_PAGES command, this bit isn't Initialize during command bitmask Initialization, only during MANAGE_PAGES. In addition, mlx5_cmd_trigger_completions() is trying to trigger completion for MANAGE_PAGES command as well. Hence, in case health error occurred before any MANAGE_PAGES command have been invoke (for example, during mlx5_enable_hca()), mlx5_cmd_trigger_completions() will try to trigger completion for MANAGE_PAGES command, which will result in null-ptr-deref error.[1] Fix it by Initialize command bitmask correctly. While at it, re-write the code for better understanding. [1] BUG: KASAN: null-ptr-deref in mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core] Write of size 4 at addr 0000000000000214 by task kworker/u96:2/12078 CPU: 10 PID: 12078 Comm: kworker/u96:2 Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_07_19_01 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_health0000:08:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x7e/0xc0 kasan_report+0xb9/0xf0 kasan_check_range+0xec/0x190 mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core] mlx5_cmd_flush+0x94/0x240 [mlx5_core] enter_error_state+0x6c/0xd0 [mlx5_core] mlx5_fw_fatal_reporter_err_work+0xf3/0x480 [mlx5_core] process_one_work+0x787/0x1490 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0xda0/0xda0 ? assign_work+0x168/0x240 worker_thread+0x586/0xd30 ? rescuer_thread+0xae0/0xae0 kthread+0x2df/0x3b0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x2d/0x70 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17	net/mlx5: Check for invalid vector index on EQ creation	Maher Sanalla
	Currently, mlx5 driver does not enforce vector index to be lower than the maximum number of supported completion vectors when requesting a new completion EQ. Thus, mlx5_comp_eqn_get() fails when trying to acquire an IRQ with an improper vector index. To prevent the case above, enforce that vector index value is valid and lower than maximum in mlx5_comp_eqn_get() before handling the request. Fixes: f14c1a14e632 ("net/mlx5: Allocate completion EQs dynamically") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17	net/mlx5: HWS, use lock classes for bwc locks	Cosmin Ratiu
	The HWS BWC API uses one lock per queue and usually acquires one of them, except when doing changes which require locking all queues in order. Naturally, lockdep isn't too happy about acquiring the same lock class multiple times, so inform it that each queue lock is a different class to avoid false positives. Fixes: 2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17	net/mlx5: HWS, don't destroy more bwc queue locks than allocated	Cosmin Ratiu
	hws_send_queues_bwc_locks_destroy destroyed more queue locks than allocated, leading to memory corruption (occasionally) and warnings such as DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) in __mutex_destroy because sometimes, the 'mutex' being destroyed was random memory. The severity of this problem is proportional to the number of queues configured because the code overreaches beyond the end of the bwc_send_queue_locks array by 2x its length. Fix that by using the correct number of bwc queues. Fixes: 2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17	net/mlx5: HWS, fixed double free in error flow of definer layout	Yevgeny Kliteynik
	Fix error flow bug that could lead to double free of a buffer during a failure to calculate a suitable definer layout. Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17	net/mlx5: HWS, removed wrong access to a number of rules variable	Yevgeny Kliteynik
	Removed wrong access to the num_of_rules field of the matcher. This is a usual u32 variable, but the access was as if it was atomic. This fixes the following CI warnings: mlx5hws_bwc.c:708:17: warning: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Watomic-alignment] Fixes: 510f9f61a112 ("net/mlx5: HWS, added API and enabled HWS support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409291101.6NdtMFVC-lkp@intel.com/ Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>