summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-09net: enetc: remove of_device_is_available() handlingVladimir Oltean
Since commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status"), this is redundant and does nothing, because enetc_pf_probe() no longer even gets called. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-09net: enetc: reimplement RFS/RSS memory clearing as PCI quirkVladimir Oltean
The workaround implemented in commit 3222b5b613db ("net: enetc: initialize RFS/RSS memories for unused ports too") is no longer effective after commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status"). Thus, it has introduced a regression and we see AER errors being reported again: $ ip link set sw2p0 up && dhclient -i sw2p0 && ip addr show sw2p0 fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode fsl_enetc 0000:00:00.2 eno2: Link is Up - 2.5Gbps/Full - flow control rx/tx mscc_felix 0000:00:00.5 swp2: configuring for fixed/sgmii link mode mscc_felix 0000:00:00.5 swp2: Link is Up - 1Gbps/Full - flow control off sja1105 spi2.2 sw2p0: configuring for phy/rgmii-id link mode sja1105 spi2.2 sw2p0: Link is Up - 1Gbps/Full - flow control off pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0 pcieport 0000:00:1f.0: AER: can't find device of ID0000 Rob's suggestion is to reimplement the enetc driver workaround as a PCI fixup, and to modify the PCI core to run the fixups for all PCI functions. This change handles the first part. We refactor the common code in enetc_psi_create() and enetc_psi_destroy(), and use the PCI fixup only for those functions for which enetc_pf_probe() won't get called. This avoids some work being done twice for the PFs which are enabled. Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/ Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-09PCI: move OF status = "disabled" detection to dev->match_driverVladimir Oltean
The blamed commit has broken probing on arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi when &enetc_port0 (PCI function 0) has status = "disabled". Background: pci_scan_slot() has logic to say that if the function 0 of a device is absent, the entire device is absent and we can skip the other functions entirely. Traditionally, this has meant that pci_bus_read_dev_vendor_id() returns an error code for that function. However, since the blamed commit, there is an extra confounding condition: function 0 of the device exists and has a valid vendor id, but it is disabled in the device tree. In that case, pci_scan_slot() would incorrectly skip the entire device instead of just that function. In the case of NXP LS1028A, status = "disabled" does not mean that the PCI function's config space is not available for reading. It is, but the Ethernet port is just not functionally useful with a particular SerDes protocol configuration (0x9999) due to pinmuxing constraints of the Soc. So, pci_scan_slot() skips all other functions on the ENETC ECAM (enetc_port1, enetc_port2, enetc_mdio_pf3 etc) when just enetc_port0 had to not be probed. There is an additional regression introduced by the change, caused by its fundamental premise. The enetc driver needs to run code for all PCI functions, regardless of whether they're enabled or not in the device tree. That is no longer possible if the driver's probe function is no longer called. But Rob recommends that we move the of_device_is_available() detection to dev->match_driver, and this makes the PCI fixups still run on all functions, while just probing drivers for those functions that are enabled. So, a separate change in the enetc driver will have to move the workarounds to a PCI fixup. Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status") Link: https://lore.kernel.org/netdev/CAL_JsqLsVYiPLx2kcHkDQ4t=hQVCR7NHziDwi9cCFUFhx48Qow@mail.gmail.com/ Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-08iavf: fix potential races for FDIR filtersPiotr Gardocki
Add fdir_fltr_lock locking in unprotected places. The change in iavf_fdir_is_dup_fltr adds a spinlock around a loop which iterates over all filters and looks for a duplicate. The filter can be removed from list and freed from memory at the same time it's being compared. All other places where filters are deleted are already protected with spinlock. The remaining changes protect adapter->fdir_active_fltr variable so now all its uses are under a spinlock. Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807205011.3129224-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08igc: Add lock to safeguard global Qbv variablesMuhammad Husaini Zulkifli
Access to shared variables through hrtimer requires locking in order to protect the variables because actions to write into these variables (oper_gate_closed, admin_gate_closed, and qbv_transition) might potentially occur simultaneously. This patch provides a locking mechanisms to avoid such scenarios. Fixes: 175c241288c0 ("igc: Fix TX Hang issue when QBV Gate is closed") Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230807205129.3129346-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08Merge tag 'mlx5-fixes-2023-08-07' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-08-07 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Add capability check for vnic counters net/mlx5: Reload auxiliary devices in pci error handlers net/mlx5: Skip clock update work when device is in error state net/mlx5: LAG, Check correct bucket when modifying LAG net/mlx5e: Unoffload post act rule when handling FIB events net/mlx5: Fix devlink controller number for ECVF net/mlx5: Allow 0 for total host VFs net/mlx5: Return correct EC_VF function ID net/mlx5: DR, Fix wrong allocation of modify hdr pattern net/mlx5e: TC, Fix internal port memory leak net/mlx5e: Take RTNL lock when needed before calling xdp_set_features() ==================== Link: https://lore.kernel.org/r/20230807212607.50883-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'Jakub Kicinski
Jijie Shao says: ==================== There are some bugfix for the HNS3 ethernet driver There are some bugfix for the HNS3 ethernet driver v1: https://lore.kernel.org/all/20230728075840.4022760-2-shaojijie@huawei.com/ ==================== Link: https://lore.kernel.org/r/20230807113452.474224-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08net: hns3: fix deadlock issue when externel_lb and reset are executed togetherYonglong Liu
When externel_lb and reset are executed together, a deadlock may occur: [ 3147.217009] INFO: task kworker/u321:0:7 blocked for more than 120 seconds. [ 3147.230483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3147.238999] task:kworker/u321:0 state:D stack: 0 pid: 7 ppid: 2 flags:0x00000008 [ 3147.248045] Workqueue: hclge hclge_service_task [hclge] [ 3147.253957] Call trace: [ 3147.257093] __switch_to+0x7c/0xbc [ 3147.261183] __schedule+0x338/0x6f0 [ 3147.265357] schedule+0x50/0xe0 [ 3147.269185] schedule_preempt_disabled+0x18/0x24 [ 3147.274488] __mutex_lock.constprop.0+0x1d4/0x5dc [ 3147.279880] __mutex_lock_slowpath+0x1c/0x30 [ 3147.284839] mutex_lock+0x50/0x60 [ 3147.288841] rtnl_lock+0x20/0x2c [ 3147.292759] hclge_reset_prepare+0x68/0x90 [hclge] [ 3147.298239] hclge_reset_subtask+0x88/0xe0 [hclge] [ 3147.303718] hclge_reset_service_task+0x84/0x120 [hclge] [ 3147.309718] hclge_service_task+0x2c/0x70 [hclge] [ 3147.315109] process_one_work+0x1d0/0x490 [ 3147.319805] worker_thread+0x158/0x3d0 [ 3147.324240] kthread+0x108/0x13c [ 3147.328154] ret_from_fork+0x10/0x18 In externel_lb process, the hns3 driver call napi_disable() first, then the reset happen, then the restore process of the externel_lb will fail, and will not call napi_enable(). When doing externel_lb again, napi_disable() will be double call, cause a deadlock of rtnl_lock(). This patch use the HNS3_NIC_STATE_DOWN state to protect the calling of napi_disable() and napi_enable() in externel_lb process, just as the usage in ndo_stop() and ndo_start(). Fixes: 04b6ba143521 ("net: hns3: add support for external loopback test") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230807113452.474224-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08net: hns3: add wait until mac link downJie Wang
In some configure flow of hns3 driver, for example, change mtu, it will disable MAC through firmware before configuration. But firmware disables MAC asynchronously. The rx traffic may be not stopped in this case. So fixes it by waiting until mac link is down. Fixes: a9775bb64aa7 ("net: hns3: fix set and get link ksettings issue") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230807113452.474224-4-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08net: hns3: refactor hclge_mac_link_status_wait for interface reuseJie Wang
Some nic configurations could only be performed after link is down. So this patch refactor this API for reuse. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230807113452.474224-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08net: hns3: restore user pause configure when disable autonegJian Shen
Restore the mac pause state to user configuration when autoneg is disabled Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230807113452.474224-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08net/unix: use consistent error code in SO_PEERPIDFDDavid Rheinsberg
Change the new (unreleased) SO_PEERPIDFD sockopt to return ENODATA rather than ESRCH if a socket type does not support remote peer-PID queries. Currently, SO_PEERPIDFD returns ESRCH when the socket in question is not an AF_UNIX socket. This is quite unexpected, given that one would assume ESRCH means the peer process already exited and thus cannot be found. However, in that case the sockopt actually returns EINVAL (via pidfd_prepare()). This is rather inconsistent with other syscalls, which usually return ESRCH if a given PID refers to a non-existant process. This changes SO_PEERPIDFD to return ENODATA instead. This is also what SO_PEERGROUPS returns, and thus keeps a consistent behavior across sockopts. Note that this code is returned in 2 cases: First, if the socket type is not AF_UNIX, and secondly if the socket was not yet connected. In both cases ENODATA seems suitable. Signed-off-by: David Rheinsberg <david@readahead.eu> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Luca Boccassi <bluca@debian.org> Fixes: 7b26952a91cf ("net: core: add getsockopt SO_PEERPIDFD") Link: https://lore.kernel.org/r/20230807081225.816199-1-david@readahead.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08Merge tag 'hardening-v6.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Replace remaining open-coded struct_size_t() instance (Gustavo A. R. Silva) - Adjust vboxsf's trailing arrays to be proper flexible arrays * tag 'hardening-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: media: venus: Use struct_size_t() helper in pkt_session_unset_buffers() vboxsf: Use flexible arrays for trailing string member
2023-08-08MAINTAINERS: update Claudiu Beznea's email addressClaudiu Beznea
Update MAINTAINERS entries with a valid email address as the Microchip one is no longer valid. Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Acked-by: Sebastian Reichel <sre@kernel.org> Link: https://lore.kernel.org/r/20230804050007.235799-1-claudiu.beznea@tuxon.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08perf stat: Don't display zero tool countsIan Rogers
Andi reported (see link below) a regression when printing the 'duration_time' tool event, where it gets printed as "not counted" for most of the CPUs, fix it by skipping zero counts for tool events. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Claire Jensen <cjense@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/ZMlrzcVrVi1lTDmn@tassilo/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-08Merge tag 'gfs2-v6.4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: - Fix a freeze consistency check in gfs2_trans_add_meta() - Don't use filemap_splice_read as it can cause deadlocks on gfs2 * tag 'gfs2-v6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Don't use filemap_splice_read gfs2: Fix freeze consistency check in gfs2_trans_add_meta
2023-08-08tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes from these csets: 522b1d69219d8f08 ("x86/cpu/amd: Add a Zenbleed fix") That cause no changes to tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after $ Just silences this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZND17H7BI4ariERn@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-08Revert "perf report: Append inlines to non-DWARF callchains"Arnaldo Carvalho de Melo
This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e. The tests were made with a specific workload, further tests on a recently updated fedora 38 system with a system wide perf.data file shows 'perf report' taking excessive time resolving inlines in vmlinux, so lets revert this until a full investigation and improvement on the addr2line support code is made. Reported-by: Jesper Dangaard Brouer <hawk@kernel.org> Acked-by: Artem Savkov <asavkov@redhat.com> Tested-by: Jesper Dangaard Brouer <hawk@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/ZMl8VyhdwhClTM5g@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-08wifi: cfg80211: fix sband iftype data lookup for AP_VLANFelix Fietkau
AP_VLAN interfaces are virtual, so doesn't really exist as a type for capabilities. When passed in as a type, AP is the one that's really intended. Fixes: c4cbaf7973a7 ("cfg80211: Add support for HE") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230622165919.46841-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-08wifi: rtw89: fix 8852AE disconnection caused by RX full flagsPing-Ke Shih
RX full flags are raised if certain types of RX FIFO are full, and then drop all following MPDU of AMPDU. In order to resume to receive MPDU when RX FIFO becomes available, we clear the register bits by the commit a0d99ebb3ecd ("wifi: rtw89: initialize DMA of CMAC"). But, 8852AE needs more settings to support this. To quickly fix disconnection problem, revert the behavior as before. Fixes: a0d99ebb3ecd ("wifi: rtw89: initialize DMA of CMAC") Reported-by: Damian B <bronecki.damian@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217710 Cc: <Stable@vger.kernel.org> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Tested-by: Damian B <bronecki.damian@gmail.com> Link: https://lore.kernel.org/r/20230808005426.5327-1-pkshih@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-08MAINTAINERS: Remove tree entry for rtl8180Larry Finger
This entry is not needed. Remove it. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Link: https://lore.kernel.org/r/20230804222438.16076-3-Larry.Finger@lwfinger.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-08MAINTAINERS: Update entry for rtl8187Larry Finger
As Herton Ronaldo Krzesinski is no longer active, remove him as maintainer for rtl8187. The git tree entry is also removed. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Link: https://lore.kernel.org/r/20230804222438.16076-2-Larry.Finger@lwfinger.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-08wifi: brcm80211: handle params_v1 allocation failurePetr Tesarik
Return -ENOMEM from brcmf_run_escan() if kzalloc() fails for v1 params. Fixes: 398ce273d6b1 ("wifi: brcmfmac: cfg80211: Add support for scan params v2") Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Link: https://lore.kernel.org/r/20230802163430.1656-1-petrtesarik@huaweicloud.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-07net: marvell: prestera: fix handling IPv4 routes with nhidJonas Gorski
Fix handling IPv4 routes referencing a nexthop via its id by replacing calls to fib_info_nh() with fib_info_nhc(). Trying to add an IPv4 route referencing a nextop via nhid: $ ip link set up swp5 $ ip a a 10.0.0.1/24 dev swp5 $ ip nexthop add dev swp5 id 20 via 10.0.0.2 $ ip route add 10.0.1.0/24 nhid 20 triggers warnings when trying to handle the route: [ 528.805763] ------------[ cut here ]------------ [ 528.810437] WARNING: CPU: 3 PID: 53 at include/net/nexthop.h:468 __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 528.820434] Modules linked in: prestera_pci act_gact act_police sch_ingress cls_u32 cls_flower prestera arm64_delta_tn48m_dn_led(O) arm64_delta_tn48m_dn_cpld(O) [last unloaded: prestera_pci] [ 528.837485] CPU: 3 PID: 53 Comm: kworker/u8:3 Tainted: G O 6.4.5 #1 [ 528.845178] Hardware name: delta,tn48m-dn (DT) [ 528.849641] Workqueue: prestera_ordered __prestera_router_fib_event_work [prestera] [ 528.857352] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 528.864347] pc : __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 528.870135] lr : prestera_k_arb_fib_evt+0xb20/0xd50 [prestera] [ 528.876007] sp : ffff80000b20bc90 [ 528.879336] x29: ffff80000b20bc90 x28: 0000000000000000 x27: ffff0001374d3a48 [ 528.886510] x26: ffff000105604000 x25: ffff000134af8a28 x24: ffff0001374d3800 [ 528.893683] x23: ffff000101c89148 x22: ffff000101c89000 x21: ffff000101c89200 [ 528.900855] x20: ffff00013641fda0 x19: ffff800009d01088 x18: 0000000000000059 [ 528.908027] x17: 0000000000000277 x16: 0000000000000000 x15: 0000000000000000 [ 528.915198] x14: 0000000000000003 x13: 00000000000fe400 x12: 0000000000000000 [ 528.922371] x11: 0000000000000002 x10: 0000000000000aa0 x9 : ffff8000013d2020 [ 528.929543] x8 : 0000000000000018 x7 : 000000007b1703f8 x6 : 000000001ca72f86 [ 528.936715] x5 : 0000000033399ea7 x4 : 0000000000000000 x3 : ffff0001374d3acc [ 528.943886] x2 : 0000000000000000 x1 : ffff00010200de00 x0 : ffff000134ae3f80 [ 528.951058] Call trace: [ 528.953516] __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 528.958952] __prestera_router_fib_event_work+0x100/0x158 [prestera] [ 528.965348] process_one_work+0x208/0x488 [ 528.969387] worker_thread+0x4c/0x430 [ 528.973068] kthread+0x120/0x138 [ 528.976313] ret_from_fork+0x10/0x20 [ 528.979909] ---[ end trace 0000000000000000 ]--- [ 528.984998] ------------[ cut here ]------------ [ 528.989645] WARNING: CPU: 3 PID: 53 at include/net/nexthop.h:468 __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 528.999628] Modules linked in: prestera_pci act_gact act_police sch_ingress cls_u32 cls_flower prestera arm64_delta_tn48m_dn_led(O) arm64_delta_tn48m_dn_cpld(O) [last unloaded: prestera_pci] [ 529.016676] CPU: 3 PID: 53 Comm: kworker/u8:3 Tainted: G W O 6.4.5 #1 [ 529.024368] Hardware name: delta,tn48m-dn (DT) [ 529.028830] Workqueue: prestera_ordered __prestera_router_fib_event_work [prestera] [ 529.036539] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 529.043533] pc : __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 529.049318] lr : __prestera_k_arb_fc_apply+0x280/0x2f8 [prestera] [ 529.055452] sp : ffff80000b20bc60 [ 529.058781] x29: ffff80000b20bc60 x28: 0000000000000000 x27: ffff0001374d3a48 [ 529.065953] x26: ffff000105604000 x25: ffff000134af8a28 x24: ffff0001374d3800 [ 529.073126] x23: ffff000101c89148 x22: ffff000101c89148 x21: ffff00013641fda0 [ 529.080299] x20: ffff000101c89000 x19: ffff000101c89020 x18: 0000000000000059 [ 529.087471] x17: 0000000000000277 x16: 0000000000000000 x15: 0000000000000000 [ 529.094642] x14: 0000000000000003 x13: 00000000000fe400 x12: 0000000000000000 [ 529.101814] x11: 0000000000000002 x10: 0000000000000aa0 x9 : ffff8000013cee80 [ 529.108985] x8 : 0000000000000018 x7 : 000000007b1703f8 x6 : 0000000000000018 [ 529.116157] x5 : 00000000d3497eb6 x4 : ffff000105604081 x3 : 000000008e979557 [ 529.123329] x2 : 0000000000000000 x1 : ffff00010200de00 x0 : ffff000134ae3f80 [ 529.130501] Call trace: [ 529.132958] __prestera_fi_is_direct+0x2c/0x68 [prestera] [ 529.138394] prestera_k_arb_fib_evt+0x6b8/0xd50 [prestera] [ 529.143918] __prestera_router_fib_event_work+0x100/0x158 [prestera] [ 529.150313] process_one_work+0x208/0x488 [ 529.154348] worker_thread+0x4c/0x430 [ 529.158030] kthread+0x120/0x138 [ 529.161274] ret_from_fork+0x10/0x20 [ 529.164867] ---[ end trace 0000000000000000 ]--- and results in a non offloaded route: $ ip route 10.0.0.0/24 dev swp5 proto kernel scope link src 10.0.0.1 rt_trap 10.0.1.0/24 nhid 20 via 10.0.0.2 dev swp5 rt_trap When creating a route referencing a nexthop via its ID, the nexthop will be stored in a separate nh pointer instead of the array of nexthops in the fib_info struct. This causes issues since fib_info_nh() only handles the nexthops array, but not the separate nh pointer, and will loudly WARN about it. In contrast fib_info_nhc() handles both, but returns a fib_nh_common pointer instead of a fib_nh pointer. Luckily we only ever access fields from the fib_nh_common parts, so we can just replace all instances of fib_info_nh() with fib_info_nhc() and access the fields via their fib_nh_common names. This allows handling IPv4 routes with an external nexthop, and they now get offloaded as expected: $ ip route 10.0.0.0/24 dev swp5 proto kernel scope link src 10.0.0.1 rt_trap 10.0.1.0/24 nhid 20 via 10.0.0.2 dev swp5 offload rt_offload Fixes: 396b80cb5cc8 ("net: marvell: prestera: Add neighbour cache accounting") Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de> Acked-by: Elad Nachman <enachman@marvell.com> Link: https://lore.kernel.org/r/20230804101220.247515-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07net: core: remove unnecessary frame_sz check in bpf_xdp_adjust_tail()Andrew Kanner
Syzkaller reported the following issue: ======================================= Too BIG xdp->frame_sz = 131072 WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline] WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121 bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103 ... Call Trace: <TASK> bpf_prog_4add87e5301a4105+0x1a/0x1c __bpf_prog_run include/linux/filter.h:600 [inline] bpf_prog_run_xdp include/linux/filter.h:775 [inline] bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721 netif_receive_generic_xdp net/core/dev.c:4807 [inline] do_xdp_generic+0x35c/0x770 net/core/dev.c:4866 tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919 tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x650/0xe40 fs/read_write.c:584 ksys_write+0x12f/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87 ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But Jesper Dangaard Brouer <jbrouer@redhat.com> noted that after introducing the xdp_init_buff() which all XDP driver use - it's safe to remove this check. The original intend was to catch cases where XDP drivers have not been updated to use xdp.frame_sz, but that is not longer a concern (since xdp_init_buff). Running the initial syzkaller repro it was discovered that the contiguous physical memory allocation is used for both xdp paths in tun_get_user(), e.g. tun_build_skb() and tun_alloc_skb(). It was also stated by Jesper Dangaard Brouer <jbrouer@redhat.com> that XDP can work on higher order pages, as long as this is contiguous physical memory (e.g. a page). Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/ Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a Link: https://lore.kernel.org/all/20230725155403.796-1-andrew.kanner@gmail.com/T/ Fixes: 43b5169d8355 ("net, xdp: Introduce xdp_init_buff utility routine") Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20230803190316.2380231-1-andrew.kanner@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07drivers: net: prevent tun_build_skb() to exceed the packet size limitAndrew Kanner
Using the syzkaller repro with reduced packet size it was discovered that XDP_PACKET_HEADROOM is not checked in tun_can_build_skb(), although pad may be incremented in tun_build_skb(). This may end up with exceeding the PAGE_SIZE limit in tun_build_skb(). Jason Wang <jasowang@redhat.com> proposed to count XDP_PACKET_HEADROOM always (e.g. without rcu_access_pointer(tun->xdp_prog)) in tun_can_build_skb() since there's a window during which XDP program might be attached between tun_can_build_skb() and tun_build_skb(). Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set") Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> Link: https://lore.kernel.org/r/20230803185947.2379988-1-andrew.kanner@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07Merge tag 'xsa432-6.5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen netback buffer overflow fix from Juergen Gross: "The fix for XSA-423 added logic to Linux'es netback driver to deal with a frontend splitting a packet in a way such that not all of the headers would come in one piece. Unfortunately the logic introduced there didn't account for the extreme case of the entire packet being split into as many pieces as permitted by the protocol, yet still being smaller than the area that's specially dealt with to keep all (possible) headers together. Such an unusual packet would therefore trigger a buffer overrun in the driver" * tag 'xsa432-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/netback: Fix buffer overrun triggered by unusual packet
2023-08-07Merge tag 'gds-for-linus-2023-08-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/gds fixes from Dave Hansen: "Mitigate Gather Data Sampling issue: - Add Base GDS mitigation - Support GDS_NO under KVM - Fix a documentation typo" * tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Fix backwards on/off logic about YMM support KVM: Add GDS_NO support to KVM x86/speculation: Add Kconfig option for GDS x86/speculation: Add force option to GDS mitigation x86/speculation: Add Gather Data Sampling mitigation
2023-08-07Merge tag 'x86_bugs_srso' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/srso fixes from Borislav Petkov: "Add a mitigation for the speculative RAS (Return Address Stack) overflow vulnerability on AMD processors. In short, this is yet another issue where userspace poisons a microarchitectural structure which can then be used to leak privileged information through a side channel" * tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/srso: Tie SBPB bit setting to microcode patch detection x86/srso: Add a forgotten NOENDBR annotation x86/srso: Fix return thunks in generated code x86/srso: Add IBPB on VMEXIT x86/srso: Add IBPB x86/srso: Add SRSO_NO support x86/srso: Add IBPB_BRTYPE support x86/srso: Add a Speculative RAS Overflow mitigation x86/bugs: Increase the x86 bugs vector size to two u32s
2023-08-07Merge tag 'wq-for-6.5-rc5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: - The recently added cpu_intensive auto detection and warning mechanism was spuriously triggered on slow CPUs. While not causing serious issues, it's still a nuisance and can cause unintended concurrency management behaviors. Relax the threshold on machines with lower BogoMIPS. While BogoMIPS is not an accurate measure of performance by most measures, we don't have to be accurate and it has rough but strong enough correlation. - A correction in Kconfig help text * tag 'wq-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000 workqueue: Fix cpu_intensive_thresh_us name in help text
2023-08-07Merge tag 'tpmdd-v6.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen: "A few more bug fixes" * tag 'tpmdd-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm/tpm_tis: Disable interrupts for Lenovo P620 devices tpm: Disable RNG for all AMD fTPMs sysctl: set variable key_sysctls storage-class-specifier to static tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7
2023-08-07Merge branch 'wireguard-fixes-for-6-5-rc6'Jakub Kicinski
Jason A. Donenfeld says: ==================== wireguard fixes for 6.5-rc6 Just one patch this time, somewhat late in the cycle: 1) Fix an off-by-one calculation for the maximum node depth size in the allowedips trie data structure, and also adjust the self-tests to hit this case so it doesn't regress again in the future. ==================== Link: https://lore.kernel.org/r/20230807132146.2191597-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07wireguard: allowedips: expand maximum node depthJason A. Donenfeld
In the allowedips self-test, nodes are inserted into the tree, but it generated an even amount of nodes, but for checking maximum node depth, there is of course the root node, which makes the total number necessarily odd. With two few nodes added, it never triggered the maximum depth check like it should have. So, add 129 nodes instead of 128 nodes, and do so with a more straightforward scheme, starting with all the bits set, and shifting over one each time. Then increase the maximum depth to 129, and choose a better name for that variable to make it clear that it represents depth as opposed to bits. Cc: stable@vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20230807132146.2191597-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slavesZiyang Xuan
BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with following testcase: # ip netns add ns1 # ip netns exec ns1 ip link add bond0 type bond mode 0 # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 # ip netns exec ns1 ip link set bond_slave_1 master bond0 # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link set bond_slave_1 nomaster # ip netns del ns1 The logical analysis of the problem is as follows: 1. create ETH_P_8021AD protocol vlan10 for bond_slave_1: register_vlan_dev() vlan_vid_add() vlan_info_alloc() __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1 2. create ETH_P_8021AD protocol bond0_vlan10 for bond0: register_vlan_dev() vlan_vid_add() __vlan_vid_add() vlan_add_rx_filter_info() if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER return 0; if (netif_device_present(dev)) return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid. 3. detach bond_slave_1 from bond0: __bond_release_one() vlan_vids_del_by_dev() list_for_each_entry(vid_info, &vlan_info->vid_list, list) vlan_vid_del(dev, vid_info->proto, vid_info->vid); // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted. // bond_slave_1->vlan_info will be assigned NULL. 4. delete vlan10 during delete ns1: default_device_exit_batch() dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10 vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1 BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!! Add S-VLAN tag related features support to bond driver. So the bond driver will always propagate the VLAN info to its slaves. Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support") Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07net/mlx5e: Add capability check for vnic countersLama Kayal
Add missing capability check for each of the vnic counters exposed by devlink health reporter, and thus avoid unexpected behavior due to invalid access to registers. While at it, read only the exact number of bits for each counter whether it was 32 bits or 64 bits. Fixes: b0bc615df488 ("net/mlx5: Add vnic devlink health reporter to PFs/VFs") Fixes: a33682e4e78e ("net/mlx5e: Expose catastrophic steering error counters") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Reload auxiliary devices in pci error handlersMoshe Shemesh
Handling pci errors should fully teardown and load back auxiliary devices, same as done through mlx5 health recovery flow. Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Skip clock update work when device is in error stateMoshe Shemesh
When device is in error state, marked by the flag MLX5_DEVICE_STATE_INTERNAL_ERROR, the HW and PCI may not be accessible and so clock update work should be skipped. Furthermore, such access through PCI in error state, after calling mlx5_pci_disable_device() can result in failing to recover from pci errors. Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support") Reported-and-tested-by: Ganesh G R <ganeshgr@linux.ibm.com> Closes: https://lore.kernel.org/netdev/9bdb9b9d-140a-7a28-f0de-2e64e873c068@nvidia.com Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: LAG, Check correct bucket when modifying LAGShay Drory
Cited patch introduced buckets in hash mode, but missed to update the ports/bucket check when modifying LAG. Fix the check. Fixes: 352899f384d4 ("net/mlx5: Lag, use buckets in hash mode") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5e: Unoffload post act rule when handling FIB eventsChris Mi
If having the following tc rule on stack device: filter parent ffff: protocol ip pref 3 flower chain 1 filter parent ffff: protocol ip pref 3 flower chain 1 handle 0x1 dst_mac 24:25:d0:e1:00:00 src_mac 02:25:d0:25:01:02 eth_type ipv4 ct_state +trk+new in_hw in_hw_count 1 action order 1: ct commit zone 0 pipe index 2 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: tunnel_key set src_ip 192.168.1.25 dst_ip 192.168.1.26 key_id 4 dst_port 4789 csum pipe index 3 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 3: mirred (Egress Redirect to device vxlan1) stolen index 9 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed When handling FIB events, the rule in post act will not be deleted. And because the post act rule has packet reformat and modify header actions, also will hit the following syndromes: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:829:(pid 11613): DEALLOC_MODIFY_HEADER_CONTEXT(0x941) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x1ab444), err(-22) mlx5_core 0000:08:00.0: mlx5_cmd_out_err:829:(pid 11613): DEALLOC_PACKET_REFORMAT_CONTEXT(0x93e) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x179e84), err(-22) Fix it by unoffloading post act rule when handling FIB events. Fixes: 314e1105831b ("net/mlx5e: Add post act offload/unoffload API") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Fix devlink controller number for ECVFDaniel Jurgens
The controller number for ECVFs is always 0, because the ECPF must be the eswitch owner for EC VFs to be enabled. Fixes: dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Allow 0 for total host VFsDaniel Jurgens
When querying eswitch functions 0 is a valid number of host VFs. After introducing ARM SRIOV falling through to getting the max value from PCI results in using the total VFs allowed on the ARM for the host. Fixes: 86eec50beaf3 ("net/mlx5: Support querying max VFs from device"); Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Return correct EC_VF function IDDaniel Jurgens
The ECVF function ID range is 1..max_ec_vfs. Currently mlx5_vport_to_func_id returns 0..max_ec_vfs - 1. Which results in a syndrome when querying the caps with more recent firmware, or reading incorrect caps with older firmware that supports EC VFs. Fixes: 9ac0b128248e ("net/mlx5: Update vport caps query/set for EC VFs") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: DR, Fix wrong allocation of modify hdr patternYevgeny Kliteynik
Fixing wrong calculation of the modify hdr pattern size, where the previously calculated number would not be enough to accommodate the required number of actions. Fixes: da5d0027d666 ("net/mlx5: DR, Add cache for modify header pattern") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5e: TC, Fix internal port memory leakJianbo Liu
The flow rule can be splited, and the extra post_act rules are added to post_act table. It's possible to trigger memleak when the rule forwards packets from internal port and over tunnel, in the case that, for example, CT 'new' state offload is allowed. As int_port object is assigned to the flow attribute of post_act rule, and its refcnt is incremented by mlx5e_tc_int_port_get(), but mlx5e_tc_int_port_put() is not called, the refcnt is never decremented, then int_port is never freed. The kmemleak reports the following error: unreferenced object 0xffff888128204b80 (size 64): comm "handler20", pid 50121, jiffies 4296973009 (age 642.932s) hex dump (first 32 bytes): 01 00 00 00 19 00 00 00 03 f0 00 00 04 00 00 00 ................ 98 77 67 41 81 88 ff ff 98 77 67 41 81 88 ff ff .wgA.....wgA.... backtrace: [<00000000e992680d>] kmalloc_trace+0x27/0x120 [<000000009e945a98>] mlx5e_tc_int_port_get+0x3f3/0xe20 [mlx5_core] [<0000000035a537f0>] mlx5e_tc_add_fdb_flow+0x473/0xcf0 [mlx5_core] [<0000000070c2cec6>] __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core] [<000000005cc84048>] mlx5e_configure_flower+0xd40/0x4c40 [mlx5_core] [<000000004f8a2031>] mlx5e_rep_indr_offload.isra.0+0x10e/0x1c0 [mlx5_core] [<000000007df797dc>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core] [<0000000016c15cc3>] tc_setup_cb_add+0x1cf/0x410 [<00000000a63305b4>] fl_hw_replace_filter+0x38f/0x670 [cls_flower] [<000000008bc9e77c>] fl_change+0x1fd5/0x4430 [cls_flower] [<00000000e7f766e4>] tc_new_tfilter+0x867/0x2010 [<00000000e101c0ef>] rtnetlink_rcv_msg+0x6fc/0x9f0 [<00000000e1111d44>] netlink_rcv_skb+0x12c/0x360 [<0000000082dd6c8b>] netlink_unicast+0x438/0x710 [<00000000fc568f70>] netlink_sendmsg+0x794/0xc50 [<0000000016e92590>] sock_sendmsg+0xc5/0x190 So fix this by moving int_port cleanup code to the flow attribute free helper, which is used by all the attribute free cases. Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5e: Take RTNL lock when needed before calling xdp_set_features()Gal Pressman
Hold RTNL lock when calling xdp_set_features() with a registered netdev, as the call triggers the netdev notifiers. This could happen when switching from uplink rep to nic profile for example. This resolves the following call trace: RTNL: assertion failed at net/core/dev.c (1953) WARNING: CPU: 6 PID: 112670 at net/core/dev.c:1953 call_netdevice_notifiers_info+0x7c/0x80 Modules linked in: sch_mqprio sch_mqprio_lib act_tunnel_key act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress bonding ib_umad ip_gre rdma_ucm mlx5_vfio_pci ipip tunnel4 ip6_gre gre mlx5_ib vfio_pci vfio_pci_core vfio_iommu_type1 ib_uverbs vfio mlx5_core ib_ipoib geneve nf_tables ip6_tunnel tunnel6 iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs] CPU: 6 PID: 112670 Comm: devlink Not tainted 6.4.0-rc7_for_upstream_min_debug_2023_06_28_17_02 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:call_netdevice_notifiers_info+0x7c/0x80 Code: 90 ff 80 3d 2d 6b f7 00 00 75 c5 ba a1 07 00 00 48 c7 c6 e4 ce 0b 82 48 c7 c7 c8 f4 04 82 c6 05 11 6b f7 00 01 e8 a4 7c 8e ff <0f> 0b eb a2 0f 1f 44 00 00 55 48 89 e5 41 54 48 83 e4 f0 48 83 ec RSP: 0018:ffff8882a21c3948 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff82e6f880 RCX: 0000000000000027 RDX: ffff88885f99b5c8 RSI: 0000000000000001 RDI: ffff88885f99b5c0 RBP: 0000000000000028 R08: ffff88887ffabaa8 R09: 0000000000000003 R10: ffff88887fecbac0 R11: ffff88887ff7bac0 R12: ffff8882a21c3968 R13: ffff88811c018940 R14: 0000000000000000 R15: ffff8881274401a0 FS: 00007fe141c81800(0000) GS:ffff88885f980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f787c28b948 CR3: 000000014bcf3005 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x79/0x120 ? call_netdevice_notifiers_info+0x7c/0x80 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? call_netdevice_notifiers_info+0x7c/0x80 ? call_netdevice_notifiers_info+0x7c/0x80 call_netdevice_notifiers+0x2e/0x50 mlx5e_set_xdp_feature+0x21/0x50 [mlx5_core] mlx5e_nic_init+0xf1/0x1a0 [mlx5_core] mlx5e_netdev_init_profile+0x76/0x110 [mlx5_core] mlx5e_netdev_attach_profile+0x1f/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x92/0x160 [mlx5_core] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core] mlx5e_vport_rep_unload+0xaa/0xc0 [mlx5_core] __esw_offloads_unload_rep+0x52/0x60 [mlx5_core] mlx5_esw_offloads_rep_unload+0x52/0x70 [mlx5_core] esw_offloads_unload_rep+0x34/0x70 [mlx5_core] esw_offloads_disable+0x2b/0x90 [mlx5_core] mlx5_eswitch_disable_locked+0x1b9/0x210 [mlx5_core] mlx5_devlink_eswitch_mode_set+0xf5/0x630 [mlx5_core] ? devlink_get_from_attrs_lock+0x9e/0x110 devlink_nl_cmd_eswitch_set_doit+0x60/0xe0 genl_family_rcv_msg_doit.isra.0+0xc2/0x110 genl_rcv_msg+0x17d/0x2b0 ? devlink_get_from_attrs_lock+0x110/0x110 ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290 ? devlink_pernet_pre_exit+0xf0/0xf0 ? genl_family_rcv_msg_doit.isra.0+0x110/0x110 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1f6/0x2c0 netlink_sendmsg+0x232/0x4a0 sock_sendmsg+0x38/0x60 ? _copy_from_user+0x2a/0x60 __sys_sendto+0x110/0x160 ? __count_memcg_events+0x48/0x90 ? handle_mm_fault+0x161/0x260 ? do_user_addr_fault+0x278/0x6e0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fe141b1340a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007fff61d03de8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000afab00 RCX: 00007fe141b1340a RDX: 0000000000000038 RSI: 0000000000afab00 RDI: 0000000000000003 RBP: 0000000000afa910 R08: 00007fe141d80200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 </TASK> Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07tpm/tpm_tis: Disable interrupts for Lenovo P620 devicesJonathan McDowell
The Lenovo ThinkStation P620 suffers from an irq storm issue like various other Lenovo machines, so add an entry for it to tpm_tis_dmi_table and force polling. It is worth noting that 481c2d14627d (tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs) does not seem to fix the problem on this machine, but setting 'tpm_tis.interrupts=0' on the kernel command line does. [jarkko@kernel.org: truncated the commit ID in the description to 12 characters] Cc: stable@vger.kernel.org # v6.4+ Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Signed-off-by: Jonathan McDowell <noodles@meta.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-07tpm: Disable RNG for all AMD fTPMsMario Limonciello
The TPM RNG functionality is not necessary for entropy when the CPU already supports the RDRAND instruction. The TPM RNG functionality was previously disabled on a subset of AMD fTPM series, but reports continue to show problems on some systems causing stutter root caused to TPM RNG functionality. Expand disabling TPM RNG use for all AMD fTPMs whether they have versions that claim to have fixed or not. To accomplish this, move the detection into part of the TPM CRB registration and add a flag indicating that the TPM should opt-out of registration to hwrng. Cc: stable@vger.kernel.org # 6.1.y+ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Fixes: f1324bbc4011 ("tpm: disable hwrng for fTPM on some AMD designs") Reported-by: daniil.stas@posteo.net Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217719 Reported-by: bitlord0xff@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217212 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-07sysctl: set variable key_sysctls storage-class-specifier to staticTom Rix
smatch reports security/keys/sysctl.c:12:18: warning: symbol 'key_sysctls' was not declared. Should it be static? This variable is only used in its defining file, so it should be static. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-07tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7Takashi Iwai
TUXEDO InfinityBook S 15/17 Gen7 suffers from an IRQ problem on tpm_tis like a few other laptops. Add an entry for the workaround. Cc: stable@vger.kernel.org Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Link: https://bugzilla.suse.com/show_bug.cgi?id=1213645 Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "x86: - Fix SEV race condition ARM: - Fixes for the configuration of SVE/SME traps when hVHE mode is in use - Allow use of pKVM on systems with FF-A implementations that are v1.0 compatible - Request/release percpu IRQs (arch timer, vGIC maintenance) correctly when pKVM is in use - Fix function prototype after __kvm_host_psci_cpu_entry() rename - Skip to the next instruction when emulating writes to TCR_EL1 on AmpereOne systems Selftests: - Fix missing include" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: selftests/rseq: Fix build with undefined __weak KVM: SEV: remove ghcb variable declarations KVM: SEV: only access GHCB fields once KVM: SEV: snapshot the GHCB before accessing it KVM: arm64: Skip instruction after emulating write to TCR_EL1 KVM: arm64: fix __kvm_host_psci_cpu_entry() prototype KVM: arm64: Fix resetting SME trap values on reset for (h)VHE KVM: arm64: Fix resetting SVE trap values on reset for hVHE KVM: arm64: Use the appropriate feature trap register when activating traps KVM: arm64: Helper to write to appropriate feature trap register based on mode KVM: arm64: Disable SME traps for (h)VHE at setup KVM: arm64: Use the appropriate feature trap register for SVE at EL2 setup KVM: arm64: Factor out code for checking (h)VHE mode into a macro KVM: arm64: Rephrase percpu enable/disable tracking in terms of hyp KVM: arm64: Fix hardware enable/disable flows for pKVM KVM: arm64: Allow pKVM on v1.0 compatible FF-A implementations