summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
2024-03-26i40e: fix vf may be used uninitialized in this function warningAleksandr Loktionov
To fix the regression introduced by commit 52424f974bc5, which causes servers hang in very hard to reproduce conditions with resets races. Using two sources for the information is the root cause. In this function before the fix bumping v didn't mean bumping vf pointer. But the code used this variables interchangeably, so stale vf could point to different/not intended vf. Remove redundant "v" variable and iterate via single VF pointer across whole function instead to guarantee VF pointer validity. Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26i40e: fix i40e_count_filters() to count only active/new filtersAleksandr Loktionov
The bug usually affects untrusted VFs, because they are limited to 18 MACs, it affects them badly, not letting to create MAC all filters. Not stable to reproduce, it happens when VF user creates MAC filters when other MACVLAN operations are happened in parallel. But consequence is that VF can't receive desired traffic. Fix counter to be bumped only for new or active filters. Fixes: 621650cabee5 ("i40e: Refactoring VF MAC filters counting to make more reliable") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26i40e: Enforce software interrupt during busy-poll exitIvan Vecera
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts during poll exit") followed by commit 23be7075b318 ("ice: fix software generating extra interrupts") I'm seeing the similar issue also with i40e driver. In certain situation when busy-loop is enabled together with adaptive coalescing, the driver occasionally misses that there are outstanding descriptors to clean when exiting busy poll. Try to catch the remaining work by triggering a software interrupt when exiting busy poll. No extra interrupts will be generated when busy polling is not used. The issue was found when running sockperf ping-pong tcp test with adaptive coalescing and busy poll enabled (50 as value busy_pool and busy_read sysctl knobs) and results in huge latency spikes with more than 100000us. The fix is inspired from the ice driver and do the following: 1) During napi poll exit in case of busy-poll (napo_complete_done() returns false) this is recorded to q_vector that we were in busy loop. 2) Extends i40e_buildreg_itr() to be able to add an enforced software interrupt into built value 2) In i40e_update_enable_itr() enforces a software interrupt trigger if we are exiting busy poll to catch any pending clean-ups 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second to limit the number of these sw interrupts. Test results ============ Prior: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473 sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.571 usec sockperf: Total 2429473 observations; each percentile contains 24294.73 observations sockperf: ---> <MAX> observation = 103294.331 sockperf: ---> percentile 99.999 = 45.633 sockperf: ---> percentile 99.990 = 37.013 sockperf: ---> percentile 99.900 = 35.910 sockperf: ---> percentile 99.000 = 33.390 sockperf: ---> percentile 90.000 = 28.626 sockperf: ---> percentile 75.000 = 27.741 sockperf: ---> percentile 50.000 = 26.743 sockperf: ---> percentile 25.000 = 25.614 sockperf: ---> <MIN> observation = 12.220 After: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186 sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.965 usec sockperf: Total 2391186 observations; each percentile contains 23911.86 observations sockperf: ---> <MAX> observation = 195.841 sockperf: ---> percentile 99.999 = 45.026 sockperf: ---> percentile 99.990 = 39.009 sockperf: ---> percentile 99.900 = 35.922 sockperf: ---> percentile 99.000 = 33.482 sockperf: ---> percentile 90.000 = 28.902 sockperf: ---> percentile 75.000 = 27.821 sockperf: ---> percentile 50.000 = 26.860 sockperf: ---> percentile 25.000 = 25.685 sockperf: ---> <MIN> observation = 12.277 Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit") Reported-by: Hugo Ferreira <hferreir@redhat.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26net: hns3: mark unexcuted loopback test result as UNEXECUTEDJian Shen
Currently, loopback test may be skipped when resetting, but the test result will still show as 'PASS', because the driver doesn't set ETH_TEST_FL_FAILED flag. Fix it by setting the flag and initializating the value to UNEXECUTED. Fixes: 4c8dab1c709c ("net: hns3: reconstruct function hns3_self_test") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: hns3: fix kernel crash when devlink reload during pf initializationYonglong Liu
The devlink reload process will access the hardware resources, but the register operation is done before the hardware is initialized. So, processing the devlink reload during initialization may lead to kernel crash. This patch fixes this by taking devl_lock during initialization. Fixes: b741269b2759 ("net: hns3: add support for registering devlink for PF") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: hns3: fix index limit to support all queue statsJie Wang
Currently, hns hardware supports more than 512 queues and the index limit in hclge_comm_tqps_update_stats is wrong. So this patch removes it. Fixes: 287db5c40d15 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-26net: dsa: mt7530: fix improper frames on all 25MHz and 40MHz XTAL MT7530Arınç ÜNAL
The MT7530 switch after reset initialises with a core clock frequency that works with a 25MHz XTAL connected to it. For 40MHz XTAL, the core clock frequency must be set to 500MHz. The mt7530_pll_setup() function is responsible of setting the core clock frequency. Currently, it runs on MT7530 with 25MHz and 40MHz XTAL. This causes MT7530 switch with 25MHz XTAL to egress and ingress frames improperly. Introduce a check to run it only on MT7530 with 40MHz XTAL. The core clock frequency is set by writing to a switch PHY's register. Access to the PHY's register is done via the MDIO bus the switch is also on. Therefore, it works only when the switch makes switch PHYs listen on the MDIO bus the switch is on. This is controlled either by the state of the ESW_P1_LED_1 pin after reset deassertion or modifying bit 5 of the modifiable trap register. When ESW_P1_LED_1 is pulled high, PHY indirect access is used. That means accessing PHY registers via the PHY indirect access control register of the switch. When ESW_P1_LED_1 is pulled low, PHY direct access is used. That means accessing PHY registers via the MDIO bus the switch is on. For MT7530 switch with 40MHz XTAL on a board with ESW_P1_LED_1 pulled high, the core clock frequency won't be set to 500MHz, causing the switch to egress and ingress frames improperly. Run mt7530_pll_setup() after PHY direct access is set on the modifiable trap register. With these two changes, all MT7530 switches with 25MHz and 40MHz, and P1_LED_1 pulled high or low, will egress and ingress frames properly. Link: https://github.com/BPI-SINOVOIP/BPI-R2-bsp/blob/4a5dd143f2172ec97a2872fa29c7c4cd520f45b5/linux-mt/drivers/net/ethernet/mediatek/gsw_mt7623.c#L1039 Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20240320-for-net-mt7530-fix-25mhz-xtal-with-direct-phy-access-v1-1-d92f605f1160@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-25net: wwan: t7xx: Split 64bit accesses to fix alignment issuesBjørn Mork
Some of the registers are aligned on a 32bit boundary, causing alignment faults on 64bit platforms. Unable to handle kernel paging request at virtual address ffffffc084a1d004 Mem abort info: ESR = 0x0000000096000061 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000046ad6000 [ffffffc084a1d004] pgd=100000013ffff003, p4d=100000013ffff003, pud=100000013ffff003, pmd=0068000020a00711 Internal error: Oops: 0000000096000061 [#1] SMP Modules linked in: mtk_t7xx(+) qcserial pppoe ppp_async option nft_fib_inet nf_flow_table_inet mt7921u(O) mt7921s(O) mt7921e(O) mt7921_common(O) iwlmvm(O) iwldvm(O) usb_wwan rndis_host qmi_wwan pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7996e(O) mt792x_usb(O) mt792x_lib(O) mt7915e(O) mt76_usb(O) mt76_sdio(O) mt76_connac_lib(O) mt76(O) mac80211(O) iwlwifi(O) huawei_cdc_ncm cfg80211(O) cdc_ncm cdc_ether wwan usbserial usbnet slhc sfp rtc_pcf8563 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mt6577_auxadc mdio_i2c libcrc32c compat(O) cdc_wdm cdc_acm at24 crypto_safexcel pwm_fan i2c_gpio i2c_smbus industrialio i2c_algo_bit i2c_mux_reg i2c_mux_pca954x i2c_mux_pca9541 i2c_mux_gpio i2c_mux dummy oid_registry tun sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes cbc authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd nvme nvme_core gpio_button_hotplug(O) dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax usbcore usb_common ptp aquantia pps_core mii tpm encrypted_keys trusted CPU: 3 PID: 5266 Comm: kworker/u9:1 Tainted: G O 6.6.22 #0 Hardware name: Bananapi BPI-R4 (DT) Workqueue: md_hk_wq t7xx_fsm_uninit [mtk_t7xx] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx] lr : t7xx_cldma_start+0xac/0x13c [mtk_t7xx] sp : ffffffc085d63d30 x29: ffffffc085d63d30 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: ffffff80c804f2c0 x24: ffffff80ca196c05 x23: 0000000000000000 x22: ffffff80c814b9b8 x21: ffffff80c814b128 x20: 0000000000000001 x19: ffffff80c814b080 x18: 0000000000000014 x17: 0000000055c9806b x16: 000000007c5296d0 x15: 000000000f6bca68 x14: 00000000dbdbdce4 x13: 000000001aeaf72a x12: 0000000000000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffffff80ca1ef6b4 x7 : ffffff80c814b818 x6 : 0000000000000018 x5 : 0000000000000870 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 000000010a947000 x1 : ffffffc084a1d004 x0 : ffffffc084a1d004 Call trace: t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx] t7xx_fsm_uninit+0x578/0x5ec [mtk_t7xx] process_one_work+0x154/0x2a0 worker_thread+0x2ac/0x488 kthread+0xe0/0xec ret_from_fork+0x10/0x20 Code: f9400800 91001000 8b214001 d50332bf (f9000022) ---[ end trace 0000000000000000 ]--- The inclusion of io-64-nonatomic-lo-hi.h indicates that all 64bit accesses can be replaced by pairs of nonatomic 32bit access. Fix alignment by forcing all accesses to be 32bit on 64bit platforms. Link: https://forum.openwrt.org/t/fibocom-fm350-gl-support/142682/72 Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface") Signed-off-by: Bjørn Mork <bjorn@mork.no> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Tested-by: Liviu Dudau <liviu@dudau.co.uk> Link: https://lore.kernel.org/r/20240322144000.1683822-1-bjorn@mork.no Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25net: ll_temac: platform_get_resource replaced by wrong functionClaus Hansen Ries
The function platform_get_resource was replaced with devm_platform_ioremap_resource_byname and is called using 0 as name. This eventually ends up in platform_get_resource_byname in the call stack, where it causes a null pointer in strcmp. if (type == resource_type(r) && !strcmp(r->name, name)) It should have been replaced with devm_platform_ioremap_resource. Fixes: bd69058f50d5 ("net: ll_temac: Use devm_platform_ioremap_resource_byname()") Signed-off-by: Claus Hansen Ries <chr@terma.com> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/cca18f9c630a41c18487729770b492bb@terma.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25igc: Remove stale comment about Tx timestampingKurt Kanzenbach
The initial igc Tx timestamping implementation used only one register for retrieving Tx timestamps. Commit 3ed247e78911 ("igc: Add support for multiple in-flight TX timestamps") added support for utilizing all four of them e.g., for multiple domain support. Remove the stale comment/FIXME. Fixes: 3ed247e78911 ("igc: Add support for multiple in-flight TX timestamps") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()Przemek Kitszel
Change kzalloc() flags used in ixgbe_ipsec_vf_add_sa() to GFP_ATOMIC, to avoid sleeping in IRQ context. Dan Carpenter, with the help of Smatch, has found following issue: The patch eda0333ac293: "ixgbe: add VF IPsec management" from Aug 13, 2018 (linux-next), leads to the following Smatch static checker warning: drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917 ixgbe_ipsec_vf_add_sa() warn: sleeping in IRQ context The call tree that Smatch is worried about is: ixgbe_msix_other() <- IRQ handler -> ixgbe_msg_task() -> ixgbe_rcv_msg_from_vf() -> ixgbe_ipsec_vf_add_sa() Fixes: eda0333ac293 ("ixgbe: add VF IPsec management") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/intel-wired-lan/db31a0b0-4d9f-4e6b-aed8-88266eb5665c@moroto.mountain Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25ice: fix memory corruption bug with suspend and rebuildJesse Brandeburg
The ice driver would previously panic after suspend. This is caused from the driver *only* calling the ice_vsi_free_q_vectors() function by itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") the driver has zeroed out num_q_vectors, and only restored it in ice_vsi_cfg_def(). This further causes the ice_rebuild() function to allocate a zero length buffer, after which num_q_vectors is updated, and then the new value of num_q_vectors is used to index into the zero length buffer, which corrupts memory. The fix entails making sure all the code referencing num_q_vectors only does so after it has been reset via ice_vsi_cfg_def(). I didn't perform a full bisect, but I was able to test against 6.1.77 kernel and that ice driver works fine for suspend/resume with no panic, so sometime since then, this problem was introduced. Also clean up an un-needed init of a local variable in the function being modified. PANIC from 6.8.0-rc1: [1026674.915596] PM: suspend exit [1026675.664697] ice 0000:17:00.1: PTP reset successful [1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time [1026675.667660] ice 0000:b1:00.0: PTP reset successful [1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time [1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None [1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010 [1026677.192753] ice 0000:17:00.0: PTP reset successful [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time [1026677.197928] #PF: supervisor read access in kernel mode [1026677.197933] #PF: error_code(0x0000) - not-present page [1026677.197937] PGD 1557a7067 P4D 0 [1026677.212133] ice 0000:b1:00.1: PTP reset successful [1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time [1026677.212575] [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1 [1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [1026677.269367] Workqueue: ice ice_service_task [ice] [1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6 [1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202 [1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000 [1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828 [1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010 [1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0 [1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000 [1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0 [1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1026677.381952] PKRU: 55555554 [1026677.385116] Call Trace: [1026677.388023] <TASK> [1026677.390589] ? __die+0x20/0x70 [1026677.394105] ? page_fault_oops+0x82/0x160 [1026677.398576] ? do_user_addr_fault+0x65/0x6a0 [1026677.403307] ? exc_page_fault+0x6a/0x150 [1026677.407694] ? asm_exc_page_fault+0x22/0x30 [1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice] [1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [1026677.429147] ice_rebuild+0x18b/0x520 [ice] [1026677.433746] ? delay_tsc+0x8f/0xc0 [1026677.437630] ice_do_reset+0xa3/0x190 [ice] [1026677.442231] ice_service_task+0x26/0x440 [ice] [1026677.447180] process_one_work+0x174/0x340 [1026677.451669] worker_thread+0x27e/0x390 [1026677.455890] ? __pfx_worker_thread+0x10/0x10 [1026677.460627] kthread+0xee/0x120 [1026677.464235] ? __pfx_kthread+0x10/0x10 [1026677.468445] ret_from_fork+0x2d/0x50 [1026677.472476] ? __pfx_kthread+0x10/0x10 [1026677.476671] ret_from_fork_asm+0x1b/0x30 [1026677.481050] </TASK> Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25ice: Refactor FW data type and fix bitmap casting issueSteven Zou
According to the datasheet, the recipe association data is an 8-byte little-endian value. It is described as 'Bitmap of the recipe indexes associated with this profile', it is from 24 to 31 byte area in FW. Therefore, it is defined to '__le64 recipe_assoc' in struct ice_aqc_recipe_to_profile. And then fix the bitmap casting issue, as we must never ever use castings for bitmap type. Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Andrii Staikov <andrii.staikov@intel.com> Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steven Zou <steven.zou@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25wifi: iwlwifi: mvm: include link ID when releasing framesBenjamin Berg
When releasing frames from the reorder buffer, the link ID was not included in the RX status information. This subsequently led mac80211 to drop the frame. Change it so that the link information is set immediately when possible so that it doesn't not need to be filled in anymore when submitting the frame to mac80211. Fixes: b8a85a1d42d7 ("wifi: iwlwifi: mvm: rxmq: report link ID to mac80211") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Tested-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320232419.bbbd5e9bfe80.Iec1bf5c884e371f7bc5ea2534ed9ea8d3f2c0bf6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: handle debugfs names more carefullyJohannes Berg
With debugfs=off, we can get here with the dbgfs_dir being an ERR_PTR(). Instead of checking for all this, which is often flagged as a mistake, simply handle the names here more carefully by printing them, then we don't need extra checks. Also, while checking, I noticed theoretically 'buf' is too small, so fix that size as well. Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218422 Fixes: c36235acb34f ("wifi: iwlwifi: mvm: rework debugfs handling") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320232419.4dc1eb3dd015.I32f308b0356ef5bcf8d188dd98ce9b210e3ab9fd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: guard against invalid STA ID on removalBenjamin Berg
Guard against invalid station IDs in iwl_mvm_mld_rm_sta_id as that would result in out-of-bounds array accesses. This prevents issues should the driver get into a bad state during error handling. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320232419.d523167bda9c.I1cffd86363805bf86a95d8bdfd4b438bb54baddc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: read txq->read_ptr under lockJohannes Berg
If we read txq->read_ptr without lock, we can read the same value twice, then obtain the lock, and reclaim from there to two different places, but crucially reclaim the same entry twice, resulting in the WARN_ONCE() a little later. Fix that by reading txq->read_ptr under lock. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240319100755.bf4c62196504.I978a7ca56c6bd6f1bf42c15aa923ba03366a840b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: fw: don't always use FW dump trigJohannes Berg
Since the dump_data (struct iwl_fwrt_dump_data) is a union, it's not safe to unconditionally access and use the 'trig' member, it might be 'desc' instead. Access it only if it's known to be 'trig' rather than 'desc', i.e. if ini-debug is present. Cc: stable@vger.kernel.org Fixes: 0eb50c674a1e ("iwlwifi: yoyo: send hcmd to fw after dump collection completes.") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240319100755.e2976bc58b29.I72fbd6135b3623227de53d8a2bb82776066cb72b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: rfi: fix potential response leaksJohannes Berg
If the rx payload length check fails, or if kmemdup() fails, we still need to free the command response. Fix that. Fixes: 21254908cbe9 ("iwlwifi: mvm: add RFI-M support") Co-authored-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240319100755.db2fa0196aa7.I116293b132502ac68a65527330fa37799694b79c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FWIlan Peer
In the non MLD firmware flows, although the deflink is used, the mapping of link ID to BSS configuration was missing, which causes flows that need this mapping to crash. Fix this by adding the link ID to BSS configuration mapping to non MLD flows as well. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240311081938.0b5c361e8f0c.Ib11f41815d2efa5d1ec57f855de4c8563142987b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: consider having one active linkShaul Triebitz
Do not call iwl_mvm_mld_get_primary_link if only one link is active. In that case, the sole active link should be used. iwl_mvm_mld_get_primary_link returns -1 if only one link is active causing a warning. Fixes: 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240311081938.6c50061bf69b.I05b0ac7fa7149eabaa5570a6f65b0d9bfb09a6f1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: pick the version of SESSION_PROTECTION_NOTIFEmmanuel Grumbach
When we want to know whether we should look for the mac_id or the link_id in struct iwl_mvm_session_prot_notif, we should look at the version of SESSION_PROTECTION_NOTIF. This causes WARNINGs: WARNING: CPU: 0 PID: 11403 at drivers/net/wireless/intel/iwlwifi/mvm/time-event.c:959 iwl_mvm_rx_session_protect_notif+0x333/0x340 [iwlmvm] RIP: 0010:iwl_mvm_rx_session_protect_notif+0x333/0x340 [iwlmvm] Code: 00 49 c7 84 24 48 07 00 00 00 00 00 00 41 c6 84 24 78 07 00 00 ff 4c 89 f7 e8 e9 71 54 d9 e9 7d fd ff ff 0f 0b e9 23 fe ff ff <0f> 0b e9 1c fe ff ff 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffb4bb00003d40 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff9ae63a361000 RCX: ffff9ae4a98b60d4 RDX: ffff9ae4588499c0 RSI: 0000000000000305 RDI: ffff9ae4a98b6358 RBP: ffffb4bb00003d68 R08: 0000000000000003 R09: 0000000000000010 R10: ffffb4bb00003d00 R11: 000000000000000f R12: ffff9ae441399050 R13: ffff9ae4761329e8 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9ae7af400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fb75680018 CR3: 00000003dae32006 CR4: 0000000000f70ef0 PKRU: 55555554 Call Trace: <IRQ> ? show_regs+0x69/0x80 ? __warn+0x8d/0x150 ? iwl_mvm_rx_session_protect_notif+0x333/0x340 [iwlmvm] ? report_bug+0x196/0x1c0 ? handle_bug+0x45/0x80 ? exc_invalid_op+0x1c/0xb0 ? asm_exc_invalid_op+0x1f/0x30 ? iwl_mvm_rx_session_protect_notif+0x333/0x340 [iwlmvm] iwl_mvm_rx_common+0x115/0x340 [iwlmvm] iwl_mvm_rx_mq+0xa6/0x100 [iwlmvm] iwl_pcie_rx_handle+0x263/0xa10 [iwlwifi] iwl_pcie_napi_poll_msix+0x32/0xd0 [iwlwifi] Fixes: 085d33c53012 ("wifi: iwlwifi: support link id in SESSION_PROTECTION_NOTIF") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240311081938.39d5618f7b9d.I564d863e53c6cbcb49141467932ecb6a9840b320@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: iwlwifi: mvm: disable MLO for the time beingJohannes Berg
MLO ended up not really fully stable yet, we want to make sure it works well with the ecosystem before enabling it. Thus, remove the flag, but set WIPHY_FLAG_DISABLE_WEXT so we don't get wireless extensions back until we enable MLO for this hardware. Cc: stable@vger.kernel.org Reviewed-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240314110951.d6ad146df98d.I47127e4fdbdef89e4ccf7483641570ee7871d4e6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25mlxbf_gige: stop PHY during open() error pathsDavid Thompson
The mlxbf_gige_open() routine starts the PHY as part of normal initialization. The mlxbf_gige_open() routine must stop the PHY during its error paths. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-23Merge tag 'hardening-v6.9-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more hardening updates from Kees Cook: - CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck) - Fix needless UTF-8 character in arch/Kconfig (Liu Song) - Improve __counted_by warning message in LKDTM (Nathan Chancellor) - Refactor DEFINE_FLEX() for default use of __counted_by - Disable signed integer overflow sanitizer on GCC < 8 * tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lkdtm/bugs: Improve warning message for compilers without counted_by support overflow: Change DEFINE_FLEX to take __counted_by member Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST" arch/Kconfig: eliminate needless UTF-8 character in Kconfig help ubsan: Disable signed integer overflow sanitizer on GCC < 8
2024-03-22overflow: Change DEFINE_FLEX to take __counted_by memberKees Cook
The norm should be flexible array structures with __counted_by annotations, so DEFINE_FLEX() is updated to expect that. Rename the non-annotated version to DEFINE_RAW_FLEX(), and update the few existing users. Additionally add selftests for the macros. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-21Merge tag 'net-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from CAN, netfilter, wireguard and IPsec. I'd like to highlight [ lowlight? - Linus ] Florian W stepping down as a netfilter maintainer due to constant stream of bug reports. Not sure what we can do but IIUC this is not the first such case. Current release - regressions: - rxrpc: fix use of page_frag_alloc_align(), it changed semantics and we added a new caller in a different subtree - xfrm: allow UDP encapsulation only in offload modes Current release - new code bugs: - tcp: fix refcnt handling in __inet_hash_connect() - Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets", conflicted with some expectations in BPF uAPI Previous releases - regressions: - ipv4: raw: fix sending packets from raw sockets via IPsec tunnels - devlink: fix devlink's parallel command processing - veth: do not manipulate GRO when using XDP - esp: fix bad handling of pages from page_pool Previous releases - always broken: - report RCU QS for busy network kthreads (with Paul McK's blessing) - tcp/rds: fix use-after-free on netns with kernel TCP reqsk - virt: vmxnet3: fix missing reserved tailroom with XDP Misc: - couple of build fixes for Documentation" * tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits) selftests: forwarding: Fix ping failure due to short timeout MAINTAINERS: step down as netfilter maintainer netfilter: nf_tables: Fix a memory leak in nf_tables_updchain net: dsa: mt7530: fix handling of all link-local frames net: dsa: mt7530: fix link-local frames that ingress vlan filtering ports bpf: report RCU QS in cpumap kthread net: report RCU QS on threaded NAPI repolling rcu: add a helper to report consolidated flavor QS ionic: update documentation for XDP support lib/bitmap: Fix bitmap_scatter() and bitmap_gather() kernel doc netfilter: nf_tables: do not compare internal table flags on updates netfilter: nft_set_pipapo: release elements in clone only from destroy path octeontx2-af: Use separate handlers for interrupts octeontx2-pf: Send UP messages to VF only when VF is up. octeontx2-pf: Use default max_active works instead of one octeontx2-pf: Wait till detach_resources msg is complete octeontx2: Detect the mbox up or down message via register devlink: fix port new reply cmd type tcp: Clear req->syncookie in reqsk_alloc(). net/bnx2x: Prevent access to a freed page in page_pool ...
2024-03-21Merge tag 'char-misc-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc and other driver subsystem updates from Greg KH: "Here is the big set of char/misc and a number of other driver subsystem updates for 6.9-rc1. Included in here are: - IIO driver updates, loads of new ones and evolution of existing ones - coresight driver updates - const cleanups for many driver subsystems - speakup driver additions - platform remove callback void cleanups - mei driver updates - mhi driver updates - cdx driver updates for MSI interrupt handling - nvmem driver updates - other smaller driver updates and cleanups, full details in the shortlog All of these have been in linux-next for a long time with no reported issue, other than a build warning for the speakup driver" The build warning hits clang and is a gcc (and C23) extension, and is fixed up in the merge. Link: https://lore.kernel.org/all/20240321134831.GA2762840@dev-arch.thelio-3990X/ * tag 'char-misc-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (279 commits) binder: remove redundant variable page_addr uio_dmem_genirq: UIO_MEM_DMA_COHERENT conversion uio_pruss: UIO_MEM_DMA_COHERENT conversion cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT uio: introduce UIO_MEM_DMA_COHERENT type cdx: add MSI support for CDX bus pps: use cflags-y instead of EXTRA_CFLAGS speakup: Add /dev/synthu device speakup: Fix 8bit characters from direct synth parport: sunbpp: Convert to platform remove callback returning void parport: amiga: Convert to platform remove callback returning void char: xillybus: Convert to platform remove callback returning void vmw_balloon: change maintainership MAINTAINERS: change the maintainer for hpilo driver char: xilinx_hwicap: Fix NULL vs IS_ERR() bug hpet: remove hpets::hp_clocksource platform: goldfish: move the separate 'default' propery for CONFIG_GOLDFISH char: xilinx_hwicap: drop casting to void in dev_set_drvdata greybus: move is_gb_* functions out of greybus.h greybus: Remove usage of the deprecated ida_simple_xx() API ...
2024-03-21Merge tag 'tty-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver updates from Greg KH: "Here is the big set of TTY/Serial driver updates and cleanups for 6.9-rc1. Included in here are: - more tty cleanups from Jiri - loads of 8250 driver cleanups from Andy - max310x driver updates - samsung serial driver updates - uart_prepare_sysrq_char() updates for many drivers - platform driver remove callback void cleanups - stm32 driver updates - other small tty/serial driver updates All of these have been in linux-next for a long time with no reported issues" * tag 'tty-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits) dt-bindings: serial: stm32: add power-domains property serial: 8250_dw: Replace ACPI device check by a quirk serial: Lock console when calling into driver before registration serial: 8250_uniphier: Switch to use uart_read_port_properties() serial: 8250_tegra: Switch to use uart_read_port_properties() serial: 8250_pxa: Switch to use uart_read_port_properties() serial: 8250_omap: Switch to use uart_read_port_properties() serial: 8250_of: Switch to use uart_read_port_properties() serial: 8250_lpc18xx: Switch to use uart_read_port_properties() serial: 8250_ingenic: Switch to use uart_read_port_properties() serial: 8250_dw: Switch to use uart_read_port_properties() serial: 8250_bcm7271: Switch to use uart_read_port_properties() serial: 8250_bcm2835aux: Switch to use uart_read_port_properties() serial: 8250_aspeed_vuart: Switch to use uart_read_port_properties() serial: port: Introduce a common helper to read properties serial: core: Add UPIO_UNKNOWN constant for unknown port type serial: core: Move struct uart_port::quirks closer to possible values serial: sh-sci: Call sci_serial_{in,out}() directly serial: core: only stop transmit when HW fifo is empty serial: pch: Use uart_prepare_sysrq_char(). ...
2024-03-21Merge tag 'linux-can-fixes-for-6.9-20240319' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2024-03-20 this is a pull request of 1 patch for net/master. Martin Jocić contributes a fix for the kvaser_pciefd driver, so that up to 8 channels on the Xilinx-based adapters can be used. This issue has been introduced in net-next for v6.9. linux-can-fixes-for-6.9-20240319 * tag 'linux-can-fixes-for-6.9-20240319' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: kvaser_pciefd: Add additional Xilinx interrupts ==================== Link: https://lore.kernel.org/r/20240320112144.582741-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-21net: dsa: mt7530: fix handling of all link-local framesArınç ÜNAL
Currently, the MT753X switches treat frames with :01-0D and :0F MAC DAs as regular multicast frames, therefore flooding them to user ports. On page 205, section "8.6.3 Frame filtering" of the active standard, IEEE Std 802.1Q™-2022, it is stated that frames with 01:80:C2:00:00:00-0F as MAC DA must only be propagated to C-VLAN and MAC Bridge components. That means VLAN-aware and VLAN-unaware bridges. On the switch designs with CPU ports, these frames are supposed to be processed by the CPU (software). So we make the switch only forward them to the CPU port. And if received from a CPU port, forward to a single port. The software is responsible of making the switch conform to the latter by setting a single port as destination port on the special tag. This switch intellectual property cannot conform to this part of the standard fully. Whilst the REV_UN frame tag covers the remaining :04-0D and :0F MAC DAs, it also includes :22-FF which the scope of propagation is not supposed to be restricted for these MAC DAs. Set frames with :01-03 MAC DAs to be trapped to the CPU port(s). Add a comment for the remaining MAC DAs. Note that the ingress port must have a PVID assigned to it for the switch to forward untagged frames. A PVID is set by default on VLAN-aware and VLAN-unaware ports. However, when the network interface that pertains to the ingress port is attached to a vlan_filtering enabled bridge, the user can remove the PVID assignment from it which would prevent the link-local frames from being trapped to the CPU port. I am yet to see a way to forward link-local frames while preventing other untagged frames from being forwarded too. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-21net: dsa: mt7530: fix link-local frames that ingress vlan filtering portsArınç ÜNAL
Whether VLAN-aware or not, on every VID VLAN table entry that has the CPU port as a member of it, frames are set to egress the CPU port with the VLAN tag stacked. This is so that VLAN tags can be appended after hardware special tag (called DSA tag in the context of Linux drivers). For user ports on a VLAN-unaware bridge, frame ingressing the user port egresses CPU port with only the special tag. For user ports on a VLAN-aware bridge, frame ingressing the user port egresses CPU port with the special tag and the VLAN tag. This causes issues with link-local frames, specifically BPDUs, because the software expects to receive them VLAN-untagged. There are two options to make link-local frames egress untagged. Setting CONSISTENT or UNTAGGED on the EG_TAG bits on the relevant register. CONSISTENT means frames egress exactly as they ingress. That means egressing with the VLAN tag they had at ingress or egressing untagged if they ingressed untagged. Although link-local frames are not supposed to be transmitted VLAN-tagged, if they are done so, when egressing through a CPU port, the special tag field will be broken. BPDU egresses CPU port with VLAN tag egressing stacked, received on software: 00:01:25.104821 AF Unknown (382365846), length 106: | STAG | | VLAN | 0x0000: 0000 6c27 614d 4143 0001 0000 8100 0001 ..l'aMAC........ 0x0010: 0026 4242 0300 0000 0000 0000 6c27 614d .&BB........l'aM 0x0020: 4143 0000 0000 0000 6c27 614d 4143 0000 AC......l'aMAC.. 0x0030: 0000 1400 0200 0f00 0000 0000 0000 0000 ................ BPDU egresses CPU port with VLAN tag egressing untagged, received on software: 00:23:56.628708 AF Unknown (25215488), length 64: | STAG | 0x0000: 0000 6c27 614d 4143 0001 0000 0026 4242 ..l'aMAC.....&BB 0x0010: 0300 0000 0000 0000 6c27 614d 4143 0000 ........l'aMAC.. 0x0020: 0000 0000 6c27 614d 4143 0000 0000 1400 ....l'aMAC...... 0x0030: 0200 0f00 0000 0000 0000 0000 ............ BPDU egresses CPU port with VLAN tag egressing tagged, received on software: 00:01:34.311963 AF Unknown (25215488), length 64: | Mess | 0x0000: 0000 6c27 614d 4143 0001 0001 0026 4242 ..l'aMAC.....&BB 0x0010: 0300 0000 0000 0000 6c27 614d 4143 0000 ........l'aMAC.. 0x0020: 0000 0000 6c27 614d 4143 0000 0000 1400 ....l'aMAC...... 0x0030: 0200 0f00 0000 0000 0000 0000 ............ To prevent confusing the software, force the frame to egress UNTAGGED instead of CONSISTENT. This way, frames can't possibly be received TAGGED by software which would have the special tag field broken. VLAN Tag Egress Procedure For all frames, one of these options set the earliest in this order will apply to the frame: - EG_TAG in certain registers for certain frames. This will apply to frame with matching MAC DA or EtherType. - EG_TAG in the address table. This will apply to frame at its incoming port. - EG_TAG in the PVC register. This will apply to frame at its incoming port. - EG_CON and [EG_TAG per port] in the VLAN table. This will apply to frame at its outgoing port. - EG_TAG in the PCR register. This will apply to frame at its outgoing port. EG_TAG in certain registers for certain frames: PPPoE Discovery_ARP/RARP: PPP_EG_TAG and ARP_EG_TAG in the APC register. IGMP_MLD: IGMP_EG_TAG and MLD_EG_TAG in the IMC register. BPDU and PAE: BPDU_EG_TAG and PAE_EG_TAG in the BPC register. REV_01 and REV_02: R01_EG_TAG and R02_EG_TAG in the RGAC1 register. REV_03 and REV_0E: R03_EG_TAG and R0E_EG_TAG in the RGAC2 register. REV_10 and REV_20: R10_EG_TAG and R20_EG_TAG in the RGAC3 register. REV_21 and REV_UN: R21_EG_TAG and RUN_EG_TAG in the RGAC4 register. With this change, it can be observed that a bridge interface with stp_state and vlan_filtering enabled will properly block ports now. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-20octeontx2-af: Use separate handlers for interruptsSubbaraya Sundeep
For PF to AF interrupt vector and VF to AF vector same interrupt handler is registered which is causing race condition. When two interrupts are raised to two CPUs at same time then two cores serve same event corrupting the data. Fixes: 7304ac4567bc ("octeontx2-af: Add mailbox IRQ and msg handlers") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-20octeontx2-pf: Send UP messages to VF only when VF is up.Subbaraya Sundeep
When PF sending link status messages to VF, it is possible that by the time link_event_task work function is executed VF might have brought down. Hence before sending VF link status message check whether VF is up to receive it. Fixes: ad513ed938c9 ("octeontx2-vf: Link event notification support") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-20octeontx2-pf: Use default max_active works instead of oneSubbaraya Sundeep
Only one execution context for the workqueue used for PF and VFs mailbox communication is incorrect since multiple works are queued simultaneously by all the VFs and PF link UP messages. Hence use default number of execution contexts by passing zero as max_active to alloc_workqueue function. With this fix in place, modify UP messages also to wait until completion. Fixes: d424b6c02415 ("octeontx2-pf: Enable SRIOV and added VF mbox handling") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-20octeontx2-pf: Wait till detach_resources msg is completeSubbaraya Sundeep
During VF driver remove, a message is sent to detach VF resources to PF but VF is not waiting until message is complete. Also mailbox interrupts need to be turned off after the detach resource message is complete. This patch fixes that problem. Fixes: 05fcc9e08955 ("octeontx2-pf: Attach NIX and NPA block LFs") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-20octeontx2: Detect the mbox up or down message via registerSubbaraya Sundeep
A single line of interrupt is used to receive up notifications and down reply messages from AF to PF (similarly from PF to its VF). PF acts as bridge and forwards VF messages to AF and sends respsones back from AF to VF. When an async event like link event is received by up message when PF is in middle of forwarding VF message then mailbox errors occur because PF state machine is corrupted. Since VF is a separate driver or VF driver can be in a VM it is not possible to serialize from the start of communication at VF. Hence to differentiate between type of messages at PF this patch makes sender to set mbox data register with distinct values for up and down messages. Sender also checks whether previous interrupt is received before triggering current interrupt by waiting for mailbox data register to become zero. Fixes: 5a6d7c9daef3 ("octeontx2-pf: Mailbox communication with AF") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-19net/bnx2x: Prevent access to a freed page in page_poolThinh Tran
Fix race condition leading to system crash during EEH error handling During EEH error recovery, the bnx2x driver's transmit timeout logic could cause a race condition when handling reset tasks. The bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(), which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload() SGEs are freed using bnx2x_free_rx_sge_range(). However, this could overlap with the EEH driver's attempt to reset the device using bnx2x_io_slot_reset(), which also tries to free SGEs. This race condition can result in system crashes due to accessing freed memory locations in bnx2x_free_rx_sge() 799 static inline void bnx2x_free_rx_sge(struct bnx2x *bp, 800 struct bnx2x_fastpath *fp, u16 index) 801 { 802 struct sw_rx_page *sw_buf = &fp->rx_page_ring[index]; 803 struct page *page = sw_buf->page; .... where sw_buf was set to NULL after the call to dma_unmap_page() by the preceding thread. EEH: Beginning: 'slot_reset' PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset() bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing... bnx2x 0011:01:00.0: enabling device (0140 -> 0142) bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0080000025065fc Oops: Kernel access of bad area, sig: 11 [#1] ..... Call Trace: [c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable) [c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0 [c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550 [c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60 [c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170 [c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0 [c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64 To solve this issue, we need to verify page pool allocations before freeing. Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element") Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240315205535.1321-1-thinhtr@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-19Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - Per vq sizes in vdpa - Info query for block devices support in vdpa - DMA sync callbacks in vduse - Fixes, cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (35 commits) virtio_net: rename free_old_xmit_skbs to free_old_xmit virtio_net: unify the code for recycling the xmit ptr virtio-net: add cond_resched() to the command waiting loop virtio-net: convert rx mode setting to use workqueue virtio: packed: fix unmap leak for indirect desc table vDPA: report virtio-blk flush info to user space vDPA: report virtio-block read-only info to user space vDPA: report virtio-block write zeroes configuration to user space vDPA: report virtio-block discarding configuration to user space vDPA: report virtio-block topology info to user space vDPA: report virtio-block MQ info to user space vDPA: report virtio-block max segments in a request to user space vDPA: report virtio-block block-size to user space vDPA: report virtio-block max segment size to user space vDPA: report virtio-block capacity to user space virtio: make virtio_bus const vdpa: make vdpa_bus const vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_min vDPA/ifcvf: get_max_vq_size to return max size virtio_vdpa: create vqs with the actual size ...
2024-03-19net: phy: fix phy_read_poll_timeout argument type in genphy_loopbackNikita Kiryushin
read_poll_timeout inside phy_read_poll_timeout can set val negative in some cases (for example, __mdiobus_read inside phy_read can return -EOPNOTSUPP). Supposedly, commit 4ec732951702 ("net: phylib: fix phy_read*_poll_timeout()") should fix problems with wrong-signed vals, but I do not see how as val is sent to phy_read as is and __val = phy_read (not val) is checked for sign. Change val type for signed to allow better error handling as done in other phy_read_poll_timeout callers. This will not fix any error handling by itself, but allows, for example, to modify cond with appropriate sign check or check resulting val separately. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration") Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240315175052.8049-1-kiryushin@ancud.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19can: kvaser_pciefd: Add additional Xilinx interruptsMartin Jocić
Since Xilinx-based adapters now support up to eight CAN channels, the TX interrupt mask array must have eight elements. Signed-off-by: Martin Jocic <martin.jocic@kvaser.com> Link: https://lore.kernel.org/all/2ab3c0585c3baba272ede0487182a423a420134b.camel@kvaser.com Fixes: 9b221ba452aa ("can: kvaser_pciefd: Add support for Kvaser PCIe 8xCAN") [mkl: replace Link by Fixes tag] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-03-19wireguard: netlink: access device through ctx instead of peerJason A. Donenfeld
The previous commit fixed a bug that led to a NULL peer->device being dereferenced. It's actually easier and faster performance-wise to instead get the device from ctx->wg. This semantically makes more sense too, since ctx->wg->peer_allowedips.seq is compared with ctx->allowedips_seq, basing them both in ctx. This also acts as a defence in depth provision against freed peers. Cc: stable@vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19wireguard: netlink: check for dangling peer via is_dead instead of empty listJason A. Donenfeld
If all peers are removed via wg_peer_remove_all(), rather than setting peer_list to empty, the peer is added to a temporary list with a head on the stack of wg_peer_remove_all(). If a netlink dump is resumed and the cursored peer is one that has been removed via wg_peer_remove_all(), it will iterate from that peer and then attempt to dump freed peers. Fix this by instead checking peer->is_dead, which was explictly created for this purpose. Also move up the device_update_lock lockdep assertion, since reading is_dead relies on that. It can be reproduced by a small script like: echo "Setting config..." ip link add dev wg0 type wireguard wg setconf wg0 /big-config ( while true; do echo "Showing config..." wg showconf wg0 > /dev/null done ) & sleep 4 wg setconf wg0 <(printf "[Peer]\nPublicKey=$(wg genkey)\n") Resulting in: BUG: KASAN: slab-use-after-free in __lock_acquire+0x182a/0x1b20 Read of size 8 at addr ffff88811956ec70 by task wg/59 CPU: 2 PID: 59 Comm: wg Not tainted 6.8.0-rc2-debug+ #5 Call Trace: <TASK> dump_stack_lvl+0x47/0x70 print_address_description.constprop.0+0x2c/0x380 print_report+0xab/0x250 kasan_report+0xba/0xf0 __lock_acquire+0x182a/0x1b20 lock_acquire+0x191/0x4b0 down_read+0x80/0x440 get_peer+0x140/0xcb0 wg_get_device_dump+0x471/0x1130 Cc: stable@vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Reported-by: Lillian Berry <lillian@star-ark.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19wireguard: device: remove generic .ndo_get_stats64Breno Leitao
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19wireguard: device: leverage core stats allocatorBreno Leitao
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in this driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19wireguard: receive: annotate data-race around receiving_counter.counterNikita Zhandarovich
Syzkaller with KCSAN identified a data-race issue when accessing keypair->receiving_counter.counter. Use READ_ONCE() and WRITE_ONCE() annotations to mark the data race as intentional. BUG: KCSAN: data-race in wg_packet_decrypt_worker / wg_packet_rx_poll write to 0xffff888107765888 of 8 bytes by interrupt on cpu 0: counter_validate drivers/net/wireguard/receive.c:321 [inline] wg_packet_rx_poll+0x3ac/0xf00 drivers/net/wireguard/receive.c:461 __napi_poll+0x60/0x3b0 net/core/dev.c:6536 napi_poll net/core/dev.c:6605 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6738 __do_softirq+0xc4/0x279 kernel/softirq.c:553 do_softirq+0x5e/0x90 kernel/softirq.c:454 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x6c5/0x700 drivers/net/wireguard/receive.c:499 process_one_work kernel/workqueue.c:2633 [inline] ... read to 0xffff888107765888 of 8 bytes by task 3196 on cpu 1: decrypt_packet drivers/net/wireguard/receive.c:252 [inline] wg_packet_decrypt_worker+0x220/0x700 drivers/net/wireguard/receive.c:501 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 ... Fixes: a9e90d9931f3 ("wireguard: noise: separate receive counter from send counter") Reported-by: syzbot+d1de830e4ecdaac83d89@syzkaller.appspotmail.com Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19virtio_net: rename free_old_xmit_skbs to free_old_xmitXuan Zhuo
Since free_old_xmit_skbs not only deals with skb, but also xdp frame and subsequent added xsk, so change the name of this function to free_old_xmit. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240229072044.77388-19-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19virtio_net: unify the code for recycling the xmit ptrXuan Zhuo
There are two completely similar and independent implementations. This is inconvenient for the subsequent addition of new types. So extract a function from this piece of code and call this function uniformly to recover old xmit ptr. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240229072044.77388-18-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19virtio-net: add cond_resched() to the command waiting loopJason Wang
Adding cond_resched() to the command waiting loop for a better co-operation with the scheduler. This allows to give CPU a breath to run other task(workqueue) instead of busy looping when preemption is not allowed on a device whose CVQ might be slow. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230720083839.481487-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
2024-03-19virtio-net: convert rx mode setting to use workqueueJason Wang
This patch convert rx mode setting to be done in a workqueue, this is a must for allow to sleep when waiting for the cvq command to response since current code is executed under addr spin lock. Note that we need to disable and flush the workqueue during freeze, this means the rx mode setting is lost after resuming. This is not the bug of this patch as we never try to restore rx mode setting during resume. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230720083839.481487-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>