summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2023-12-11octeontx2-af: Fix pause frame configurationHariprasad Kelam
The current implementation's default Pause Forward setting is causing unnecessary network traffic. This patch disables Pause Forward to address this issue. Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11octeontx2-af: Update RSS algorithm indexHariprasad Kelam
The RSS flow algorithm is not set up correctly for promiscuous or all multi MCAM entries. This has an impact on flow distribution. This patch fixes the issue by updating flow algorithm index in above mentioned MCAM entries. Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11octeontx2-pf: Fix promisc mcam entry actionHariprasad Kelam
Current implementation is such that, promisc mcam entry action is set as multicast even when there are no trusted VFs. multicast action causes the hardware to copy packet data, which reduces the performance. This patch fixes this issue by setting the promisc mcam entry action to unicast instead of multicast when there are no trusted VFs. The same change is made for the 'allmulti' mcam entry action. Fixes: ffd2f89ad05c ("octeontx2-pf: Enable promisc/allmulti match MCAM entries.") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11octeon_ep: explicitly test for firmware ready valueShinas Rasheed
The firmware ready value is 1, and get firmware ready status function should explicitly test for that value. The firmware ready value read will be 2 after driver load, and on unbind till firmware rewrites the firmware ready back to 0, the value seen by driver will be 2, which should be regarded as not ready. Fixes: 10c073e40469 ("octeon_ep: defer probe if firmware not ready") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-10octeontx2-af: fix a use-after-free in rvu_nix_register_reportersZhipeng Lu
The rvu_dl will be freed in rvu_nix_health_reporters_destroy(rvu_dl) after the create_workqueue fails, and after that free, the rvu_dl will be translate back through the following call chain: rvu_nix_health_reporters_destroy |-> rvu_nix_health_reporters_create |-> rvu_health_reporters_create |-> rvu_register_dl (label err_dl_health) Finally. in the err_dl_health label, rvu_dl being freed again in rvu_health_reporters_destroy(rvu) by rvu_nix_health_reporters_destroy. In the second calls of rvu_nix_health_reporters_destroy, however, it uses rvu_dl->rvu_nix_health_reporter, which is already freed at the end of rvu_nix_health_reporters_destroy in the first call. So this patch prevents the first destroy by instantly returning -ENONMEN when create_workqueue fails. In addition, since the failure of create_workqueue is the only entrence of label err, it has been integrated into the error-handling path of create_workqueue. Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-10net: fec: correct queue selectionRadu Bulie
The old implementation extracted VLAN TCI info from the payload before the VLAN tag has been pushed in the payload. Another problem was that the VLAN TCI was extracted even if the packet did not have VLAN protocol header. This resulted in invalid VLAN TCI and as a consequence a random queue was computed. This patch fixes the above issues and use the VLAN TCI from the skb if it is present or VLAN TCI from payload if present. If no VLAN header is present queue 0 is selected. Fixes: 52c4a1a85f4b ("net: fec: add ndo_select_queue to fix TX bandwidth fluctuations") Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-09atm: solos-pci: Fix potential deadlock on &tx_queue_lockChengfeng Ye
As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d38912 ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-09atm: solos-pci: Fix potential deadlock on &cli_queue_lockChengfeng Ye
As &card->cli_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #1> console_show() --> spin_lock(&card->cli_queue_lock) <interrupt> --> solos_bh() --> spin_lock(&card->cli_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on the card->cli_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 9c54004ea717 ("atm: Driver for Solos PCI ADSL2+ card.") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logicMichael Chan
When the chip is configured to timestamp all receive packets, the timestamp in the RX completion is only valid if the metadata present flag is not set for packets received on the wire. In addition, internal loopback packets will never have a valid timestamp and the timestamp field will always be zero. We must exclude any 0 value in the timestamp field because there is no way to determine if it is a loopback packet or not. Add a new function bnxt_rx_ts_valid() to check for all timestamp valid conditions. Fixes: 66ed81dcedc6 ("bnxt_en: Enable packet timestamping for all RX packets") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08bnxt_en: Fix wrong return value check in bnxt_close_nic()Kalesh AP
The wait_event_interruptible_timeout() function returns 0 if the timeout elapsed, -ERESTARTSYS if it was interrupted by a signal, and the remaining jiffies otherwise if the condition evaluated to true before the timeout elapsed. Driver should have checked for zero return value instead of a positive value. MChan: Print a warning for -ERESTARTSYS. The close operation will proceed anyway when wait_event_interruptible_timeout() returns for any reason. Since we do the close no matter what, we should not return this error code to the caller. Change bnxt_close_nic() to a void function and remove all error handling from some of the callers. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08bnxt_en: Fix skb recycling logic in bnxt_deliver_skb()Sreekanth Reddy
Receive SKBs can go through the VF-rep path or the normal path. skb_mark_for_recycle() is only called for the normal path. Fix it to do it for both paths to fix possible stalled page pool shutdown errors. Fixes: 86b05508f775 ("bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08bnxt_en: Clear resource reservation during resumeSomnath Kotur
We are issuing HWRM_FUNC_RESET cmd to reset the device including all reserved resources, but not clearing the reservations within the driver struct. As a result, when the driver re-initializes as part of resume, it believes that there is no need to do any resource reservation and goes ahead and tries to allocate rings which will eventually fail beyond a certain number pre-reserved by the firmware. Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08qca_spi: Fix reset behaviorStefan Wahren
In case of a reset triggered by the QCA7000 itself, the behavior of the qca_spi driver was not quite correct: - in case of a pending RX frame decoding the drop counter must be incremented and decoding state machine reseted - also the reset counter must always be incremented regardless of sync state Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-4-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08qca_debug: Fix ethtool -G iface tx behaviorStefan Wahren
After calling ethtool -g it was not possible to adjust the TX ring size again: # ethtool -g eth1 Ring parameters for eth1: Pre-set maximums: RX: 4 RX Mini: n/a RX Jumbo: n/a TX: 10 Current hardware settings: RX: 4 RX Mini: n/a RX Jumbo: n/a TX: 10 # ethtool -G eth1 tx 8 netlink error: Invalid argument The reason for this is that the readonly setting rx_pending get initialized and after that the range check in qcaspi_set_ringparam() fails regardless of the provided parameter. So fix this by accepting the exposed RX defaults. Instead of adding another magic number better use a new define here. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08qca_debug: Prevent crash on TX ring changesStefan Wahren
The qca_spi driver stop and restart the SPI kernel thread (via ndo_stop & ndo_open) in case of TX ring changes. This is a big issue because it allows userspace to prevent restart of the SPI kernel thread (via signals). A subsequent change of TX ring wrongly assume a valid spi_thread pointer which result in a crash. So prevent this by stopping the network traffic handling and temporary park the SPI thread. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08octeon_ep: initialise control mbox tasks before using APIsShinas Rasheed
Initialise various workqueue tasks and queue interrupt poll task before the first invocation of any control net APIs. Since octep_ctrl_net_get_info was called before the control net receive work task was initialised or even the interrupt poll task was queued, the function call wasn't returning actual firmware info queried from Octeon. Fixes: 8d6198a14e2b ("octeon_ep: support to fetch firmware info") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/20231206135228.2591659-1-srasheed@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08team: Fix use-after-free when an option instance allocation failsFlorent Revest
In __team_options_register, team_options are allocated and appended to the team's option_list. If one option instance allocation fails, the "inst_rollback" cleanup path frees the previously allocated options but doesn't remove them from the team's option_list. This leaves dangling pointers that can be dereferenced later by other parts of the team driver that iterate over options. This patch fixes the cleanup path to remove the dangling pointers from the list. As far as I can tell, this uaf doesn't have much security implications since it would be fairly hard to exploit (an attacker would need to make the allocation of that specific small object fail) but it's still nice to fix. Cc: stable@vger.kernel.org Fixes: 80f7c6683fe0 ("team: add support for per-port options") Signed-off-by: Florent Revest <revest@chromium.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231206123719.1963153-1-revest@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08Merge tag 'mlx5-fixes-2023-12-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-12-04 This series provides bug fixes to mlx5 driver. V1->V2: - Drop commit #9 ("net/mlx5e: Forbid devlink reload if IPSec rules are offloaded"), we are working on a better fix Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-07Merge tag 'net-6.7-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. Current release - regressions: - veth: fix packet segmentation in veth_convert_skb_to_xdp_buff Current release - new code bugs: - tcp: assorted fixes to the new Auth Option support Older releases - regressions: - tcp: fix mid stream window clamp - tls: fix incorrect splice handling - ipv4: ip_gre: handle skb_pull() failure in ipgre_xmit() - dsa: mv88e6xxx: restore USXGMII support for 6393X - arcnet: restore support for multiple Sohard Arcnet cards Older releases - always broken: - tcp: do not accept ACK of bytes we never sent - require admin privileges to receive packet traces via netlink - packet: move reference count in packet_sock to atomic_long_t - bpf: - fix incorrect branch offset comparison with cpu=v4 - fix prog_array_map_poke_run map poke update - netfilter: - three fixes for crashes on bad admin commands - xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref - nf_tables: fix 'exist' matching on bigendian arches - leds: netdev: fix RTNL handling to prevent potential deadlock - eth: tg3: prevent races in error/reset handling - eth: r8169: fix rtl8125b PAUSE storm when suspended - eth: r8152: improve reset and surprise removal handling - eth: hns: fix race between changing features and sending - eth: nfp: fix sleep in atomic for bonding offload" * tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning net/smc: fix missing byte order conversion in CLC handshake net: dsa: microchip: provide a list of valid protocols for xmit handler drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group psample: Require 'CAP_NET_ADMIN' when joining "packets" group bpf: sockmap, updating the sg structure should also update curr net: tls, update curr on splice as well nfp: flower: fix for take a mutex lock in soft irq context and rcu lock net: dsa: mv88e6xxx: Restore USXGMII support for 6393X tcp: do not accept ACK of bytes we never sent selftests/bpf: Add test for early update in prog_array_map_poke_run bpf: Fix prog_array_map_poke_run map poke update netfilter: xt_owner: Fix for unsafe access of sk->sk_socket netfilter: nf_tables: validate family when identifying table via handle netfilter: nf_tables: bail out on mismatching dynset and set expressions netfilter: nf_tables: fix 'exist' matching on bigendian arches netfilter: nft_set_pipapo: skip inactive elements during set walk netfilter: bpf: fix bad registration on nf_defrag leds: trigger: netdev: fix RTNL handling to prevent potential deadlock octeontx2-af: Update Tx link register range ...
2023-12-07Merge tag 'regmap-fix-v6.7-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "An incremental fix for the fix introduced during the merge window for caching of the selector for windowed register ranges. We were incorrectly leaking an error code in the case where the last selector accessed was for some reason not cached" * tag 'regmap-fix-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: fix bogus error on regcache_sync success
2023-12-07Merge tag 'devicetree-fixes-for-6.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix dt-extract-compatibles for builds with in tree build directory - Drop Xinlei Lee <xinlei.lee@mediatek.com> bouncing email - Fix the of_reconfig_get_state_change() return value documentation - Add missing #power-domain-cells property to QCom MPM - Fix warnings in i.MX LCDIF and adi,adv7533 * tag 'devicetree-fixes-for-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: display: adi,adv75xx: Document #sound-dai-cells dt-bindings: lcdif: Properly describe the i.MX23 interrupts dt-bindings: interrupt-controller: Allow #power-domain-cells of: dynamic: Fix of_reconfig_get_state_change() return value documentation dt-bindings: display: mediatek: dsi: remove Xinlei's mail dt: dt-extract-compatibles: Don't follow symlinks when walking tree
2023-12-07Merge tag 'platform-drivers-x86-v6.7-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - Fix i8042 filter resource handling, input, and suspend issues in asus-wmi - Skip zero instance WMI blocks to avoid issues with some laptops - Differentiate dev/production keys in mlxbf-bootctl - Correct surface serdev related return value to avoid leaking errno into userspace - Error checking fixes * tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/mellanox: Check devm_hwmon_device_register_with_groups() return value platform/mellanox: Add null pointer checks for devm_kasprintf() mlxbf-bootctl: correctly identify secure boot with development keys platform/x86: wmi: Skip blocks with zero instances platform/surface: aggregator: fix recv_buf() return value platform/x86: asus-wmi: disable USB0 hub on ROG Ally before suspend platform/x86: asus-wmi: Filter Volume key presses if also reported via atkbd platform/x86: asus-wmi: Change q500a_i8042_filter() into a generic i8042-filter platform/x86: asus-wmi: Move i8042 filter install to shared asus-wmi code
2023-12-07net: dsa: microchip: provide a list of valid protocols for xmit handlerSean Nyekjaer
Provide a list of valid protocols for which the driver will provide it's deferred xmit handler. When using DSA_TAG_PROTO_KSZ8795 protocol, it does not provide a "connect" method, therefor ksz_connect() is not allocating ksz_tagger_data. This avoids the following null pointer dereference: ksz_connect_tag_protocol from dsa_register_switch+0x9ac/0xee0 dsa_register_switch from ksz_switch_register+0x65c/0x828 ksz_switch_register from ksz_spi_probe+0x11c/0x168 ksz_spi_probe from spi_probe+0x84/0xa8 spi_probe from really_probe+0xc8/0x2d8 Fixes: ab32f56a4100 ("net: dsa: microchip: ptp: add packet transmission timestamping") Signed-off-by: Sean Nyekjaer <sean@geanix.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20231206071655.1626479-1-sean@geanix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07nfp: flower: fix for take a mutex lock in soft irq context and rcu lockHui Zhou
The neighbour event callback call the function nfp_tun_write_neigh, this function will take a mutex lock and it is in soft irq context, change the work queue to process the neighbour event. Move the nfp_tun_write_neigh function out of range rcu_read_lock/unlock() in function nfp_tunnel_request_route_v4 and nfp_tunnel_request_route_v6. Fixes: abc210952af7 ("nfp: flower: tunnel neigh support bond offload") CC: stable@vger.kernel.org # 6.2+ Signed-off-by: Hui Zhou <hui.zhou@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-06Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-05 (ice, i40e, iavf) This series contains updates to ice, i40e and iavf drivers. Michal fixes incorrect usage of VF MSIX value and index calculation for ice. Marcin restores disabling of Rx VLAN filtering which was inadvertently removed for ice. Ivan Vecera corrects improper messaging of MFS port for i40e. Jake fixes incorrect checking of coalesce values on iavf. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: validate tx_coalesce_usecs even if rx_coalesce_usecs is zero i40e: Fix unexpected MFS warning message ice: Restore fix disabling RX VLAN filtering ice: change vfs.num_msix_per to vf->num_msix ==================== Link: https://lore.kernel.org/r/20231205211918.2123019-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06net: dsa: mv88e6xxx: Restore USXGMII support for 6393XTobias Waldekranz
In 4a56212774ac, USXGMII support was added for 6393X, but this was lost in the PCS conversion (the blamed commit), most likely because these efforts where more or less done in parallel. Restore this feature by porting Michal's patch to fit the new implementation. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Michal Smulski <michal.smulski@ooma.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Fixes: e5b732a275f5 ("net: dsa: mv88e6xxx: convert 88e639x to phylink_pcs") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20231205221359.3926018-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06leds: trigger: netdev: fix RTNL handling to prevent potential deadlockHeiner Kallweit
When working on LED support for r8169 I got the following lockdep warning. Easiest way to prevent this scenario seems to be to take the RTNL lock before the trigger_data lock in set_device_name(). ====================================================== WARNING: possible circular locking dependency detected 6.7.0-rc2-next-20231124+ #2 Not tainted ------------------------------------------------------ bash/383 is trying to acquire lock: ffff888103aa1c68 (&trigger_data->lock){+.+.}-{3:3}, at: netdev_trig_notify+0xec/0x190 [ledtrig_netdev] but task is already holding lock: ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}-{3:3}: __mutex_lock+0x9b/0xb50 mutex_lock_nested+0x16/0x20 rtnl_lock+0x12/0x20 set_device_name+0xa9/0x120 [ledtrig_netdev] netdev_trig_activate+0x1a1/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 -> #0 (&trigger_data->lock){+.+.}-{3:3}: __lock_acquire+0x1459/0x25a0 lock_acquire+0xc8/0x2d0 __mutex_lock+0x9b/0xb50 mutex_lock_nested+0x16/0x20 netdev_trig_notify+0xec/0x190 [ledtrig_netdev] call_netdevice_register_net_notifiers+0x5a/0x100 register_netdevice_notifier+0x85/0x120 netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(&trigger_data->lock); lock(rtnl_mutex); lock(&trigger_data->lock); *** DEADLOCK *** 8 locks held by bash/383: #0: ffff888103ff33f0 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x6c/0xf0 #1: ffff888103aa1e88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x114/0x210 #2: ffff8881036f1890 (kn->active#82){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x11d/0x210 #3: ffff888108e2c358 (&led_cdev->led_access){+.+.}-{3:3}, at: led_trigger_write+0x30/0x140 #4: ffffffff8cdd9e10 (triggers_list_lock){++++}-{3:3}, at: led_trigger_write+0x75/0x140 #5: ffff888108e2c270 (&led_cdev->trigger_lock){++++}-{3:3}, at: led_trigger_write+0xe3/0x140 #6: ffffffff8cdde3d0 (pernet_ops_rwsem){++++}-{3:3}, at: register_netdevice_notifier+0x1c/0x120 #7: ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20 stack backtrace: CPU: 0 PID: 383 Comm: bash Not tainted 6.7.0-rc2-next-20231124+ #2 Hardware name: Default string Default string/Default string, BIOS ADLN.M6.SODIMM.ZB.CY.015 08/08/2023 Call Trace: <TASK> dump_stack_lvl+0x5c/0xd0 dump_stack+0x10/0x20 print_circular_bug+0x2dd/0x410 check_noncircular+0x131/0x150 __lock_acquire+0x1459/0x25a0 lock_acquire+0xc8/0x2d0 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] __mutex_lock+0x9b/0xb50 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] ? __this_cpu_preempt_check+0x13/0x20 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] ? __cancel_work_timer+0x11c/0x1b0 ? __mutex_lock+0x123/0xb50 mutex_lock_nested+0x16/0x20 ? mutex_lock_nested+0x16/0x20 netdev_trig_notify+0xec/0x190 [ledtrig_netdev] call_netdevice_register_net_notifiers+0x5a/0x100 register_netdevice_notifier+0x85/0x120 netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 ? preempt_count_add+0x49/0xc0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f269055d034 Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 35 c3 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 RSP: 002b:00007ffddb7ef748 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f269055d034 RDX: 0000000000000007 RSI: 000055bf5f4af3c0 RDI: 0000000000000001 RBP: 000055bf5f4af3c0 R08: 0000000000000073 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000007 R13: 00007f26906325c0 R14: 00007f269062ff20 R15: 0000000000000000 </TASK> Fixes: d5e01266e7f5 ("leds: trigger: netdev: add additional specific link speed mode") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/fb5c8294-2a10-4bf5-8f10-3d2b77d2757e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06octeontx2-af: Update Tx link register rangeRahul Bhansali
On new silicons the TX channels for transmit level has increased. This patch fixes the respective register offset range to configure the newly added channels. Fixes: b279bbb3314e ("octeontx2-af: NIX Tx scheduler queue config support") Signed-off-by: Rahul Bhansali <rbhansali@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06octeontx2-af: Add missing mcs flr handler callGeetha sowjanya
If mcs resources are attached to PF/VF. These resources need to be freed on FLR. This patch add missing mcs flr call on PF FLR. Fixes: bd69476e86fc ("octeontx2-af: cn10k: mcs: Install a default TCAM for normal traffic") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06octeontx2-af: Fix mcs stats register addressGeetha sowjanya
This patch adds the miss mcs stats register for mcs supported platforms. Fixes: 9312150af8da ("octeontx2-af: cn10k: mcs: Support for stats collection") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06octeontx2-af: Fix mcs sa cam entries sizeGeetha sowjanya
On latest silicon versions SA cam entries increased to 256. This patch fixes the datatype of sa_entries in mcs_hw_info struct to u16 to hold 256 entries. Fixes: 080bbd19c9dd ("octeontx2-af: cn10k: mcs: Add mailboxes for port related operations") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06octeontx2-af: Adjust Tx credits when MCS external bypass is disabledNithin Dabilpuram
When MCS external bypass is disabled, MCS returns additional 2 credits(32B) for every packet Tx'ed on LMAC. To account for these extra credits, NIX_AF_TX_LINKX_NORM_CREDIT.CC_MCS_CNT needs to be configured as otherwise NIX Tx credits would overflow and will never be returned to idle state credit count causing issues with credit control and MTU change. This patch fixes the same by configuring CC_MCS_CNT at probe time for MCS enabled SoC's Fixes: bd69476e86fc ("octeontx2-af: cn10k: mcs: Install a default TCAM for normal traffic") Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06net: hns: fix fake link up on xge portYonglong Liu
If a xge port just connect with an optical module and no fiber, it may have a fake link up because there may be interference on the hardware. This patch adds an anti-shake to avoid the problem. And the time of anti-shake is base on tests. Fixes: b917078c1c10 ("net: hns: Add ACPI support to check SFP present") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06net: hns: fix wrong head when modify the tx feature when sending packetsYonglong Liu
Upon changing the tx feature, the hns driver will modify the maybe_stop_tx() and fill_desc() functions, if the modify happens during packet sending, will cause the hardware and software pointers do not match, and the port can not work anymore. This patch deletes the maybe_stop_tx() and fill_desc() functions modification when setting tx feature, and use the skb_is_gro() to determine which functions to use in the tx path. Fixes: 38f616da1c28 ("net:hns: Add support of ethtool TSO set option for Hip06 in HNS") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06net: atlantic: Fix NULL dereference of skb pointer inDaniil Maximov
If is_ptp_ring == true in the loop of __aq_ring_xdp_clean function, then a timestamp is stored from a packet in a field of skb object, which is not allocated at the moment of the call (skb == NULL). Generalize aq_ptp_extract_ts and other affected functions so they don't work with struct sk_buff*, but with struct skb_shared_hwtstamps*. Found by Linux Verification Center (linuxtesting.org) with SVACE Fixes: 26efaef759a1 ("net: atlantic: Implement xdp data plane") Signed-off-by: Daniil Maximov <daniil31415it@gmail.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Link: https://lore.kernel.org/r/20231204085810.1681386-1-daniil31415it@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06r8152: add vendor/device ID pair for ASUS USB-C2500Kelly Kane
The ASUS USB-C2500 is an RTL8156 based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Kelly Kane <kelly@hawknetworks.com> Link: https://lore.kernel.org/r/20231203011712.6314-1-kelly@hawknetworks.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05ionic: Fix dim work handling in split interrupt modeBrett Creeley
Currently ionic_dim_work() is incorrect when in split interrupt mode. This is because the interrupt rate is only being changed for the Rx side even for dim running on Tx. Fix this by using the qcq from the container_of macro. Also, introduce some local variables for a bit of cleanup. Fixes: a6ff85e0a2d9 ("ionic: remove intr coalesce update from napi") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231204192234.21017-3-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05ionic: fix snprintf format length warningShannon Nelson
Our friendly kernel test robot has reminded us that with a new check we have a warning about a potential string truncation. In this case it really doesn't hurt anything, but it is worth addressing especially since there really is no reason to reserve so many bytes for our queue names. It seems that cutting the queue name buffer length in half stops the complaint. Fixes: c06107cabea3 ("ionic: more ionic name tweaks") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311300201.lO8v7mKU-lkp@intel.com/ Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231204192234.21017-2-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05net: bnxt: fix a potential use-after-free in bnxt_init_tcDinghao Liu
When flow_indr_dev_register() fails, bnxt_init_tc will free bp->tc_info through kfree(). However, the caller function bnxt_init_one() will ignore this failure and call bnxt_shutdown_tc() on failure of bnxt_dl_register(), where a use-after-free happens. Fix this issue by setting bp->tc_info to NULL after kfree(). Fixes: 627c89d00fb9 ("bnxt_en: flow_offload: offload tunnel decap rules via indirect callbacks") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://lore.kernel.org/r/20231204024004.8245-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05net: veth: fix packet segmentation in veth_convert_skb_to_xdp_buffLorenzo Bianconi
Based on the previous allocated packet, page_offset can be not null in veth_convert_skb_to_xdp_buff routine. Take into account page fragment offset during the skb paged area copy in veth_convert_skb_to_xdp_buff(). Fixes: 2d0de67da51a ("net: veth: use newly added page pool API for veth with xdp") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/eddfe549e7e626870071930964ac3c38a1dc8068.1701702000.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05iavf: validate tx_coalesce_usecs even if rx_coalesce_usecs is zeroJacob Keller
In __iavf_set_coalesce, the driver checks both ec->rx_coalesce_usecs and ec->tx_coalesce_usecs for validity. It does this via a chain if if/else-if blocks. If every single branch of the series of if statements exited, this would be fine. However, the rx_coalesce_usecs is checked against zero to print an informative message if use_adaptive_rx_coalesce is enabled. If this check is true, it short circuits the entire chain of statements, preventing validation of the tx_coalesce_usecs field. Indeed, since commit e792779e6b63 ("iavf: Prevent changing static ITR values if adaptive moderation is on") the iavf driver actually rejects any change to the tx_coalesce_usecs or rx_coalesce_usecs when use_adaptive_tx_coalesce or use_adaptive_rx_coalesce is enabled, making this checking a bit redundant. Fix this error by removing the unnecessary and redundant checks for use_adaptive_rx_coalesce and use_adaptive_tx_coalesce. Since zero is a valid value, and since the tx_coalesce_usecs and rx_coalesce_usecs fields are already unsigned, remove the minimum value check. This allows assigning an ITR value ranging from 0-8160 as described by the printed message. Fixes: 65e87c0398f5 ("i40evf: support queue-specific settings for interrupt moderation") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-05i40e: Fix unexpected MFS warning messageIvan Vecera
Commit 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") added a warning message that reports unexpected size of port's MFS (max frame size) value. This message use for the port number local variable 'i' that is wrong. In i40e_probe() this 'i' variable is used only to iterate VSIs to find FDIR VSI: <code> ... /* if FDIR VSI was set up, start it now */ for (i = 0; i < pf->num_alloc_vsi; i++) { if (pf->vsi[i] && pf->vsi[i]->type == I40E_VSI_FDIR) { i40e_vsi_open(pf->vsi[i]); break; } } ... </code> So the warning message use for the port number index of FDIR VSI if this exists or pf->num_alloc_vsi if not. Fix the message by using 'pf->hw.port' for the port number. Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-05ice: Restore fix disabling RX VLAN filteringMarcin Szycik
Fix setting dis_rx_filtering depending on whether port vlan is being turned on or off. This was originally fixed in commit c793f8ea15e3 ("ice: Fix disabling Rx VLAN filtering with port VLAN enabled"), but while refactoring ice_vf_vsi_init_vlan_ops(), the fix has been lost. Restore the fix along with the original comment from that change. Also delete duplicate lines in ice_port_vlan_on(). Fixes: 2946204b3fa8 ("ice: implement bridge port vlan") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-05ice: change vfs.num_msix_per to vf->num_msixMichal Swiatkowski
vfs::num_msix_per should be only used as default value for vf->num_msix. For other use cases vf->num_msix should be used, as VF can have different MSI-X amount values. Fix incorrect register index calculation. vfs::num_msix_per and pf->sriov_base_vector shouldn't be used after implementation of changing MSI-X amount on VFs. Instead vf->first_vector_idx should be used, as it is storing value for first irq index. Fixes: fe1c5ca2fe76 ("ice: implement num_msix field per VF") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-05octeontx2-af: fix a use-after-free in rvu_npa_register_reportersZhipeng Lu
The rvu_dl will be freed in rvu_npa_health_reporters_destroy(rvu_dl) after the create_workqueue fails, and after that free, the rvu_dl will be translate back through rvu_npa_health_reporters_create, rvu_health_reporters_create, and rvu_register_dl. Finally it goes to the err_dl_health label, being freed again in rvu_health_reporters_destroy(rvu) by rvu_npa_health_reporters_destroy. In the second calls of rvu_npa_health_reporters_destroy, however, it uses rvu_dl->rvu_npa_health_reporter, which is already freed at the end of rvu_npa_health_reporters_destroy in the first call. So this patch prevents the first destroy by instantly returning -ENONMEN when create_workqueue fails. In addition, since the failure of create_workqueue is the only entrence of label err, it has been integrated into the error-handling path of create_workqueue. Fixes: f1168d1e207c ("octeontx2-af: Add devlink health reporters for NPA") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Geethasowjanya Akula <gakula@marvell.com> Link: https://lore.kernel.org/r/20231202095902.3264863-1-alexious@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-04net/mlx5: Fix a NULL vs IS_ERR() checkDan Carpenter
The mlx5_esw_offloads_devlink_port() function returns error pointers, not NULL. Fixes: 7bef147a6ab6 ("net/mlx5: Don't skip vport check") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-04net/mlx5e: Check netdev pointer before checking its net nsGavin Li
Previously, when comparing the net namespaces, the case where the netdev doesn't exist wasn't taken into account, and therefore can cause a crash. In such a case, the comparing function should return false, as there is no netdev->net to compare the devlink->net to. Furthermore, this will result in an attempt to enter switchdev mode without a netdev to fail, and which is the desired result as there is no meaning in switchdev mode without a net device. Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-04net/mlx5: Nack sync reset request when HotPlug is enabledMoshe Shemesh
Current sync reset flow is not supported when PCIe bridge connected directly to mlx5 device has HotPlug interrupt enabled and can be triggered on link state change event. Return nack on reset request in such case. Fixes: 92501fa6e421 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-04net/mlx5e: TC, Don't offload post action rule if not supportedChris Mi
If post action is not supported, eg. ignore_flow_level is not supported, don't offload post action rule. Otherwise, will hit panic [1]. Fix it by checking if post action table is valid or not. [1] [445537.863880] BUG: unable to handle page fault for address: ffffffffffffffb1 [445537.864617] #PF: supervisor read access in kernel mode [445537.865244] #PF: error_code(0x0000) - not-present page [445537.865860] PGD 70683a067 P4D 70683a067 PUD 70683c067 PMD 0 [445537.866497] Oops: 0000 [#1] PREEMPT SMP NOPTI [445537.867077] CPU: 19 PID: 248742 Comm: tc Kdump: loaded Tainted: G O 6.5.0+ #1 [445537.867888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [445537.868834] RIP: 0010:mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core] [445537.869635] Code: c0 0d 00 00 e8 20 96 c6 d3 48 85 c0 0f 84 e5 00 00 00 c7 83 b0 01 00 00 00 00 00 00 49 89 c5 31 c0 31 d2 66 89 83 b4 01 00 00 <49> 8b 44 24 10 83 23 df 83 8b d8 01 00 00 04 48 89 83 c0 01 00 00 [445537.871318] RSP: 0018:ffffb98741cef428 EFLAGS: 00010246 [445537.871962] RAX: 0000000000000000 RBX: ffff8df341167000 RCX: 0000000000000001 [445537.872704] RDX: 0000000000000000 RSI: ffffffff954844e1 RDI: ffffffff9546e9cb [445537.873430] RBP: ffffb98741cef448 R08: 0000000000000020 R09: 0000000000000246 [445537.874160] R10: 0000000000000000 R11: ffffffff943f73ff R12: ffffffffffffffa1 [445537.874893] R13: ffff8df36d336c20 R14: ffffffffffffffa1 R15: ffff8df341167000 [445537.875628] FS: 00007fcd6564f800(0000) GS:ffff8dfa9ea00000(0000) knlGS:0000000000000000 [445537.876425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [445537.877090] CR2: ffffffffffffffb1 CR3: 00000003b5884001 CR4: 0000000000770ee0 [445537.877832] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [445537.878564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [445537.879300] PKRU: 55555554 [445537.879797] Call Trace: [445537.880263] <TASK> [445537.880713] ? show_regs+0x6e/0x80 [445537.881232] ? __die+0x29/0x70 [445537.881731] ? page_fault_oops+0x85/0x160 [445537.882276] ? search_exception_tables+0x65/0x70 [445537.882852] ? kernelmode_fixup_or_oops+0xa2/0x120 [445537.883432] ? __bad_area_nosemaphore+0x18b/0x250 [445537.884019] ? bad_area_nosemaphore+0x16/0x20 [445537.884566] ? do_kern_addr_fault+0x8b/0xa0 [445537.885105] ? exc_page_fault+0xf5/0x1c0 [445537.885623] ? asm_exc_page_fault+0x2b/0x30 [445537.886149] ? __kmem_cache_alloc_node+0x1df/0x2a0 [445537.886717] ? mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core] [445537.887431] ? mlx5e_tc_post_act_add+0x30/0x130 [mlx5_core] [445537.888172] alloc_flow_post_acts+0xfb/0x1c0 [mlx5_core] [445537.888849] parse_tc_actions+0x582/0x5c0 [mlx5_core] [445537.889505] parse_tc_fdb_actions+0xd7/0x1f0 [mlx5_core] [445537.890175] __mlx5e_add_fdb_flow+0x1ab/0x2b0 [mlx5_core] [445537.890843] mlx5e_add_fdb_flow+0x56/0x120 [mlx5_core] [445537.891491] ? debug_smp_processor_id+0x1b/0x30 [445537.892037] mlx5e_tc_add_flow+0x79/0x90 [mlx5_core] [445537.892676] mlx5e_configure_flower+0x305/0x450 [mlx5_core] [445537.893341] mlx5e_rep_setup_tc_cls_flower+0x3d/0x80 [mlx5_core] [445537.894037] mlx5e_rep_setup_tc_cb+0x5c/0xa0 [mlx5_core] [445537.894693] tc_setup_cb_add+0xdc/0x220 [445537.895177] fl_hw_replace_filter+0x15f/0x220 [cls_flower] [445537.895767] fl_change+0xe87/0x1190 [cls_flower] [445537.896302] tc_new_tfilter+0x484/0xa50 Fixes: f0da4daa3413 ("net/mlx5e: Refactor ct to use post action infrastructure") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Automatic Verification <verifier@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shachar Kagan <skagan@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
2023-12-04net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_workMoshe Shemesh
Due to the cited patch, devlink health commands take devlink lock and this may result in deadlock for mlx5e_tx_reporter as it takes local state_lock before calling devlink health report and on the other hand devlink health commands such as diagnose for same reporter take local state_lock after taking devlink lock (see kernel log below). To fix it, remove local state_lock from mlx5e_tx_timeout_work() before calling devlink_health_report() and take care to cancel the work before any call to close channels, which may free the SQs that should be handled by the work. Before cancel_work_sync(), use current_work() to check we are not calling it from within the work, as mlx5e_tx_timeout_work() itself may close the channels and reopen as part of recovery flow. While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock() and a flag to avoid deadlock by calling cancel_work_sync() before closing the channels while holding rtnl_lock too. Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted ------------------------------------------------------ kworker/u16:2/65 is trying to acquire lock: ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0 but task is already holding lock: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x12c/0x14b0 mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core] devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x2e9/0x530 netlink_rcv_skb+0x11d/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x438/0x710 netlink_sendmsg+0x788/0xc40 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x1c1/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #0 (&devlink->lock_key#2){+.+.}-{3:3}: __lock_acquire+0x2c8a/0x6200 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock(&devlink->lock_key#2); lock(&priv->state_lock); lock(&devlink->lock_key#2); *** DEADLOCK *** 4 locks held by kworker/u16:2/65: #0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core] #3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] stack backtrace: CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? find_held_lock+0x2d/0x110 ? __stack_depot_save+0x24c/0x520 ? alloc_chain_hlocks+0x228/0x700 __lock_acquire+0x2c8a/0x6200 ? register_lock_class+0x1860/0x1860 ? kasan_save_stack+0x1e/0x40 ? kasan_set_free_info+0x20/0x30 ? ____kasan_slab_free+0x11d/0x1b0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? devlink_health_report+0x3d5/0x7e0 lock_acquire+0x1c1/0x550 ? devlink_health_report+0x2f1/0x7e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? find_held_lock+0x2d/0x110 __mutex_lock+0x12c/0x14b0 ? devlink_health_report+0x2f1/0x7e0 ? devlink_health_report+0x2f1/0x7e0 ? mutex_lock_io_nested+0x1320/0x1320 ? trace_hardirqs_on+0x2d/0x100 ? bit_wait_io_timeout+0x170/0x170 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core] ? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] ? mutex_lock_io_nested+0x1320/0x1320 ? process_one_work+0x70f/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? lock_downgrade+0x6e0/0x6e0 mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: c90005b5f75c ("devlink: Hold the instance lock in health callbacks") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>