summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-28amt: fix typo in commentRuffalo Lavoisier
Correct spelling on 'non-existent' in comment Signed-off-by: Ruffalo Lavoisier <RuffaloLavoisier@gmail.com> Link: https://lore.kernel.org/r/20220728032854.151180-1-RuffaloLavoisier@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28mlxsw: core_linecards: Remove duplicated include in core_linecard_dev.cYang Li
Fix following includecheck warning: ./drivers/net/ethernet/mellanox/mlxsw/core_linecard_dev.c: linux/err.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220727233801.23781-1-yang.lee@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28ax25: fix incorrect dev_tracker usageEric Dumazet
While investigating a separate rose issue [1], and enabling CONFIG_NET_DEV_REFCNT_TRACKER=y, Bernard reported an orthogonal ax25 issue [2] An ax25_dev can be used by one (or many) struct ax25_cb. We thus need different dev_tracker, one per struct ax25_cb. After this patch is applied, we are able to focus on rose. [1] https://lore.kernel.org/netdev/fb7544a1-f42e-9254-18cc-c9b071f4ca70@free.fr/ [2] [ 205.798723] reference already released. [ 205.798732] allocated in: [ 205.798734] ax25_bind+0x1a2/0x230 [ax25] [ 205.798747] __sys_bind+0xea/0x110 [ 205.798753] __x64_sys_bind+0x18/0x20 [ 205.798758] do_syscall_64+0x5c/0x80 [ 205.798763] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798768] freed in: [ 205.798770] ax25_release+0x115/0x370 [ax25] [ 205.798778] __sock_release+0x42/0xb0 [ 205.798782] sock_close+0x15/0x20 [ 205.798785] __fput+0x9f/0x260 [ 205.798789] ____fput+0xe/0x10 [ 205.798792] task_work_run+0x64/0xa0 [ 205.798798] exit_to_user_mode_prepare+0x18b/0x190 [ 205.798804] syscall_exit_to_user_mode+0x26/0x40 [ 205.798808] do_syscall_64+0x69/0x80 [ 205.798812] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798827] ------------[ cut here ]------------ [ 205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81 [ 205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video [ 205.798948] mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25] [ 205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3 [ 205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020 [ 205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81 [ 205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e [ 205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286 [ 205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000 [ 205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff [ 205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618 [ 205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0 [ 205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001 [ 205.799024] FS: 0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000 [ 205.799028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0 [ 205.799033] Call Trace: [ 205.799035] <TASK> [ 205.799038] ? ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799047] ? ax25_device_event+0x9f/0x270 [ax25] [ 205.799055] ? raw_notifier_call_chain+0x49/0x60 [ 205.799060] ? call_netdevice_notifiers_info+0x52/0xa0 [ 205.799065] ? dev_close_many+0xc8/0x120 [ 205.799070] ? unregister_netdevice_many+0x13d/0x890 [ 205.799073] ? unregister_netdevice_queue+0x90/0xe0 [ 205.799076] ? unregister_netdev+0x1d/0x30 [ 205.799080] ? mkiss_close+0x7c/0xc0 [mkiss] [ 205.799084] ? tty_ldisc_close+0x2e/0x40 [ 205.799089] ? tty_ldisc_hangup+0x137/0x210 [ 205.799092] ? __tty_hangup.part.0+0x208/0x350 [ 205.799098] ? tty_vhangup+0x15/0x20 [ 205.799103] ? pty_close+0x127/0x160 [ 205.799108] ? tty_release+0x139/0x5e0 [ 205.799112] ? __fput+0x9f/0x260 [ 205.799118] ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799126] ax25_device_event+0x9f/0x270 [ax25] [ 205.799135] raw_notifier_call_chain+0x49/0x60 [ 205.799140] call_netdevice_notifiers_info+0x52/0xa0 [ 205.799146] dev_close_many+0xc8/0x120 [ 205.799152] unregister_netdevice_many+0x13d/0x890 [ 205.799157] unregister_netdevice_queue+0x90/0xe0 [ 205.799161] unregister_netdev+0x1d/0x30 [ 205.799165] mkiss_close+0x7c/0xc0 [mkiss] [ 205.799170] tty_ldisc_close+0x2e/0x40 [ 205.799173] tty_ldisc_hangup+0x137/0x210 [ 205.799178] __tty_hangup.part.0+0x208/0x350 [ 205.799184] tty_vhangup+0x15/0x20 [ 205.799188] pty_close+0x127/0x160 [ 205.799193] tty_release+0x139/0x5e0 [ 205.799199] __fput+0x9f/0x260 [ 205.799203] ____fput+0xe/0x10 [ 205.799208] task_work_run+0x64/0xa0 [ 205.799213] do_exit+0x33b/0xab0 [ 205.799217] ? __handle_mm_fault+0xc4f/0x15f0 [ 205.799224] do_group_exit+0x35/0xa0 [ 205.799228] __x64_sys_exit_group+0x18/0x20 [ 205.799232] do_syscall_64+0x5c/0x80 [ 205.799238] ? handle_mm_fault+0xba/0x290 [ 205.799242] ? debug_smp_processor_id+0x17/0x20 [ 205.799246] ? fpregs_assert_state_consistent+0x26/0x50 [ 205.799251] ? exit_to_user_mode_prepare+0x49/0x190 [ 205.799256] ? irqentry_exit_to_user_mode+0x9/0x20 [ 205.799260] ? irqentry_exit+0x33/0x40 [ 205.799263] ? exc_page_fault+0x87/0x170 [ 205.799268] ? asm_exc_page_fault+0x8/0x30 [ 205.799273] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.799277] RIP: 0033:0x7ff6b80eaca1 [ 205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77. [ 205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1 [ 205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 [ 205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028 [ 205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00 [ 205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00 [ 205.799304] </TASK> Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation") Reported-by: Bernard F6BVP <f6bvp@free.fr> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220728051821.3160118-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28selftests: net: dsa: Add a Makefile which installs the selftestsMartin Blumenstingl
Add a Makefile which takes care of installing the selftests in tools/testing/selftests/drivers/net/dsa. This can be used to install all DSA specific selftests and forwarding.config using the same approach as for the selftests in tools/testing/selftests/net/forwarding. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220727191642.480279-1-martin.blumenstingl@googlemail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28Merge branch 'take-devlink-lock-on-mlx4-and-mlx5-callbacks'Jakub Kicinski
Moshe Shemesh says: ==================== Take devlink lock on mlx4 and mlx5 callbacks Prepare mlx4 and mlx5 drivers to have all devlink callbacks called with devlink instance locked. Change mlx4 driver to use devl_ API where needed to have devlink reload callbacks locked. Change mlx5 driver to use devl_ API where needed to have devlink reload and devlink health callbacks locked. As mlx5 is the only driver which needed changes to enable calling health callbacks with devlink instance locked, this patchset also removes DEVLINK_NL_FLAG_NO_LOCK flag from devlink health callbacks. This patchset will be followed by a patchset that will remove DEVLINK_NL_FLAG_NO_LOCK flag from devlink and will remove devlink_mutex. ==================== Link: https://lore.kernel.org/r/1659023630-32006-1-git-send-email-moshe@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28devlink: Hold the instance lock in health callbacksMoshe Shemesh
Let the core take the devlink instance lock around health callbacks and remove the now redundant locking in the drivers. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx5: Lock mlx5 devlink health recovery callbackMoshe Shemesh
Change devlink instance locks in mlx5 driver to have devlink health recovery callback locked, while keeping all driver paths which lead to devl_ API functions called by the driver locked. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx4: Lock mlx4 devlink reload callbackMoshe Shemesh
Change devlink instance locks in mlx4 driver to have devlink reload callback locked, while keeping all driver paths which leads to devl_ API functions called by the mlx4 driver locked. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx4: Use devl_ API for devlink port register / unregisterMoshe Shemesh
Use devl_ API to call devl_port_register() and devl_port_unregister() instead of devlink_port_register() and devlink_port_unregister(). Add devlink instance lock in mlx4 driver paths to these functions. This will be used by the downstream patch to invoke mlx4 devlink reload callbacks with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx4: Use devl_ API for devlink region create / destroyMoshe Shemesh
Use devl_ API to call devl_region_create() and devl_region_destroy() instead of devlink_region_create() and devlink_region_destroy(). Add devlink instance lock in mlx4 driver paths to these functions. This will be used by the downstream patch to invoke mlx4 devlink reload callbacks with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx5: Lock mlx5 devlink reload callbacksMoshe Shemesh
Change devlink instance locks in mlx5 driver to have devlink reload callbacks locked, while keeping all driver paths which lead to devl_ API functions called by the driver locked. Add mlx5_load_one_devl_locked() and mlx5_unload_one_devl_locked() which are used by the paths which are already locked such as devlink reload callbacks. This patch makes the driver use devl_ API also for traps register as these functions are called from the driver paths parallel to reload that requires locking now. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx5: Move fw reset unload to mlx5_fw_reset_complete_reloadMoshe Shemesh
Refactor fw reset code to have the unload driver part done on mlx5_fw_reset_complete_reload(), so if it was called by the PF which initiated the reload fw activate flow, the unload part will be handled by the mlx5_devlink_reload_fw_activate() callback itself and not by the reset event work. This will be used by the downstream patch to invoke devlink reload callbacks with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net: devlink: remove region snapshots list dependency on devlink->lockJiri Pirko
After mlx4 driver is converted to do locked reload, devlink_region_snapshot_create() may be called from both locked and unlocked context. Note that in mlx4 region snapshots could be created on any command failure. That can happen in any flow that involves commands to FW, which means most of the driver flows. So resolve this by removing dependency on devlink->lock for region snapshots list consistency and introduce new mutex to ensure it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net: devlink: remove region snapshot ID tracking dependency on devlink->lockJiri Pirko
After mlx4 driver is converted to do locked reload, functions to get/put regions snapshot ID may be called from both locked and unlocked context. So resolve this by removing dependency on devlink->lock for region snapshot ID tracking by using internal xa_lock() to maintain shapshot_ids xa_array consistency. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28Merge branch 'add-framework-for-selftests-in-devlink'Jakub Kicinski
Vikas Gupta says: ==================== add framework for selftests in devlink Add support for selftests in the devlink framework. Adds a callback .selftests_check and .selftests_run in devlink_ops. User can add test(s) suite which is subsequently passed to the driver and driver can opt for running particular tests based on its capabilities. Patchset adds a flash based test for the bnxt_en driver. ==================== Link: https://lore.kernel.org/r/20220727165721.37959-1-vikas.gupta@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28bnxt_en: implement callbacks for devlink selftestsvikas
Add callbacks ============= .selftest_check: returns true for flash selftest. .selftest_run: runs a flash selftest. Also, refactor NVM APIs so that they can be used with devlink and ethtool both. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28devlink: introduce framework for selftestsVikas Gupta
Add a framework for running selftests. Framework exposes devlink commands and test suite(s) to the user to execute and query the supported tests by the driver. Below are new entries in devlink_nl_ops devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported selftests by the drivers. devlink_nl_cmd_selftests_run: To execute selftests. Users can provide a test mask for executing group tests or standalone tests. Documentation/networking/devlink/ path is already part of MAINTAINERS & the new files come under this path. Hence no update needed to the MAINTAINERS Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28Merge branch 'mlx5e-use-tls-tx-pool-to-improve-connection-rate'Jakub Kicinski
Tariq Toukan says: ==================== mlx5e use TLS TX pool to improve connection rate To offload encryption operations, the mlx5 device maintains state and keeps track of every kTLS device-offloaded connection. Two HW objects are used per TX context of a kTLS offloaded connection: a. Transport interface send (TIS) object, to reach the HW context. b. Data Encryption Key (DEK) to perform the crypto operations. These two objects are created and destroyed per TLS TX context, via FW commands. In total, 4 FW commands are issued per TLS TX context, which seriously limits the connection rate. In this series, we aim to save creation and destroy of TIS objects by recycling them. Upon recycling of a TIS, the HW still needs to be notified for the re-mapping between a TIS and a context. This is done by posting WQEs via an SQ, significantly faster API than the FW command interface. A pool is used for recycling. The pool dynamically interacts to the load and connection rate, growing and shrinking accordingly. Saving the TIS FW commands per context increases connection rate by ~42%, from 11.6K to 16.5K connections per sec. Connection rate is still limited by FW bottleneck due to the remaining per context FW commands (DEK create/destroy). This will soon be addressed in a followup series. By combining the two series, the FW bottleneck will be released, and a significantly higher (about 100K connections per sec) kTLS TX device-offloaded connection rate is reached. ==================== Link: https://lore.kernel.org/r/20220727094346.10540-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx5e: kTLS, Dynamically re-size TX recycling poolTariq Toukan
Let the TLS TX recycle pool be more flexible in size, by continuously and dynamically allocating and releasing HW resources in response to changes in the connections rate and load. Allocate and release pool entries in bulks (16). Use a workqueue to release/allocate in the background. Allocate a new bulk when the pool size goes lower than the low threshold (1K). Symmetric operation is done when the pool size gets greater than the upper threshold (4K). Every idle pool entry holds: 1 TIS, 1 DEK (HW resources), in addition to ~100 bytes in host memory. Start with an empty pool to minimize memory and HW resources waste for non-TLS users that have the device-offload TLS enabled. Upon a new request, in case the pool is empty, do not wait for a whole bulk allocation to complete. Instead, trigger an instant allocation of a single resource to reduce latency. Performance tests: Before: 11,684 CPS After: 16,556 CPS Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connectionsTariq Toukan
The transport interface send (TIS) object is responsible for performing all transport related operations of the transmit side. The ConnectX HW uses a TIS object to save and access the TLS crypto information and state of an offloaded TX kTLS connection. Before this patch, we used to create a new TIS per connection and destroy it once it’s closed. Every create and destroy of a TIS is a FW command. Same applies for the private TLS context, where we used to dynamically allocate and free it per connection. Resources recycling reduce the impact of the allocation/free operations and helps speeding up the connection rate. In this feature we maintain a pool of TX objects and use it to recycle the resources instead of re-creating them per connection. A cached TIS popped from the pool is updated to serve the new connection via the fast-path HW interface, updating the tls static and progress params. This is a very fast operation, significantly faster than FW commands. On recycling, a WQE fence is required after the context params change. This guarantees that the data is sent after the context has been successfully updated in hardware, and that the context modification doesn't interfere with existing traffic. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx5e: kTLS, Take stats out of OOO handlerTariq Toukan
Let the caller of mlx5e_ktls_tx_handle_ooo() take care of updating the stats, according to the returned value. As the switch/case blocks are already there, this change saves unnecessary branches in the handler. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/mlx5e: kTLS, Introduce TLS-specific create TISTariq Toukan
TLS TIS objects have a defined role in mapping and reaching the HW TLS contexts. Some standard TIS attributes (like LAG port affinity) are not relevant for them. Use a dedicated TLS TIS create function instead of the generic mlx5e_create_tis. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/tls: Multi-threaded calls to TX tls_dev_delTariq Toukan
Multiple TLS device-offloaded contexts can be added in parallel via concurrent calls to .tls_dev_add, while calls to .tls_dev_del are sequential in tls_device_gc_task. This is not a sustainable behavior. This creates a rate gap between add and del operations (addition rate outperforms the deletion rate). When running for enough time, the TLS device resources could get exhausted, failing to offload new connections. Replace the single-threaded garbage collector work with a per-context alternative, so they can be handled on several cores in parallel. Use a new dedicated destruct workqueue for this. Tested with mlx5 device: Before: 22141 add/sec, 103 del/sec After: 11684 add/sec, 11684 del/sec Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net/tls: Perform immediate device ctx cleanup when possibleTariq Toukan
TLS context destructor can be run in atomic context. Cleanup operations for device-offloaded contexts could require access and interaction with the device callbacks, which might sleep. Hence, the cleanup of such contexts must be deferred and completed inside an async work. For all others, this is not necessary, as cleanup is atomic. Invoke cleanup immediately for them, avoiding queueing redundant gc work. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28tls: rx: Fix unsigned comparison with less than zeroYang Li
The return from the call to tls_rx_msg_size() is int, it can be a negative error code, however this is being assigned to an unsigned long variable 'sz', so making 'sz' an int. Eliminate the following coccicheck warning: ./net/tls/tls_strp.c:211:6-8: WARNING: Unsigned expression compared with zero: sz < 0 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20220728031019.32838-1-yang.lee@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28Merge branch 'tls-rx-follow-ups-to-rx-work'Jakub Kicinski
Jakub Kicinski says: ==================== tls: rx: follow ups to rx work A selection of unrelated changes. First some selftest polishing. Next a change to rcvtimeo handling for locking based on an exchange with Eric. Follow up to Paolo's comments from yesterday. Last but not least a fix to a false positive warning, turns out I've been testing with DEBUG_NET=n this whole time. ==================== Link: https://lore.kernel.org/r/20220727031524.358216-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28tls: rx: fix the false positive warningJakub Kicinski
I went too far in the accessor conversion, we can't use tls_strp_msg() after decryption because the message may not be ready. What we care about on this path is that the output skb is detached, i.e. we didn't somehow just turn around and used the input skb with its TCP data still attached. So look at the anchor directly. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28tls: strp: rename and multithread the workqueueJakub Kicinski
Paolo points out that there seems to be no strong reason strparser users a single threaded workqueue. Perhaps there were some performance or pinning considerations? Since we don't know (and it's the slow path) let's default to the most natural, multi-threaded choice. Also rename the workqueue to "tls-". Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28tls: rx: don't consider sock_rcvtimeo() cumulativeJakub Kicinski
Eric indicates that restarting rcvtimeo on every wait may be fine. I thought that we should consider it cumulative, and made tls_rx_reader_lock() return the remaining timeo after acquiring the reader lock. tls_rx_rec_wait() gets its timeout passed in by value so it does not keep track of time previously spent. Make the lock waiting consistent with tls_rx_rec_wait() - don't keep track of time spent. Read the timeo fresh in tls_rx_rec_wait(). It's unclear to me why callers are supposed to cache the value. Link: https://lore.kernel.org/all/CANn89iKcmSfWgvZjzNGbsrndmCch2HC_EPZ7qmGboDNaWoviNQ@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28selftests: tls: handful of memrnd() and length checksJakub Kicinski
Add a handful of memory randomizations and precise length checks. Nothing is really broken here, I did this to increase confidence when debugging. It does fix a GCC warning, tho. Apparently GCC recognizes that memory needs to be initialized for send() but does not recognize that for write(). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28net: usb: delete extra space and tab in blank lineXie Shaowen
delete extra space and tab in blank line, there is no functional change. Signed-off-by: Xie Shaowen <studentxswpy@163.com> Link: https://lore.kernel.org/r/20220727081253.3043941-1-studentxswpy@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28dm: verity-loadpin: Drop use of dm_table_get_num_targets()Matthias Kaehlcke
Commit 2aec377a2925 ("dm table: remove dm_table_get_num_targets() wrapper") in linux-dm/for-next removed the function dm_table_get_num_targets() which is used by verity-loadpin. Access table->num_targets directly instead of using the defunct wrapper. Fixes: b6c1c5745ccc ("dm: Add verity helpers for LoadPin") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220728085412.1.I242d21b378410eb6f9897a3160efb56e5608c59d@changeid
2022-07-28Merge tag 'drm-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fix from Dave Airlie: "Quiet extra week, just a single fix for i915 workaround with execlist backend. i915: - Further reset robustness improvements for execlists [Wa_22011802037]" * tag 'drm-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm: drm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend
2022-07-29Merge tag 'drm-intel-fixes-2022-07-28-1' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Further reset robustness improvements for execlists [Wa_22011802037] (Umesh Nerlige Ramappa) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YuJIWaEbKcs/q0NY@tursulin-desk
2022-07-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28libbpf: Support PPC in arch_specific_syscall_pfxDaniel Müller
Commit 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes") added the arch_specific_syscall_pfx() function, which returns a string representing the architecture in use. As it turns out this function is currently not aware of Power PC, where NULL is returned. That's being flagged by the libbpf CI system, which builds for ppc64le and the compiler sees a NULL pointer being passed in to a %s format string. With this change we add representations for two more architectures, for Power PC and Power PC 64, and also adjust the string format logic to handle NULL pointers gracefully, in an attempt to prevent similar issues with other architectures in the future. Fixes: 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes") Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220728222345.3125975-1-deso@posteo.net
2022-07-28dm: fix dm-raid crash if md_handle_request() splits bioMike Snitzer
Commit ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone") introduced the optimization to _not_ perform bio_associate_blkg()'s relatively costly work when DM core clones its bio. But in doing so it exposed the possibility for DM's cloned bio to alter DM target behavior (e.g. crash) if a target were to issue IO without first calling bio_set_dev(). The DM raid target can trigger an MD crash due to its need to split the DM bio that is passed to md_handle_request(). The split will recurse to submit_bio_noacct() using a bio with an uninitialized ->bi_blkg. This NULL bio->bi_blkg causes blk_throtl_bio() to dereference a NULL blkg_to_tg(bio->bi_blkg). Fix this in DM core by adding a new 'needs_bio_set_dev' target flag that will make alloc_tio() call bio_set_dev() on behalf of the target. dm-raid is the only target that requires this flag. bio_set_dev() initializes the DM cloned bio's ->bi_blkg, using bio_associate_blkg, before passing the bio to md_handle_request(). Long-term fix would be to audit and refactor MD code to rely on DM to split its bio, using dm_accept_partial_bio(), but there are MD raid personalities (e.g. raid1 and raid10) whose implementation are tightly coupled to handling the bio splitting inline. Fixes: ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-28dm raid: fix address sanitizer warning in raid_resumeMikulas Patocka
There is a KASAN warning in raid_resume when running the lvm test lvconvert-raid.sh. The reason for the warning is that mddev->raid_disks is greater than rs->raid_disks, so the loop touches one entry beyond the allocated length. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-28dm raid: fix address sanitizer warning in raid_statusMikulas Patocka
There is this warning when using a kernel with the address sanitizer and running this testsuite: https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/scsi_raid ================================================================== BUG: KASAN: slab-out-of-bounds in raid_status+0x1747/0x2820 [dm_raid] Read of size 4 at addr ffff888079d2c7e8 by task lvcreate/13319 CPU: 0 PID: 13319 Comm: lvcreate Not tainted 5.18.0-0.rc3.<snip> #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <TASK> dump_stack_lvl+0x6a/0x9c print_address_description.constprop.0+0x1f/0x1e0 print_report.cold+0x55/0x244 kasan_report+0xc9/0x100 raid_status+0x1747/0x2820 [dm_raid] dm_ima_measure_on_table_load+0x4b8/0xca0 [dm_mod] table_load+0x35c/0x630 [dm_mod] ctl_ioctl+0x411/0x630 [dm_mod] dm_ctl_ioctl+0xa/0x10 [dm_mod] __x64_sys_ioctl+0x12a/0x1a0 do_syscall_64+0x5b/0x80 The warning is caused by reading conf->max_nr_stripes in raid_status. The code in raid_status reads mddev->private, casts it to struct r5conf and reads the entry max_nr_stripes. However, if we have different raid type than 4/5/6, mddev->private doesn't point to struct r5conf; it may point to struct r0conf, struct r1conf, struct r10conf or struct mpconf. If we cast a pointer to one of these structs to struct r5conf, we will be reading invalid memory and KASAN warns about it. Fix this bug by reading struct r5conf only if raid type is 4, 5 or 6. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-28dm: Start pr_preempt from the same starting pathMike Christie
pr_preempt has a similar issue as reserve where for all the reservation types except the All Registrants ones the preempt can create a reservation. And a follow up reservation or release needs to go down the same path the preempt did. This has the pr_preempt work like reserve and release where we always start from the first path in the first group. This commit has been tested with windows failover clustering's validation test and libiscsi's PGR tests to check for regressions. They both don't have tests to verify this case, so I tested it manually. Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-28dm: Fix PR release handling for non All RegistrantsMike Christie
This commit fixes a bug where we are leaving the reservation in place even though pr_release has run and returned success. If we have a Write Exclusive, Exclusive Access, or Write/Exclusive Registrants only reservation, the release must be sent down the path that is the reservation holder. The problem is multipath_prepare_ioctl most likely selected path N for the reservation, then later when we do the release multipath_prepare_ioctl will select a completely different path. The device will then return success becuase the nvme and scsi specs say to return success if there is no reservation or if the release is sent down from a path that is not the holder. We then think we have released the reservation. This commit has us loop over each path and send a release so we can make sure the release is executed on the correct path. It has been tested with windows failover clustering's validation test which checks this case, and it has been tested manually (the libiscsi PGR tests don't have a test case for this yet, but I will be adding one). Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-28dm: Start pr_reserve from the same starting pathMike Christie
When an app does a pr_reserve it will go to whatever path we happen to be using at the time. This can result in errors when the app does a second pr_reserve call and expects success but gets a failure because the reserve is not done on the holder's path. This commit has us always start trying to do reserves from the first path in the first group. Windows failover clustering will produce the type of pattern above. With this commit, we will now pass its validation test for this case. Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-28dm: Allow dm_call_pr to be used for path searchesMike Christie
The specs state that if you send a reserve down a path that is already the holder success must be returned and if it goes down a path that is not the holder reservation conflict must be returned. Windows failover clustering will send a second reservation and expects that a device returns success. The problem for multipathing is that for an All Registrants reservation, we can send the reserve down any path but for all other reservation types there is one path that is the holder. To handle this we could add PR state to dm but that can get nasty. Look at target_core_pr.c for an example of the type of things we'd have to track. It will also get more complicated because other initiators can change the state so we will have to add in async event/sense handling. This commit, and the 3 commits that follow, tries to keep dm simple and keep just doing passthrough. This commit modifies dm_call_pr to be able to find the first usable path that can execute our pr_op then return. When dm_pr_reserve is converted to dm_call_pr in the next commit for the normal case we will use the same path for every reserve. Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-28dm: return early from dm_pr_call() if DM device is suspendedMike Snitzer
Otherwise PR ops may be issued while the broader DM device is being reconfigured, etc. Fixes: 9c72bad1f31a ("dm: call PR reserve/unreserve on each underlying device") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-28net/mlx5e: Move mlx5e_init_l2_addr to en_mainLama Kayal
Move the function declaration of mlx5e_init_l2_addr to en/fs.h, rename to mlx5e_fs_init_l2_addr to align with the fs API functions naming convention and let it take mlx5e_flow_steering as arguments while keeping implementation at en_fs.c file. This helps maintain a clean driver code and avoids unnecessary dependencies. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-28net/mlx5e: Split en_fs ndo's and move to en_mainLama Kayal
Add inner callee for ndo mlx5e_vlan_rx_add_vid and mlx5e_vlan_rx_kill_vid, to separate the priv usage from other flow steering flows. Move wrapper ndo's into en_main, and split the rest of the functionality into a separate part inside en_fs. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-28net/mlx5e: Separate mlx5e_set_rx_mode_work and move caller to en_mainLama Kayal
Separate mlx5e_set_rx_mode into two, and move caller to en_main while keeping implementation in en_fs in the newly declared function mlx5e_fs_set_rx_mode. This; to minimize the coupling of flow_steering to priv. Add a parallel boolean member vlan_strip_disable to mlx5e_flow_steering that's updated similarly as its identical in priv, thus making it possible to adjust the rx_mode work handler to current changes. Also, add state_destroy boolean to mlx5e_flow_steering struct which replaces the old check : !test_bit(MLX5E_STATE_DESTROYING, &priv->state). This state member is updated accordingly prior to INIT_WORK(mlx5e_set_rx_mode_work), This is done for similar purposes as mentioned earlier and to minimize argument passings. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-28net/mlx5e: Add mdev to flow_steering structLama Kayal
Make flow_steering struct contain mlx5_core_dev such that it becomes self contained and easier to decouple later on this series. Let its values be initialized in mlx5e_fs_init(). Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-28net/mlx5e: Report flow steering errors with mdev err report APILama Kayal
Let en_fs report errors via mdev error report API, aka mlx5_core_* macros, thus replace the netdev API reports. This to minimize netdev coupling to the flow steering struct. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-28net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointerLama Kayal
Make mlx5e_flow_steering member of mlx5e_priv a pointer. Add dynamic allocation respectively. Allocate fs for all profiles when initializing profile, symmetrically deallocate at profile cleanup. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>