linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-05-17	net/mlx5: Remove unused argument	Eli Cohen
	Argument ndev is not used in mlx5_handle_changeupper_event() Remove it. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Lag, refactor lag state machine	Eli Cohen
	LAG state machine is implemented using bit flags. However, all these bit flags, except for MLX5_LAG_FLAG_HASH_BASED, are really mutual exclusive. In addition, MLX5_LAG_FLAG_READY is used by bonding to mark if we have our netdevices successfully added to lag and does not really belong in the same flags variable as the other flags. Rename MLX5_LAG_FLAG_READY to MLX5_LAG_FLAG_NDEVS_READY to better reflect its purpose and put it in a new flags variable. For the rest of the flags, we introduce a mode enum to hold the state of the LAG. Remove the shared fdb boolean flag from struct mlx5_lag and store this configuration as a mode flag. Change all flag related operations to use standard Linux APIs. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Add XDP SQs to uplink representors steering tables	Gal Pressman
	This patch adds the XDP SQs to the uplink representors steering tables in swichdev mode and enables XDP usage on them. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Correct the calculation of max channels for rep	Moshe Tal
	Correct the calculation of maximum channels of rep to better utilize the hardware resources and allow a larger scale of reps. This will allow creation of all virtual ports configured. Fixes: 473baf2e9e8c ("net/mlx5e: Allow profile-specific limitation on max num of channels") Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: CT: Add ct driver counters	Saeed Mahameed
	Connection offload is translated to multiple rules over several hardware flow tables. Unhandled end-cases may cause a hardware resource leak causing multiple system symptoms such as a host memory leak, decreased performance and other scale related issues. Export the current number of firmware FTEs related to the CT table as a debugfs counter. Also add a dropped packets counter to help debug packets dropped on restore failure. To show the offloaded count: cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/offloaded To show the dropped count: cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/rx_dropped Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Roi Dayan <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com>
2022-05-17	net/mlx5e: Allow relaxed ordering over VFs	Aya Levin
	By PCI spec, the config space of the VF always report relaxed ordering not supported while it inherits this property from its PF. Hence using pcie_relaxed_ordering_enable(), always disables the relaxed ordering on all VFs. Remove this check and rely on the firmware which queries the config space of the PF and set the capability bit accordingly. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Marina Varshaver <marinav@nvidia.com> Reviewed-by: Gal Shalom <galshalom@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Support partial GSO for tunnels over vlans	Gal Pressman
	Offloading outer checksum on tunnels requires GSO partial, add it to 'vlan_features' to allow offloading tunnels over vlans. For example, running GENEVE over vlan & ipv6 (mandatory UDP checksum) now allows for hardware TSO instead of software segmentation in GSO only. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: IPoIB, Improve ethtool rxnfc callback structure in IPoIB	Gal Pressman
	Followup commit 79ce39be1d63 ("net/mlx5e: Improve ethtool rxnfc callback structure") and handle CONFIG_MLX5_EN_RXNFC enabled/disabled inside the fs layer so the ethtool callbacks are always available. The fs layer will provide stubs when CONFIG_MLX5_EN_RXNFC is compiled out. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Allocate virtually contiguous memory for reps structures	Tariq Toukan
	Physical continuity is not necessary, and requested allocation size might be larger than PAGE_SIZE. Hence, use v-alloc/free API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Allocate virtually contiguous memory for VLANs list	Tariq Toukan
	Physical continuity is not necessary, and requested allocation size might be larger than PAGE_SIZE. Hence, use v-alloc/free API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Allocate virtually contiguous memory in pci_irq.c	Tariq Toukan
	Physical continuity is not necessary, and requested allocation size might be larger than PAGE_SIZE. Hence, use v-alloc/free API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Allocate virtually contiguous memory in vport.c	Tariq Toukan
	Physical continuity is not necessary, and requested allocation size might be larger than PAGE_SIZE. Hence, use v-alloc/free API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Inline db alloc API function	Tariq Toukan
	Take the wrapper version which picks default node into a header file. This reduces the number of exported functions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Add last command failure syndrome to debugfs	Moshe Shemesh
	Add syndrome of last command failure per command type to debugfs to ease debugging of such failure. last_failed_syndrome - last command failed syndrome returned by FW. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: sparse: error: context imbalance in 'mlx5_vf_get_core_dev'	Saeed Mahameed
	Removing the annotation resolves the issue for some reason. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Drain fw_reset when removing device	Shay Drory
	In case fw sync reset is called in parallel to device removal, device might stuck in the following deadlock: CPU 0 CPU 1 ----- ----- remove_one uninit_one (locks intf_state_mutex) mlx5_sync_reset_now_event() work in fw_reset->wq. mlx5_enter_error_state() mutex_lock (intf_state_mutex) cleanup_once fw_reset_cleanup() destroy_workqueue(fw_reset->wq) Drain the fw_reset WQ, and make sure no new work is being queued, before entering uninit_one(). The Drain is done before devlink_unregister() since fw_reset, in some flows, is using devlink API devlink_remote_reload_actions_performed(). Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: CT: Fix setting flow_source for smfs ct tuples	Paul Blakey
	Cited patch sets flow_source to ANY overriding the provided spec flow_source, avoiding the optimization done by commit c9c079b4deaa ("net/mlx5: CT: Set flow source hint from provided tuple device"). To fix the above, set the dr_rule flow_source from provided flow spec. Fixes: 3ee61ebb0df1 ("net/mlx5: CT: Add software steering ct flow steering provider") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: CT: Fix support for GRE tuples	Paul Blakey
	cited commit removed support for GRE tuples when software steering was enabled. To bring back support for GRE tuples, add GRE ipv4/ipv6 matchers. Fixes: 3ee61ebb0df1 ("net/mlx5: CT: Add software steering ct flow steering provider") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Remove HW-GRO from reported features	Gal Pressman
	We got reports of certain HW-GRO flows causing kernel call traces, which might be related to firmware. To be on the safe side, disable the feature for now and re-enable it once a driver/firmware fix is found. Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Properly block HW GRO when XDP is enabled	Maxim Mikityanskiy
	HW GRO is incompatible and mutually exclusive with XDP and XSK. However, the needed checks are only made when enabling XDP. If HW GRO is enabled when XDP is already active, the command will succeed, and XDP will be skipped in the data path, although still enabled. This commit fixes the bug by checking the XDP and XSK status in mlx5e_fix_features and disabling HW GRO if XDP is enabled. Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Properly block LRO when XDP is enabled	Maxim Mikityanskiy
	LRO is incompatible and mutually exclusive with XDP. However, the needed checks are only made when enabling XDP. If LRO is enabled when XDP is already active, the command will succeed, and XDP will be skipped in the data path, although still enabled. This commit fixes the bug by checking the XDP status in mlx5e_fix_features and disabling LRO if XDP is enabled. Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Block rx-gro-hw feature in switchdev mode	Aya Levin
	When the driver is in switchdev mode and rx-gro-hw is set, the RQ needs special CQE handling. Till then, block setting of rx-gro-hw feature in switchdev mode, to avoid failure while setting the feature due to failure while opening the RQ. Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5e: Wrap mlx5e_trap_napi_poll into rcu_read_lock	Maxim Mikityanskiy
	The body of mlx5e_napi_poll is wrapped into rcu_read_lock to be able to read the XDP program pointer using rcu_dereference. However, the trap RQ NAPI doesn't use rcu_read_lock, because the trap RQ works only in the non-linear mode, and mlx5e_skb_from_cqe_nonlinear, until recently, didn't support XDP and didn't call rcu_dereference. Starting from the cited commit, mlx5e_skb_from_cqe_nonlinear supports XDP and calls rcu_dereference, but mlx5e_trap_napi_poll doesn't wrap it into rcu_read_lock. It leads to RCU-lockdep warnings like this: WARNING: suspicious RCU usage This commit fixes the issue by adding an rcu_read_lock to mlx5e_trap_napi_poll, similarly to mlx5e_napi_poll. Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: DR, Ignore modify TTL on RX if device doesn't support it	Yevgeny Kliteynik
	When modifying TTL, packet's csum has to be recalculated. Due to HW issue in ConnectX-5, csum recalculation for modify TTL on RX is supported through a work-around that is specifically enabled by configuration. If the work-around isn't enabled, rather than adding an unsupported action the modify TTL action on RX should be ignored. Ignoring modify TTL action might result in zero actions, so in such cases we will not convert the match STE to modify STE, as it is done by FW in DMFS. This patch fixes an issue where modify TTL action was ignored both on RX and TX instead of only on RX. Fixes: 4ff725e1d4ad ("net/mlx5: DR, Ignore modify TTL if device doesn't support it") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: Initialize flow steering during driver probe	Shay Drory
	Currently, software objects of flow steering are created and destroyed during reload flow. In case a device is unloaded, the following error is printed during grace period: mlx5_core 0000:00:0b.0: mlx5_fw_fatal_reporter_err_work:690:(pid 95): Driver is in error state. Unloading As a solution to fix use-after-free bugs, where we try to access these objects, when reading the value of flow_steering_mode devlink param[1], let's split flow steering creation and destruction into two routines: * init and cleanup: memory, cache, and pools allocation/free. * create and destroy: namespaces initialization and cleanup. While at it, re-order the cleanup function to mirror the init function. [1] Kasan trace: [ 385.119849 ] BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] Read of size 4 at addr ffff888104b79308 by task bash/291 [ 385.119849 ] [ 385.119849 ] CPU: 1 PID: 291 Comm: bash Not tainted 5.17.0-rc1+ #2 [ 385.119849 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [ 385.119849 ] Call Trace: [ 385.119849 ] <TASK> [ 385.119849 ] dump_stack_lvl+0x6e/0x91 [ 385.119849 ] print_address_description.constprop.0+0x1f/0x160 [ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] kasan_report.cold+0x83/0xdf [ 385.119849 ] ? devlink_param_notify+0x20/0x190 [ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] mlx5_devlink_fs_mode_get+0x3b/0xa0 [ 385.119849 ] devlink_nl_param_fill+0x18a/0xa50 [ 385.119849 ] ? _raw_spin_lock_irqsave+0x8d/0xe0 [ 385.119849 ] ? devlink_flash_update_timeout_notify+0xf0/0xf0 [ 385.119849 ] ? __wake_up_common+0x4b/0x1e0 [ 385.119849 ] ? preempt_count_sub+0x14/0xc0 [ 385.119849 ] ? _raw_spin_unlock_irqrestore+0x28/0x40 [ 385.119849 ] ? __wake_up_common_lock+0xe3/0x140 [ 385.119849 ] ? __wake_up_common+0x1e0/0x1e0 [ 385.119849 ] ? __sanitizer_cov_trace_const_cmp8+0x27/0x80 [ 385.119849 ] ? __rcu_read_unlock+0x48/0x70 [ 385.119849 ] ? kasan_unpoison+0x23/0x50 [ 385.119849 ] ? __kasan_slab_alloc+0x2c/0x80 [ 385.119849 ] ? memset+0x20/0x40 [ 385.119849 ] ? __sanitizer_cov_trace_const_cmp4+0x25/0x80 [ 385.119849 ] devlink_param_notify+0xce/0x190 [ 385.119849 ] devlink_unregister+0x92/0x2b0 [ 385.119849 ] remove_one+0x41/0x140 [ 385.119849 ] pci_device_remove+0x68/0x140 [ 385.119849 ] ? pcibios_free_irq+0x10/0x10 [ 385.119849 ] __device_release_driver+0x294/0x3f0 [ 385.119849 ] device_driver_detach+0x82/0x130 [ 385.119849 ] unbind_store+0x193/0x1b0 [ 385.119849 ] ? subsys_interface_unregister+0x270/0x270 [ 385.119849 ] drv_attr_store+0x4e/0x70 [ 385.119849 ] ? drv_attr_show+0x60/0x60 [ 385.119849 ] sysfs_kf_write+0xa7/0xc0 [ 385.119849 ] kernfs_fop_write_iter+0x23a/0x2f0 [ 385.119849 ] ? sysfs_kf_bin_read+0x160/0x160 [ 385.119849 ] new_sync_write+0x311/0x430 [ 385.119849 ] ? new_sync_read+0x480/0x480 [ 385.119849 ] ? _raw_spin_lock+0x87/0xe0 [ 385.119849 ] ? __sanitizer_cov_trace_cmp4+0x25/0x80 [ 385.119849 ] ? security_file_permission+0x94/0xa0 [ 385.119849 ] vfs_write+0x4c7/0x590 [ 385.119849 ] ksys_write+0xf6/0x1e0 [ 385.119849 ] ? __x64_sys_read+0x50/0x50 [ 385.119849 ] ? fpregs_assert_state_consistent+0x99/0xa0 [ 385.119849 ] do_syscall_64+0x3d/0x90 [ 385.119849 ] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 385.119849 ] RIP: 0033:0x7fc36ef38504 [ 385.119849 ] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53 [ 385.119849 ] RSP: 002b:00007ffde0ff3d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 385.119849 ] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc36ef38504 [ 385.119849 ] RDX: 000000000000000c RSI: 00007fc370521040 RDI: 0000000000000001 [ 385.119849 ] RBP: 00007fc370521040 R08: 00007fc36f00b8c0 R09: 00007fc36ee4b740 [ 385.119849 ] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc36f00a760 [ 385.119849 ] R13: 000000000000000c R14: 00007fc36f005760 R15: 000000000000000c [ 385.119849 ] </TASK> [ 385.119849 ] [ 385.119849 ] Allocated by task 65: [ 385.119849 ] kasan_save_stack+0x1e/0x40 [ 385.119849 ] __kasan_kmalloc+0x81/0xa0 [ 385.119849 ] mlx5_init_fs+0x11b/0x1160 [ 385.119849 ] mlx5_load+0x13c/0x220 [ 385.119849 ] mlx5_load_one+0xda/0x160 [ 385.119849 ] mlx5_recover_device+0xb8/0x100 [ 385.119849 ] mlx5_health_try_recover+0x2f9/0x3a1 [ 385.119849 ] devlink_health_reporter_recover+0x75/0x100 [ 385.119849 ] devlink_health_report+0x26c/0x4b0 [ 385.275909 ] mlx5_fw_fatal_reporter_err_work+0x11e/0x1b0 [ 385.275909 ] process_one_work+0x520/0x970 [ 385.275909 ] worker_thread+0x378/0x950 [ 385.275909 ] kthread+0x1bb/0x200 [ 385.275909 ] ret_from_fork+0x1f/0x30 [ 385.275909 ] [ 385.275909 ] Freed by task 65: [ 385.275909 ] kasan_save_stack+0x1e/0x40 [ 385.275909 ] kasan_set_track+0x21/0x30 [ 385.275909 ] kasan_set_free_info+0x20/0x30 [ 385.275909 ] __kasan_slab_free+0xfc/0x140 [ 385.275909 ] kfree+0xa5/0x3b0 [ 385.275909 ] mlx5_unload+0x2e/0xb0 [ 385.275909 ] mlx5_unload_one+0x86/0xb0 [ 385.275909 ] mlx5_fw_fatal_reporter_err_work.cold+0xca/0xcf [ 385.275909 ] process_one_work+0x520/0x970 [ 385.275909 ] worker_thread+0x378/0x950 [ 385.275909 ] kthread+0x1bb/0x200 [ 385.275909 ] ret_from_fork+0x1f/0x30 [ 385.275909 ] [ 385.275909 ] The buggy address belongs to the object at ffff888104b79300 [ 385.275909 ] which belongs to the cache kmalloc-128 of size 128 [ 385.275909 ] The buggy address is located 8 bytes inside of [ 385.275909 ] 128-byte region [ffff888104b79300, ffff888104b79380) [ 385.275909 ] The buggy address belongs to the page: [ 385.275909 ] page:00000000de44dd39 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x104b78 [ 385.275909 ] head:00000000de44dd39 order:1 compound_mapcount:0 [ 385.275909 ] flags: 0x8000000000010200(slab\|head\|zone=2) [ 385.275909 ] raw: 8000000000010200 0000000000000000 dead000000000122 ffff8881000428c0 [ 385.275909 ] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 385.275909 ] page dumped because: kasan: bad access detected [ 385.275909 ] [ 385.275909 ] Memory state around the buggy address: [ 385.275909 ] ffff888104b79200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc [ 385.275909 ] ffff888104b79280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 385.275909 ] >ffff888104b79300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 385.275909 ] ^ [ 385.275909 ] ffff888104b79380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 385.275909 ] ffff888104b79400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 385.275909 ]] Fixes: e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	net/mlx5: DR, Fix missing flow_source when creating multi-destination FW table	Maor Dickman
	In order to support multiple destination FTEs with SW steering FW table is created with single FTE with multiple actions and SW steering rule forward to it. When creating this table, flow source isn't set according to the original FTE. Fix this by passing the original FTE flow source to the created FW table. Fixes: 34583beea4b7 ("net/mlx5: DR, Create multi-destination table for SW-steering use") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17	scsi: target: Fix incorrect use of cpumask_t	Mingzhe Zou
	In commit d72d827f2f26, I used 'cpumask_t' incorrectly: void iscsit_thread_get_cpumask(struct iscsi_conn conn) { int ord, cpu; cpumask_t conn_allowed_cpumask; ...... } static ssize_t lio_target_wwn_cpus_allowed_list_store( struct config_item item, const char page, size_t count) { int ret; char orig; cpumask_t new_allowed_cpumask; ...... } The correct pattern should be as follows: cpumask_var_t mask; if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) return -ENOMEM; ... use 'mask' here ... free_cpumask_var(mask); Link: https://lore.kernel.org/r/20220516054721.1548-1-mingzhe.zou@easystack.cn Fixes: d72d827f2f26 ("scsi: target: Add iscsi/cpus_allowed_list in configfs") Reported-by: Test Bot <zgrieee@gmail.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-17	octeontx2-pf: Add support for adaptive interrupt coalescing	Suman Ghosh
	Added support for adaptive IRQ coalescing. It uses net_dim algorithm to find the suitable delay/IRQ count based on the current packet rate. Signed-off-by: Suman Ghosh <sumang@marvell.com> Link: https://lore.kernel.org/r/20220517044055.876158-1-sumang@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-17	ptp: ptp_clockmatrix: return -EBUSY if phase pull-in is in progress	Min Li
	Also removes PEROUT_ENABLE_OUTPUT_MASK Signed-off-by: Min Li <min.li.xe@renesas.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/1652712427-14703-2-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-17	ptp: ptp_clockmatrix: Add PTP_CLK_REQ_EXTTS support	Min Li
	Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY for gettime and settime exclusively. Before this change, TOD_READ_PRIMARY was used for both extts and gettime/settime, which would result in changing TOD read/write triggers between operations. Using TOD_READ_SECONDARY would make extts independent of gettime/settime operation Signed-off-by: Min Li <min.li.xe@renesas.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/1652712427-14703-1-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-17	net: smc911x: replace ternary operator with min()	Guo Zhengkui
	Fix the following coccicheck warning: drivers/net/ethernet/smsc/smc911x.c:483:20-22: WARNING opportunity for min() Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Link: https://lore.kernel.org/r/20220516115627.66363-1-guozhengkui@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-17	net: thunderx: remove null check after call container_of()	Haowen Bai
	container_of() will never return NULL, so remove useless code. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Link: https://lore.kernel.org/r/1652696212-17516-1-git-send-email-baihaowen@meizu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-17	octeontx2-pf: Use memset_startat() helper in otx2_stop()	Xiu Jianfeng
	Use memset_startat() helper to simplify the code, there is no functional change in this patch. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Link: https://lore.kernel.org/r/20220516092337.131653-1-xiujianfeng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-17	net/qla3xxx: Fix a test in ql_reset_work()	Christophe JAILLET
	test_bit() tests if one bit is set or not. Here the logic seems to check of bit QL_RESET_PER_SCSI (i.e. 4) OR bit QL_RESET_START (i.e. 3) is set. In fact, it checks if bit 7 (4 \| 3 = 7) is set, that is to say QL_ADAPTER_UP. This looks harmless, because this bit is likely be set, and when the ql_reset_work() delayed work is scheduled in ql3xxx_isr() (the only place that schedule this work), QL_RESET_START or QL_RESET_PER_SCSI is set. This has been spotted by smatch. Fixes: 5a4faa873782 ("[PATCH] qla3xxx NIC driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/80e73e33f390001d9c0140ffa9baddf6466a41a2.1652637337.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-17	Merge tag 'pci-v5.18-fixes-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Avoid putting Elo i2 PCIe Ports in D3cold because downstream devices are inaccessible after going back to D0 (Rafael J. Wysocki) - Qualcomm SM8250 has a ddrss_sf_tbu clock but SC8180X does not; make a SC8180X-specific config without the clock so it probes correctly (Bjorn Andersson) - Revert aardvark chained IRQ handler rewrite because it broke interrupt affinity (Pali Rohár) * tag 'pci-v5.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler" PCI: qcom: Remove ddrss_sf_tbu clock from SC8180X PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold
2022-05-17	Merge tag 'thermal-5.18-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Fix up a recent change in the int340x thermal driver that inadvertently broke thermal zone handling on some systems (Srinivas Pandruvada)" * tag 'thermal-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: int340x: Mode setting with new OS handshake
2022-05-17	PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits	Kuppuswamy Sathyanarayanan
	When a Root Port or Root Complex Event Collector receives an error Message e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status register and logs the Requester ID in the Error Source Identification register. If it receives a second ERR_COR Message before software clears PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the Requester ID is lost. In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared: - hardware receives ERR_COR message - hardware sets PCI_ERR_ROOT_COR_RCV - aer_irq() entered - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS) - aer_irq(): now status == PCI_ERR_ROOT_COR_RCV - hardware receives second ERR_COR message - hardware sets PCI_ERR_ROOT_MULTI_COR_RCV - aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status) - PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set - aer_irq() entered again - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS) - aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV - aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set - PCI_ERR_ROOT_MULTI_COR_RCV is still set The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV. Fix the problem by queueing an AER event and clearing the Root Error Status bits when any of these bits are set: PCI_ERR_ROOT_COR_RCV PCI_ERR_ROOT_UNCOR_RCV PCI_ERR_ROOT_MULTI_COR_RCV PCI_ERR_ROOT_MULTI_UNCOR_RCV See the bugzilla link for details from Eric about how to reproduce this problem. [bhelgaas: commit log, move repro details to bugzilla] Fixes: e167bfcaa4cd ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992 Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.com Reported-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com>
2022-05-17	drm/dp/mst: fix a possible memory leak in fetch_monitor_name()	Hangyu Hua
	drm_dp_mst_get_edid call kmemdup to create mst_edid. So mst_edid need to be freed after use. Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20220516032042.13166-1-hbh25y@gmail.com
2022-05-17	clk: at91: generated: consider range when calculating best rate	Codrin Ciubotariu
	clk_generated_best_diff() helps in finding the parent and the divisor to compute a rate closest to the required one. However, it doesn't take into account the request's range for the new rate. Make sure the new rate is within the required range. Fixes: 8a8f4bf0c480 ("clk: at91: clk-generated: create function to find best_diff") Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Link: https://lore.kernel.org/r/20220413071318.244912-1-codrin.ciubotariu@microchip.com Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2022-05-17	cpufreq: make interface functions and lock holding state clear	Schspa Shi
	cpufreq_offline() calls offline() and exit() under the policy rwsem But they are called outside the rwsem in cpufreq_online(). Make cpufreq_online() call offline() and exit() as well as online() and init() under the policy rwsem to achieve a clear lock relationship. All of the init() and online() implementations in the tree only initialize the policy object without attempting to acquire the policy rwsem and they won't call cpufreq APIs attempting to acquire it. Signed-off-by: Schspa Shi <schspa@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-17	cpufreq: Abort show()/store() for half-initialized policies	Schspa Shi
	If policy initialization fails after the sysfs files are created, there is a possibility to end up running show()/store() callbacks for half-initialized policies, which may have unpredictable outcomes. Abort show()/store() in such a case by making sure the policy is active. Also dectivate the policy on such failures. Signed-off-by: Schspa Shi <schspa@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-17	Input: cros-ec-keyb - allow skipping keyboard registration	Stephen Boyd
	If the device is a detachable (and therefore lacks full keyboard), we may still want to load this driver because the device might have some other buttons or switches (e.g. volume and power buttons or a tablet mode switch). In such case we do not want to register the "main" keyboard device to allow userspace detect when the detachable keyboard is disconnected and adjust the system behavior for the tablet mode. Originally it was suggested to simply skip keyboard registration if row and columns properties didn't exist, but that approach did not convey the intent strongly enough and also had a slight problem for migrating existing DTBs without updating the kernel first, so it was decided to introduce new google,cros-ec-keyb-switches to explicitly mark devices that only have axillary buttons and switches. Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20220516183452.942008-3-swboyd@chromium.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-05-17	of/fdt: Ignore disabled memory nodes	Andre Przywara
	When we boot a machine using a devicetree, the generic DT code goes through all nodes with a 'device_type = "memory"' property, and collects all memory banks mentioned there. However it does not check for the status property, so any nodes which are explicitly "disabled" will still be added as a memblock. This ends up badly for QEMU, when booting with secure firmware on arm/arm64 machines, because QEMU adds a node describing secure-only memory: =================== secram@e000000 { secure-status = "okay"; status = "disabled"; reg = <0x00 0xe000000 0x00 0x1000000>; device_type = "memory"; }; =================== The kernel will eventually use that memory block (which is located below the main DRAM bank), but accesses to that will be answered with an SError: =================== [ 0.000000] Internal error: synchronous external abort: 96000050 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc6-00014-g10c8acb8b679 #524 [ 0.000000] Hardware name: linux,dummy-virt (DT) [ 0.000000] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.000000] pc : new_slab+0x190/0x340 [ 0.000000] lr : new_slab+0x184/0x340 [ 0.000000] sp : ffff80000a4b3d10 .... ================== The actual crash location and call stack will be somewhat random, and depend on the specific allocation of that physical memory range. As the DT spec[1] explicitly mentions standard properties, add a simple check to skip over disabled memory nodes, so that we only use memory that is meant for non-secure code to use. That fixes booting a QEMU arm64 VM with EL3 enabled ("secure=on"), when not using UEFI. In this case the QEMU generated DT will be handed on to the kernel, which will see the secram node. This issue is reproducible when using TF-A together with U-Boot as firmware, then booting with the "booti" command. When using U-Boot as an UEFI provider, the code there [2] explicitly filters for disabled nodes when generating the UEFI memory map, so we are safe. EDK/2 only reads the first bank of the first DT memory node [3] to learn about memory, so we got lucky there. [1] https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter3-devicenodes.rst#memory-node (after the table) [2] https://source.denx.de/u-boot/u-boot/-/blob/master/lib/fdtdec.c#L1061-1063 [3] https://github.com/tianocore/edk2/blob/master/ArmVirtPkg/PrePi/FdtParser.c Reported-by: Ross Burton <ross.burton@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220517101410.3493781-1-andre.przywara@arm.com
2022-05-17	ice: Fix interrupt moderation settings getting cleared	Michal Wilczynski
	Adaptive-rx and Adaptive-tx are interrupt moderation settings that can be enabled/disabled using ethtool: ethtool -C ethX adaptive-rx on/off adaptive-tx on/off Unfortunately those settings are getting cleared after changing number of queues, or in ethtool world 'channels': ethtool -L ethX rx 1 tx 1 Clearing was happening due to introduction of bit fields in ice_ring_container struct. This way only itr_setting bits were rebuilt during ice_vsi_rebuild_set_coalesce(). Introduce an anonymous struct of bitfields and create a union to refer to them as a single variable. This way variable can be easily saved and restored. Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild") Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17	ice: fix possible under reporting of ethtool Tx and Rx statistics	Paul Greenwalt
	The hardware statistics counters are not cleared during resets so the drivers first access is to initialize the baseline and then subsequent reads are for reporting the counters. The statistics counters are read during the watchdog subtask when the interface is up. If the baseline is not initialized before the interface is up, then there can be a brief window in which some traffic can be transmitted/received before the initial baseline reading takes place. Directly initialize ethtool statistics in driver open so the baseline will be initialized when the interface is up, and any dropped packets incremented before the interface is up won't be reported. Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17	ice: fix crash when writing timestamp on RX rings	Arkadiusz Kubalewski
	Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Dave Cain <dcain@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17	mtd: st_spi_fsm: add missing clk_disable_unprepare() in stfsm_remove()	Yang Yingliang
	Clock source is prepared and enabled by clk_prepare_enable() in probe function, but not disabled or unprepared in remove function. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220516092911.953066-1-yangyingliang@huawei.com
2022-05-17	pinctrl: cherryview: Use GPIO chip pointer in chv_gpio_irq_mask_unmask()	Andy Shevchenko
	The callers already have dereferenced pointer to GPIO chip, no need to do it again in chv_gpio_irq_mask_unmask(). Hence, replace IRQ data pointer by GPIO chip pointer. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2022-05-17	Merge remote-tracking branch 'regulator/for-5.19' into regulator-next	Mark Brown

2022-05-17	EDAC/i5100: Remove unused inline function i5100_nrecmema_dm_buf_id()	YueHaibing
	Commit a4972b1b9a04 ("edac: i5100_edac: Remove unused i5100_recmema_dm_buf_id") left this function unused. Remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220514080433.29944-1-yuehaibing@huawei.com