linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-07-28	net: fec: add MAC internal delayed clock feature support	Fugang Duan
	i.MX8QM ENET IP version support timing specification that MAC integrate clock delay in RGMII mode, the delayed TXC/RXC as an alternative option to work well with various PHYs. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: fec: add eee mode tx lpi support	Fugang Duan
	The i.MX8MQ ENET version support IEEE802.3az eee mode, add eee mode tx lpi enable to support ethtool interface. usage: 1. set sleep and wake timer to 5ms: ethtool --set-eee eth0 eee on tx-lpi on tx-timer 5000 2. check the eee mode: ~# ethtool --show-eee eth0 EEE Settings for eth0: EEE status: enabled - active Tx LPI: 5000 (us) Supported EEE link modes: 100baseT/Full 1000baseT/Full Advertised EEE link modes: 100baseT/Full 1000baseT/Full Link partner advertised EEE link modes: 100baseT/Full Note: For realtime case and IEEE1588 ptp case, it should disable EEE mode. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: fec: add imx8mq and imx8qm new versions support	Fugang Duan
	The ENET of imx8mq and imx8qm are basically the same as imx6sx, but they have new features support based on imx6sx, like: - imx8mq: supports IEEE 802.3az EEE standard. - imx8qm: supports RGMII mode delayed clock. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	dt-bindings: net: fsl,fec: add RGMII internal clock delay	Joakim Zhang
	Add RGMII internal clock delay for FEC controller. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	dt-bindings: net: fsl,fec: update compatible items	Joakim Zhang
	Add more compatible items for i.MX8/8M platforms. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	tc-testing: Add control-plane selftest for skbmod SKBMOD_F_ECN option	Peilin Ye
	Recently we added a new option, SKBMOD_F_ECN, to tc-skbmod(8). Add a control-plane selftest for it. Depends on kernel patch "net/sched: act_skbmod: Add SKBMOD_F_ECN option support", as well as iproute2 patch "tc/skbmod: Introduce SKBMOD_F_ECN option". Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net/sched: act_skbmod: Add SKBMOD_F_ECN option support	Peilin Ye
	Currently, when doing rate limiting using the tc-police(8) action, the easiest way is to simply drop the packets which exceed or conform the configured bandwidth limit. Add a new option to tc-skbmod(8), so that users may use the ECN [1] extension to explicitly inform the receiver about the congestion instead of dropping packets "on the floor". The 2 least significant bits of the Traffic Class field in IPv4 and IPv6 headers are used to represent different ECN states [2]: 0b00: "Non ECN-Capable Transport", Non-ECT 0b10: "ECN Capable Transport", ECT(0) 0b01: "ECN Capable Transport", ECT(1) 0b11: "Congestion Encountered", CE As an example: $ tc filter add dev eth0 parent 1: protocol ip prio 10 \ matchall action skbmod ecn Doing the above marks all ECT(0) and ECT(1) packets as CE. It does NOT affect Non-ECT or non-IP packets. In the tc-police scenario mentioned above, users may pipe a tc-police action and a tc-skbmod "ecn" action together to achieve ECN-based rate limiting. For TCP connections, upon receiving a CE packet, the receiver will respond with an ECE packet, asking the sender to reduce their congestion window. However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and our implementation does not touch or care about L4 headers. The updated tc-skbmod SYNOPSIS looks like the following: tc ... action skbmod { set SETTABLE \| swap SWAPPABLE \| ecn } ... Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod command. Trying to use more than one of them at a time is considered undefined behavior; pipe multiple tc-skbmod commands together instead. "set" and "swap" only affect Ethernet packets, while "ecn" only affects IPv{4,6} packets. It is also worth mentioning that, in theory, the same effect could be achieved by piping a "police" action and a "bpf" action using the bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the user, thus impractical. Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets". [1] https://datatracker.ietf.org/doc/html/rfc3168 [2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	nfp: flower-ct: fix error return code in nfp_fl_ct_add_offload()	Yang Yingliang
	If nfp_tunnel_add_ipv6_off() fails, it should return error code in nfp_fl_ct_add_offload(). Fixes: 5a2b93041646 ("nfp: flower-ct: compile match sections of flow_payload") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: let flow have same hash in two directions	zhang kai
	using same source and destination ip/port for flow hash calculation within the two directions. Signed-off-by: zhang kai <zhangkaiheb@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	platform/x86: gigabyte-wmi: add support for B550 Aorus Elite V2	Thomas Weißschuh
	Reported as working here: https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1#issuecomment-879398883 Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20210726153630.65213-1-linux@weissschuh.net Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-07-28	platform/x86: intel-hid: add Alder Lake ACPI device ID	Ping Bao
	Alder Lake has a new ACPI ID for Intel HID event filter device. Signed-off-by: Ping Bao <ping.a.bao@intel.com> Link: https://lore.kernel.org/r/20210721225615.20575-1-ping.a.bao@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-07-28	HID: wacom: Skip processing of touches with negative slot values	Jason Gerecke
	The `input_mt_get_slot_by_key` function may return a negative value if an error occurs (e.g. running out of slots). If this occurs we should really avoid reporting any data for the slot. Signed-off-by: Ping Cheng <ping.cheng@wacom.com> Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2021-07-28	HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT	Jason Gerecke
	Commit 670e90924bfe ("HID: wacom: support named keys on older devices") added support for sending named events from the soft buttons on the 24HDT and 27QHDT. In the process, however, it inadvertantly disabled the touchscreen of the 24HDT and 27QHDT by default. The `wacom_set_shared_values` function would normally enable touch by default but because it checks the state of the non-shared `has_mute_touch_switch` flag and `wacom_setup_touch_input_capabilities` sets the state of the /shared/ version, touch ends up being disabled by default. This patch sets the non-shared flag, letting `wacom_set_shared_values` take care of copying the value over to the shared version and setting the default touch state to "on". Fixes: 670e90924bfe ("HID: wacom: support named keys on older devices") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Reviewed-by: Ping Cheng <ping.cheng@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2021-07-28	HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible"	Colin Ian King
	There is a spelling mistake in the Kconfig text. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2021-07-28	HID: apple: Add support for Keychron K1 wireless keyboard	Haochen Tong
	The Keychron K1 wireless keyboard has a set of Apple-like function keys and an Fn key that works like on an Apple bluetooth keyboard. It identifies as an Apple Alu RevB ANSI keyboard (05ac:024f) over USB and BT. Use hid-apple for it so the Fn key and function keys work correctly. Signed-off-by: Haochen Tong <i@hexchain.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2021-07-28	HID: fix typo in Kconfig	Christophe JAILLET
	There is a missing space in "relyingon". Add it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2021-07-28	Merge branch 'devlink-register'	David S. Miller
	Leon Romanovsky says: ==================== Remove duplicated devlink registration check Changelog: v1: * Added two new patches that remove registration field from mlx5 and ti drivers. v0: https://lore.kernel.org/lkml/ed7bbb1e4c51dd58e6035a058e93d16f883b09ce.1627215829.git.leonro@nvidia.com -------------------------------------------------------------------- Both registered flag and devlink pointer are set at the same time and indicate the same thing - devlink/devlink_port are ready. Instead of checking ->registered use devlink pointer as an indication. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	devlink: Remove duplicated registration check	Leon Romanovsky
	Both registered flag and devlink pointer are set at the same time and indicate the same thing - devlink/devlink_port are ready. Instead of checking ->registered use devlink pointer as an indication. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net/mlx5: Don't rely on always true registered field	Leon Romanovsky
	Devlink is an integral part of mlx5 driver and all flows ensure that devlink_*_register() will success. That makes the ->registered check an obsolete. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: ti: am65-cpsw-nuss: fix wrong devlink release order	Leon Romanovsky
	The commit that introduced devlink support released devlink resources in wrong order, that made an unwind flow to be asymmetrical. In addition, the am65-cpsw-nuss used internal to devlink core field - registered. In order to fix the unwind flow and remove such access to the registered field, rewrite the code to call devlink_port_unregister only on registered ports. Fixes: 58356eb31d60 ("net: ti: am65-cpsw-nuss: Add devlink support") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	nfc: nfcsim: fix use after free during module unload	Krzysztof Kozlowski
	There is a use after free memory corruption during module exit: - nfcsim_exit() - nfcsim_device_free(dev0) - nfc_digital_unregister_device() This iterates over command queue and frees all commands, - dev->up = false - nfcsim_link_shutdown() - nfcsim_link_recv_wake() This wakes the sleeping thread nfcsim_link_recv_skb(). - nfcsim_link_recv_skb() Wake from wait_event_interruptible_timeout(), call directly the deb->cb callback even though (dev->up == false), - digital_send_cmd_complete() Dereference of "struct digital_cmd" cmd which was freed earlier by nfc_digital_unregister_device(). This causes memory corruption shortly after (with unrelated stack trace): nfc nfc0: NFC: nfcsim_recv_wq: Device is down llcp: nfc_llcp_recv: err -19 nfc nfc1: NFC: nfcsim_recv_wq: Device is down BUG: unable to handle page fault for address: ffffffffffffffed Call Trace: fsnotify+0x54b/0x5c0 __fsnotify_parent+0x1fe/0x300 ? vfs_write+0x27c/0x390 vfs_write+0x27c/0x390 ksys_write+0x63/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae KASAN report: BUG: KASAN: use-after-free in digital_send_cmd_complete+0x16/0x50 Write of size 8 at addr ffff88800a05f720 by task kworker/0:2/71 Workqueue: events nfcsim_recv_wq [nfcsim] Call Trace: dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x21/0x140 ? digital_send_cmd_complete+0x16/0x50 ? digital_send_cmd_complete+0x16/0x50 kasan_report.cold+0x7f/0x11b ? digital_send_cmd_complete+0x16/0x50 ? digital_dep_link_down+0x60/0x60 digital_send_cmd_complete+0x16/0x50 nfcsim_recv_wq+0x38f/0x3d5 [nfcsim] ? nfcsim_in_send_cmd+0x4a/0x4a [nfcsim] ? lock_is_held_type+0x98/0x110 ? finish_wait+0x110/0x110 ? rcu_read_lock_sched_held+0x9c/0xd0 ? rcu_read_lock_bh_held+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x12e/0x1f0 This flow of calling digital_send_cmd_complete() callback on driver exit is specific to nfcsim which implements reading and sending work queues. Since the NFC digital device was unregistered, the callback should not be called. Fixes: 204bddcb508f ("NFC: nfcsim: Make use of the Digital layer") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	tulip: windbond-840: Fix missing pci_disable_device() in probe and remove	Wang Hai
	Replace pci_enable_device() with pcim_enable_device(), pci_disable_device() and pci_release_regions() will be called in release automatically. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	sctp: fix return value check in __sctp_rcv_asconf_lookup	Marcelo Ricardo Leitner
	As Ben Hutchings noticed, this check should have been inverted: the call returns true in case of success. Reported-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 0c5dc070ff3d ("sctp: validate from_addr_param return") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	nfc: s3fwrn5: fix undefined parameter values in dev_err()	Tang Bin
	In the function s3fwrn5_fw_download(), the 'ret' is not assigned, so the correct value should be given in dev_err function. Fixes: a0302ff5906a ("nfc: s3fwrn5: remove unnecessary label") Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	Merge tag 'mlx5-fixes-2021-07-27' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-07-27 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not ↵	Peter Ujfalusi
	yet available If the router_xlate can not find the controller in the available DMA devices then it should return with -EPORBE_DEFER in a same way as the of_dma_request_slave_channel() does. The issue can be reproduced if the event router is registered before the DMA controller itself and a driver would request for a channel before the controller is registered. In of_dma_request_slave_channel(): 1. of_dma_find_controller() would find the dma_router 2. ofdma->of_dma_xlate() would fail and returned NULL 3. -ENODEV is returned as error code with this patch we would return in this case the correct -EPROBE_DEFER and the client can try to request the channel later. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Link: https://lore.kernel.org/r/20210717190021.21897-1-peter.ujfalusi@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-07-28	dmaengine: stm32-dmamux: Fix PM usage counter unbalance in stm32 dmamux ops	Zhang Qilong
	pm_runtime_get_sync will increment pm usage counter even it failed. Forgetting to putting operation will result in reference leak here. We fix it by replacing it with pm_runtime_resume_and_get to keep usage counter balanced. Fixes: 4f3ceca254e0f ("dmaengine: stm32-dmamux: Add PM Runtime support") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20210607064640.121394-3-zhangqilong3@huawei.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-07-28	dmaengine: stm32-dma: Fix PM usage counter imbalance in stm32 dma ops	Zhang Qilong
	pm_runtime_get_sync will increment pm usage counter even it failed. Forgetting to putting operation will result in reference leak here. We fix it by replacing it with pm_runtime_resume_and_get to keep usage counter balanced. Fixes: 48bc73ba14bcd ("dmaengine: stm32-dma: Add PM Runtime support") Fixes: 05f8740a0e6fc ("dmaengine: stm32-dma: add suspend/resume power management support") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20210607064640.121394-2-zhangqilong3@huawei.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-07-28	usb: gadget: f_hid: idle uses the highest byte for duration	Maxim Devaev
	SET_IDLE value must be shifted 8 bits to the right to get duration. This confirmed by USBCV test. Fixes: afcff6dc690e ("usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers") Cc: stable <stable@vger.kernel.org> Signed-off-by: Maxim Devaev <mdevaev@gmail.com> Link: https://lore.kernel.org/r/20210727185800.43796-1-mdevaev@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27	block: delay freeing the gendisk	Christoph Hellwig
	blkdev_get_no_open acquires a reference to the block_device through the block device inode and then tries to acquire a device model reference to the gendisk. But at this point the disk migh already be freed (although the race is free). Fix this by only freeing the gendisk from the whole device bdevs ->free_inode callback as well. Fixes: 22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210722075402.983367-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-27	blk-iocost: fix operation ordering in iocg_wake_fn()	Tejun Heo
	iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it wants the wq_entry to be always removed whether it ended up waking the task or not. finish_wait() tests whether wq_entry needs removal without grabbing the wait_queue lock and expects the waker to use list_del_init_careful() after all waking operations are complete, which iocg_wake_fn() didn't do. The operation order was wrong and the regular list_del_init() was used. The result is that if a waiter wakes up racing the waker, it can free pop the wq_entry off stack before the waker is still looking at it, which can lead to a backtrace like the following. [7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP ... [7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0 ... [7312084.858314] Call Trace: [7312084.863548] _raw_spin_lock_irqsave+0x22/0x30 [7312084.872605] try_to_wake_up+0x4c/0x4f0 [7312084.880444] iocg_wake_fn+0x71/0x80 [7312084.887763] __wake_up_common+0x71/0x140 [7312084.895951] iocg_kick_waitq+0xe8/0x2b0 [7312084.903964] ioc_rqos_throttle+0x275/0x650 [7312084.922423] __rq_qos_throttle+0x20/0x30 [7312084.930608] blk_mq_make_request+0x120/0x650 [7312084.939490] generic_make_request+0xca/0x310 [7312084.957600] submit_bio+0x173/0x200 [7312084.981806] swap_readpage+0x15c/0x240 [7312084.989646] read_swap_cache_async+0x58/0x60 [7312084.998527] swap_cluster_readahead+0x201/0x320 [7312085.023432] swapin_readahead+0x2df/0x450 [7312085.040672] do_swap_page+0x52f/0x820 [7312085.058259] handle_mm_fault+0xa16/0x1420 [7312085.066620] do_page_fault+0x2c6/0x5c0 [7312085.074459] page_fault+0x2f/0x40 Fix it by switching to list_del_init_careful() and putting it at the end. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rik van Riel <riel@surriel.com> Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-27	cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync	Tejun Heo
	0fa294fb1985 ("cgroup: Replace cgroup_rstat_mutex with a spinlock") added cgroup_rstat_flush_irqsafe() allowing flushing to happen from the irq context. However, rstat paths use u64_stats_sync to synchronize access to 64bit stat counters on 32bit machines. u64_stats_sync is implemented using seq_lock and trying to read from an irq context can lead to A-A deadlock if the irq happens to interrupt the stat update. Fix it by using the irqsafe variants - u64_stats_update_begin_irqsave() and u64_stats_update_end_irqrestore() - in the update paths. Note that none of this matters on 64bit machines. All these are just for 32bit SMP setups. Note that the interface was introduced way back, its first and currently only use was recently added by 2d146aa3aa84 ("mm: memcontrol: switch to rstat"). Stable tagging targets this commit. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rik van Riel <riel@surriel.com> Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") Cc: stable@vger.kernel.org # v5.13+
2021-07-27	net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32	Chris Mi
	The offending refactor commit uses u16 chain wrongly. Actually, it should be u32. Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure") CC: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()	Dima Chumak
	The result of __dev_get_by_index() is not checked for NULL and then gets dereferenced immediately. Also, __dev_get_by_index() must be called while holding either RTNL lock or @dev_base_lock, which isn't satisfied by mlx5e_hairpin_get_mdev() or its callers. This makes the underlying hlist_for_each_entry() loop not safe, and can have adverse effects in itself. Fix by using dev_get_by_index() and handling nullptr return value when ifindex device is not found. Update mlx5e_hairpin_get_mdev() callers to check for possible PTR_ERR() result. Fixes: 77ab67b7f0f9 ("net/mlx5e: Basic setup of hairpin object") Addresses-Coverity: ("Dereference null return value") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5: Unload device upon firmware fatal error	Aya Levin
	When fw_fatal reporter reports an error, the firmware in not responding. Unload the device to ensure that the driver closes all its resources, even if recovery is not due (user disabled auto-recovery or reporter is in grace period). On successful recovery the device is loaded back up. Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5e: Fix page allocation failure for ptp-RQ over SF	Aya Levin
	Set the correct pci-device pointer to the ptp-RQ. This allows access to dma_mask and avoids allocation request with wrong pci-device. Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5e: Fix page allocation failure for trap-RQ over SF	Aya Levin
	Set the correct device pointer to the trap-RQ, to allow access to dma_mask and avoid allocation request with the wrong pci-dev. WARNING: CPU: 1 PID: 12005 at kernel/dma/mapping.c:151 dma_map_page_attrs+0x139/0x1c0 ... all Trace: <IRQ> ? __page_pool_alloc_pages_slow+0x5a/0x210 mlx5e_post_rx_wqes+0x258/0x400 [mlx5_core] mlx5e_trap_napi_poll+0x44/0xc0 [mlx5_core] __napi_poll+0x24/0x150 net_rx_action+0x22b/0x280 __do_softirq+0xc7/0x27e do_softirq+0x61/0x80 </IRQ> __local_bh_enable_ip+0x4b/0x50 mlx5e_handle_action_trap+0x2dd/0x4d0 [mlx5_core] blocking_notifier_call_chain+0x5a/0x80 mlx5_devlink_trap_action_set+0x8b/0x100 [mlx5_core] Fixes: 5543e989fe5e ("net/mlx5e: Add trap entity to ETH driver") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5e: Consider PTP-RQ when setting RX VLAN stripping	Aya Levin
	Add PTP-RQ to the loop when setting rx-vlan-offload feature via ethtool. On PTP-RQ's creation, set rx-vlan-offload into its parameters. Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5e: Add NETIF_F_HW_TC to hw_features when HTB offload is available	Maxim Mikityanskiy
	If a feature flag is only present in features, but not in hw_features, the user can't reset it. Although hw_features may contain NETIF_F_HW_TC by the point where the driver checks whether HTB offload is supported, this flag is controlled by another condition that may not hold. Set it explicitly to make sure the user can disable it. Fixes: 214baf22870c ("net/mlx5e: Support HTB offload") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5e: RX, Avoid possible data corruption when relaxed ordering and LRO ↵	Tariq Toukan
	combined When HW aggregates packets for an LRO session, it writes the payload of two consecutive packets of a flow contiguously, so that they usually share a cacheline. The first byte of a packet's payload is written immediately after the last byte of the preceding packet. In this flow, there are two consecutive write requests to the shared cacheline: 1. Regular write for the earlier packet. 2. Read-modify-write for the following packet. In case of relaxed-ordering on, these two writes might be re-ordered. Using the end padding optimization (to avoid partial write for the last cacheline of a packet) becomes problematic if the two writes occur out-of-order, as the padding would overwrite payload that belongs to the following packet, causing data corruption. Avoid this by disabling the end padding optimization when both LRO and relaxed-ordering are enabled. Fixes: 17347d5430c4 ("net/mlx5e: Add support for PCI relaxed ordering") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5: E-Switch, handle devcom events only for ports on the same device	Roi Dayan
	This is the same check as LAG mode checks if to enable lag. This will fix adding peer miss rules if lag is not supported and even an incorrect rules in socket direct mode. Also fix the incorrect comment on mlx5_get_next_phys_dev() as flow #1 doesn't exists. Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5: E-Switch, Set destination vport vhca id only when merged eswitch ↵	Maor Dickman
	is supported Destination vport vhca id is valid flag is set only merged eswitch isn't supported. Change destination vport vhca id value to be set also only when merged eswitch is supported. Fixes: e4ad91f23f10 ("net/mlx5e: Split offloaded eswitch TC rules for port mirroring") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5e: Disable Rx ntuple offload for uplink representor	Maor Dickman
	Rx ntuple offload is not supported in switchdev mode. Tryng to enable it cause kernel panic. BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000001065a5067 P4D 80000001065a5067 PUD 106594067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 1089 Comm: ethtool Not tainted 5.13.0-rc7_for_upstream_min_debug_2021_06_23_16_44 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_arfs_enable+0x70/0xd0 [mlx5_core] Code: 44 24 10 00 00 00 00 48 c7 44 24 18 00 00 00 00 49 63 c4 48 89 e2 44 89 e6 48 69 c0 20 08 00 00 48 89 ef 48 03 85 68 ac 00 00 <48> 8b 40 08 48 89 44 24 08 e8 d2 aa fd ff 48 83 05 82 96 18 00 01 RSP: 0018:ffff8881047679e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000004000000000 RCX: 0000004000000000 RDX: ffff8881047679e0 RSI: 0000000000000000 RDI: ffff888115100880 RBP: ffff888115100880 R08: ffffffffa00f6cb0 R09: ffff888104767a18 R10: ffff8881151000a0 R11: ffff888109479540 R12: 0000000000000000 R13: ffff888104767bb8 R14: ffff888115100000 R15: ffff8881151000a0 FS: 00007f41a64ab740(0000) GS:ffff8882f5dc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000104cbc005 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: set_feature_arfs+0x1e/0x40 [mlx5_core] mlx5e_handle_feature+0x43/0xa0 [mlx5_core] mlx5e_set_features+0x139/0x1b0 [mlx5_core] __netdev_update_features+0x2b3/0xaf0 ethnl_set_features+0x176/0x3a0 ? __nla_parse+0x22/0x30 genl_family_rcv_msg_doit+0xe2/0x140 genl_rcv_msg+0xde/0x1d0 ? features_reply_size+0xe0/0xe0 ? genl_get_cmd+0xd0/0xd0 netlink_rcv_skb+0x4e/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1f6/0x2b0 netlink_sendmsg+0x225/0x450 sock_sendmsg+0x33/0x40 __sys_sendto+0xd4/0x120 ? __sys_recvmsg+0x4e/0x90 ? exc_page_fault+0x219/0x740 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3f/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f41a65b0cba Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007ffd8d688358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000010f42a0 RCX: 00007f41a65b0cba RDX: 0000000000000058 RSI: 00000000010f43b0 RDI: 0000000000000003 RBP: 000000000047ae60 R08: 00007f41a667c000 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 00000000010f4340 R13: 00000000010f4350 R14: 00007ffd8d688400 R15: 00000000010f42a0 Modules linked in: mlx5_vdpa vhost_iotlb vdpa xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core ptp pps_core fuse CR2: 0000000000000008 ---[ end trace c66523f2aba94b43 ]--- Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-27	net/mlx5: Fix flow table chaining	Maor Gottlieb
	Fix a bug when flow table is created in priority that already has other flow tables as shown in the below diagram. If the new flow table (FT-B) has the lowest level in the priority, we need to connect the flow tables from the previous priority (p0) to this new table. In addition when this flow table is destroyed (FT-B), we need to connect the flow tables from the previous priority (p0) to the next level flow table (FT-C) in the same priority of the destroyed table (if exists). --------- \|root_ns\| --------- \| -------------------------------- \| \| \| ---------- ---------- --------- \|p(prio)-x\| \| p-y \| \| p-n \| ---------- ---------- --------- \| \| ---------------- ------------------ \|ns(e.g bypass)\| \|ns(e.g. kernel) \| ---------------- ------------------ \| \| \| ------- ------ ---- \| p0 \| \| p1 \| \|p2\| ------- ------ ---- \| \| \ -------- ------- ------ \| FT-A \| \|FT-B \| \|FT-C\| -------- ------- ------ Fixes: f90edfd279f3 ("net/mlx5_core: Connect flow tables") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-28	Merge branch 'ipa-clock-refs'	David S. Miller
	Alex Elder says: ==================== net: ipa: add clock references This series continues preparation for implementing runtime power management for IPA. We need to ensure that the IPA core clock and interconnects are operational whenever IPA hardware is accessed. And in particular this means that any external entry point that can lead to accessing IPA hardware must guarantee the hardware is "up" when it is accessed. The first four patches in this series take IPA clock references when needed by such external entry points, dropping those references in those same functions when they are no longer required. The last patch is a bit different, though it too prepares for enabling runtime power management. It avoids suspending/resuming endpoints if setup is not complete. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: ipa: don't suspend endpoints if setup not complete	Alex Elder
	Until we complete the setup stage of initialization, GSI is not initialized and therefore endpoints aren't usable. So avoid suspending endpoints during system suspend unless setup is complete. Clear the setup_complete flag at the top of ipa_teardown() to reflect the fact that things are no longer in setup state. Get rid of a misplaced (and superfluous) comment. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: ipa: add a clock reference for netdev operations	Alex Elder
	The IPA network device can be opened at any time, and an opened network device can be stopped any time. Both of these callback functions require access to the hardware, and therefore they need the IPA clock to be operational. Take an IPA clock reference in both the ->open and ->stop callback functions, dropping the reference when they are done accessing hardware. The ->start_xmit callback requires a little different handling, and that will be added separately. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: ipa: add clock reference for remoteproc SSR	Alex Elder
	The remoteproc SSR callback function for the modem requires hardware access when handling a modem crash or shutdown. Take and later release an IPA clock reference in ipa_modem_crashed(), to ensure the hardware is operational. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: ipa: get another clock for ipa_setup()	Alex Elder
	Two places call ipa_setup(). The first, ipa_probe(), holds an IPA clock reference when calling ipa_setup() (if the AP is responsible for IPA firmware loading). But if the modem is loading IPA firmware, ipa_smp2p_modem_setup_ready_isr() calls ipa_setup() after the modem has signaled the hardware is ready. This can happen at any time, and there is no guarantee the hardware is active. Have ipa_smp2p_modem_setup() take an IPA clock reference before it calls ipa_setup(), and release it once setup is complete. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-28	net: ipa: get clock in ipa_probe()	Alex Elder
	Any entry point that leads to IPA hardware access must ensure the hardware is operational (clocked). Currently we ensure this by taking an extra clock reference during setup that is not released until we receive a system suspend request. But this extra reference will soon go away. When the platform driver ->probe function is called, we first need hardware access in ipa_config(). Although ipa_config() takes an IPA clock reference, it the special reference taken to prevent suspending the hardware. Have ipa_probe() take a reference before calling ipa_config(), so that the "no-suspend" reference can eventually go away. Drop this reference before ipa_probe() returns. Similarly, the driver ->remove function can be called at any time. Take an IPA clock reference at the beginning of that function, and drop it again after the deconfig stage has completed (at which point hardware access is no longer needed). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>