summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2022-05-13Merge tag 'hwmon-for-v5.18-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Restrict ltq-cputemp to SOC_XWAY to fix build failure - Add OF device ID table to tmp401 driver to enable auto-load * tag 'hwmon-for-v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (ltq-cputemp) restrict it to SOC_XWAY hwmon: (tmp401) Add OF device ID table
2022-05-13Merge tag 'drm-fixes-2022-05-13' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Pretty quiet week on the fixes front, 4 amdgpu and one i915 fix. I think there might be a few misc fbdev ones outstanding, but I'll see if they are necessary and pass them on if so. amdgpu: - Disable ASPM for VI boards on ADL platforms - S0ix DCN3.1 display fix - Resume regression fix - Stable pstate fix i915: - fix for kernel memory corruption when running a lot of OpenCL tests in parallel" * tag 'drm-fixes-2022-05-13' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu/ctx: only reset stable pstate if the user changed it (v2) Revert "drm/amd/pm: keep the BACO feature enabled for suspend" drm/i915: Fix race in __i915_vma_remove_closed drm/amd/display: undo clearing of z10 related function pointers drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems
2022-05-13Merge tag 'amd-drm-fixes-5.18-2022-05-11' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.18-2022-05-11: amdgpu: - Disable ASPM for VI boards on ADL platforms - S0ix DCN3.1 display fix - Resume regression fix - Stable pstate fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220511174422.5769-1-alexander.deucher@amd.com
2022-05-12Merge tag 'net-5.18-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless, and bluetooth. No outstanding fires. Current release - regressions: - eth: atlantic: always deep reset on pm op, fix null-deref Current release - new code bugs: - rds: use maybe_get_net() when acquiring refcount on TCP sockets [refinement of a previous fix] - eth: ocelot: mark traps with a bool instead of guessing type based on list membership Previous releases - regressions: - net: fix skipping features in for_each_netdev_feature() - phy: micrel: fix null-derefs on suspend/resume and probe - bcmgenet: check for Wake-on-LAN interrupt probe deferral Previous releases - always broken: - ipv4: drop dst in multicast routing path, prevent leaks - ping: fix address binding wrt vrf - net: fix wrong network header length when BPF protocol translation is used on skbs with a fraglist - bluetooth: fix the creation of hdev->name - rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition - wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing - wifi: ath11k: reduce the wait time of 11d scan and hw scan while adding an interface - mac80211: fix rx reordering with non explicit / psmp ack policy - mac80211: reset MBSSID parameters upon connection - nl80211: fix races in nl80211_set_tx_bitrate_mask() - tls: fix context leak on tls_device_down - sched: act_pedit: really ensure the skb is writable - batman-adv: don't skb_split skbuffs with frag_list - eth: ocelot: fix various issues with TC actions (null-deref; bad stats; ineffective drops; ineffective filter removal)" * tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) tls: Fix context leak on tls_device_down net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe() net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down() mlxsw: Avoid warning during ip6gre device removal net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral net: ethernet: mediatek: ppe: fix wrong size passed to memset() Bluetooth: Fix the creation of hdev->name i40e: i40e_main: fix a missing check on list iterator net/sched: act_pedit: really ensure the skb is writable s390/lcs: fix variable dereferenced before check s390/ctcm: fix potential memory leak s390/ctcm: fix variable dereferenced before check net: atlantic: verify hw_head_ lies within TX buffer ring net: atlantic: add check for MAX_SKB_FRAGS net: atlantic: reduce scope of is_rsc_complete net: atlantic: fix "frag[0] not initialized" net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe() net: phy: micrel: Fix incorrect variable type in micrel decnet: Use container_of() for struct dn_neigh casts ...
2022-05-12net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()Taehee Yoo
In the NIC ->probe() callback, ->mtd_probe() callback is called. If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too. In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and initializes mtd partiion. But mtd partition for sfc is shared data. So that allocated mtd partition data from last called efx_ef10_mtd_probe() will not be used. Therefore it must be freed. But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe(). kmemleak reports: unreferenced object 0xffff88811ddb0000 (size 63168): comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120 [<ffffffffa3873f0e>] __kmalloc+0x20e/0x250 [<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc] [<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc] [<ffffffffa414192c>] local_pci_probe+0xdc/0x170 [<ffffffffa4145df5>] pci_device_probe+0x235/0x680 [<ffffffffa443dd52>] really_probe+0x1c2/0x8f0 [<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460 [<ffffffffa443e92a>] driver_probe_device+0x4a/0x120 [<ffffffffa443f2ae>] __driver_attach+0x16e/0x320 [<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190 [<ffffffffa443b75e>] bus_add_driver+0x39e/0x560 [<ffffffffa4440b1e>] driver_register+0x18e/0x310 [<ffffffffc02e2055>] 0xffffffffc02e2055 [<ffffffffa3001af3>] do_one_initcall+0xc3/0x450 [<ffffffffa33ca574>] do_init_module+0x1b4/0x700 Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Fixes: 8127d661e77f ("sfc: Add support for Solarflare SFC9100 family") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()Florian Fainelli
After commit 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of state->link") the interface suspend path would call our mac_link_down() call back which would forcibly set the link down, thus preventing Wake-on-LAN packets from reaching our management port. Fix this by looking at whether the port is enabled for Wake-on-LAN and not clearing the link status in that case to let packets go through. Fixes: 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of state->link") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220512021731.2494261-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12mlxsw: Avoid warning during ip6gre device removalAmit Cohen
IPv6 addresses which are used for tunnels are stored in a hash table with reference counting. When a new GRE tunnel is configured, the driver is notified and configures it in hardware. Currently, any change in the tunnel is not applied in the driver. It means that if the remote address is changed, the driver is not aware of this change and the first address will be used. This behavior results in a warning [1] in scenarios such as the following: # ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit # ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit # ip link delete gre1 The change of the address is not applied in the driver. Currently, the driver uses the remote address which is stored in the 'parms' of the overlay device. When the tunnel is removed, the new IPv6 address is used, the driver tries to release it, but as it is not aware of the change, this address is not configured and it warns about releasing non existing IPv6 address. Fix it by using the IPv6 address which is cached in the IPIP entry, this address is the last one that the driver used, so even in cases such the above, the first address will be released, without any warning. [1]: WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84 Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021 RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... Call Trace: <TASK> mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum] mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum] mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum] raw_notifier_call_chain+0x3c/0x50 call_netdevice_notifiers_info+0x2f/0x80 unregister_netdevice_many+0x311/0x6d0 rtnl_dellink+0x136/0x360 rtnetlink_rcv_msg+0x12f/0x380 netlink_rcv_skb+0x49/0xf0 netlink_unicast+0x233/0x340 netlink_sendmsg+0x202/0x440 ____sys_sendmsg+0x1f3/0x220 ___sys_sendmsg+0x70/0xb0 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e846efe2737b ("mlxsw: spectrum: Add hash table for IPv6 address mapping") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12net: bcmgenet: Check for Wake-on-LAN interrupt probe deferralFlorian Fainelli
The interrupt controller supplying the Wake-on-LAN interrupt line maybe modular on some platforms (irq-bcm7038-l1.c) and might be probed at a later time than the GENET driver. We need to specifically check for -EPROBE_DEFER and propagate that error to ensure that we eventually fetch the interrupt descriptor. Fixes: 9deb48b53e7f ("bcmgenet: add WOL IRQ check") Fixes: 5b1f0e62941b ("net: bcmgenet: Avoid touching non-existent interrupt") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Stefan Wahren <stefan.wahren@i2se.com> Link: https://lore.kernel.org/r/20220511031752.2245566-1-f.fainelli@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12net: ethernet: mediatek: ppe: fix wrong size passed to memset()Yang Yingliang
'foe_table' is a pointer, the real size of struct mtk_foe_entry should be pass to memset(). Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220511030829.3308094-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-11Merge tag 'wireless-2022-05-11' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v5.18 Second set of fixes for v5.18 and hopefully the last one. We have a new iwlwifi maintainer, a fix to rfkill ioctl interface and important fixes to both stack and two drivers. * tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition nl80211: fix locking in nl80211_set_tx_bitrate_mask() mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection mac80211_hwsim: fix RCU protected chanctx access mailmap: update Kalle Valo's email mac80211: Reset MBSSID parameters upon connection cfg80211: retrieve S1G operating channel number nl80211: validate S1G channel width mac80211: fix rx reordering with non explicit / psmp ack policy ath11k: reduce the wait time of 11d scan and hw scan while add interface MAINTAINERS: update iwlwifi driver maintainer iwlwifi: iwl-dbg: Use del_timer_sync() before freeing ==================== Link: https://lore.kernel.org/r/20220511154535.A1A12C340EE@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11i40e: i40e_main: fix a missing check on list iteratorXiaomeng Tong
The bug is here: ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err); The list iterator 'ch' will point to a bogus position containing HEAD if the list is empty or no element is found. This case must be checked before any use of the iterator, otherwise it will lead to a invalid memory access. To fix this bug, use a new variable 'iter' as the list iterator, while use the origin variable 'ch' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e") Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220510204846.2166999-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11drm/amdgpu/ctx: only reset stable pstate if the user changed it (v2)Alex Deucher
Check if the requested stable pstate matches the current one before changing it. This avoids changing the stable pstate on context destroy if the user never changed it in the first place via the IOCTL. v2: compare the current and requested rather than setting a flag (Lijo) Fixes: 8cda7a4f96e435 ("drm/amdgpu/UAPI: add new CTX OP to get/set stable pstates") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-05-11Revert "drm/amd/pm: keep the BACO feature enabled for suspend"Alex Deucher
This reverts commit eaa090538e8d21801c6d5f94590c3799e6a528b5. Commit ebc002e3ee78 ("drm/amdgpu: don't use BACO for reset in S3") stops using BACO for reset during suspend, so it's no longer necessary to leave BACO enabled during suspend. This fixes resume from suspend on the navy flounder dGPU in the ASUS ROG Strix G513QY. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1982 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-05-11s390/lcs: fix variable dereferenced before checkAlexandra Winter
smatch complains about drivers/s390/net/lcs.c:1741 lcs_get_control() warn: variable dereferenced before check 'card->dev' (see line 1739) Fixes: 27eb5ac8f015 ("[PATCH] s390: lcs driver bug fixes and improvements [1/2]") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11s390/ctcm: fix potential memory leakAlexandra Winter
smatch complains about drivers/s390/net/ctcm_mpc.c:1210 ctcmpc_unpack_skb() warn: possible memory leak of 'mpcginfo' mpc_action_discontact() did not free mpcginfo. Consolidate the freeing in ctcmpc_unpack_skb(). Fixes: 293d984f0e36 ("ctcm: infrastructure for replaced ctc driver") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11s390/ctcm: fix variable dereferenced before checkAlexandra Winter
Found by cppcheck and smatch. smatch complains about drivers/s390/net/ctcm_sysfs.c:43 ctcm_buffer_write() warn: variable dereferenced before check 'priv' (see line 42) Fixes: 3c09e2647b5e ("ctcm: rename READ/WRITE defines to avoid redefinitions") Reported-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11net: atlantic: verify hw_head_ lies within TX buffer ringGrant Grundler
Bounds check hw_head index provided by NIC to verify it lies within the TX buffer ring. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11net: atlantic: add check for MAX_SKB_FRAGSGrant Grundler
Enforce that the CPU can not get stuck in an infinite loop. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11net: atlantic: reduce scope of is_rsc_completeGrant Grundler
Don't defer handling the err case outside the loop. That's pointless. And since is_rsc_complete is only used inside this loop, declare it inside the loop to reduce it's scope. Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11net: atlantic: fix "frag[0] not initialized"Grant Grundler
In aq_ring_rx_clean(), if buff->is_eop is not set AND buff->len < AQ_CFG_RX_HDR_SIZE, then hdr_len remains equal to buff->len and skb_add_rx_frag(xxx, *0*, ...) is not called. The loop following this code starts calling skb_add_rx_frag() starting with i=1 and thus frag[0] is never initialized. Since i is initialized to zero at the top of the primary loop, we can just reference and post-increment i instead of hardcoding the 0 when calling skb_add_rx_frag() the first time. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-10net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()Yang Yingliang
Switch to using pcim_enable_device() to avoid missing pci_disable_device(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220510031316.1780409-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10net: phy: micrel: Fix incorrect variable type in micrelWan Jiabing
In lanphy_read_page_reg, calling __phy_read() might return a negative error code. Use 'int' to check the error code. Fixes: 7c2dcfa295b1 ("net: phy: micrel: Add support for LAN8804 PHY") Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220509144519.2343399-1-wanjiabing@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10net: atlantic: always deep reset on pm op, fixing up my null deref regressionManuel Ullmann
The impact of this regression is the same for resume that I saw on thaw: the kernel hangs and nothing except SysRq rebooting can be done. Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs"), where I disabled deep pm resets in suspend and resume, trying to make sense of the atl_resume_common() deep parameter in the first place. It turns out, that atlantic always has to deep reset on pm operations. Even though I expected that and tested resume, I screwed up by kexec-rebooting into an unpatched kernel, thus missing the breakage. This fixup obsoletes the deep parameter of atl_resume_common, but I leave the cleanup for the maintainers to post to mainline. Suspend and hibernation were successfully tested by the reporters. Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs") Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/ Reported-by: Jordan Leppert <jordanleppert@protonmail.com> Reported-by: Holger Hoffstaette <holger@applied-asynchrony.com> Tested-by: Jordan Leppert <jordanleppert@protonmail.com> Tested-by: Holger Hoffstaette <holger@applied-asynchrony.com> CC: <stable@vger.kernel.org> # 5.10+ Signed-off-by: Manuel Ullmann <labre@posteo.de> Link: https://lore.kernel.org/r/87bkw8dfmp.fsf@posteo.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-09Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-05-06 This series contains updates to ice driver only. Ivan Vecera fixes a race with aux plug/unplug by delaying setting adev until initialization is complete and adding locking. Anatolii ensures VF queues are completely disabled before attempting to reconfigure them. Michal ensures stale Tx timestamps are cleared from hardware. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: fix PTP stale Tx timestamps cleanup ice: clear stale Tx queue settings before configuring ice: Fix race during aux device (un)plugging ==================== Link: https://lore.kernel.org/r/20220506174129.4976-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09net: phy: Fix race condition on link status changeFrancesco Dolcini
This fixes the following error caused by a race condition between phydev->adjust_link() and a MDIO transaction in the phy interrupt handler. The issue was reproduced with the ethernet FEC driver and a micrel KSZ9031 phy. [ 146.195696] fec 2188000.ethernet eth0: MDIO read timeout [ 146.201779] ------------[ cut here ]------------ [ 146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c [ 146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug [ 146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9 [ 146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 146.236257] unwind_backtrace from show_stack+0x10/0x14 [ 146.241640] show_stack from dump_stack_lvl+0x58/0x70 [ 146.246841] dump_stack_lvl from __warn+0xb4/0x24c [ 146.251772] __warn from warn_slowpath_fmt+0x5c/0xd4 [ 146.256873] warn_slowpath_fmt from phy_error+0x24/0x6c [ 146.262249] phy_error from kszphy_handle_interrupt+0x40/0x48 [ 146.268159] kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78 [ 146.274417] irq_thread_fn from irq_thread+0xf0/0x1dc [ 146.279605] irq_thread from kthread+0xe4/0x104 [ 146.284267] kthread from ret_from_fork+0x14/0x28 [ 146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8) [ 146.294448] 1fa0: 00000000 00000000 00000000 00000000 [ 146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 146.318262] irq event stamp: 12325 [ 146.321780] hardirqs last enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60 [ 146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60 [ 146.338259] softirqs last enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624 [ 146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178 [ 146.354447] ---[ end trace 0000000000000000 ]--- With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link() calls fec_stop()/fec_restart() and both these function reset and temporary disable the FEC disrupting any MII transaction that could be happening at the same time. fec_enet_adjust_link() and phy_read() can be running at the same time when we have one additional interrupt before the phy_state_machine() is able to terminate. Thread 1 (phylib WQ) | Thread 2 (phy interrupt) | | phy_interrupt() <-- PHY IRQ | handle_interrupt() | phy_read() | phy_trigger_machine() | --> schedule phylib WQ | | phy_state_machine() | phy_check_link_status() | phy_link_change() | phydev->adjust_link() | fec_enet_adjust_link() | --> FEC reset | phy_interrupt() <-- PHY IRQ | phy_read() | Fix this by acquiring the phydev lock in phy_interrupt(). Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/ Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices") cc: <stable@vger.kernel.org> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220506060815.327382-1-francesco.dolcini@toradex.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09hwmon: (ltq-cputemp) restrict it to SOC_XWAYRandy Dunlap
Building with SENSORS_LTQ_CPUTEMP=y with SOC_FALCON=y causes build errors since FALCON does not support the same features as XWAY. Change this symbol to depend on SOC_XWAY since that provides the necessary interfaces. Repairs these build errors: ../drivers/hwmon/ltq-cputemp.c: In function 'ltq_cputemp_enable': ../drivers/hwmon/ltq-cputemp.c:23:9: error: implicit declaration of function 'ltq_cgu_w32'; did you mean 'ltq_ebu_w32'? [-Werror=implicit-function-declaration] 23 | ltq_cgu_w32(ltq_cgu_r32(CGU_GPHY1_CR) | CGU_TEMP_PD, CGU_GPHY1_CR); ../drivers/hwmon/ltq-cputemp.c:23:21: error: implicit declaration of function 'ltq_cgu_r32'; did you mean 'ltq_ebu_r32'? [-Werror=implicit-function-declaration] 23 | ltq_cgu_w32(ltq_cgu_r32(CGU_GPHY1_CR) | CGU_TEMP_PD, CGU_GPHY1_CR); ../drivers/hwmon/ltq-cputemp.c: In function 'ltq_cputemp_probe': ../drivers/hwmon/ltq-cputemp.c:92:31: error: 'SOC_TYPE_VR9_2' undeclared (first use in this function) 92 | if (ltq_soc_type() != SOC_TYPE_VR9_2) Fixes: 7074d0a92758 ("hwmon: (ltq-cputemp) add cpu temp sensor driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Florian Eckert <fe@dev.tdt.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jean Delvare <jdelvare@suse.com> Cc: linux-hwmon@vger.kernel.org Link: https://lore.kernel.org/r/20220509234740.26841-1-rdunlap@infradead.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-05-09ptp: ocp: Use DIV64_U64_ROUND_UP for rounding.Jonathan Lemon
The initial code used roundup() to round the starting time to a multiple of a period. This generated an error on 32-bit systems, so was replaced with DIV_ROUND_UP_ULL(). However, this truncates to 32-bits on a 64-bit system. Replace with DIV64_U64_ROUND_UP() instead. Fixes: b325af3cfab9 ("ptp: ocp: Add signal generators and update sysfs nodes") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220506223739.1930-2-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09ethernet: tulip: fix missing pci_disable_device() on error in tulip_init_one()Yang Yingliang
Fix the missing pci_disable_device() before return from tulip_init_one() in the error handling case. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220506094250.3630615-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09ionic: fix missing pci_release_regions() on error in ionic_probe()Yang Yingliang
If ionic_map_bars() fails, pci_release_regions() need be called. Fixes: fbfb8031533c ("ionic: Add hardware init and device commands") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220506034040.2614129-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09Merge tag 'platform-drivers-x86-v5.18-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: - thinkpad_acpi AMD suspend/resume + fan detection fixes - two other small fixes - one hardware-id addition * tag 'platform-drivers-x86-v5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/surface: aggregator: Fix initialization order when compiling as builtin module platform/surface: gpe: Add support for Surface Pro 8 platform/x86/intel: Fix 'rmmod pmt_telemetry' panic platform/x86: thinkpad_acpi: Correct dual fan probe platform/x86: thinkpad_acpi: Add a s2idle resume quirk for a number of laptops platform/x86: thinkpad_acpi: Convert btusb DMI list to quirks
2022-05-09mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protectionJohannes Berg
This is needed since it might use (and pass out) pointers to e.g. keys protected by RCU. Can't really happen here as the frames aren't encrypted, but we need to still adhere to the rules. Fixes: cacfddf82baf ("mac80211_hwsim: initialize ieee80211_tx_info at hw_scan_work") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20220505230421.5f139f9de173.I77ae111a28f7c0e9fd1ebcee7f39dbec5c606770@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-05-09mac80211_hwsim: fix RCU protected chanctx accessJohannes Berg
We need to RCU protect the chanctx_conf access, so do that. Fixes: 585625c955b1 ("mac80211_hwsim: check TX and STA bandwidth") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20220505230421.fb8055c081a2.Ic6da3307c77a909bd61a0ea25dc2a4b08fe1b03f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-05-09net: sfc: fix memory leak due to ptp channelTaehee Yoo
It fixes memory leak in ring buffer change logic. When ring buffer size is changed(ethtool -G eth0 rx 4096), sfc driver works like below. 1. stop all channels and remove ring buffers. 2. allocates new buffer array. 3. allocates rx buffers. 4. start channels. While the above steps are working, it skips some steps if the channel doesn't have a ->copy callback function. Due to ptp channel doesn't have ->copy callback, these above steps are skipped for ptp channel. It eventually makes some problems. a. ptp channel's ring buffer size is not changed, it works only 1024(default). b. memory leak. The reason for memory leak is to use the wrong ring buffer values. There are some values, which is related to ring buffer size. a. efx->rxq_entries - This is global value of rx queue size. b. rx_queue->ptr_mask - used for access ring buffer as circular ring. - roundup_pow_of_two(efx->rxq_entries) - 1 c. rx_queue->max_fill - efx->rxq_entries - EFX_RXD_HEAD_ROOM These all values should be based on ring buffer size consistently. But ptp channel's values are not. a. efx->rxq_entries - This is global(for sfc) value, always new ring buffer size. b. rx_queue->ptr_mask - This is always 1023(default). c. rx_queue->max_fill - This is new ring buffer size - EFX_RXD_HEAD_ROOM. Let's assume we set 4096 for rx ring buffer, normal channel ptp channel efx->rxq_entries 4096 4096 rx_queue->ptr_mask 4095 1023 rx_queue->max_fill 4086 4086 sfc driver allocates rx ring buffers based on these values. When it allocates ptp channel's ring buffer, 4086 ring buffers are allocated then, these buffers are attached to the allocated array. But ptp channel's ring buffer array size is still 1024(default) and ptr_mask is still 1023 too. So, 3062 ring buffers will be overwritten to the array. This is the reason for memory leak. Test commands: ethtool -G <interface name> rx 4096 while : do ip link set <interface name> up ip link set <interface name> down done In order to avoid this problem, it adds ->copy callback to ptp channel type. So that rx_queue->ptr_mask value will be updated correctly. Fixes: 7c236c43b838 ("sfc: Add support for IEEE-1588 PTP") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-09drm/i915: Fix race in __i915_vma_remove_closedKarol Herbst
i915_vma_reopen checked if the vma is closed before without taking the lock. So multiple threads could attempt removing the vma. Instead the lock needs to be taken before actually checking. v2: move struct declaration Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.3+ Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5732 Signed-off-by: Karol Herbst <kherbst@redhat.com> Fixes: 155ab8836caa ("drm/i915: Move object close under its own lock") Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220420095720.3331609-1-kherbst@redhat.com (cherry picked from commit 1df1c79cbb7ac9bf148930be3418973c76ba8dde) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-05-08Merge tag 'sound-5.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became slightly larger as I've been off in the last weeks. The majority of changes here is about ASoC, fixes for dmaengine and for addressing issues reported by CI, as well as other device-specific small fixes. Also, fixes for FireWire core stack and the usual HD-audio quirks are included" * tag 'sound-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits) ASoC: SOF: Fix NULL pointer exception in sof_pci_probe callback ASoC: ops: Validate input values in snd_soc_put_volsw_range() ASoC: dmaengine: Restore NULL prepare_slave_config() callback ASoC: atmel: mchp-pdmc: set prepare_slave_config ASoC: max98090: Generate notifications on changes for custom control ASoC: max98090: Reject invalid values in custom control put() ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers firewire: core: extend card->lock in fw_core_handle_bus_reset firewire: remove check of list iterator against head past the loop body firewire: fix potential uaf in outbound_phy_packet_callback() ASoC: rt9120: Correct the reg 0x09 size to one byte ALSA: hda/realtek: Enable mute/micmute LEDs support for HP Laptops ALSA: hda/realtek: Fix mute led issue on thinkpad with cs35l41 s-codec ASoC: meson: axg-card: Fix nonatomic links ASoC: meson: axg-tdm-interface: Fix formatters in trigger" ASoC: soc-ops: fix error handling ASoC: meson: Fix event generation for G12A tohdmi mux ASoC: meson: Fix event generation for AUI CODEC mux ASoC: meson: Fix event generation for AUI ACODEC mux ...
2022-05-08ataflop: use a statically allocated error countersWilly Tarreau
This is the last driver making use of fd_request->error_count, which is easy to get wrong as was shown in floppy.c. We don't need to keep it there, it can be moved to the atari_floppy_struct instead, so let's do this. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Minh Yuan <yuanmingbuaa@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-08floppy: use a statically allocated error counterWilly Tarreau
Interrupt handler bad_flp_intr() may cause a UAF on the recently freed request just to increment the error count. There's no point keeping that one in the request anyway, and since the interrupt handler uses a static pointer to the error which cannot be kept in sync with the pending request, better make it use a static error counter that's reset for each new request. This reset now happens when entering redo_fd_request() for a new request via set_next_request(). One initial concern about a single error counter was that errors on one floppy drive could be reported on another one, but this problem is not real given that the driver uses a single drive at a time, as that PC-compatible controllers also have this limitation by using shared signals. As such the error count is always for the "current" drive. Reported-by: Minh Yuan <yuanmingbuaa@gmail.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Tested-by: Denis Efremov <efremov@linux.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-07Merge tag 'gpio-fixes-for-v5.18-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix the bounds check for the 'gpio-reserved-ranges' device property in gpiolib-of - drop the assignment of the pwm base number in gpio-mvebu (this was missed by the patch doing it globally for all pwm drivers) - fix the fwnode assignment (use own fwnode, not the parent's one) for the GPIO irqchip in gpio-visconti - update the irq_stat field before checking the trigger field in gpio-pca953x - update GPIO entry in MAINTAINERS * tag 'gpio-fixes-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set) gpio: visconti: Fix fwnode of GPIO IRQ MAINTAINERS: update the GPIO git tree entry gpio: mvebu: drop pwm base assignment gpiolib: of: fix bounds check for 'gpio-reserved-ranges'
2022-05-07Merge tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A single revert for a change that isn't needed in 5.18, and a small series for s390/dasd" * tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-block: s390/dasd: Use kzalloc instead of kmalloc/memset s390/dasd: Fix read inconsistency for ESE DASD devices s390/dasd: Fix read for ESE with blksize < 4k s390/dasd: prevent double format of tracks for ESE devices s390/dasd: fix data corruption for ESE devices Revert "block: release rq qos structures for queue without disk"
2022-05-06net: chelsio: cxgb4: Avoid potential negative array offsetKees Cook
Using min_t(int, ...) as a potential array index implies to the compiler that negative offsets should be allowed. This is not the case, though. Replace "int" with "unsigned int". Fixes the following warning exposed under future CONFIG_FORTIFY_SOURCE improvements: In file included from include/linux/string.h:253, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/rcupdate.h:29, from include/linux/rculist.h:11, from include/linux/pid.h:5, from include/linux/sched.h:14, from include/linux/delay.h:23, from drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:35: drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function 't4_get_raw_vpd_params': include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 29 and size [2147483648, 4294967295] [-Warray-bounds] 46 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy' 388 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk' 433 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2796:9: note: in expansion of macro 'memcpy' 2796 | memcpy(p->id, vpd + id, min_t(int, id_len, ID_LEN)); | ^~~~~~ include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 0 and size [2147483648, 4294967295] [-Warray-bounds] 46 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy' 388 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk' 433 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2798:9: note: in expansion of macro 'memcpy' 2798 | memcpy(p->sn, vpd + sn, min_t(int, sn_len, SERNUM_LEN)); | ^~~~~~ Additionally remove needless cast from u8[] to char * in last strim() call. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202205031926.FVP7epJM-lkp@intel.com Fixes: fc9279298e3a ("cxgb4: Search VPD with pci_vpd_find_ro_info_keyword()") Fixes: 24c521f81c30 ("cxgb4: Use pci_vpd_find_id_string() to find VPD ID string") Cc: Raju Rangoju <rajur@chelsio.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220505233101.1224230-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06ice: fix PTP stale Tx timestamps cleanupMichal Michalik
Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: clear stale Tx queue settings before configuringAnatolii Gerasymenko
The iAVF driver uses 3 virtchnl op codes to communicate with the PF regarding the VF Tx queues: * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware logic for the Tx queues * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings. There is a bug in the iAVF driver due to the race condition between VF reset request and shutdown being executed in parallel. This leads to a break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent. If this occurs, the PF driver never cleans up the Tx queues. This results in leaving behind stale Tx queue settings in the hardware and firmware. The most obvious outcome is that upon the next VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx scheduler node due to a lack of space. We need to protect ICE driver against such situation. To fix this, make sure we clear existing stale settings out when handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the previous settings. Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if the queue is not configured. The function already handles the case when the Tx queue is not currently configured and exits with a 0 return in that case. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: Fix race during aux device (un)pluggingIvan Vecera
Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] <TASK> [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] </TASK> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A few recent regressions in rxe's multicast code, and some old driver bugs: - Error case unwind bug in rxe for rkeys - Dot not call netdev functions under a spinlock in rxe multicast code - Use the proper BH lock type in rxe multicast code - Fix idrma deadlock and crash - Add a missing flush to drain irdma QPs when in error - Fix high userspace latency in irdma during destroy due to synchronize_rcu() - Rare race in siw MPA processing" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/rxe: Change mcg_lock to a _bh lock RDMA/rxe: Do not call dev_mc_add/del() under a spinlock RDMA/siw: Fix a condition race issue in MPA request processing RDMA/irdma: Fix possible crash due to NULL netdev in notifier RDMA/irdma: Reduce iWARP QP destroy time RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state RDMA/rxe: Recheck the MR in when generating a READ reply RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core() RDMA/rxe: Fix "Replace mr by rkey in responder resources"
2022-05-06Merge tag 'mmc-v5.18-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull mmc fixes from Ulf Hansson: "MMC core: - Fix initialization for eMMC's HS200/HS400 mode MMC host: - sdhci-msm: Reset GCC_SDCC_BCR register to prevent timeout issues - sunxi-mmc: Fix DMA descriptors allocated above 32 bits" * tag 'mmc-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits mmc: core: Set HS clock speed before sending HS CMD13
2022-05-06Merge tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "A pretty quiet week, one fbdev, msm, kconfig, and two amdgpu fixes, about what I'd expect for rc6. fbdev: - hotunplugging fix amdgpu: - Fix a xen dom0 regression on APUs - Fix a potential array overflow if a receiver were to send an erroneous audio channel count msm: - lockdep fix. it6505: - kconfig fix" * tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT drm/amdgpu: do not use passthrough mode in Xen dom0 drm/bridge: ite-it6505: add missing Kconfig option select fbdev: Make fb_release() return -ENODEV if fbdev was unregistered drm/msm/dp: remove fail safe mode related code
2022-05-06gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)Puyou Lu
When one port's input state get inverted (eg. from low to hight) after pca953x_irq_setup but before setting irq_mask (by some other driver such as "gpio-keys"), the next inversion of this port (eg. from hight to low) will not be triggered any more (because irq_stat is not updated at the first time). Issue should be fixed after this commit. Fixes: 89ea8bbe9c3e ("gpio: pca953x.c: add interrupt handling capability") Signed-off-by: Puyou Lu <puyou.lu@gmail.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-06drm/amd/display: undo clearing of z10 related function pointersEric Yang
[Why] Z10 and S0i3 have some shared path. Previous code clean up , incorrectly removed these pointers, which breaks s0i3 restore [How] Do not clear the function pointers based on Z10 disable. Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Eric Yang <Eric.Yang2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-06drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systemsRichard Gong
Active State Power Management (ASPM) feature is enabled since kernel 5.14. There are some AMD Volcanic Islands (VI) GFX cards, such as the WX3200 and RX640, that do not work with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as video/display output, Intel Alder Lake based systems will freeze after suspend/resume. The issue was originally reported on one system (Dell Precision 3660 with BIOS version 0.14.81), but was later confirmed to affect at least 4 pre-production Alder Lake based systems. Add an extra check to disable ASPM on Intel Alder Lake based systems with the problematic AMD Volcanic Islands GFX cards. Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885 Signed-off-by: Richard Gong <richard.gong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-05-06platform/surface: aggregator: Fix initialization order when compiling as ↵Maximilian Luz
builtin module When building the Surface Aggregator Module (SAM) core, registry, and other SAM client drivers as builtin modules (=y), proper initialization order is not guaranteed. Due to this, client driver registration (triggered by device registration in the registry) races against bus initialization in the core. If any attempt is made at registering the device driver before the bus has been initialized (i.e. if bus initialization fails this race) driver registration will fail with a message similar to: Driver surface_battery was unable to register with bus_type surface_aggregator because the bus was not initialized Switch from module_init() to subsys_initcall() to resolve this issue. Note that the serdev subsystem uses postcore_initcall() so we are still able to safely register the serdev device driver for the core. Fixes: c167b9c7e3d6 ("platform/surface: Add Surface Aggregator subsystem") Reported-by: Blaž Hrastnik <blaz@mxxn.io> Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Link: https://lore.kernel.org/r/20220429195738.535751-1-luzmaximilian@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>