summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-15hwmon: (drivetemp) Set scsi command timeout to 10sRussell Harmon
There's at least one drive (MaxDigitalData OOS14000G) such that if it receives a large amount of I/O while entering an idle power state will first exit idle before responding, including causing SMART temperature requests to be delayed. This causes the drivetemp request to exceed its timeout of 1 second. Signed-off-by: Russell Harmon <russ@har.mn> Link: https://lore.kernel.org/r/20250115131340.3178988-1-russ@har.mn Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-01-15hwmon: (acpi_power_meter) Fix a check for the return value of ↵Kazuhiro Abe
read_domain_devices(). After commit fabb1f813ec0 ("hwmon: (acpi_power_meter) Fix fail to load module on platform without _PMD method"), the acpi_power_meter driver fails to load if the platform has _PMD method. To address this, add a check for successful read_domain_devices(). Tested on Nvidia Grace machine. Fixes: fabb1f813ec0 ("hwmon: (acpi_power_meter) Fix fail to load module on platform without _PMD method") Signed-off-by: Kazuhiro Abe <fj1078ii@aa.jp.fujitsu.com> Link: https://lore.kernel.org/r/20250115073532.3211000-1-fj1078ii@aa.jp.fujitsu.com [groeck: Dropped unnecessary () from expression] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-01-15Merge tag 'reset-fixes-for-v6.13' of git://git.pengutronix.de/pza/linux into ↵Arnd Bergmann
arm/fixes Reset controller fixes for v6.13 * Fix rzg2l-usb-vbus-regulator lookup by assigning the proper of node to the allocated platform device in the rzg2l-usbphy-ctrl driver. * tag 'reset-fixes-for-v6.13' of git://git.pengutronix.de/pza/linux: reset: rzg2l-usbphy-ctrl: Assign proper of node to the allocated device Link: https://lore.kernel.org/r/20250113163642.1757160-1-p.zabel@pengutronix.de Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-01-15drm/xe/oa: Add missing VISACTL mux registersAshutosh Dixit
Add missing VISACTL mux registers required for some OA config's (e.g. RenderPipeCtrl). Fixes: cdf02fe1a94a ("drm/xe/oa/uapi: Add/remove OA config perf ops") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250111021539.2920346-1-ashutosh.dixit@intel.com (cherry picked from commit c26f22dac3449d8a687237cdfc59a6445eb8f75a) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe: make change ccs_mode a synchronous actionMaciej Patelczyk
If ccs_mode is being modified via /sys/class/drm/cardX/device/tileY/gtY/ccs_mode the asynchronous reset is triggered and the write returns immediately. With that some test receive false information about number of CCS engines or even fail if they proceed without delay after changing the ccs_mode. Changing the ccs_mode change from async to sync to prevent failures in tests. Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211111727.1481476-3-maciej.patelczyk@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 480fb9806e2e073532f7786166287114c696b340) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe: introduce xe_gt_reset and xe_gt_wait_for_resetMaciej Patelczyk
Add synchronous version gt reset as there are few places where it is expected. Also add a wait helper to wait until gt reset is done. Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211111727.1481476-2-maciej.patelczyk@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 155c77f45f63dd58a37eeb0896b0b140ab785836) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe/guc: Adding steering info support for GuC register listsJesus Narvaez
The guc_mmio_reg interface supports steering, but it is currently not implemented. This will allow the GuC to control steering of MMIO registers after save-restore and avoid reading from fused off MCR register instances. Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241212190100.3768068-1-jesus.narvaez@intel.com (cherry picked from commit ee5a1321df90891d59d83b7c9d5b6c5b755d059d) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/bridge: fix documentation for the hdmi_audio_prepare() callbackDmitry Baryshkov
Fix c&p error and change linuxdoc comment for the hdmi_audio_prepare() callback from drm_bridge_funcs to mention the callback name instead of the original prepare() callback. Fixes: 0beba3f9d366 ("drm/bridge: connector: add support for HDMI codec framework") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250106174645.463927e0@canb.auug.org.au/ Acked-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250107-drm-bridge-fix-docs-v1-1-84e539e6f348@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2025-01-15irqchip: Plug a OF node reference leak in platform_irqchip_probe()Joe Hattori
platform_irqchip_probe() leaks a OF node when irq_init_cb() fails. Fix it by declaring par_np with the __free(device_node) cleanup construct. This bug was found by an experimental static analysis tool that I am developing. Fixes: f8410e626569 ("irqchip: Add IRQCHIP_PLATFORM_DRIVER_BEGIN/END and IRQCHIP_MATCH helper macros") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241215033945.3414223-1-joe@pf.is.s.u-tokyo.ac.jp
2025-01-15selftests: net: Adapt ethtool mq tests to fix in qdisc graftVictor Nogueira
Because of patch[1] the graft behaviour changed So the command: tcq replace parent 100:1 handle 204: Is no longer valid and will not delete 100:4 added by command: tcq replace parent 100:4 handle 204: pfifo_fast So to maintain the original behaviour, this patch manually deletes 100:4 and grafts 100:1 Note: This change will also work fine without [1] [1] https://lore.kernel.org/netdev/20250111151455.75480-1-jhs@mojatatu.com/T/#u Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-15doc/cgroup: Fix title underline lengthMaxime Ripard
Commit b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup") introduced a new documentation file, with a shorter than expected underline. This results in a documentation build warning. Fix that underline length. Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20250113154611.624256bf@canb.auug.org.au/ Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20250113092608.1349287-4-mripard@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-15drm/doc: Include new drm-compute documentationMaxime Ripard
Commit b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup") introduced a new documentation file, but didn't link it anywhere. It was thus triggering a documentation build warning. Make sure it's included as part of the DRM documentation. Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20250113155000.4a99e7b0@canb.auug.org.au/ Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20250113092608.1349287-3-mripard@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-15cgroup/dmem: Fix parameters documentationMaxime Ripard
During the dmem cgroup development, the parameters to the dmem_cgroup_state_evict_valuable() and dmem_cgroup_try_charge() were changed, but the documentation wasn't adjusted accordingly. This results in a documentation build warning. Adjust the documentation to reflect what the final functions parameters are. Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20250113160334.1f09f881@canb.auug.org.au/ Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20250113092608.1349287-2-mripard@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-15cgroup/dmem: Select PAGE_COUNTERMaxime Ripard
The dmem cgroup the page counting API implemented behing the PAGE_COUNTER kconfig option. However, it doesn't select it, resulting in potential build breakages. Select PAGE_COUNTER. Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501111330.3VuUx8vf-lkp@intel.com/ Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20250113092608.1349287-1-mripard@kernel.org Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-15irqchip/sunxi-nmi: Add missing SKIP_WAKE flagPhilippe Simons
Some boards with Allwinner SoCs connect the PMIC's IRQ pin to the SoC's NMI pin instead of a normal GPIO. Since the power key is connected to the PMIC, and people expect to wake up a suspended system via this key, the NMI IRQ controller must stay alive when the system goes into suspend. Add the SKIP_WAKE flag to prevent the sunxi NMI controller from going to sleep, so that the power key can wake up those systems. [ tglx: Fixed up coding style ] Signed-off-by: Philippe Simons <simons.philippe@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250112123402.388520-1-simons.philippe@gmail.com
2025-01-15irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()Tomas Krcka
The following call-chain leads to enabling interrupts in a nested interrupt disabled section: irq_set_vcpu_affinity() irq_get_desc_lock() raw_spin_lock_irqsave() <--- Disable interrupts its_irq_set_vcpu_affinity() guard(raw_spinlock_irq) <--- Enables interrupts when leaving the guard() irq_put_desc_unlock() <--- Warns because interrupts are enabled This was broken in commit b97e8a2f7130, which replaced the original raw_spin_[un]lock() pair with guard(raw_spinlock_irq). Fix the issue by using guard(raw_spinlock). [ tglx: Massaged change log ] Fixes: b97e8a2f7130 ("irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()") Signed-off-by: Tomas Krcka <krckatom@amazon.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241230150825.62894-1-krckatom@amazon.de
2025-01-15irqchip/gic-v3: Handle CPU_PM_ENTER_FAILED correctlyYogesh Lal
When a CPU attempts to enter low power mode, it disables the redistributor and Group 1 interrupts and reinitializes the system registers upon wakeup. If the transition into low power mode fails, then the CPU_PM framework invokes the PM notifier callback with CPU_PM_ENTER_FAILED to allow the drivers to undo the state changes. The GIC V3 driver ignores CPU_PM_ENTER_FAILED, which leaves the GIC in disabled state. Handle CPU_PM_ENTER_FAILED in the same way as CPU_PM_EXIT to restore normal operation. [ tglx: Massage change log, add Fixes tag ] Fixes: 3708d52fc6bb ("irqchip: gic-v3: Implement CPU PM notifier") Signed-off-by: Yogesh Lal <quic_ylal@quicinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241220093907.2747601-1-quic_ylal@quicinc.com
2025-01-15kernel/cgroup: Remove the unused variable climitJiapeng Chong
Variable climit is not effectively used, so delete it. kernel/cgroup/dmem.c:302:23: warning: variable ‘climit’ set but not used. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=13512 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250114062804.5092-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-15drm/bridge: ite-it6263: Prevent error pointer dereference in probe()Dan Carpenter
If devm_i2c_new_dummy_device() fails then we were supposed to return an error code, but instead the function continues and will crash on the next line. Add the missing return statement. Fixes: 049723628716 ("drm/bridge: Add ITE IT6263 LVDS to HDMI converter") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Liu Ying <victor.liu@nxp.com> Link: https://patchwork.freedesktop.org/patch/msgid/804a758b-f2e7-4116-b72d-29bc8905beed@stanley.mountain
2025-01-14net: fec: handle page_pool_dev_alloc_pages errorKevin Groeneveld
The fec_enet_update_cbd function calls page_pool_dev_alloc_pages but did not handle the case when it returned NULL. There was a WARN_ON(!new_page) but it would still proceed to use the NULL pointer and then crash. This case does seem somewhat rare but when the system is under memory pressure it can happen. One case where I can duplicate this with some frequency is when writing over a smbd share to a SATA HDD attached to an imx6q. Setting /proc/sys/vm/min_free_kbytes to higher values also seems to solve the problem for my test case. But it still seems wrong that the fec driver ignores the memory allocation error and can crash. This commit handles the allocation error by dropping the current packet. Fixes: 95698ff6177b5 ("net: fec: using page pool to manage RX buffers") Signed-off-by: Kevin Groeneveld <kgroeneveld@lenbrook.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250113154846.1765414-1-kgroeneveld@lenbrook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: netpoll: ensure skb_pool list is always initializedJohn Sperbeck
When __netpoll_setup() is called directly, instead of through netpoll_setup(), the np->skb_pool list head isn't initialized. If skb_pool_flush() is later called, then we hit a NULL pointer in skb_queue_purge_reason(). This can be seen with this repro, when CONFIG_NETCONSOLE is enabled as a module: ip tuntap add mode tap tap0 ip link add name br0 type bridge ip link set dev tap0 master br0 modprobe netconsole netconsole=4444@10.0.0.1/br0,9353@10.0.0.2/ rmmod netconsole The backtrace is: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page ... ... ... Call Trace: <TASK> __netpoll_free+0xa5/0xf0 br_netpoll_cleanup+0x43/0x50 [bridge] do_netpoll_cleanup+0x43/0xc0 netconsole_netdev_event+0x1e3/0x300 [netconsole] unregister_netdevice_notifier+0xd9/0x150 cleanup_module+0x45/0x920 [netconsole] __se_sys_delete_module+0x205/0x290 do_syscall_64+0x70/0x150 entry_SYSCALL_64_after_hwframe+0x76/0x7e Move the skb_pool list setup and initial skb fill into __netpoll_setup(). Fixes: 221a9c1df790 ("net: netpoll: Individualize the skb pool") Signed-off-by: John Sperbeck <jsperbeck@google.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250114011354.2096812-1-jsperbeck@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: xilinx: axienet: Fix IRQ coalescing packet count overflowSean Anderson
If coalesce_count is greater than 255 it will not fit in the register and will overflow. This can be reproduced by running # ethtool -C ethX rx-frames 256 which will result in a timeout of 0us instead. Fix this by checking for invalid values and reporting an error. Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://patch.msgid.link/20250113163001.2335235-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14hwmon: (tmp513) Fix division of negative numbersDavid Lechner
Fix several issues with division of negative numbers in the tmp513 driver. The docs on the DIV_ROUND_CLOSEST macro explain that dividing a negative value by an unsigned type is undefined behavior. The driver was doing this in several places, i.e. data->shunt_uohms has type of u32. The actual "undefined" behavior is that it converts both values to unsigned before doing the division, for example: int ret = DIV_ROUND_CLOSEST(-100, 3U); results in ret == 1431655732 instead of -33. Furthermore the MILLI macro has a type of unsigned long. Multiplying a signed long by an unsigned long results in an unsigned long. So, we need to cast both MILLI and data data->shunt_uohms to long when using the DIV_ROUND_CLOSEST macro. Fixes: f07f9d2467f4 ("hwmon: (tmp513) Use SI constants from units.h") Fixes: 59dfa75e5d82 ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.") Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20250114-fix-si-prefix-macro-sign-bugs-v1-1-696fd8d10f00@baylibre.com [groeck: Drop some continuation lines] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-01-14nfp: bpf: prevent integer overflow in nfp_bpf_event_output()Dan Carpenter
The "sizeof(struct cmsg_bpf_event) + pkt_size + data_size" math could potentially have an integer wrapping bug on 32bit systems. Check for this and return an error. Fixes: 9816dd35ecec ("nfp: bpf: perf event output helpers support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/6074805b-e78d-4b8a-bf05-e929b5377c28@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14x86/fred: Fix the FRED RSP0 MSR out of sync with its per-CPU cacheXin Li (Intel)
The FRED RSP0 MSR is only used for delivering events when running userspace. Linux leverages this property to reduce expensive MSR writes and optimize context switches. The kernel only writes the MSR when about to run userspace *and* when the MSR has actually changed since the last time userspace ran. This optimization is implemented by maintaining a per-CPU cache of FRED RSP0 and then checking that against the value for the top of current task stack before running userspace. However cpu_init_fred_exceptions() writes the MSR without updating the per-CPU cache. This means that the kernel might return to userspace with MSR_IA32_FRED_RSP0==0 when it needed to point to the top of current task stack. This would induce a double fault (#DF), which is bad. A context switch after cpu_init_fred_exceptions() can paper over the issue since it updates the cached value. That evidently happens most of the time explaining how this bug got through. Fix the bug through resynchronizing the FRED RSP0 MSR with its per-CPU cache in cpu_init_fred_exceptions(). Fixes: fe85ee391966 ("x86/entry: Set FRED RSP0 on return to userspace instead of context switch") Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250110174639.1250829-1-xin%40zytor.com
2025-01-14Merge tag 'seccomp-v6.13-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fix from Kees Cook: "Fix a randconfig failure: - Unconditionally define stub for !CONFIG_SECCOMP (Linus Walleij)" * tag 'seccomp-v6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Stub for !CONFIG_SECCOMP
2025-01-14Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Fix E825 initialization Grzegorz Nitka says: E825 products have incorrect initialization procedure, which may lead to initialization failures and register values. Fix E825 products initialization by adding correct sync delay, checking the PHY revision only for current PHY and adding proper destination device when reading port/quad. In addition, E825 uses PF ID for indexing per PF registers and as a primary PHY lane number, which is incorrect. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Add correct PHY lane assignment ice: Fix ETH56G FC-FEC Rx offset value ice: Fix quad registers read on E825 ice: Fix E825 initialization ==================== Link: https://patch.msgid.link/20250113182840.3564250-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14Merge branch 'mptcp-fixes-for-connect-selftest-flakes'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: fixes for connect selftest flakes Last week, Jakub reported [1] that the MPTCP Connect selftest was unstable. It looked like it started after the introduction of some fixes [2]. After analysis from Paolo, these patches revealed existing bugs, that should be fixed by the following patches. - Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In some corner cases, the other peer was not notified when more data could be sent. A fix for v5.11, but depending on a feature introduced in v5.19. - Patch 2: Fix spurious wake-up under memory pressure. In this situation, the userspace could be invited to read data not being there yet. A fix for v6.7. - Patch 3: Fix a false positive error when running the MPTCP Connect selftest with the "disconnect" cases. The userspace could disconnect the socket too soon, which would reset (MP_FASTCLOSE) the connection, interpreted as an error by the test. A fix for v5.17. Link: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org [1] Link: https://lore.kernel.org/20241230-net-mptcp-rbuf-fixes-v1-0-8608af434ceb@kernel.org [2] ==================== Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-0-0d986ee7b1b6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14selftests: mptcp: avoid spurious errors on disconnectPaolo Abeni
The disconnect test-case generates spurious errors: INFO: disconnect INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI 01 ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 140ms) [FAIL] file received by server does not match (in, out): Unexpected revents: POLLERR/POLLNVAL(19) -rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect Trailing bytes are: ��\����R���!8��u2��5N% -rw------- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1 Trailing bytes are: ��\����R���!8��u2��5N% 02 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ] 03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP (duration 31ms) [ OK ] 04 ns1 TCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration 26ms) [ OK ] [FAIL] Tests of the full disconnection have failed Time: 2 seconds The root cause is actually in the user-space bits: the test program currently disconnects as soon as all the pending data has been spooled, generating an FASTCLOSE. If such option reaches the peer before the latter has reached the closed status, the msk socket will report an error to the user-space, as per protocol specification, causing the above failure. Address the issue explicitly waiting for all the relevant sockets to reach a closed status before performing the disconnect. Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-3-0d986ee7b1b6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14mptcp: fix spurious wake-up on under memory pressurePaolo Abeni
The wake-up condition currently implemented by mptcp_epollin_ready() is wrong, as it could mark the MPTCP socket as readable even when no data are present and the system is under memory pressure. Explicitly check for some data being available in the receive queue. Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-2-0d986ee7b1b6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14mptcp: be sure to send ack when mptcp-level window re-opensPaolo Abeni
mptcp_cleanup_rbuf() is responsible to send acks when the user-space reads enough data to update the receive windows significantly. It tries hard to avoid acquiring the subflow sockets locks by checking conditions similar to the ones implemented at the TCP level. To avoid too much code duplication - the MPTCP protocol can't reuse the TCP helpers as part of the relevant status is maintained into the msk socket - and multiple costly window size computation, mptcp_cleanup_rbuf uses a rough estimate for the most recently advertised window size: the MPTCP receive free space, as recorded as at last-ack time. Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect a zero to non-zero win change in some corner cases, skipping the tcp_cleanup_rbuf call and leaving the peer stuck. After commit ea66758c1795 ("tcp: allow MPTCP to update the announced window"), MPTCP has actually cheap access to the announced window value. Use it in mptcp_cleanup_rbuf() for a more accurate ack generation. Fixes: e3859603ba13 ("mptcp: better msk receive window updates") Cc: stable@vger.kernel.org Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-1-0d986ee7b1b6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14drm/v3d: Ensure job pointer is set to NULL after job completionMaíra Canal
After a job completes, the corresponding pointer in the device must be set to NULL. Failing to do so triggers a warning when unloading the driver, as it appears the job is still active. To prevent this, assign the job pointer to NULL after completing the job, indicating the job has finished. Fixes: 14d1d1908696 ("drm/v3d: Remove the bad signaled() implementation.") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113154741.67520-1-mcanal@igalia.com
2025-01-14cpufreq: Move endif to the end of Kconfig fileViresh Kumar
It is possible to enable few cpufreq drivers, without the framework being enabled. This happened due to a bug while moving the entries earlier. Fix it. Fixes: 7ee1378736f0 ("cpufreq: Move CPPC configs to common Kconfig and add RISC-V") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://patch.msgid.link/84ac7a8fa72a8fe20487bb0a350a758bce060965.1736488384.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-14Merge tag 'pci-v6.13-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fix from Bjorn Helgaas: - Prevent bwctrl NULL pointer dereference that caused hangs on shutdown on ASUS ROG Strix SCAR 17 G733PYV (Lukas Wunner) * tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/bwctrl: Fix NULL pointer deref on unbind and bind
2025-01-14Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "One iscsi driver fix and one core fix. The core fix is an important one because a retry efficiency update is now causing some USB devices to get the wrong size on discovery (it upset their retry logic for READ_CAPACITY_16)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request scsi: core: Fix command pass through retry regression
2025-01-14drm/vmwgfx: Add new keep_resv BO paramIan Forbes
Adds a new BO param that keeps the reservation locked after creation. This removes the need to re-reserve the BO after creation which is a waste of cycles. This also fixes a bug in vmw_prime_import_sg_table where the imported reservation is unlocked twice. Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export") Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110185335.15301-1-ian.forbes@broadcom.com
2025-01-14drm/vmwgfx: Remove busy_placesIan Forbes
Unused since commit a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Martin Krastev <martin.krastev@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250108201355.2521070-1-ian.forbes@broadcom.com
2025-01-14drm/vmwgfx: Unreserve BO on errorIan Forbes
Unlock BOs in reverse order. Add an acquire context so that lockdep doesn't complain. Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210195535.2074918-1-ian.forbes@broadcom.com
2025-01-14Merge tag 'sound-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Hopefully the last PR for 6.13. This became bigger than wished due to the timing after holiday breaks. The only large LOC is the additional document for Cirrus codec which is nice for users (and absolutely safe). All the rest are small fixes in ASoC Rcar and codecs as well as HD-audio quirks (And no fix for USB guitar pedals seen yet :)" * tag 'sound-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5 ALSA: hda/realtek: fixup ASUS H7606W ALSA: hda/realtek: fixup ASUS GA605W ALSA: hda/realtek: Add support for Ayaneo System using CS35L41 HDA ASoC: rsnd: check rsnd_adg_clk_enable() return value ASoC: cs42l43: Add codec force suspend/resume ops ALSA: doc: Add codecs/index.rst to top-level index ALSA: doc: cs35l56: Add information about Cirrus Logic CS35L54/56/57 ASoC: samsung: Add missing depends on I2C MAINTAINERS: add missing maintainers for Simple Audio Card ASoC: samsung: Add missing selects for MFD_WM8994 ASoC: codecs: es8316: Fix HW rate calculation for 48Mhz MCLK ASoC: wm8994: Add depends on MFD core ASoC: tas2781: Fix occasional calibration failture ASoC: codecs: ES8326: Adjust ANA_MICBIAS to reduce pop noise
2025-01-14drm/display: hdmi: Do not read EDID on disconnected connectorsCristian Ciocaltea
The recently introduced hotplug event handler in the HDMI Connector framework attempts to unconditionally read the EDID data, leading to a bunch of non-harmful, yet quite annoying DDC/I2C related errors being reported. Ensure the operation is done only for connectors having the status connected or unknown. Additionally, perform an explicit reset of the connector information when dealing with a disconnected status. Fixes: ab716b74dc9d ("drm/display/hdmi: implement hotplug functions") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113-hdmi-conn-edid-read-fix-v2-1-d2a0438a44ab@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14drm/tests: hdmi: Add connector disablement testLiu Ying
Atomic check should succeed when disabling a connector. Add a test case drm_test_check_disabling_connector() to make sure of this. Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Liu Ying <victor.liu@nxp.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110084821.3239518-3-victor.liu@nxp.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14drm/connector: hdmi: Do atomic check when necessaryLiu Ying
It's ok to pass atomic check successfully if an atomic commit tries to disable the display pipeline which the connector belongs to. That is, when the crtc or the best_encoder pointers in struct drm_connector_state are NULL, drm_atomic_helper_connector_hdmi_check() should return 0. Without the check against the NULL pointers, drm_default_rgb_quant_range() called by drm_atomic_helper_connector_hdmi_check() would dereference the NULL pointer to_match in drm_match_cea_mode(). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: drm_default_rgb_quant_range+0x0/0x4c (P) drm_bridge_connector_atomic_check+0x20/0x2c drm_atomic_helper_check_modeset+0x488/0xc78 drm_atomic_helper_check+0x20/0xa4 drm_atomic_check_only+0x4b8/0x984 drm_atomic_commit+0x48/0xc4 drm_framebuffer_remove+0x44c/0x530 drm_mode_rmfb_work_fn+0x7c/0xa0 process_one_work+0x150/0x294 worker_thread+0x2dc/0x3dc kthread+0x130/0x204 ret_from_fork+0x10/0x20 Fixes: 8ec116ff21a9 ("drm/display: bridge_connector: provide atomic_check for HDMI bridges") Fixes: 84e541b1e58e ("drm/sun4i: use drm_atomic_helper_connector_hdmi_check()") Fixes: 65548c8ff0ab ("drm/rockchip: inno_hdmi: Switch to HDMI connector") Signed-off-by: Liu Ying <victor.liu@nxp.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250110084821.3239518-2-victor.liu@nxp.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14Merge drm/drm-next into drm-misc-next-fixesMaxime Ripard
drm-next has the dmem cgroup patches we need to merge fixes for. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}Gui Chengming
FW attestation was disabled on MP0_14_0_{2/3}. V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian) Signed-off-by: Gui Chengming <Jack.Gui@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62952a38d9bcf357d5ffc97615c48b12c9cd627c) Cc: stable@vger.kernel.org # 6.12.x
2025-01-14drm/amdgpu: always sync the GFX pipe on ctx switchChristian König
That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405) Cc: stable@vger.kernel.org
2025-01-14drm/amdgpu: disable gfxoff with the compute workload on gfx12Kenneth Feng
Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2affe2bbc997b3920045c2c434e480c81a5f9707) Cc: stable@vger.kernel.org # 6.12.x
2025-01-14io_uring/rsrc: fixup io_clone_buffers() error handlingJens Axboe
Jann reports he can trigger a UAF if the target ring unregisters buffers before the clone operation is fully done. And additionally also an issue related to node allocation failures. Both of those stemp from the fact that the cleanup logic puts the buffers manually, rather than just relying on io_rsrc_data_free() doing it. Hence kill the manual cleanup code and just let io_rsrc_data_free() handle it, it'll put the nodes appropriately. Reported-by: Jann Horn <jannh@google.com> Fixes: 3597f2786b68 ("io_uring/rsrc: unify file and buffer resource tables") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-14drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX IsolationSrinivasan Shanmugam
This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] *** DEADLOCK *** [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: afefd6f24502 ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0b6b2dd38336d5fd49214f0e4e6495e658e3ab44) Cc: stable@vger.kernel.org
2025-01-14platform/x86: lenovo-yoga-tab2-pro-1380-fastcharger: fix serdev raceChenyuan Yang
The yt2_1380_fc_serdev_probe() function calls devm_serdev_device_open() before setting the client ops via serdev_device_set_client_ops(). This ordering can trigger a NULL pointer dereference in the serdev controller's receive_buf handler, as it assumes serdev->ops is valid when SERPORT_ACTIVE is set. This is similar to the issue fixed in commit 5e700b384ec1 ("platform/chrome: cros_ec_uart: properly fix race condition") where devm_serdev_device_open() was called before fully initializing the device. Fix the race by ensuring client ops are set before enabling the port via devm_serdev_device_open(). Note, serdev_device_set_baudrate() and serdev_device_set_flow_control() calls should be after the devm_serdev_device_open() call. Fixes: b2ed33e8d486 ("platform/x86: Add lenovo-yoga-tab2-pro-1380-fastcharger driver") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250111180951.2277757-1-chenyuan0y@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-14platform/x86: dell-uart-backlight: fix serdev raceChenyuan Yang
The dell_uart_bl_serdev_probe() function calls devm_serdev_device_open() before setting the client ops via serdev_device_set_client_ops(). This ordering can trigger a NULL pointer dereference in the serdev controller's receive_buf handler, as it assumes serdev->ops is valid when SERPORT_ACTIVE is set. This is similar to the issue fixed in commit 5e700b384ec1 ("platform/chrome: cros_ec_uart: properly fix race condition") where devm_serdev_device_open() was called before fully initializing the device. Fix the race by ensuring client ops are set before enabling the port via devm_serdev_device_open(). Note, serdev_device_set_baudrate() and serdev_device_set_flow_control() calls should be after the devm_serdev_device_open() call. Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250111180118.2274516-1-chenyuan0y@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>