summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-14PM: sleep: convert comment from kernel-doc to plain commentRandy Dunlap
Modify a non-kernel-doc comment to begin with /* instead of /** so that it does not cause a kernel-doc warning. power.h:114: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Auxiliary structure used for reading the snapshot image data and power.h:114: warning: missing initial short description on line: * Auxiliary structure used for reading the snapshot image data and Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Pavel Machek <pavel@ucw.cz> Link: https://patch.msgid.link/20250111063107.910825-1-rdunlap@infradead.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-14cpufreq: ACPI: Fix max-frequency computationGautham R. Shenoy
Commit 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") introduced an assumption in acpi_cpufreq_cpu_init() that the first entry in the P-state table was the nominal frequency. This assumption is incorrect. The frequency corresponding to the P0 P-State need not be the same as the nominal frequency advertised via CPPC. Since the driver is using the CPPC.highest_perf and CPPC.nominal_perf to compute the boost-ratio, it makes sense to use CPPC.nominal_freq to compute the max-frequency. CPPC.nominal_freq is advertised on platforms supporting CPPC revisions 3 or higher. Hence, fallback to using the first entry in the P-State table only on platforms that do not advertise CPPC.nominal_freq. Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") Tested-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20250113044107.566-1-gautham.shenoy@amd.com [ rjw: Retain reverse X-mas tree ordering of local variable declarations ] [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-14cpufreq: Move endif to the end of Kconfig fileViresh Kumar
It is possible to enable few cpufreq drivers, without the framework being enabled. This happened due to a bug while moving the entries earlier. Fix it. Fixes: 7ee1378736f0 ("cpufreq: Move CPPC configs to common Kconfig and add RISC-V") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://patch.msgid.link/84ac7a8fa72a8fe20487bb0a350a758bce060965.1736488384.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-14Merge tag 'pci-v6.13-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fix from Bjorn Helgaas: - Prevent bwctrl NULL pointer dereference that caused hangs on shutdown on ASUS ROG Strix SCAR 17 G733PYV (Lukas Wunner) * tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/bwctrl: Fix NULL pointer deref on unbind and bind
2025-01-14clk: clk-loongson2: Fix the number count of clk providerBinbin Zhou
Since commit 02fb4f008433 ("clk: clk-loongson2: Fix potential buffer overflow in flexible-array member access"), the clk provider register is failed. The count of `clks_num` is shown below: for (p = data; p->name; p++) clks_num++; In fact, `clks_num` represents the number of SoC clocks and should be expressed as the maximum value of the clock binding id in use (p->id + 1). Now we fix it to avoid the following error when trying to register a clk provider: [ 13.409595] of_clk_hw_onecell_get: invalid index 17 Cc: stable@vger.kernel.org Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Fixes: 02fb4f008433 ("clk: clk-loongson2: Fix potential buffer overflow in flexible-array member access") Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Link: https://lore.kernel.org/r/82e43d89a9a6791129cf8ea14f4eeb666cd87be4.1736856470.git.zhoubinbin@loongson.cn Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2025-01-14clk: clk-loongson2: Switch to use devm_clk_hw_register_fixed_rate_parent_data()Binbin Zhou
Since commit 706ae6446494 ("clk: fixed-rate: add devm_clk_hw_register_fixed_rate_parent_data()"), we can use the devm_clk_hw_register_fixed_rate_parent_data() helper and from then on there is no need to manually unregister the fixed rate hw. Since clk_hw_unregister_fixed_rate() was not called before, we also fix the memory leak that was present. Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Link: https://lore.kernel.org/r/8733a7485619bdb791de25201a3d7984d1849c9f.1736856470.git.zhoubinbin@loongson.cn Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2025-01-14Merge branch 'mlx5-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-01-14 The following pull-request contains mlx5 IFC updates. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add nic_cap_reg and vhca_icm_ctrl registers net/mlx5: SHAMPO: Introduce new SHAMPO specific HCA caps net/mlx5: Add support for MRTCQ register net/mlx5: Update mlx5_ifc to support FEC for 200G per lane link modes ==================== Link: https://patch.msgid.link/20250114055700.1928736-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14clk: starfive: Make _clk_get become a common helper functionChanghuang Liang
Introduce num_reg to store the number of clocks, this helps to make _clk_get become a common helper function which called jh71x0_clk_get(). With this, it helps to simplify the code and extend the code in the future. Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com> Link: https://lore.kernel.org/r/20250114081300.36600-1-changhuang.liang@starfivetech.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2025-01-14Merge tag 'v6.14-rockchip-clk1' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-rockchip Pull Rockchip clk driver updates from Heiko Stuebner: Real handling of the linked clocks (clocks of the interconnect port a peripheral is connected to) on rk3588 using pm-clocks, allowing us to stop marking them as critical and one more rk3588 critical clock, that the kernel cannot handle otherwise right now. * tag 'v6.14-rockchip-clk1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: clk: rockchip: rk3588: make refclko25m_ethX critical clk: rockchip: rk3588: drop RK3588_LINKED_CLK clk: rockchip: implement linked gate clock support clk: rockchip: expose rockchip_clk_set_lookup clk: rockchip: rk3588: register GATE_LINK later clk: rockchip: support clocks registered late
2025-01-14Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "One iscsi driver fix and one core fix. The core fix is an important one because a retry efficiency update is now causing some USB devices to get the wrong size on discovery (it upset their retry logic for READ_CAPACITY_16)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request scsi: core: Fix command pass through retry regression
2025-01-14drm/vmwgfx: Add new keep_resv BO paramIan Forbes
Adds a new BO param that keeps the reservation locked after creation. This removes the need to re-reserve the BO after creation which is a waste of cycles. This also fixes a bug in vmw_prime_import_sg_table where the imported reservation is unlocked twice. Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export") Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110185335.15301-1-ian.forbes@broadcom.com
2025-01-14drm/vmwgfx: Remove busy_placesIan Forbes
Unused since commit a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Martin Krastev <martin.krastev@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250108201355.2521070-1-ian.forbes@broadcom.com
2025-01-14drm/vmwgfx: Unreserve BO on errorIan Forbes
Unlock BOs in reverse order. Add an acquire context so that lockdep doesn't complain. Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210195535.2074918-1-ian.forbes@broadcom.com
2025-01-14io_uring/rsrc: require cloned buffers to share accounting contextsJann Horn
When IORING_REGISTER_CLONE_BUFFERS is used to clone buffers from uring instance A to uring instance B, where A and B use different MMs for accounting, the accounting can go wrong: If uring instance A is closed before uring instance B, the pinned memory counters for uring instance B will be decremented, even though the pinned memory was originally accounted through uring instance A; so the MM of uring instance B can end up with negative locked memory. Cc: stable@vger.kernel.org Closes: https://lore.kernel.org/r/CAG48ez1zez4bdhmeGLEFxtbFADY4Czn3CV0u9d_TMcbvRA01bg@mail.gmail.com Fixes: 7cc2a6eadcd7 ("io_uring: add IORING_REGISTER_COPY_BUFFERS method") Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20250114-uring-check-accounting-v1-1-42e4145aa743@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-14Merge tag 'sound-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Hopefully the last PR for 6.13. This became bigger than wished due to the timing after holiday breaks. The only large LOC is the additional document for Cirrus codec which is nice for users (and absolutely safe). All the rest are small fixes in ASoC Rcar and codecs as well as HD-audio quirks (And no fix for USB guitar pedals seen yet :)" * tag 'sound-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5 ALSA: hda/realtek: fixup ASUS H7606W ALSA: hda/realtek: fixup ASUS GA605W ALSA: hda/realtek: Add support for Ayaneo System using CS35L41 HDA ASoC: rsnd: check rsnd_adg_clk_enable() return value ASoC: cs42l43: Add codec force suspend/resume ops ALSA: doc: Add codecs/index.rst to top-level index ALSA: doc: cs35l56: Add information about Cirrus Logic CS35L54/56/57 ASoC: samsung: Add missing depends on I2C MAINTAINERS: add missing maintainers for Simple Audio Card ASoC: samsung: Add missing selects for MFD_WM8994 ASoC: codecs: es8316: Fix HW rate calculation for 48Mhz MCLK ASoC: wm8994: Add depends on MFD core ASoC: tas2781: Fix occasional calibration failture ASoC: codecs: ES8326: Adjust ANA_MICBIAS to reduce pop noise
2025-01-14gfs2: Truncate address space when flipping GFS2_DIF_JDATA flagAndreas Gruenbacher
Truncate an inode's address space when flipping the GFS2_DIF_JDATA flag: depending on that flag, the pages in the address space will either use buffer heads or iomap_folio_state structs, and we cannot mix the two. Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-01-14drm/display: hdmi: Do not read EDID on disconnected connectorsCristian Ciocaltea
The recently introduced hotplug event handler in the HDMI Connector framework attempts to unconditionally read the EDID data, leading to a bunch of non-harmful, yet quite annoying DDC/I2C related errors being reported. Ensure the operation is done only for connectors having the status connected or unknown. Additionally, perform an explicit reset of the connector information when dealing with a disconnected status. Fixes: ab716b74dc9d ("drm/display/hdmi: implement hotplug functions") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113-hdmi-conn-edid-read-fix-v2-1-d2a0438a44ab@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14drm/tests: hdmi: Add connector disablement testLiu Ying
Atomic check should succeed when disabling a connector. Add a test case drm_test_check_disabling_connector() to make sure of this. Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Liu Ying <victor.liu@nxp.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110084821.3239518-3-victor.liu@nxp.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14drm/connector: hdmi: Do atomic check when necessaryLiu Ying
It's ok to pass atomic check successfully if an atomic commit tries to disable the display pipeline which the connector belongs to. That is, when the crtc or the best_encoder pointers in struct drm_connector_state are NULL, drm_atomic_helper_connector_hdmi_check() should return 0. Without the check against the NULL pointers, drm_default_rgb_quant_range() called by drm_atomic_helper_connector_hdmi_check() would dereference the NULL pointer to_match in drm_match_cea_mode(). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: drm_default_rgb_quant_range+0x0/0x4c (P) drm_bridge_connector_atomic_check+0x20/0x2c drm_atomic_helper_check_modeset+0x488/0xc78 drm_atomic_helper_check+0x20/0xa4 drm_atomic_check_only+0x4b8/0x984 drm_atomic_commit+0x48/0xc4 drm_framebuffer_remove+0x44c/0x530 drm_mode_rmfb_work_fn+0x7c/0xa0 process_one_work+0x150/0x294 worker_thread+0x2dc/0x3dc kthread+0x130/0x204 ret_from_fork+0x10/0x20 Fixes: 8ec116ff21a9 ("drm/display: bridge_connector: provide atomic_check for HDMI bridges") Fixes: 84e541b1e58e ("drm/sun4i: use drm_atomic_helper_connector_hdmi_check()") Fixes: 65548c8ff0ab ("drm/rockchip: inno_hdmi: Switch to HDMI connector") Signed-off-by: Liu Ying <victor.liu@nxp.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250110084821.3239518-2-victor.liu@nxp.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14APEI: GHES: Have GHES honor the panic= settingBorislav Petkov
The GHES driver overrides the panic= setting by force-rebooting the system after a fatal hw error has been reported. The intent being that such an error would be reported earlier. However, this is not optimal when a hard-to-debug issue requires long time to reproduce and when that happens, the box will get rebooted after 30 seconds and thus destroy the whole hw context of when the error happened. So rip out the default GHES panic timeout and honor the global one. In the panic disabled (panic=0) case, the error will still be logged to dmesg for later inspection and if panic after a hw error is really required, then that can be controlled the usual way - use panic= on the cmdline or set it in the kernel .config's CONFIG_PANIC_TIMEOUT. Reported-by: Feng Tang <feng.tang@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250113125224.GFZ4UMiNtWIJvgpveU@fat_crate.local Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-14ACPI: PRM: Fix missing guid_t declaration in linux/prmt.hRobert Richter
Seen the following build error: ./include/linux/prmt.h:5:27: error: unknown type name ‘guid_t’ 5 | int acpi_call_prm_handler(guid_t handler_guid, void *param_buffer); | ^~~~~~ The include file uses guid_t but it is not declared. Include linux/uuid.h to fix this. Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://patch.msgid.link/20250107161923.3387552-1-rrichter@amd.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-14Merge drm/drm-next into drm-misc-next-fixesMaxime Ripard
drm-next has the dmem cgroup patches we need to merge fixes for. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14blk-mq: Move more error handling into blk_mq_submit_bio()Bart Van Assche
The error handling code in blk_mq_get_new_requests() cannot be understood without knowing that this function is only called by blk_mq_submit_bio(). Hence move the code for handling blk_mq_get_new_requests() failures into blk_mq_submit_bio(). Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241218212246.1073149-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-14block: Reorder the request allocation code in blk_mq_submit_bio()Bart Van Assche
Help the CPU branch predictor in case of a cache hit by handling the cache hit scenario first. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241218212246.1073149-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-14drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}Gui Chengming
FW attestation was disabled on MP0_14_0_{2/3}. V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian) Signed-off-by: Gui Chengming <Jack.Gui@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62952a38d9bcf357d5ffc97615c48b12c9cd627c) Cc: stable@vger.kernel.org # 6.12.x
2025-01-14drm/amdgpu: always sync the GFX pipe on ctx switchChristian König
That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405) Cc: stable@vger.kernel.org
2025-01-14drm/amdgpu: disable gfxoff with the compute workload on gfx12Kenneth Feng
Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2affe2bbc997b3920045c2c434e480c81a5f9707) Cc: stable@vger.kernel.org # 6.12.x
2025-01-14io_uring/rsrc: fixup io_clone_buffers() error handlingJens Axboe
Jann reports he can trigger a UAF if the target ring unregisters buffers before the clone operation is fully done. And additionally also an issue related to node allocation failures. Both of those stemp from the fact that the cleanup logic puts the buffers manually, rather than just relying on io_rsrc_data_free() doing it. Hence kill the manual cleanup code and just let io_rsrc_data_free() handle it, it'll put the nodes appropriately. Reported-by: Jann Horn <jannh@google.com> Fixes: 3597f2786b68 ("io_uring/rsrc: unify file and buffer resource tables") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-14drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX IsolationSrinivasan Shanmugam
This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] *** DEADLOCK *** [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: afefd6f24502 ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0b6b2dd38336d5fd49214f0e4e6495e658e3ab44) Cc: stable@vger.kernel.org
2025-01-14platform/x86: lenovo-yoga-tab2-pro-1380-fastcharger: fix serdev raceChenyuan Yang
The yt2_1380_fc_serdev_probe() function calls devm_serdev_device_open() before setting the client ops via serdev_device_set_client_ops(). This ordering can trigger a NULL pointer dereference in the serdev controller's receive_buf handler, as it assumes serdev->ops is valid when SERPORT_ACTIVE is set. This is similar to the issue fixed in commit 5e700b384ec1 ("platform/chrome: cros_ec_uart: properly fix race condition") where devm_serdev_device_open() was called before fully initializing the device. Fix the race by ensuring client ops are set before enabling the port via devm_serdev_device_open(). Note, serdev_device_set_baudrate() and serdev_device_set_flow_control() calls should be after the devm_serdev_device_open() call. Fixes: b2ed33e8d486 ("platform/x86: Add lenovo-yoga-tab2-pro-1380-fastcharger driver") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250111180951.2277757-1-chenyuan0y@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-14platform/x86: dell-uart-backlight: fix serdev raceChenyuan Yang
The dell_uart_bl_serdev_probe() function calls devm_serdev_device_open() before setting the client ops via serdev_device_set_client_ops(). This ordering can trigger a NULL pointer dereference in the serdev controller's receive_buf handler, as it assumes serdev->ops is valid when SERPORT_ACTIVE is set. This is similar to the issue fixed in commit 5e700b384ec1 ("platform/chrome: cros_ec_uart: properly fix race condition") where devm_serdev_device_open() was called before fully initializing the device. Fix the race by ensuring client ops are set before enabling the port via devm_serdev_device_open(). Note, serdev_device_set_baudrate() and serdev_device_set_flow_control() calls should be after the devm_serdev_device_open() call. Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250111180118.2274516-1-chenyuan0y@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-14bcachefs: Document issue with bch_stripe layoutKent Overstreet
We've got a problem with bch_stripe that is going to take an on disk format rev to fix - we can't access the block sector counts if the checksum type is unknown. Document it for now, there are a few other things to fix as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Fix self healing on read errorKent Overstreet
We were incorrectly checking if there'd been an io error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Pop all the transactions from the abort oneAlan Huang
The transaction is going to abort, so there will be no cycle involving this transaction anymore. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14ftrace: Document that multiple function_graph tracing may have different timesSteven Rostedt
The function graph tracer now calculates the calltime internally and for each instance. If there are two instances that are running function graph tracer and are tracing the same functions, the timings of the length of those functions may be slightly different: # trace-cmd record -B foo -p function_graph -B bar -p function_graph sleep 5 # trace-cmd report [..] bar: sleep-981 [000] ...1. 1101.109027: funcgraph_entry: 0.764 us | mutex_unlock(); (ret=0xffff8abcc256c300) foo: sleep-981 [000] ...1. 1101.109028: funcgraph_entry: 0.748 us | mutex_unlock(); (ret=0xffff8abcc256c300) bar: sleep-981 [000] ..... 1101.109029: funcgraph_exit: 2.456 us | } (ret=0xffff8abcc256c300) foo: sleep-981 [000] ..... 1101.109029: funcgraph_exit: 2.403 us | } (ret=0xffff8abcc256c300) bar: sleep-981 [000] d..1. 1101.109031: funcgraph_entry: 0.844 us | fpregs_assert_state_consistent(); (ret=0x0) foo: sleep-981 [000] d..1. 1101.109032: funcgraph_entry: 0.803 us | fpregs_assert_state_consistent(); (ret=0x0) Link: https://lore.kernel.org/all/20250114101806.b2778cb01f34f5be9d23ad98@kernel.org/ Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/20250114101202.02e7bc68@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-14bcachefs: Only abort the transactions in the cycleAlan Huang
When the cycle doesn't involve the initiator of the cycle detection, we might choose a transaction that is not involved in the cycle to abort. It shouldn't be that since it won't break the cycle, this patch therefore chooses the transaction in the cycle to abort. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Introduce lock_graph_pop_fromAlan Huang
This patch introduces a helper function called lock_graph_pop_from, it pops the graph from i. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Convert open-coded lock_graph_pop_all to helperAlan Huang
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Do not allow no fail lock request to failAlan Huang
If the transaction chose itself as a victim before and restarted, it might request a no fail lock request this time. But it might be added to others' lock graph and be chose as the victim again, it's no longer safe without additional check. We can also convert the cycle detector to be fully RCU-based to solve that unsoundness, but the latency added to trans_put and additional memory required may not worth it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Merge the condition to avoid additional invocationAlan Huang
If the lock has been acquired and unlocked, we don't have to do clear and wakeup again, though harmless since we hold the intent lock. Merge the condition might be clearer. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14Revert "bcachefs: Fix bch2_btree_node_upgrade()"Alan Huang
This reverts commit 62448afee714354a26db8a0f3c644f58628f0792. six_lock_tryupgrade fails only if there is an intent lock held, it won't fail no matter how many read locks are held. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14spi-nand/spi-mem DTR supportMark Brown
Merge series from Miquel Raynal <miquel.raynal@bootlin.com>: Here is a (big) series supposed to bring DTR support in SPI-NAND.
2025-01-14spi: ti-qspi: Use syscon_regmap_lookup_by_phandle_argsKrzysztof Kozlowski
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250111185400.183760-1-krzysztof.kozlowski@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-14spi: amd: Fix -Wuninitialized in amd_spi_exec_mem_op()Nathan Chancellor
After commit e6204f39fe3a ("spi: amd: Drop redundant check"), clang warns (or errors with CONFIG_WERROR=y): drivers/spi/spi-amd.c:695:9: error: variable 'ret' is uninitialized when used here [-Werror,-Wuninitialized] 695 | return ret; | ^~~ drivers/spi/spi-amd.c:673:9: note: initialize the variable 'ret' to silence this warning 673 | int ret; | ^ | = 0 1 error generated. ret is no longer set on anything other than the default switch path. Replace ret with a direct return of 0 at the end of the function and -EOPNOTSUPP in the default case to resolve the warning. Fixes: e6204f39fe3a ("spi: amd: Drop redundant check") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501112315.ugYQ7Ce7-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://patch.msgid.link/20250111-spi-amd-fix-uninitialized-ret-v1-1-c66ab9f6a23d@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-14btrfs: selftests: add a selftest for deleting two out of three extentsJohannes Thumshirn
Add a selftest creating three extents and then deleting two out of the three extents. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-14btrfs: selftests: add test for punching a hole into 3 RAID stripe-extentsJohannes Thumshirn
Test creating a range of three RAID stripe-extents and then punch a hole in the middle, deleting all of the middle extents and partially deleting the "book ends". Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-14btrfs: selftests: add selftest for punching holes into the RAID stripe extentsJohannes Thumshirn
Add a selftest for punching a hole into a RAID stripe extent. The test create an 1M extent and punches a 64k bytes long hole at offset of 32k from the start of the extent. Afterwards it verifies the start and length of both resulting new extents "left" and "right" as well as the absence of the hole. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-14btrfs: selftests: test RAID stripe-tree deletion spanning two itemsJohannes Thumshirn
Add a selftest for RAID stripe-tree deletion with a delete range spanning two items, so that we're punching a hole into two adjacent RAID stripe extents truncating the first and "moving" the second to the right. The following diagram illustrates the operation: |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| |----- keep -----|--- drop ---|----- keep ----| Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-14btrfs: selftests: don't split RAID extents in halfJohannes Thumshirn
The selftests for partially deleting the start or tail of RAID stripe-extents split these extents in half. This can hide errors in the calculation, so don't split the RAID stripe-extents in half but delete the first or last 16K of the 64K extents. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-14btrfs: selftests: check for correct return value of failed lookupJohannes Thumshirn
Commit 5e72aabc1fff ("btrfs: return ENODATA in case RST lookup fails") changed btrfs_get_raid_extent_offset()'s return value to ENODATA in case the RAID stripe-tree lookup failed. Adjust the test cases which check for absence of a given range to check for ENODATA as return value in this case. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>