linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-05-07	drm/xe: Use copy_from_user() instead of __copy_from_user()	Harish Chegondi
	copy_from_user() has more checks and is more safer than __copy_from_user() Suggested-by: Kees Cook <kees@kernel.org> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/acabf20aa8621c7bc8de09b1bffb8d14b5376484.1746126614.git.harish.chegondi@intel.com
2025-05-07	drm/i915/irq: move i915->irq_lock to display->irq.lock	Jani Nikula
	Observe that i915->irq_lock is no longer used to protect anything outside of display. Make it a display thing. This allows us to remove the ugly #define irq_lock irq.lock hack from xe compat header. Note that this is slightly more subtle than it first looks. For i915, there's no functional change here. The lock is moved. However, for xe, we'll now have two locks, xe->irq.lock and display->irq.lock. These should protect different things, though. Indeed, nesting in the past would've lead to a deadlock because they were the same lock. With the i915 references gone, we can make a handful more files independent of i915_drv.h. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/6d8d2ce0f34a9c7361a5e2fcf96bb32a34c57e76.1746536745.git.jani.nikula@intel.com [Jani: Fixed a comment while applying.] Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-07	drm/i915/rps: refactor display rps support	Jani Nikula
	Make the gt rps code and display irq code interact via intel_display_rps.[ch], instead of direct access. Add no-op static inline stubs for xe instead of having a separate build unit doing nothing. All of this clarifies the interfaces between i915 core and display. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/ef2a46dc8f30b72282494f54e98cb5fed7523b58.1746536745.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-05	drm/xe/gsc: do not flush the GSC worker from the reset path	Daniele Ceraolo Spurio
	The workqueue used for the reset worker is marked as WQ_MEM_RECLAIM, while the GSC one isn't (and can't be as we need to do memory allocations in the gsc worker). Therefore, we can't flush the latter from the former. The reason why we had such a flush was to avoid interrupting either the GSC FW load or in progress GSC proxy operations. GSC proxy operations fall into 2 categories: 1) GSC proxy init: this only happens once immediately after GSC FW load and does not support being interrupted. The only way to recover from an interruption of the proxy init is to do an FLR and re-load the GSC. 2) GSC proxy request: this can happen in response to a request that the driver sends to the GSC. If this is interrupted, the GSC FW will timeout and the driver request will be failed, but overall the GSC will keep working fine. Flushing the work allowed us to avoid interruption in both cases (unless the hang came from the GSC engine itself, in which case we're toast anyway). However, a failure on a proxy request is tolerable if we're in a scenario where we're triggering a GT reset (i.e., something is already gone pretty wrong), so what we really need to avoid is interrupting the init flow, which we can do by polling on the register that reports when the proxy init is complete (as that ensure us that all the load and init operations have been completed). Note that during suspend we still want to do a flush of the worker to make sure it completes any operations involving the HW before the power is cut. v2: fix spelling in commit msg, rename waiter function (Julia) Fixes: dd0e89e5edc2 ("drm/xe/gsc: GSC FW load") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4830 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://lore.kernel.org/r/20250502155104.2201469-1-daniele.ceraolospurio@intel.com
2025-05-02	drm/i915/hdcp: pass struct drm_device to driver specific HDCP GSC code	Jani Nikula
	The driver specific HDCP GSC code will eventually be part of the driver cores rather than display. Remove the struct intel_display references from them, and pass struct drm_device instead. Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/bf9aa8e44e18eef41e3077a2966935b4e2649b62.1745524803.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-02	drm/i915/hdcp: simplify HDCP GSC firmware usage selection	Jani Nikula
	Just localize the GSC decision inside intel_hdcp.c, and deduplicate the conditions. Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/a1d031bfbff7073e576dfe8d3d3d5a28d7bb2c15.1745524803.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-02	drm/i915/hdcp: switch the HDCP GSC message interface from u8* to void*	Jani Nikula
	The in/out buffers are just opaque data, and don't need to be considered u8. Switching to void lets us drop a ton of unnecessary casts. Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/ea005adb713e85b797d83204c80de0a2a8e5ab47.1745524803.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-02	drm/i915/hdcp: pass the context to the HDCP GSC message interface	Jani Nikula
	The opaque HDCP GSC context nicely abstracts the differences between drivers. Pass that instead of struct drm_i915_private or struct xe_device to intel_hdcp_gsc_msg_send(). We can store the driver specific data in the context. This lets us drop the dependency on i915_drv.h from intel_hdcp_gsc_message.c. Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/df1653212f9014e717701b017e78e0017884b870.1745524803.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-02	drm/i915/hdcp: rename HDCP GSC context alloc/free functions	Jani Nikula
	Name the functions intel_hdcp_gsc_context_alloc() and intel_hdcp_gsc_context_free() for consistency. Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/c6e25686ed20b5fdea9a59faf6a64a7312a075b0.1745524803.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-02	drm/i915/hdcp: rename struct intel_hdcp_gsc_message to intel_hdcp_gsc_context	Jani Nikula
	It's really about the context more than about the message. Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/ca0a802a81ba4e96e7c40646a32386d4351d6ff4.1745524803.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-02	drm/i915/hdcp: split HDCP GSC message alloc/save responsibilities	Jani Nikula
	Allocate and initialize the HDCP GSC message in intel_hdcp_gsc_hdcp2_init() as before, but store the pointer to display->hdcp.hdcp_message in the caller. Similarly, pass in the pointer to intel_hdcp_gsc_free_message(). Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/a74fcc941126bf92d12115b5faf4f75099e26242.1745524803.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-02	drm/i915/hdcp: deduplicate and refactor HDCP GSC ops initialization	Jani Nikula
	The gsc_hdcp_ops is duplicated and initialized exactly the same way in two different places (for i915 and xe), and requires forward declarations for all the hooks. Deduplicate, and make the functions static. There are slight differences in the i915 and xe implementations of intel_hdcp_gsc_init() and intel_hdcp_gsc_fini(). Take the best of both, and improve. We need to expose intel_hdcp_gsc_hdcp2_init() and intel_hdcp_gsc_free_message() for this, and create the latter for xe. Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/21e7871b35d4c7d13f016b5ecb4f10e5be72c531.1745524803.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-01	drm/xe: Do not print timedout job message on killed exec queues	Matthew Brost
	If a user ctrl-c an app while something is running on the GPU, jobs are expected to timeout. Do not spam dmesg with timedout job messages in this case. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250428175505.935694-1-matthew.brost@intel.com
2025-05-01	drm/xe/eustall: Do not support EU stall on SRIOV VF	Harish Chegondi
	EU stall sampling is not supported on SRIOV VF. Do not initialize or open EU stall stream on SRIOV VF. Fixes: 9a0b11d4cf3b ("drm/xe/eustall: Add support to init, enable and disable EU stall sampling") Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/10db5d1c7e17aadca7078ff74575b7ffc0d5d6b8.1745215022.git.harish.chegondi@intel.com (cherry picked from commit 6ed20625a4b8189a1bd6598aa58e03147ce378ee) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-01	drm/xe/eustall: Resolve a possible circular locking dependency	Harish Chegondi
	Use a separate lock in the polling function eu_stall_data_buf_poll() instead of eu_stall->stream_lock. This would prevent a possible circular locking dependency leading to a deadlock as described below. This would also require additional locking with the new lock in the read function. <4> [787.192986] ====================================================== <4> [787.192988] WARNING: possible circular locking dependency detected <4> [787.192991] 6.14.0-rc7-xe+ #1 Tainted: G U <4> [787.192993] ------------------------------------------------------ <4> [787.192994] xe_eu_stall/20093 is trying to acquire lock: <4> [787.192996] ffff88819847e2c0 ((work_completion) (&(&stream->buf_poll_work)->work)), at: __flush_work+0x1f8/0x5e0 <4> [787.193005] but task is already holding lock: <4> [787.193007] ffff88814ce83ba8 (&gt->eu_stall->stream_lock){3:3}, at: xe_eu_stall_stream_ioctl+0x41/0x6a0 [xe] <4> [787.193090] which lock already depends on the new lock. <4> [787.193093] the existing dependency chain (in reverse order) is: <4> [787.193095] -> #1 (&gt->eu_stall->stream_lock){+.+.}-{3:3}: <4> [787.193099] __mutex_lock+0xb4/0xe40 <4> [787.193104] mutex_lock_nested+0x1b/0x30 <4> [787.193106] eu_stall_data_buf_poll_work_fn+0x44/0x1d0 [xe] <4> [787.193155] process_one_work+0x21c/0x740 <4> [787.193159] worker_thread+0x1db/0x3c0 <4> [787.193161] kthread+0x10d/0x270 <4> [787.193164] ret_from_fork+0x44/0x70 <4> [787.193168] ret_from_fork_asm+0x1a/0x30 <4> [787.193172] -> #0 ((work_completion)(&(&stream->buf_poll_work)->work)){+.+.}-{0:0}: <4> [787.193176] __lock_acquire+0x1637/0x2810 <4> [787.193180] lock_acquire+0xc9/0x300 <4> [787.193183] __flush_work+0x219/0x5e0 <4> [787.193186] cancel_delayed_work_sync+0x87/0x90 <4> [787.193189] xe_eu_stall_disable_locked+0x9a/0x260 [xe] <4> [787.193237] xe_eu_stall_stream_ioctl+0x5b/0x6a0 [xe] <4> [787.193285] __x64_sys_ioctl+0xa4/0xe0 <4> [787.193289] x64_sys_call+0x131e/0x2650 <4> [787.193292] do_syscall_64+0x91/0x180 <4> [787.193295] entry_SYSCALL_64_after_hwframe+0x76/0x7e <4> [787.193299] other info that might help us debug this: <4> [787.193302] Possible unsafe locking scenario: <4> [787.193304] CPU0 CPU1 <4> [787.193305] ---- ---- <4> [787.193306] lock(&gt->eu_stall->stream_lock); <4> [787.193308] lock((work_completion) (&(&stream->buf_poll_work)->work)); <4> [787.193311] lock(&gt->eu_stall->stream_lock); <4> [787.193313] lock((work_completion) (&(&stream->buf_poll_work)->work)); <4> [787.193315] * DEADLOCK * Fixes: 760edec939685 ("drm/xe/eustall: Add support to read() and poll() EU stall data") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4598 Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/c896932fca84f79db2df5942911997ed77b2b9b6.1744934656.git.harish.chegondi@intel.com (cherry picked from commit c2b1f1b8641372bb2e563c49eb25632623a860fc) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-01	drm/xe: fix devcoredump chunk alignmnent calculation	Arnd Bergmann
	The device core dumps are copied in 1.5GB chunks, which leads to a link-time error on 32-bit builds because of the 64-bit division not getting trivially turned into mask and shift operations: ERROR: modpost: "__moddi3" [drivers/gpu/drm/xe/xe.ko] undefined! On top of this, I noticed that the ALIGN_DOWN() usage here cannot work because that is only defined for power-of-two alignments. Change ALIGN_DOWN into an explicit div_u64_rem() that avoids the link error and hopefully produces the right results. Doing a 1.5GB kvmalloc() does seem a bit suspicious as well, e.g. this will clearly fail on any 32-bit platform and is also likely to run out of memory on 64-bit systems under memory pressure, so using a much smaller power-of-two chunk size might be a good idea instead. v2: - Always call div_u64_rem (Matt) Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504251238.JsNgFeFc-lkp@intel.com/ Fixes: c4a2e5f865b7 ("drm/xe: Add devcoredump chunking") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250501012545.1045247-1-matthew.brost@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-04-29	drm/xe/vf: Fix guc_info debugfs for VFs	Daniele Ceraolo Spurio
	The guc_info debugfs attempts to read a bunch of registers that the VFs doesn't have access to, so fix it by skipping the reads. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4775 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250423173908.1571412-1-daniele.ceraolospurio@intel.com
2025-04-29	drm/xe/tests/mocs: Hold XE_FORCEWAKE_ALL for LNCF regs	Tejas Upadhyay
	LNCF registers report wrong values when XE_FORCEWAKE_GT only is held. Holding XE_FORCEWAKE_ALL ensures correct operations on LNCF regs. V2(Himal): - Use xe_force_wake_ref_has_domain Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1999 Fixes: a6a4ea6d7d37 ("drm/xe: Add mocs kunit") Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250428082357.1730068-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-04-28	drm/xe/guc: Fix capture of steering registers	John Harrison
	The list of registers to capture on a GPU hang includes some that require steering. Unfortunately, the flag to say this was being wiped to due a missing OR on the assignment of the next flag field. Fix that. Fixes: b170d696c1e2 ("drm/xe/guc: Add XE_LP steered register lists") Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-xe@lists.freedesktop.org Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Link: https://lore.kernel.org/r/20250417195215.3002210-2-John.C.Harrison@Intel.com (cherry picked from commit 532da44b54a10d50ebad14a8a02bd0b78ec23e8b) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-28	drm/xe/svm: fix dereferencing error pointer in drm_gpusvm_range_alloc()	Harshit Mogalapalli
	xe_svm_range_alloc() returns ERR_PTR(-ENOMEM) on failure and there is a dereference of "range" after that: --> range->gpusvm = gpusvm; In xe_svm_range_alloc(), when memory allocation fails return NULL instead to handle this situation. Fixes: 99624bdff867 ("drm/gpusvm: Add support for GPU Shared Virtual Memory") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/adaef4dd-5866-48ca-bc22-4a1ddef20381@stanley.mountain/ Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250323124907.3946370-1-harshit.m.mogalapalli@oracle.com (cherry picked from commit 7a0322122cfdd9a6f10fc7701023d75c98eb3d22) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-28	Merge drm/drm-next into drm-xe-next	Thomas Hellström
	Additional backmerge to avoid excessive diffstats when sending PR. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-04-27	drm/xe: Drop force_alloc from xe_bo_evict in selftests	Matthew Brost
	The force_alloc flag was removed from TTM / Xe but updating the selftests to new function interfaces was missed. Remove argument from xe_bo_evict in selftests. v2: - Fix dma-buf, migrate selftests (CI) Fixes: 55df7c0c62c1 ("drm/ttm/xe: drop unused force_alloc flag") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://lore.kernel.org/r/20250428022318.877860-1-matthew.brost@intel.com
2025-04-25	drm/xe/eustall: Do not support EU stall on SRIOV VF	Harish Chegondi
	EU stall sampling is not supported on SRIOV VF. Do not initialize or open EU stall stream on SRIOV VF. Fixes: 9a0b11d4cf3b ("drm/xe/eustall: Add support to init, enable and disable EU stall sampling") Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/10db5d1c7e17aadca7078ff74575b7ffc0d5d6b8.1745215022.git.harish.chegondi@intel.com
2025-04-25	drm/xe/eustall: Resolve a possible circular locking dependency	Harish Chegondi
	Use a separate lock in the polling function eu_stall_data_buf_poll() instead of eu_stall->stream_lock. This would prevent a possible circular locking dependency leading to a deadlock as described below. This would also require additional locking with the new lock in the read function. <4> [787.192986] ====================================================== <4> [787.192988] WARNING: possible circular locking dependency detected <4> [787.192991] 6.14.0-rc7-xe+ #1 Tainted: G U <4> [787.192993] ------------------------------------------------------ <4> [787.192994] xe_eu_stall/20093 is trying to acquire lock: <4> [787.192996] ffff88819847e2c0 ((work_completion) (&(&stream->buf_poll_work)->work)), at: __flush_work+0x1f8/0x5e0 <4> [787.193005] but task is already holding lock: <4> [787.193007] ffff88814ce83ba8 (&gt->eu_stall->stream_lock){3:3}, at: xe_eu_stall_stream_ioctl+0x41/0x6a0 [xe] <4> [787.193090] which lock already depends on the new lock. <4> [787.193093] the existing dependency chain (in reverse order) is: <4> [787.193095] -> #1 (&gt->eu_stall->stream_lock){+.+.}-{3:3}: <4> [787.193099] __mutex_lock+0xb4/0xe40 <4> [787.193104] mutex_lock_nested+0x1b/0x30 <4> [787.193106] eu_stall_data_buf_poll_work_fn+0x44/0x1d0 [xe] <4> [787.193155] process_one_work+0x21c/0x740 <4> [787.193159] worker_thread+0x1db/0x3c0 <4> [787.193161] kthread+0x10d/0x270 <4> [787.193164] ret_from_fork+0x44/0x70 <4> [787.193168] ret_from_fork_asm+0x1a/0x30 <4> [787.193172] -> #0 ((work_completion)(&(&stream->buf_poll_work)->work)){+.+.}-{0:0}: <4> [787.193176] __lock_acquire+0x1637/0x2810 <4> [787.193180] lock_acquire+0xc9/0x300 <4> [787.193183] __flush_work+0x219/0x5e0 <4> [787.193186] cancel_delayed_work_sync+0x87/0x90 <4> [787.193189] xe_eu_stall_disable_locked+0x9a/0x260 [xe] <4> [787.193237] xe_eu_stall_stream_ioctl+0x5b/0x6a0 [xe] <4> [787.193285] __x64_sys_ioctl+0xa4/0xe0 <4> [787.193289] x64_sys_call+0x131e/0x2650 <4> [787.193292] do_syscall_64+0x91/0x180 <4> [787.193295] entry_SYSCALL_64_after_hwframe+0x76/0x7e <4> [787.193299] other info that might help us debug this: <4> [787.193302] Possible unsafe locking scenario: <4> [787.193304] CPU0 CPU1 <4> [787.193305] ---- ---- <4> [787.193306] lock(&gt->eu_stall->stream_lock); <4> [787.193308] lock((work_completion) (&(&stream->buf_poll_work)->work)); <4> [787.193311] lock(&gt->eu_stall->stream_lock); <4> [787.193313] lock((work_completion) (&(&stream->buf_poll_work)->work)); <4> [787.193315] * DEADLOCK * Fixes: 760edec939685 ("drm/xe/eustall: Add support to read() and poll() EU stall data") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4598 Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/c896932fca84f79db2df5942911997ed77b2b9b6.1744934656.git.harish.chegondi@intel.com
2025-04-26	Merge tag 'drm-xe-next-2025-04-17' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/xe/kernel into drm-next Core Changes: Fix drm_gpusvm kernel-doc (Lucas) Driver Changes: - Release guc ids before cancelling work (Tejas) - Remove a duplicated pc_start_call (Rodrigo) - Fix an incorrect assert in previous userptr fixes (Thomas) - Remove gen11 assertions and prefixes (Lucas) - Drop sentinels from arg to xe_rtp_process_to_src (Lucas) - Temporarily disable D3Cold on BMG (Rodrigo) - Fix MOCS debugfs LNCF readout (Tvrtko) - Some ring flush cleanups (Tvrtko) - Use unsigned int for alignment in fb pinning code (Tvrtko) - Retry and wait longer for GuC PC start (Rodrigo) - Recognize 3DSTATE_COARSE_PIXEL in LRC dumps (Matt Roper) - Remove reduntant check in xe_vm_create_ioctl() (Xin) - A bunch of SRIOV updates (Michal) - Add stats for SVM page-faults (Francois) - Fix an UAF (Harish) - Expose fan speed (Raag) - Fix exporting xe buffer objects multiple times (Tomasz) - Apply a workaround (Vinay) - Simplify pinned bo iteration (Thomas) - Remove an incorrect "static" keywork (Lucas) - Add support for separate firmware files on each GT (Lucas) - Survivability handling fixes (Lucas) - Allow to inject error in early probe (Lucas) - Fix unmet direct dependencies warning (Yue Haibing) - More error injection during probe (Francois) - Coding style fix (Maarten) - Additional stats support (Riana) - Add fault injection for xe_oa_alloc_regs (Nakshrtra) - Add a BMG PCI ID (Matt Roper) - Some SVM fixes and preliminary SVM multi-device work (Thomas) - Switch the migrate code from drm managed to dev managed (Aradhya) - Fix an out-of-bounds shift when invalidating TLB (Thomas) - Ensure fixed_slice_mode gets set after ccs_mode change (Niranjana) - Use local fence in error path of xe_migrate_clear (Matthew Brost) - More Workarounds (Julia) - Define sysfs_ops on all directories (Tejas) - Set power state to D3Cold during s2idle/s3 (Badal) - Devcoredump output fix (John) - Avoid plain 64-bit division (Arnd Bergmann) - Reword a debug message (John) - Don't print a hwconfig error message when forcing execlists (Stuart) - Restore an error code to avoid a smatch warning (Rodrigo) - Invalidate L3 read-only cachelines for geometry streams too (Kenneth) - Make PPHWSP size explicit in xe_gt_lrc_size() (Gustavo) - Add GT frequency events (Vinay) - Fix xe_pt_stage_bind_walk kerneldoc (Thomas) - Add a workaround (Aradhya) - Rework pinned save/restore (Matthew Auld, Matthew Brost) - Allow non-contig VRAM kernel BO (Matthew Auld) - Support non-contig VRAM provisioning for SRIOV (Matthew Auld) - Allow scratch-pages for unmapped parts of page-faulting VMs. (Oak) - Ensure XE_BO_FLAG_CPU_ADDR_MIRROR had a unique value (Matt Roper) - Fix taking an invalid lock on wedge (Lucas) - Configs and documentation for survivability mode (Riana) - Remove an unused macro (Shuicheng) - Work around a page-fault full error (Matt Brost) - Enable a SRIOV workaround (John) - Bump the recommended GuC version (John) - Allow to drop VRAM resizing (Lucas) - Don't expose privileged debugfs files if VF (Michal) - Don't show GGTT/LMEM debugfs files under media GT (Michal) - Adjust ring-buffer emission for maximum possible size (Tvrtko) - Fix notifier vs folio lock deadlock (Matthew Auld) - Stop relying on placement for dma-buf unmap Matthew Auld) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/aADWaEFKVmxSnDLo@fedora
2025-04-24	drm/xe: Abort printing coredump in VM printer output if full	Matthew Brost
	Abort printing coredump in VM printer output if full. Helps speedup large coredumps which need to walked multiple times in xe_devcoredump_read. v2: - s/drm_printer_is_full/drm_coredump_printer_is_full (Jani) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250423171725.597955-5-matthew.brost@intel.com
2025-04-24	drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access	Matthew Brost
	Add migrate layer functions to access VRAM and update xe_ttm_access_memory to use for non-visible access and large (more than 16k) BO access. 8G devcoreump on BMG observed 3 minute CPU copy time vs. 3s GPU copy time. v4: - Fix non-page aligned accesses - Add support for small / unaligned access - Update commit message indicating migrate used for large accesses (Auld) - Fix warning in xe_res_cursor for non-zero offset v5: - Fix 32 bit build (CI) v6: - Rebase and use SVM migration copy functions v7: - Fix build error (CI) v8: - Remove ifdef around VRAM copy functions (CI) - Use break statement in dma unmmaping (Jonathan) - Use if/else rather than goto (Jonathan) - Use single return point (Jonathan) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250423171725.597955-3-matthew.brost@intel.com
2025-04-24	drm/xe: Add devcoredump chunking	Matthew Brost
	Chunk devcoredump into 1.5G pieces to avoid hitting the kvmalloc limit of 2G. Simple algorithm reads 1.5G at time in xe_devcoredump_read callback as needed. Some memory allocations are changed to GFP_ATOMIC as they done in xe_devcoredump_read which holds lock in the path of reclaim. The allocations are small, so in practice should never fail. v2: - Update commit message wrt gfp atomic (John H) v6: - Drop GFP_ATOMIC change for hwconfig (John H) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250423171725.597955-2-matthew.brost@intel.com
2025-04-24	Merge drm/drm-next into drm-xe-next	Thomas Hellström
	Backmerge to bring in linux 6.15-rc. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-04-24	drm/ttm/xe: drop unused force_alloc flag	Dave Airlie
	This flag used to be used in the old memory tracking code, that code got migrated into the vmwgfx driver[1], and then got removed from the tree[2], but this piece got left behind. [1] f07069da6b4c ("drm/ttm: move memory accounting into vmwgfx v4") [2] 8aadeb8ad874 ("drm/vmwgfx: Remove the dedicated memory accounting") Cleanup the dead code. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-04-23	PCI: Add CONFIG_MMU dependency	Arnd Bergmann
	It turns out that there are no platforms that have PCI but don't have an MMU, so adding a Kconfig dependency on CONFIG_PCI simplifies build testing kernels for those platforms a lot, and avoids a lot of inadvertent build regressions. Add a dependency for CONFIG_PCI and remove all the ones for PCI specific device drivers that are currently marked not having it. There are a few platforms that have an optional MMU, but they usually cannot have PCI at all. The one exception is Coldfire MCF54xx, but this is mainly for historic reasons, and anyone using those chips should really use the MMU these days. Link: https://lore.kernel.org/lkml/a41f1b20-a76c-43d8-8c36-f12744327a54@app.fastmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20250423202215.3315550-1-arnd@kernel.org
2025-04-23	drm/xe: Fix CFI violation when accessing sysfs files	Jeevaka Prabu Badrappan
	When an attribute group is created with sysfs_create_group() or sysfs_create_files() the ->sysfs_ops() callback is set to kobj_sysfs_ops, which sets the ->show() callback to kobj_attr_show(). kobj_attr_show() uses container_of() to get the ->show() callback from the attribute it was passed, meaning the ->show() callback needs to be the same type as the ->show() callback in 'struct kobj_attribute'. However, cur_freq_show() has the type of the ->show() callback in 'struct device_attribute', which causes a CFI violation when opening the 'id' sysfs node under gtidle/freq/throttle. This happens to work because the layout of 'struct kobj_attribute' and 'struct device_attribute' are the same, so the container_of() cast happens to allow the ->show() callback to still work. Changed the type of cur_freq_show() and few more functions to match the ->show() callback in 'struct kobj_attributes' to resolve the CFI violation. CFI failure seen while accessing sysfs files under /sys/class/drm/card0/device/tile0/gt/gtidle/ /sys/class/drm/card0/device/tile0/gt/freq0/ /sys/class/drm/card0/device/tile0/gt/freq0/throttle/ [ 2599.618075] RIP: 0010:__cfi_cur_freq_show+0xd/0x10 [xe] [ 2599.624452] Code: 44 c1 44 89 fa e8 03 95 39 f2 48 98 5b 41 5e 41 5f 5d c3 c9 [ 2599.646638] RSP: 0018:ffffbe438ead7d10 EFLAGS: 00010286 [ 2599.652823] RAX: ffff9f7d8b3845d8 RBX: ffff9f7dee8c95d8 RCX: 0000000000000000 [ 2599.661246] RDX: ffff9f7e6f439000 RSI: ffffffffc13ada30 RDI: ffff9f7d975d4b00 [ 2599.669669] RBP: ffffbe438ead7d18 R08: 0000000000001000 R09: ffff9f7e6f439000 [ 2599.678092] R10: 00000000e07304a6 R11: ffffffffc1241ca0 R12: ffffffffb4836ea0 [ 2599.688435] R13: ffff9f7e45fb1180 R14: ffff9f7d975d4b00 R15: ffff9f7e6f439000 [ 2599.696860] FS: 000076b02b66cfc0(0000) GS:ffff9f80ef400000(0000) knlGS:00000 [ 2599.706412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2599.713196] CR2: 00005f80d94641a9 CR3: 00000001e44ec006 CR4: 0000000100f72ef0 [ 2599.721618] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2599.730041] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 2599.738464] PKRU: 55555554 [ 2599.741655] Call Trace: [ 2599.744541] <TASK> [ 2599.747017] ? __die_body+0x69/0xb0 [ 2599.751151] ? die+0xa9/0xd0 [ 2599.754548] ? do_trap+0x89/0x160 [ 2599.758476] ? __cfi_cur_freq_show+0xd/0x10 [xe b37985c94829727668bd7c5b33c1] [ 2599.768315] ? handle_invalid_op+0x69/0x90 [ 2599.773167] ? __cfi_cur_freq_show+0xd/0x10 [xe b37985c94829727668bd7c5b33c1] [ 2599.783010] ? exc_invalid_op+0x36/0x60 [ 2599.787552] ? fred_hwexc+0x123/0x1a0 [ 2599.791873] ? fred_entry_from_kernel+0x7b/0xd0 [ 2599.797219] ? asm_fred_entrypoint_kernel+0x45/0x70 [ 2599.802976] ? act_freq_show+0x70/0x70 [xe b37985c94829727668bd7c5b33c1d9998] [ 2599.812301] ? __cfi_cur_freq_show+0xd/0x10 [xe b37985c94829727668bd7c5b33c1] [ 2599.822137] ? __kmalloc_node_noprof+0x1f3/0x420 [ 2599.827594] ? __kvmalloc_node_noprof+0xcb/0x180 [ 2599.833045] ? kobj_attr_show+0x22/0x40 [ 2599.837571] sysfs_kf_seq_show+0xa8/0x110 [ 2599.842302] kernfs_seq_show+0x38/0x50 Signed-off-by: Jeevaka Prabu Badrappan <jeevaka.badrappan@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250422171852.85558-1-jeevaka.badrappan@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-04-23	drm/xe: handle pinned memory in PM notifier	Matthew Auld
	Userspace is still alive and kicking at this point so actually moving pinned stuff here is tricky. However, we can instead pre-allocate the backup storage upfront from the notifier, such that we scoop up as much as we can, and then leave the final .suspend() to do the actual copy (or allocate anything that we missed). That way the bulk of our allocations will hopefully be done outside the more restrictive .suspend(). We do need to be extra careful though, since the pinned handling can now race with PM notifier, like something becoming unpinned after we prepare it from the notifier. v2 (Thomas): - Fix kernel doc and drop the pin as soon as we are done with the restore, instead of deferring to later. Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250416150913.434369-8-matthew.auld@intel.com
2025-04-23	drm/xe: share bo dma-resv with backup object	Matthew Auld
	We end up needing to grab both locks together anyway and keep them held until we complete the copy or add the fence. Plus the backup_obj is short lived and tied to the parent object, so seems reasonable to share the same dma-resv. This will simplify the locking here, and in follow up patches. v2: - Hold reference to the parent bo to be sure the shared dma-resv can't go out of scope too soon. (Thomas) Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250416150913.434369-7-matthew.auld@intel.com
2025-04-23	drm/xe: evict user memory in PM notifier	Matthew Auld
	In the case of VRAM we might need to allocate large amounts of GFP_KERNEL memory on suspend, however doing that directly in the driver .suspend()/.prepare() callback is not advisable (no swap for example). To improve on this we can instead hook up to the PM notifier framework which is invoked at an earlier stage. We effectively call the evict routine twice, where the notifier will have hopefully have cleared out most if not everything by the time we call it a second time when entering the .suspend() callback. For s4 we also get the added benefit of allocating the system pages before the hibernation image size is calculated, which looks more sensible. Note that the .suspend() hook is still responsible for dealing with all the pinned memory. Improving that is left to another patch. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1181 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4566 Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250416150913.434369-6-matthew.auld@intel.com
2025-04-22	drm/xe/guc: Cache DSS info when creating capture register list	John Harrison
	Calculating the DSS id (index of a steered register) currently requires reading state from the hwconfig table and that currently requires dynamically allocating memory. The GuC based register capture (for dev core dumps) includes this index as part of the register name in the dump. However, it was calculating said index at the time of the dump for every dump. That is wasteful. It also breaks anyone trying to do the dump at a time when memory allocations are not allowed. So rather than calculating on every print, just calculate at start of day when creating the register list in the first place. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250417213303.3021243-1-John.C.Harrison@Intel.com
2025-04-22	drm/xe/guc: Use the steering flag when printing registers	John Harrison
	The printing code was doing a test on which list a register was in to decide whether it is steered or not. That might be valid at this moment but there may be other reasons for extended lists in the future. Plus, there is a flag specifically for identifying steered registers. So, just use that instead - it is simpler and safer. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250417195215.3002210-3-John.C.Harrison@Intel.com
2025-04-22	drm/xe/guc: Fix capture of steering registers	John Harrison
	The list of registers to capture on a GPU hang includes some that require steering. Unfortunately, the flag to say this was being wiped to due a missing OR on the assignment of the next flag field. Fix that. Fixes: b170d696c1e2 ("drm/xe/guc: Add XE_LP steered register lists") Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-xe@lists.freedesktop.org Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Link: https://lore.kernel.org/r/20250417195215.3002210-2-John.C.Harrison@Intel.com
2025-04-22	drm/xe/svm: fix dereferencing error pointer in drm_gpusvm_range_alloc()	Harshit Mogalapalli
	xe_svm_range_alloc() returns ERR_PTR(-ENOMEM) on failure and there is a dereference of "range" after that: --> range->gpusvm = gpusvm; In xe_svm_range_alloc(), when memory allocation fails return NULL instead to handle this situation. Fixes: 99624bdff867 ("drm/gpusvm: Add support for GPU Shared Virtual Memory") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/adaef4dd-5866-48ca-bc22-4a1ddef20381@stanley.mountain/ Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250323124907.3946370-1-harshit.m.mogalapalli@oracle.com
2025-04-17	drm/xe/pxp: do not queue unneeded terminations from debugfs	Daniele Ceraolo Spurio
	The PXP terminate debugfs currently unconditionally simulates a termination, no matter what the HW status is. This is unneeded if PXP is not in use and can cause errors if the HW init hasn't completed yet. To solve these issues, we can simply limit the terminations to the cases where PXP is fully initialized and in use. v2: s/pxp_status/ready/ to avoid confusion with pxp->status (John) Fixes: 385a8015b214 ("drm/xe/pxp: Add PXP debugfs support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4749 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250416201622.1295369-1-daniele.ceraolospurio@intel.com (cherry picked from commit ba1f62a0cac84757ca35f4217e3cd3a2654233ae) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-17	drm/xe/dma_buf: stop relying on placement in unmap	Matthew Auld
	The is_vram() is checking the current placement, however if we consider exported VRAM with dynamic dma-buf, it looks possible for the xe driver to async evict the memory, notifying the importer, however importer does not have to call unmap_attachment() immediately, but rather just as "soon as possible", like when the dma-resv idles. Following from this we would then pipeline the move, attaching the fence to the manager, and then update the current placement. But when the unmap_attachment() runs at some later point we might see that is_vram() is now false, and take the complete wrong path when dma-unmapping the sg, leading to explosions. To fix this check if the sgl was mapping a struct page. v2: - The attachment can be mapped multiple times it seems, so we can't really rely on encoding something in the attachment->priv. Instead see if the page_link has an encoded struct page. For vram we expect this to be NULL. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4563 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250410162716.159403-2-matthew.auld@intel.com (cherry picked from commit d755887f8e5a2a18e15e6632a5193e5feea18499) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-17	drm/xe/userptr: fix notifier vs folio deadlock	Matthew Auld
	User is reporting what smells like notifier vs folio deadlock, where migrate_pages_batch() on core kernel side is holding folio lock(s) and then interacting with the mappings of it, however those mappings are tied to some userptr, which means calling into the notifier callback and grabbing the notifier lock. With perfect timing it looks possible that the pages we pulled from the hmm fault can get sniped by migrate_pages_batch() at the same time that we are holding the notifier lock to mark the pages as accessed/dirty, but at this point we also want to grab the folio locks(s) to mark them as dirty, but if they are contended from notifier/migrate_pages_batch side then we deadlock since folio lock won't be dropped until we drop the notifier lock. Fortunately the mark_page_accessed/dirty is not really needed in the first place it seems and should have already been done by hmm fault, so just remove it. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4765 Fixes: 0a98219bcc96 ("drm/xe/hmm: Don't dereference struct page pointers without notifier lock") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.10+ Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250414132539.26654-2-matthew.auld@intel.com (cherry picked from commit bd7c0cb695e87c0e43247be8196b4919edbe0e85) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-17	drm/xe: Set LRC addresses before guc load	Lucas De Marchi
	The metadata saved in the ADS is read by GuC when it's initialized. Saving the addresses to the LRCs when they are populated is too late as GuC will keep using the old ones. This was causing GuC to use the RCS LRC for any engine class. It's not a big problem on a Linux-only scenario since the they are used by GuC only on media engines when the watchdog is triggered. However, in a virtualization scenario with Windows as the VF, it causes the wrong LRCs to be loaded as the watchdog is used for all engines. Fix it by letting guc_golden_lrc_init() initialize the metadata, like other *_init() functions, and later guc_golden_lrc_populate() to copy the LRCs to the right places. The former is called before the second GuC load, while the latter is called after LRCs have been recorded. Cc: Chee Yin Wong <chee.yin.wong@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> # v6.11+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Chee Yin Wong <chee.yin.wong@intel.com> Link: https://lore.kernel.org/r/20250409-fix-guc-ads-v1-1-494135f7a5d0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit c31a0b6402d15b530514eee9925adfcb8cfbb1c9) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-17	drm/xe: Introduce fault injection for guc CTB send/recv	Satyanarayana K V P
	Fault can be injected with below steps. FAILTYPE=fail_function FAILFUNC=xe_guc_ct_send_recv echo > /sys/kernel/debug/$FAILTYPE/inject echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject printf %#x -19 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 10 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval echo -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250403120641.7258-3-satyanarayana.k.v.p@intel.com
2025-04-17	drm/xe: Introduce fault injection for guc mmio send/recv.	Satyanarayana K V P
	Fault can be injected with below steps. FAILTYPE=fail_function FAILFUNC=xe_guc_mmio_send_recv echo > /sys/kernel/debug/$FAILTYPE/inject echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject printf %#x -5 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 10 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval echo -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250403120641.7258-2-satyanarayana.k.v.p@intel.com
2025-04-17	drm/xe: Use GT oriented message to report engine activity error	Michal Wajdeczko
	We are enabling/disabling engine activity on per-GT basis, so any errors should be also reported per GT, like: [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to enable engine activity function stats (-ENOSPC) [ ] xe 0000:00:02.0: [drm] GT1: PF: Failed to enable engine activity function stats (-ENOSPC) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250414202347.1909-2-michal.wajdeczko@intel.com
2025-04-17	drm/xe/guc: Fix out-of-bound while enabling engine activity stats	Michal Wajdeczko
	In the PF mode we allocate array of struct engine_activity_group that holds activity data split for the PF and all potential VFs. But while preparing data for use by VFs we ended with bad index. [ ] BUG: KASAN: slab-out-of-bounds in xe_guc_engine_activity_function_stats+0x41e/0x4f0 [xe] [ ] Call Trace: [ ] <TASK> [ ] dump_stack_lvl+0x91/0xf0 [ ] print_report+0xd1/0x680 [ ] ? __virt_addr_valid+0x23a/0x440 [ ] ? kasan_addr_to_slab+0xd/0xb0 [ ] kasan_report+0xe7/0x130 [ ] ? xe_guc_engine_activity_function_stats+0x41e/0x4f0 [xe] [ ] ? xe_guc_engine_activity_function_stats+0x41e/0x4f0 [xe] [ ] __asan_report_store8_noabort+0x17/0x30 [ ] xe_guc_engine_activity_function_stats+0x41e/0x4f0 [xe] [ ] pf_engine_activity_stats+0x1b6/0x7f0 [xe] [ ] ? kobject_put+0x5f/0x470 [ ] xe_pci_sriov_configure+0x28c9/0x3270 [xe] [ ] ? __pfx_dev_attr_store+0x10/0x10 [ ] ? kstrtoull+0x3b/0x70 [ ] ? __pfx___lock_acquire+0x10/0x10 [ ] ? kstrtou16+0x65/0xf0 [ ] sriov_numvfs_store+0x20c/0x400 [ ] ? __pfx_sriov_numvfs_store+0x10/0x10 [ ] ? __pfx__copy_from_iter+0x10/0x10 [ ] ? __pfx_dev_attr_store+0x10/0x10 [ ] dev_attr_store+0x3b/0x80 [ ] ? sysfs_file_ops+0x135/0x190 Fixes: 2de3f38fbf89 ("drm/xe: Add support for per-function engine activity") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250414202347.1909-1-michal.wajdeczko@intel.com
2025-04-17	drm/xe/pxp: do not queue unneeded terminations from debugfs	Daniele Ceraolo Spurio
	The PXP terminate debugfs currently unconditionally simulates a termination, no matter what the HW status is. This is unneeded if PXP is not in use and can cause errors if the HW init hasn't completed yet. To solve these issues, we can simply limit the terminations to the cases where PXP is fully initialized and in use. v2: s/pxp_status/ready/ to avoid confusion with pxp->status (John) Fixes: 385a8015b214 ("drm/xe/pxp: Add PXP debugfs support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4749 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250416201622.1295369-1-daniele.ceraolospurio@intel.com
2025-04-17	drm/xe/compat: clean up unused platform check macros	Jani Nikula
	Clean up unused platform check macros from compat i915_drv.h. Display no longer uses any of the IS_*() platform checks. The remaining users are part of the soc/ code. Note that in a comment. Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Link: https://lore.kernel.org/r/2f09b3c60223d9426049a28d3d06a3ec2c6ec348.1744222449.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-04-17	drm/i915/fb: convert intel_fbdev.[ch] and intel_fbdev_fb.[ch] to struct ↵	Jani Nikula
	intel_display Going forward, struct intel_display is the main display device data pointer. Convert intel_fbdev.[ch] and as much as possible of intel_fbdev_fb.[ch] to struct intel_display. Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Link: https://lore.kernel.org/r/49651754f3716041f97984e47c15d331851870a5.1744222449.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>