summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/xe/xe_pci.c
AgeCommit message (Collapse)Author
2025-07-10drm/xe/sriov: Mark BMG as SR-IOV capableMichal Wajdeczko
Enable SR-IOV support for BMG platforms. Note that as other flags from the platform descriptor, it only means it may have that capability: it still depends on runtime checks for the proper support in HW and firmware. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250710103040.375610-3-jakub1.kolakowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-08drm/xe/ptl: Drop force_probe requirementMatt Atwood
Panther Lake has proven to be stable through testing and use. Remove the force_probe requirement and enable the platform by default. Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250707201959.319406-1-matthew.s.atwood@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-03Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"Matthew Brost
This reverts commit fe0154cf8222d9e38c60ccc124adb2f9b5272371. Seeing some unexplained random failures during LRC context switches with indirect ring state enabled. The failures were always there, but the repro rate increased with the addition of WA BB as a separate BO. Commit 3a1edef8f4b5 ("drm/xe: Make WA BB part of LRC BO") helped to reduce the issues in the context switches, but didn't eliminate them completely. Indirect ring state is not required for any current features, so disable for now until failures can be root caused. Cc: stable@vger.kernel.org Fixes: fe0154cf8222 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-02drm/xe: Assign GT IDs properly on multi-tile + multi-GT platformsMatt Roper
Although "multi-tile" and "multiple GTs per tile" are mutually-exclusive characteristics on all of our platforms today, this may not always be true. Assign GT IDs according to xe->info.max_gt_per_tile in a way that should work even if future platforms have different configurations. This patch should not change the behavior of current platforms; it only future-proofs for potential future designs. v2: - Re-calculate gt_count if tile count gets reduced by MTCFG. (PVC CI) Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-14-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-02drm/xe/tests/pci: Ensure all platforms have a valid GT/tile countMatt Roper
Add a simple kunit test to ensure each platform's GT per tile count is non-zero and does not exceed the global XE_MAX_GT_PER_TILE definition. We need to move 'struct xe_subplatform_desc' from the .c file to the types header to ensure it is accessible from the kunit test. v2: - Rebase on latest xe_pci test rework from Michal and convert to a parameterized test that runs on each PCI ID supported by the driver. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-07-02drm/xe: Track maximum GTs per tile on a per-platform basisMatt Roper
Today all of our platforms fall into one of three cases: * Single tile platforms with a single (primary) GT * Single tile platforms with two GTs (primary + media) * Two-tile platforms with a single GT (primary) in each Our numbering of GTs has been a bit inconsistent between platforms (e.g., GT1 is the media GT on some platforms, but the second tile's primary GT on others). In the future we'll likely have platforms that are both multi-tile and multi-GT, which will make the situation more confusing. We could also wind up with more than just two types of GTs at some point in the future. Going forward we should standardize the way we assign uapi GT IDs to internal GT structures. Let's declare that for userspace GT ID n, GT[n]'s tile = n / (max gt per tile) GT[n]'s slot within tile = n % (max gt per tile) We don't want the GT numbering to change for any of our current platforms since the current IDs are part of our ABI contract with userspace so this means we should track the 'max gt per tile' value on a per-platform basis rather than just using a single value across the driver. Encode this into device descriptors in xe_pci.c and use the per-platform number for various checks in the code. Constant XE_MAX_GT_PER_TILE will remain just as the maximum across all platforms for easy of sizing array allocations. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-12-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-06-23drm/xe/nvm: add on-die non-volatile memory deviceAlexander Usyskin
Enable access to internal non-volatile memory on DGFX with GSC/CSC devices via a child device. The nvm child device is exposed via auxiliary bus. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://lore.kernel.org/r/20250617145159.3803852-7-alexander.usyskin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-18drm/xe/xe3: Add support for media IP version 30.02Matt Roper
Media version 30.02 should be treated the same as other Xe3 IP, but will have a slightly different set of workarounds. -v2: Extend the range in existing WA entry (Bala) -v3: Revert v2, Do not extend the range for the time being(Matt) Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20250613193146.3549862-4-dnyaneshwar.bhadane@intel.com
2025-06-18drm/xe/xe3: Add support for graphics IP version 30.03Matt Roper
Graphics version 30.03 should be treated the same as other Xe3 IP, but will have a slightly different set of workarounds. -v2: Merge and extend the WA onto existing entry (Bala) -v3: Revert v2's feedback changes and keep entry saparate (Matt). Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@inte.com> Link: https://lore.kernel.org/r/20250613193146.3549862-3-dnyaneshwar.bhadane@intel.com
2025-06-11drm/xe/xe2_hpg: Define additional Xe2_HPG GMD_IDShekhar Chauhan
Add another GMD_ID for Xe2_HPG Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: James Ausmus <james.ausmus@intel.com> Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250605190804.1287289-4-dnyaneshwar.bhadane@intel.com
2025-05-30drm/xe/hwmon: Add support to manage power limits though mailboxKarthik Poosa
Add support to manage power limits using pcode mailbox commands for supported platforms. v2: - Address review comments. (Badal) - Use mailbox commands instead of registers to manage power limits for BMG. - Clamp the maximum power limit to GPU firmware default value. v3: - Clamp power limit in write also for platforms with mailbox support. v4: - Remove unnecessary debug prints. (Badal) v5: - Update description of variable pl1_on_boot to fix kernel-doc error. v6: - Improve commit message, refer to BIOS as GPU firmware. - Change macro READ_PL_FROM_BIOS to READ_PL_FROM_FW. - Rectify drm_warn to drm_info. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: e90f7a58e659 ("drm/xe/hwmon: Add HWMON support for BMG") Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250529163458.2354509-2-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-12drm/xe: Add WA BB to capture active context utilizationUmesh Nerlige Ramappa
Context Timestamp (CTX_TIMESTAMP) in the LRC accumulates the run ticks of the context, but only gets updated when the context switches out. In order to check how long a context has been active before it switches out, two things are required: (1) Determine if the context is running: To do so, we program the WA BB to set an initial value for CTX_TIMESTAMP in the LRC. The value chosen is 1 since 0 is the initial value when the LRC is initialized. During a query, we just check for this value to determine if the context is active. If the context switched out, it would overwrite this location with the actual CTX_TIMESTAMP MMIO value. Note that WA BB runs as the last part of the context restore, so reusing this LRC location will not clobber anything. (2) Calculate the time that the context has been active for: The CTX_TIMESTAMP ticks only when the context is active. If a context is active, we just use the CTX_TIMESTAMP MMIO as the new value of utilization. While doing so, we need to read the CTX_TIMESTAMP MMIO for the specific engine instance. Since we do not know which instance the context is running on until it is scheduled, we also read the ENGINE_ID MMIO in the WA BB and store it in the PPHSWP. Using the above 2 instructions in a WA BB, capture active context utilization. v2: (Matt Brost) - This breaks TDR, fix it by saving the CTX_TIMESTAMP register "drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value" - Drop tile from LRC if using gt "drm/xe: Save the gt pointer in LRC and drop the tile" v3: - Remove helpers for bb_per_ctx_ptr (Matt) - Add define for context active value (Matt) - Use 64 bit CTX TIMESTAMP for platforms that support it. For platforms that don't, live with the rare race. (Matt, Lucas) - Convert engine id to hwe and get the MMIO value (Lucas) - Correct commit message on when WA BB runs (Lucas) v4: - s/GRAPHICS_VER(...)/xe->info.has_64bit_timestamp/ (Matt) - Drop support for active utilization on a VF (CI failure) - In xe_lrc_init ensure the lrc value is 0 to begin with (CI regression) v5: - Minor checkpatch fix - Squash into previous commit and make TDR use 32-bit time - Update code comment to match commit msg Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4532 Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250509161159.2173069-8-umesh.nerlige.ramappa@intel.com
2025-04-23drm/xe: evict user memory in PM notifierMatthew Auld
In the case of VRAM we might need to allocate large amounts of GFP_KERNEL memory on suspend, however doing that directly in the driver .suspend()/.prepare() callback is not advisable (no swap for example). To improve on this we can instead hook up to the PM notifier framework which is invoked at an earlier stage. We effectively call the evict routine twice, where the notifier will have hopefully have cleared out most if not everything by the time we call it a second time when entering the .suspend() callback. For s4 we also get the added benefit of allocating the system pages before the hibernation image size is calculated, which looks more sensible. Note that the .suspend() hook is still responsible for dealing with all the pinned memory. Improving that is left to another patch. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1181 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4566 Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250416150913.434369-6-matthew.auld@intel.com
2025-04-08drm/xe: Enable configfs support for survivability modeRiana Tauro
Enable survivability mode if supported and configfs attribute is set. Enabling survivability mode manually is useful in cases where pcode does not detect failure, validation and for IFR (in-field-repair). To set configfs survivability mode attribute for a device echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode The card enters survivability mode if supported v2: add a log if survivability mode is enabled for unsupported platforms (Rodrigo) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250407051414.1651616-4-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-07drm/xe: Introduced needs_scratch bit in device descriptorOak Zeng
On some platform, scratch page is needed for out of bound prefetch to work. Introduce a bit in device descriptor to specify whether this device needs scratch page to work. v2: introduce a needs_scratch bit in device info (Thomas, Jonathan) v3: drop NEEDS_SCRATCH macro (Matt) Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250403165328.2438690-2-oak.zeng@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-03-28drm/xe/d3cold: Set power state to D3Cold during s2idle/s3Badal Nilawar
According to pci core guidelines, pci_save_config is recommended when the driver explicitly needs to set the pci power state. As of now xe kmd is only doing pci_save_config while entering to s2idle/s3 state, which makes pci core think that device driver has already applied required pci power state. This leads to GPU remain in D0 state. To fix the issue setting the pci power state to D3Cold. Fixes:dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250327161914.432552-1-badal.nilawar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-21drm/xe: Move survivability back to xeLucas De Marchi
Commit d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci") moved the survivability handling to be done entirely in the xe_pci layer. However there are some issues with that approach: 1) Survivability mode needs at least the mmio initialized, otherwise it can't really read a register to decide if it should enter that state 2) SR-IOV mode should be initialized, otherwise it's not possible to check if it's VF Besides, as pointed by Riana the check for xe_survivability_mode_enable() was wrong in xe_pci_probe() since it's not a bool return. Fix that by moving the initialization to be entirely in the xe_device layer, with the correct dependencies handled: only after mmio and sriov initialization, and not triggering it on error from wait_for_lmem_ready(). This restores the trigger behavior before that commit. The xe_pci layer now only checks for "is it enabled?", like it's doing in xe_pci_suspend()/xe_pci_remove(), etc. Cc: Riana Tauro <riana.tauro@intel.com> Fixes: d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-1-fdb3559ea965@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-14drm/xe/hwmon: expose fan speedRaag Jadav
Add hwmon support for fan1_input, fan2_input and fan3_input attributes, which will expose fan speed of respective channels in RPM when supported by hardware. With this in place we can monitor fan speed using lm-sensors tool. v2: Rely on platform checks instead of mailbox error (Aravind, Rodrigo) v3: Introduce has_fan_control flag (Rodrigo) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250312085909.755073-1-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-05drm/xe: Simplify setting release info in xe->infoGustavo Sousa
Now that we have all IPs being described via struct xe_ip, where release information (version and name) is represented in a single struct type, we can extract duplicated logic from handle_pre_gmdid() and handle_gmdid() and apply it in the body of xe_info_init(). With this change, there is no point in keeping handle_pre_gmdid() anymore, so we just remove it and inline the assignment of {graphics,media}_ip. Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-7-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe: Re-use feature descriptors for pre-GMDID IPsGustavo Sousa
Now that pre-GMDID IPs are described via struct xe_ip, it is possible to re-use the feature descriptors that have exact match with ones from previous releases. Do that. Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-6-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe: Convert pre-GMDID IPs to struct xe_ipGustavo Sousa
We have now a struct xe_ip to fully describe an IP, but we are only using that for GMDID-based IPs. For pre-GMDID IPs, we still describe release info (version and name) via feature descriptors (struct xe_{graphics,media}_desc). Let's convert those to use struct xe_ip. With this, we have a uniform way of describing IPs in the xe driver instead of having different approaches based on whether the IPs use GMDIDs or not. A nice side-effect of this change is that now we have an easy way to lookup, in the source code, mappings between versions, names and features for all supported IPs. v2: - Store pointers to struct xe_ip instead xe_{graphics,media}_desc in struct xe_device_desc. Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-5-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe: Define xe_ip instances before xe_device_descGustavo Sousa
We will soon update the code so that pre-GMDID IPs are also defined with struct xe_ip. Since we will need to refer to them in instances of struct xe_device_desc, let's move up the current instances of xe_ip (GMDID-based) so that all IP descriptors are kept together. Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-4-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe: Rename gmdid_map to xe_ipGustavo Sousa
If we pay closer attention to struct gmdid_map, we will realize that it is actually fully describing an IP (graphics or media): it contains "release info" and "features info". The former is comprised of fields "ver" and "name"; and the latter is done via member "ip", which is a pointer to either struct xe_graphics_desc or xe_media_desc, and can be reused across releases. As such let's: * Rename struct gmdid_map to xe_ip. * Rename the field ver to verx100 to be consistent with the naming of members using that encoding of the version. * Rename the field "ip" to "desc" to make it clear that it is a pointer to a descriptor of features for the IP, since it will not contain *all* info (i.e. features + release info). We sill have release info mapped into struct xe_{graphics,media}_desc for pre-GMDID IPs. In an upcoming change we will handle that so that we make a clear separation between "release info" and "feature info". Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-3-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe: Disambiguate GMDID-based IP namesGustavo Sousa
The name of an IP is a function of its version. As such, given an IP version, it should be clear to identify the name of that IP release. With the current code, we keep that mapping clear for pre-GMDID IPs, but ambiguous for GMDID-based ones. That causes two types of inconveniences: 1. The end user, who might not have all the necessary mapping at hand, might be confused when seeing different possible IP names in the dmesg log. 2. It makes a developer who is not familiar with the "IP version" to "Release name" need to resort to looking at the specs to understand see what version maps to what. While the specs should be the authority on the mapping, we should make our lives easier by reflecting that mapping in the source code. Thus, since the IP name is tied to the version, let's remove the ambiguity by using a "name" field in struct gmdid_map instead of accumulating names in the descriptor instances. This does result in the code having IP name being defined in different structs (gmdid_map, xe_graphics_desc, xe_media_desc), but that will be resolved in upcoming changes. A side-effect of this change is that media_xe2 exactly matches media_xelpmp now, so we just re-use the latter. v2: - Drop media_xe2 and re-use media_xelpmp. (Matt) Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-2-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-03-05drm/xe: Set IP names in functions handling IP versionGustavo Sousa
In an upcoming change, we will handle setting graphics_name and media_name differently for GMDID-based IPs. As such, let's make both handle_pre_gmdid() and handle_gmdid() functions responsible for initializing those fields. While now we have both doing essentially the same thing with respect to those fields, handle_pre_gmdid() will diverge soon. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-1-5bc0c6d0c13f@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-02-25drm/xe: Move survivability entirely to xe_pciLucas De Marchi
There's an odd split between xe_pci.c and xe_device.c wrt xe_survivability: it's initialized by xe_device, but then finalized by xe_pci. Move it entirely to the outer layer, xe_pci, so it controls the flow entirely. This also allows to stop ignoring some of the errors. E.g.: if there's an -ENOMEM, it shouldn't continue as if it survivability had been enabled. One change worth mentioning is that if "wait for lmem" fails, it will also check the pcode status to decide if it should enter or not in survivability mode, which it was not doing before. The bit from pcode for that decision should remain the same after lmem failed initialization, so it should be fine. Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-9-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-25drm/xe: Drop remove callback supportLucas De Marchi
Now that devres supports component driver cleanup during driver removal cleanup, the xe custom support for removal callbacks is not needed anymore. Drop it. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-7-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-25drm/xe: Stop setting drvdata to NULLLucas De Marchi
PCI subsystem is not supposed to call the remove() function when probe fails and doesn't need a protection for that. The only places checking for NULL drvdata, is on 2 sysfs files and they shouldn't be needed since the files are removed and reads on open fds just return an error. For this protection the core driver implementation in drivers/base/dd.c:device_unbind_cleanup() already sets it to NULL, after the release of dev resources. Remove the setting to NULL so it's possible to obtain the xe pointer from callbacks like the component unbind from device_unbind_cleanup(), i.e. after xe_pci_remove() already finished. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-14drm/xe: Add callback support for driver removeLucas De Marchi
xe device probe uses devm cleanup in most places. However there are a few cases where this is not possible: when the driver interacts with component add/del. In that case, the resource group would be cleanup while the entire device resources are in the process of cleanup. One example is the xe_gsc_proxy and display using that to interact with mei and audio. Add a callback-based remove so the exception doesn't make the probe use multiple error handling styles. v2: Change internal API to mimic the devm API. This will make it easier to migrate in future when devm can be used. Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-07drm/xe: Enable SR-IOV for PTLMichal Wajdeczko
We should now have sufficient changes in the driver to run it on PTL platforms in the SR-IOV Physical Function (PF) mode, that would allow us to enable SR-IOV Virtual Functions (VFs), and successfully probe our driver in the VF mode on enabled VF devices. To unblock SR-IOV PF mode you need to load xe with modparam: xe.max_vfs=7 Then to enable VFs it is sufficient to use: $ echo 7 > /sys/bus/pci/devices/0000:00:02.0/sriov_numvfs Note that in default auto-provisioning all VFs are allocated with some amount of shared resources (like unlimited GPU execution and preemption times, fair GGTT space, fair GuC context IDs range, ...) However with CONFIG_DEBUG_FS enabled it is possible to tweak most of the SR-IOV configuration parameters using attributes like: /sys/kernel/debug/dri/0000:00:02.0/gt0/ ├── pf │   ├── contexts_spare │   ├── doorbells_spare │   ├── exec_quantum_ms │   ├── ggtt_spare │   ├── preempt_timeout_us │   ├── sched_priority │   └── ... ├── vf1 │   ├── contexts_quota │   ├── doorbells_quota │   ├── exec_quantum_ms │   ├── ggtt_quota │   ├── preempt_timeout_us │   ├── sched_priority │   └── ... ├── vf2 │   └── ... : Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Jakub Kolakowski <jakub1.kolakowski@intel.com> Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com> Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250206214545.940-1-michal.wajdeczko@intel.com
2025-02-03drm/xe: Refactor max_remote_tilesSai Teja Pottumuttu
max_remote_tiles is more related to the platform than the GT IP. Thus move it to platform descriptor from graphics descriptor. Note that the FIXME is no more required, thus it can be dropped. v2: Rebase v3: Change the position of comment (MattR) Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250130085804.4136497-3-sai.teja.pottumuttu@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-02-03drm/xe: Refactor dma_mask_sizeSai Teja Pottumuttu
dma_mask_size is more related to the platform than the GT IP. Thus move it to platform descriptors. v2: - Rebase Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250130085804.4136497-2-sai.teja.pottumuttu@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-02-03drm/xe/pxp: Enable PXP for MTL and LNLDaniele Ceraolo Spurio
Now that are the pieces are there, we can turn the feature on. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-14-daniele.ceraolospurio@intel.com
2025-02-03drm/xe/pxp: Initialize PXP structure and KCR regDaniele Ceraolo Spurio
As the first step towards adding PXP support, hook in the PXP init function, allocate the PXP structure and initialize the KCR register to allow PXP HWDRM sessions. v2: remove unneeded includes, free PXP memory on error (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-2-daniele.ceraolospurio@intel.com
2025-01-28drm/xe: Enable Boot Survivability modeRiana Tauro
Enable boot survivability mode if pcode initialization fails and if boot status indicates a failure. In this mode, drm card is not exposed and driver probe returns success after loading the bare minimum to allow firmware to be flashed via mei. v2: abstract survivability mode variable add BMG check inside function (Jani, Rodrigo) v3: return -EBUSY during system suspend (Anshuman) check survivability mode in pci probe only on error Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-3-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-01-18drm/xe/vf: Perform early GT MMIO initialization to read GMDIDMichal Wajdeczko
VFs need to communicate with the GuC to obtain the GMDID value and existing GuC functions used for that assume that the GT has it's MMIO members already setup. However, due to recent refactoring the gt->mmio is initialized later, and any attempt by the VF to use xe_mmio_read|write() from GuC functions will lead to NPD crash due to unset MMIO register address: [] xe 0000:00:02.1: [drm] Running in SR-IOV VF mode [] xe 0000:00:02.1: [drm] GT0: sending H2G MMIO 0x5507 [] BUG: unable to handle page fault for address: 0000000000190240 Since we are already tweaking the id and type of the primary GT to mimic it's a Media GT before initializing the GuC communication, we can also call xe_gt_mmio_init() to perform early setup of the gt->mmio which will make those GuC functions work again. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250114211347.1083-1-michal.wajdeczko@intel.com
2025-01-14drm/xe: Remove unused "mmio_ext" codeMatt Roper
The "mmio_ext" and 'REG_EXT" code is currently unused on any existing platform. Going forward, this also isn't the design we want to use for any future platforms/features either, so we should just go ahead and remove the dead code to avoid confusion. mmio_ext was originally added in an attempt to hack around the early (mis)design of the Xe driver, which used xe_gt as the target for all register MMIO access, even those completely unrelated to the GT subunit of the hardware. With the introduction of commit 34953ee349dd ("drm/xe: Create dedicated xe_mmio structure") and its follow-up patches, that misdesign has been corrected and access to register MMIO regions specific to hardware units is now done through xe_mmio structures which encapsulate an iomap, region size, and some other metadata. Although all of the registers used by the driver today happen to fall within one specific PCI BAR region, and thus re-use a single device-wide iomap, there's no requirement that this stay true for future platforms or features. I.e., if a future platform adds a new 'foo' hardware unit that exists at a different area in the BAR, or even in a completely different BAR, then that would be handled by doing a separate iomap of that unit's register region and wrapping it in its own 'struct xe_mmio foo_regs' structure. The pointer to the new 'foo_regs' could be placed within the xe_device, xe_tile, xe_gt, etc., according to where the new hardware unit falls within the current hardware hierarchy. This effectively reverts the following commits, although parts of these commits had already vanished or changed with the earlier xe_mmio refactor work: - commit 399a13323f0d ("drm/xe: add 28-bit address support in struct xe_reg") - commit fdef72e02e20 ("drm/xe: add a flag to bypass multi-tile config from MTCFG reg") - commit 866b2b176434 ("drm/xe: add MMIO extension support flags") - commit ef29b390c734 ("drm/xe: map MMIO BAR according to the num of tiles in device desc") - commit a4e2f3a299ea ("drm/xe: refactor xe_mmio_probe_tiles to support MMIO extension") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Koby Elbaz <kelbaz@habana.ai> Acked-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106234312.2986065-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-01-09drm/xe: Fix all typos in xeNitin Gote
Fix all typos in files of xe, reported by codespell tool. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106102646.1400146-2-nitin.r.gote@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
2024-11-28drm/xe: Update xe2_graphics name stringMatt Roper
Since both Xe2 and Xe3 platforms currently use the same set of graphics IP feature flags, we associate the "graphics_xe2" structure with both IPs. Update the name string on that IP structure to clarify this and avoid confusion as Xe3 platforms start going into public CI. Fixes: 800d75bf20ae ("drm/xe/xe3: Define Xe3 feature flags") Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241125194838.1190599-2-matthew.d.roper@intel.com (cherry picked from commit 4fe70f664a105391321c85b2af241001e8118d24) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-10-29drm/xe/ptl: Enable PTL displayHaridhar Kalvala
At this point we should have enough support landed to turn on and start basic testing of display functionality. Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Signed-off-by: Clint Taylor <Clinton.A.Taylor@intel.com> Acked-by: Jani Saarinen <jani.saarinen@intel.com> Tested-by: Jani Saarinen<jani.saarinen@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241028193015.3241858-10-clinton.a.taylor@intel.com
2024-10-29drm/xe: switch to common PCI ID macrosJani Nikula
Switch to the shared PCI ID macros in drm/intel/pciids.h. Remove xe_pciids.h. Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/84e08172184bdc6409cf6dd13f6c52971c647dbb.1729590029.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-10-08drm/xe/ptl: Add PTL platform definitionHaridhar Kalvala
PTL is an integrated GPU based on the Xe3 architecture. v2: explicitly turn off display until display patches land. Bspec: 72574 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-6-matthew.s.atwood@intel.com
2024-10-08drm/xe/xe3: Define Xe3 feature flagsHaridhar Kalvala
Define a common set of Xe3 feature flags and definitions that will be used for all platforms in this family. The feature flags are inherited unchanged from the Xe2 (XE2_FEATURES) platform. Following B-spec details inherited from Xe2 feature flag definition commit. v2: reuse graphics_xe2 definition Bspec: 58695 - dma_mask_size remains 46 (not documented in bspec) - supports_usm=1 (Bspec 59651) - has_flatccs=1 (Bspec 58797) - has_4tile=1 (Bspec 58788) - has_asid=1 (Bspec 59654, 59265, 60288) - has_range_tlb_invalidate=1 (Bspec 71126) - five-level page table (Bspec 59505) - 1 VD + 1 VE + 1 SFC (Bspec 67103, 70819) - platform engine mask (Bspec 60149) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-3-matthew.s.atwood@intel.com
2024-10-03drm/xe: Use fault injection infrastructure to find issues at probe timeFrancois Dugast
The kernel fault injection infrastructure is used to test proper error handling during probe. The return code of the functions using ALLOW_ERROR_INJECTION() can be conditionnally modified at runtime by tuning some debugfs entries. This requires CONFIG_FUNCTION_ERROR_INJECTION (among others). One way to use fault injection at probe time by making each of those functions fail one at a time is: FAILTYPE=fail_function DEVICE="0000:00:08.0" # depends on the system ERRNO=-12 # -ENOMEM, can depend on the function echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 100 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval echo -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose modprobe xe echo $DEVICE > /sys/bus/pci/drivers/xe/unbind grep -oP "^.* \[xe\]" /sys/kernel/debug/$FAILTYPE/injectable | \ cut -d ' ' -f 1 | while read -r FUNCTION ; do echo "Injecting fault in $FUNCTION" echo "" > /sys/kernel/debug/$FAILTYPE/inject echo $FUNCTION > /sys/kernel/debug/$FAILTYPE/inject printf %#x $ERRNO > /sys/kernel/debug/$FAILTYPE/$FUNCTION/retval echo $DEVICE > /sys/bus/pci/drivers/xe/bind done rmmod xe It will also be integrated into IGT for systematic execution by CI. v2: Wrappers are not needed in the cases covered by this patch, so remove them and use ALLOW_ERROR_INJECTION() directly. v3: Document the use of fault injection at probe time in xe_pci_probe and refer to it where ALLOW_ERROR_INJECTION() is used. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927151207.399354-1-francois.dugast@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-09-18drm/xe: Defer gt->mmio initialization until after multi-tile setupMatt Roper
With the recent xe_mmio redesign, tiles and GTs each have their own MMIO accessor, with the GT inheriting some of the information (such as the iomap pointer) from their containing tile. Given that non-root tiles get initialized later than the root tile (and currently after the point at which GT MMIO is initialized for _all_ GTs), we wind up incorrectly inheriting uninitialized pointers for the initialization of GT MMIO for GTs that reside on non-root tiles. This causes a driver crash on multi-tile PVC platforms. With the general xe_mmio redesign, it's now only necessary to do the GT-level MMIO setup before the point we start reading/writing GT registers. Move initialization of gt->mmio out of xe_info_init (which runs before non-root tiles are initialized) and to the beginning of where we start actually accessing the GTs themselves. The high-level initialization flow now boils down to: - General device init, software-only setup - (no register access possible yet) - Root tile initialization - (access to device/tile0 registers possible via xe_root_tile_mmio()) - Initialization of non-root tiles - (access to any tile's registers possible via tile->mmio) - GT MMIO initialization, inheriting iomap from each GT's tile - (access to any GT's registers possible via gt->mmio) Fixes: fa599b8c95a7 ("drm/xe: Populate GT's mmio iomap from tile during init") Reported-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240917221615.875962-2-matthew.d.roper@intel.com
2024-09-18drm/xe: Restore pci state upon resumeRodrigo Vivi
The pci state was saved, but not restored. Restore right after the power state transition request like every other driver. v2: Use right fixes tag, since this was there initialy, but accidentally removed. Fixes: f6761c68c0ac ("drm/xe/display: Improve s2idle handling.") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912214507.456897-1-rodrigo.vivi@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-09-11drm/xe/pci: Convert register access to use xe_mmioMatt Roper
Stop using GT pointers for register access. v2: - Clarify comment about manual GSI offset handling. (Rodrigo) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-63-matthew.d.roper@intel.com
2024-09-11drm/xe: Adjust mmio code to pass VF substructure to SRIOV codeMatt Roper
Although we want to break the GT-centric nature of the MMIO code in the general driver, the SRIOV handling still relies on data in a VF substructure of the GT. So add a GT backpointer, but name it sriov_vf_gt to make it clear that it's only for this one specific special case and will not be set or usable for anything else. v2: - Store backpointer to the GT itself rather than the SRIOV-specific substructure. (Michal) Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> # v1 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-53-matthew.d.roper@intel.com
2024-09-11drm/xe: Add xe_tile backpointer to xe_mmioMatt Roper
Once MMIO operations stop being (incorrectly) tied to a GT, we'll still need a backpointer for feature checks, message logging, and tracepoints. Use a tile backpointer since that may allow the most useful debugging output, while also providing access to the xe_device. v2: - Make backpointer an xe_tile instead of xe_device. (Michal) Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> # v1 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-52-matthew.d.roper@intel.com
2024-09-11drm/xe: Populate GT's mmio iomap from tile during initMatt Roper
Each GT should share the same register iomap as its parent tile. Future patches will switch to access the iomap through the GT's mmio substruct rather than through the tile. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-50-matthew.d.roper@intel.com