Age | Commit message (Collapse) | Author |
|
Enable SR-IOV support for BMG platforms. Note that as other flags from
the platform descriptor, it only means it may have that capability: it
still depends on runtime checks for the proper support in HW and
firmware.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Tested-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250710103040.375610-3-jakub1.kolakowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Panther Lake has proven to be stable through testing and use.
Remove the force_probe requirement and enable the platform by default.
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250707201959.319406-1-matthew.s.atwood@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This reverts commit fe0154cf8222d9e38c60ccc124adb2f9b5272371.
Seeing some unexplained random failures during LRC context switches with
indirect ring state enabled. The failures were always there, but the
repro rate increased with the addition of WA BB as a separate BO.
Commit 3a1edef8f4b5 ("drm/xe: Make WA BB part of LRC BO") helped to
reduce the issues in the context switches, but didn't eliminate them
completely.
Indirect ring state is not required for any current features, so disable
for now until failures can be root caused.
Cc: stable@vger.kernel.org
Fixes: fe0154cf8222 ("drm/xe/xe2: Enable Indirect Ring State support for Xe2")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250702035846.3178344-1-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Although "multi-tile" and "multiple GTs per tile" are mutually-exclusive
characteristics on all of our platforms today, this may not always be
true. Assign GT IDs according to xe->info.max_gt_per_tile in a way that
should work even if future platforms have different configurations.
This patch should not change the behavior of current platforms; it only
future-proofs for potential future designs.
v2:
- Re-calculate gt_count if tile count gets reduced by MTCFG. (PVC CI)
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-14-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Add a simple kunit test to ensure each platform's GT per tile count is
non-zero and does not exceed the global XE_MAX_GT_PER_TILE definition.
We need to move 'struct xe_subplatform_desc' from the .c file to the
types header to ensure it is accessible from the kunit test.
v2:
- Rebase on latest xe_pci test rework from Michal and convert to
a parameterized test that runs on each PCI ID supported by the
driver.
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Today all of our platforms fall into one of three cases:
* Single tile platforms with a single (primary) GT
* Single tile platforms with two GTs (primary + media)
* Two-tile platforms with a single GT (primary) in each
Our numbering of GTs has been a bit inconsistent between platforms
(e.g., GT1 is the media GT on some platforms, but the second tile's
primary GT on others). In the future we'll likely have platforms that
are both multi-tile and multi-GT, which will make the situation more
confusing. We could also wind up with more than just two types of GTs
at some point in the future.
Going forward we should standardize the way we assign uapi GT IDs to
internal GT structures. Let's declare that for userspace GT ID n,
GT[n]'s tile = n / (max gt per tile)
GT[n]'s slot within tile = n % (max gt per tile)
We don't want the GT numbering to change for any of our current
platforms since the current IDs are part of our ABI contract with
userspace so this means we should track the 'max gt per tile' value on a
per-platform basis rather than just using a single value across the
driver. Encode this into device descriptors in xe_pci.c and use the
per-platform number for various checks in the code. Constant
XE_MAX_GT_PER_TILE will remain just as the maximum across all platforms
for easy of sizing array allocations.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250701201320.2514369-12-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Enable access to internal non-volatile memory on DGFX
with GSC/CSC devices via a child device.
The nvm child device is exposed via auxiliary bus.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-7-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Media version 30.02 should be treated the same as other Xe3 IP, but
will have a slightly different set of workarounds.
-v2: Extend the range in existing WA entry (Bala)
-v3: Revert v2, Do not extend the range for the time being(Matt)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250613193146.3549862-4-dnyaneshwar.bhadane@intel.com
|
|
Graphics version 30.03 should be treated the same as other Xe3 IP, but
will have a slightly different set of workarounds.
-v2: Merge and extend the WA onto existing entry (Bala)
-v3: Revert v2's feedback changes and keep entry saparate (Matt).
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@inte.com>
Link: https://lore.kernel.org/r/20250613193146.3549862-3-dnyaneshwar.bhadane@intel.com
|
|
Add another GMD_ID for Xe2_HPG
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250605190804.1287289-4-dnyaneshwar.bhadane@intel.com
|
|
Add support to manage power limits using pcode mailbox commands
for supported platforms.
v2:
- Address review comments. (Badal)
- Use mailbox commands instead of registers to manage power limits
for BMG.
- Clamp the maximum power limit to GPU firmware default value.
v3:
- Clamp power limit in write also for platforms with mailbox support.
v4:
- Remove unnecessary debug prints. (Badal)
v5:
- Update description of variable pl1_on_boot to fix kernel-doc error.
v6:
- Improve commit message, refer to BIOS as GPU firmware.
- Change macro READ_PL_FROM_BIOS to READ_PL_FROM_FW.
- Rectify drm_warn to drm_info.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: e90f7a58e659 ("drm/xe/hwmon: Add HWMON support for BMG")
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-2-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Context Timestamp (CTX_TIMESTAMP) in the LRC accumulates the run ticks
of the context, but only gets updated when the context switches out. In
order to check how long a context has been active before it switches
out, two things are required:
(1) Determine if the context is running:
To do so, we program the WA BB to set an initial value for CTX_TIMESTAMP
in the LRC. The value chosen is 1 since 0 is the initial value when the
LRC is initialized. During a query, we just check for this value to
determine if the context is active. If the context switched out, it
would overwrite this location with the actual CTX_TIMESTAMP MMIO value.
Note that WA BB runs as the last part of the context restore, so reusing
this LRC location will not clobber anything.
(2) Calculate the time that the context has been active for:
The CTX_TIMESTAMP ticks only when the context is active. If a context is
active, we just use the CTX_TIMESTAMP MMIO as the new value of
utilization. While doing so, we need to read the CTX_TIMESTAMP MMIO
for the specific engine instance. Since we do not know which instance
the context is running on until it is scheduled, we also read the
ENGINE_ID MMIO in the WA BB and store it in the PPHSWP.
Using the above 2 instructions in a WA BB, capture active context
utilization.
v2: (Matt Brost)
- This breaks TDR, fix it by saving the CTX_TIMESTAMP register
"drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value"
- Drop tile from LRC if using gt
"drm/xe: Save the gt pointer in LRC and drop the tile"
v3:
- Remove helpers for bb_per_ctx_ptr (Matt)
- Add define for context active value (Matt)
- Use 64 bit CTX TIMESTAMP for platforms that support it. For platforms
that don't, live with the rare race. (Matt, Lucas)
- Convert engine id to hwe and get the MMIO value (Lucas)
- Correct commit message on when WA BB runs (Lucas)
v4:
- s/GRAPHICS_VER(...)/xe->info.has_64bit_timestamp/ (Matt)
- Drop support for active utilization on a VF (CI failure)
- In xe_lrc_init ensure the lrc value is 0 to begin with (CI regression)
v5:
- Minor checkpatch fix
- Squash into previous commit and make TDR use 32-bit time
- Update code comment to match commit msg
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4532
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250509161159.2173069-8-umesh.nerlige.ramappa@intel.com
|
|
In the case of VRAM we might need to allocate large amounts of
GFP_KERNEL memory on suspend, however doing that directly in the driver
.suspend()/.prepare() callback is not advisable (no swap for example).
To improve on this we can instead hook up to the PM notifier framework
which is invoked at an earlier stage. We effectively call the evict
routine twice, where the notifier will have hopefully have cleared out
most if not everything by the time we call it a second time when
entering the .suspend() callback. For s4 we also get the added benefit
of allocating the system pages before the hibernation image size is
calculated, which looks more sensible.
Note that the .suspend() hook is still responsible for dealing with all
the pinned memory. Improving that is left to another patch.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1181
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4566
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250416150913.434369-6-matthew.auld@intel.com
|
|
Enable survivability mode if supported and configfs attribute is set.
Enabling survivability mode manually is useful in cases where pcode does
not detect failure, validation and for IFR (in-field-repair).
To set configfs survivability mode attribute for a device
echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode
The card enters survivability mode if supported
v2: add a log if survivability mode is enabled for unsupported
platforms (Rodrigo)
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250407051414.1651616-4-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
On some platform, scratch page is needed for out of bound prefetch
to work. Introduce a bit in device descriptor to specify whether
this device needs scratch page to work.
v2: introduce a needs_scratch bit in device info (Thomas, Jonathan)
v3: drop NEEDS_SCRATCH macro (Matt)
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250403165328.2438690-2-oak.zeng@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
|
|
According to pci core guidelines, pci_save_config is recommended when the
driver explicitly needs to set the pci power state. As of now xe kmd is
only doing pci_save_config while entering to s2idle/s3 state, which makes
pci core think that device driver has already applied required pci power
state. This leads to GPU remain in D0 state. To fix the issue setting
the pci power state to D3Cold.
Fixes:dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250327161914.432552-1-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Commit d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci")
moved the survivability handling to be done entirely in the xe_pci
layer. However there are some issues with that approach:
1) Survivability mode needs at least the mmio initialized, otherwise it
can't really read a register to decide if it should enter that state
2) SR-IOV mode should be initialized, otherwise it's not possible to
check if it's VF
Besides, as pointed by Riana the check for
xe_survivability_mode_enable() was wrong in xe_pci_probe() since it's
not a bool return.
Fix that by moving the initialization to be entirely in the xe_device
layer, with the correct dependencies handled: only after mmio and sriov
initialization, and not triggering it on error from
wait_for_lmem_ready(). This restores the trigger behavior before that
commit. The xe_pci layer now only checks for "is it enabled?",
like it's doing in xe_pci_suspend()/xe_pci_remove(), etc.
Cc: Riana Tauro <riana.tauro@intel.com>
Fixes: d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-1-fdb3559ea965@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Add hwmon support for fan1_input, fan2_input and fan3_input attributes,
which will expose fan speed of respective channels in RPM when supported
by hardware. With this in place we can monitor fan speed using lm-sensors
tool.
v2: Rely on platform checks instead of mailbox error (Aravind, Rodrigo)
v3: Introduce has_fan_control flag (Rodrigo)
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250312085909.755073-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Now that we have all IPs being described via struct xe_ip, where release
information (version and name) is represented in a single struct type,
we can extract duplicated logic from handle_pre_gmdid() and
handle_gmdid() and apply it in the body of xe_info_init().
With this change, there is no point in keeping handle_pre_gmdid()
anymore, so we just remove it and inline the assignment of
{graphics,media}_ip.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-7-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Now that pre-GMDID IPs are described via struct xe_ip, it is possible to
re-use the feature descriptors that have exact match with ones from
previous releases. Do that.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-6-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
We have now a struct xe_ip to fully describe an IP, but we are only
using that for GMDID-based IPs.
For pre-GMDID IPs, we still describe release info (version and name) via
feature descriptors (struct xe_{graphics,media}_desc). Let's convert
those to use struct xe_ip.
With this, we have a uniform way of describing IPs in the xe driver
instead of having different approaches based on whether the IPs use
GMDIDs or not.
A nice side-effect of this change is that now we have an easy way to
lookup, in the source code, mappings between versions, names and
features for all supported IPs.
v2:
- Store pointers to struct xe_ip instead xe_{graphics,media}_desc in
struct xe_device_desc.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-5-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
We will soon update the code so that pre-GMDID IPs are also defined with
struct xe_ip. Since we will need to refer to them in instances of
struct xe_device_desc, let's move up the current instances of xe_ip
(GMDID-based) so that all IP descriptors are kept together.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-4-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
If we pay closer attention to struct gmdid_map, we will realize that it
is actually fully describing an IP (graphics or media): it contains
"release info" and "features info". The former is comprised of fields
"ver" and "name"; and the latter is done via member "ip", which is a
pointer to either struct xe_graphics_desc or xe_media_desc, and can be
reused across releases.
As such let's:
* Rename struct gmdid_map to xe_ip.
* Rename the field ver to verx100 to be consistent with the naming of
members using that encoding of the version.
* Rename the field "ip" to "desc" to make it clear that it is a
pointer to a descriptor of features for the IP, since it will not
contain *all* info (i.e. features + release info).
We sill have release info mapped into struct xe_{graphics,media}_desc
for pre-GMDID IPs. In an upcoming change we will handle that so that we
make a clear separation between "release info" and "feature info".
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-3-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
The name of an IP is a function of its version. As such, given an IP
version, it should be clear to identify the name of that IP release.
With the current code, we keep that mapping clear for pre-GMDID IPs, but
ambiguous for GMDID-based ones. That causes two types of inconveniences:
1. The end user, who might not have all the necessary mapping at hand,
might be confused when seeing different possible IP names in the
dmesg log.
2. It makes a developer who is not familiar with the "IP version" to
"Release name" need to resort to looking at the specs to understand
see what version maps to what. While the specs should be the
authority on the mapping, we should make our lives easier by
reflecting that mapping in the source code.
Thus, since the IP name is tied to the version, let's remove the
ambiguity by using a "name" field in struct gmdid_map instead of
accumulating names in the descriptor instances.
This does result in the code having IP name being defined in
different structs (gmdid_map, xe_graphics_desc, xe_media_desc), but that
will be resolved in upcoming changes.
A side-effect of this change is that media_xe2 exactly matches
media_xelpmp now, so we just re-use the latter.
v2:
- Drop media_xe2 and re-use media_xelpmp. (Matt)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-2-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
In an upcoming change, we will handle setting graphics_name and
media_name differently for GMDID-based IPs. As such, let's make both
handle_pre_gmdid() and handle_gmdid() functions responsible for
initializing those fields. While now we have both doing essentially the
same thing with respect to those fields, handle_pre_gmdid() will diverge
soon.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250221-xe-unify-ip-descriptors-v2-1-5bc0c6d0c13f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
There's an odd split between xe_pci.c and xe_device.c wrt
xe_survivability: it's initialized by xe_device, but then finalized by
xe_pci. Move it entirely to the outer layer, xe_pci, so it controls
the flow entirely.
This also allows to stop ignoring some of the errors. E.g.: if there's
an -ENOMEM, it shouldn't continue as if it survivability had been
enabled.
One change worth mentioning is that if "wait for lmem" fails, it will
also check the pcode status to decide if it should enter or not in
survivability mode, which it was not doing before. The bit from pcode
for that decision should remain the same after lmem failed
initialization, so it should be fine.
Cc: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Now that devres supports component driver cleanup during driver removal
cleanup, the xe custom support for removal callbacks is not needed
anymore. Drop it.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
PCI subsystem is not supposed to call the remove() function when probe
fails and doesn't need a protection for that. The only places checking
for NULL drvdata, is on 2 sysfs files and they shouldn't be needed since
the files are removed and reads on open fds just return an error.
For this protection the core driver implementation in
drivers/base/dd.c:device_unbind_cleanup() already sets it to NULL, after
the release of dev resources.
Remove the setting to NULL so it's possible to obtain the xe pointer
from callbacks like the component unbind from device_unbind_cleanup(),
i.e. after xe_pci_remove() already finished.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250222001051.3012936-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
xe device probe uses devm cleanup in most places. However there are a
few cases where this is not possible: when the driver interacts with
component add/del. In that case, the resource group would be cleanup
while the entire device resources are in the process of cleanup. One
example is the xe_gsc_proxy and display using that to interact with mei
and audio.
Add a callback-based remove so the exception doesn't make the probe
use multiple error handling styles.
v2: Change internal API to mimic the devm API. This will make it easier
to migrate in future when devm can be used.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
We should now have sufficient changes in the driver to run it on
PTL platforms in the SR-IOV Physical Function (PF) mode, that would
allow us to enable SR-IOV Virtual Functions (VFs), and successfully
probe our driver in the VF mode on enabled VF devices.
To unblock SR-IOV PF mode you need to load xe with modparam:
xe.max_vfs=7
Then to enable VFs it is sufficient to use:
$ echo 7 > /sys/bus/pci/devices/0000:00:02.0/sriov_numvfs
Note that in default auto-provisioning all VFs are allocated with
some amount of shared resources (like unlimited GPU execution and
preemption times, fair GGTT space, fair GuC context IDs range, ...)
However with CONFIG_DEBUG_FS enabled it is possible to tweak most
of the SR-IOV configuration parameters using attributes like:
/sys/kernel/debug/dri/0000:00:02.0/gt0/
├── pf
│ ├── contexts_spare
│ ├── doorbells_spare
│ ├── exec_quantum_ms
│ ├── ggtt_spare
│ ├── preempt_timeout_us
│ ├── sched_priority
│ └── ...
├── vf1
│ ├── contexts_quota
│ ├── doorbells_quota
│ ├── exec_quantum_ms
│ ├── ggtt_quota
│ ├── preempt_timeout_us
│ ├── sched_priority
│ └── ...
├── vf2
│ └── ...
:
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250206214545.940-1-michal.wajdeczko@intel.com
|
|
max_remote_tiles is more related to the platform than the GT IP. Thus
move it to platform descriptor from graphics descriptor. Note that the
FIXME is no more required, thus it can be dropped.
v2: Rebase
v3: Change the position of comment (MattR)
Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130085804.4136497-3-sai.teja.pottumuttu@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
dma_mask_size is more related to the platform than the GT IP. Thus
move it to platform descriptors.
v2:
- Rebase
Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130085804.4136497-2-sai.teja.pottumuttu@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Now that are the pieces are there, we can turn the feature on.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-14-daniele.ceraolospurio@intel.com
|
|
As the first step towards adding PXP support, hook in the PXP init
function, allocate the PXP structure and initialize the KCR register to
allow PXP HWDRM sessions.
v2: remove unneeded includes, free PXP memory on error (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-2-daniele.ceraolospurio@intel.com
|
|
Enable boot survivability mode if pcode initialization fails and
if boot status indicates a failure. In this mode, drm card is not
exposed and driver probe returns success after loading the bare minimum
to allow firmware to be flashed via mei.
v2: abstract survivability mode variable
add BMG check inside function (Jani, Rodrigo)
v3: return -EBUSY during system suspend (Anshuman)
check survivability mode in pci probe only
on error
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-3-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
VFs need to communicate with the GuC to obtain the GMDID value
and existing GuC functions used for that assume that the GT has
it's MMIO members already setup. However, due to recent refactoring
the gt->mmio is initialized later, and any attempt by the VF to use
xe_mmio_read|write() from GuC functions will lead to NPD crash due
to unset MMIO register address:
[] xe 0000:00:02.1: [drm] Running in SR-IOV VF mode
[] xe 0000:00:02.1: [drm] GT0: sending H2G MMIO 0x5507
[] BUG: unable to handle page fault for address: 0000000000190240
Since we are already tweaking the id and type of the primary GT to
mimic it's a Media GT before initializing the GuC communication,
we can also call xe_gt_mmio_init() to perform early setup of the
gt->mmio which will make those GuC functions work again.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250114211347.1083-1-michal.wajdeczko@intel.com
|
|
The "mmio_ext" and 'REG_EXT" code is currently unused on any existing
platform. Going forward, this also isn't the design we want to use for
any future platforms/features either, so we should just go ahead and
remove the dead code to avoid confusion.
mmio_ext was originally added in an attempt to hack around the early
(mis)design of the Xe driver, which used xe_gt as the target for all
register MMIO access, even those completely unrelated to the GT subunit
of the hardware. With the introduction of commit 34953ee349dd ("drm/xe:
Create dedicated xe_mmio structure") and its follow-up patches, that
misdesign has been corrected and access to register MMIO regions
specific to hardware units is now done through xe_mmio structures which
encapsulate an iomap, region size, and some other metadata.
Although all of the registers used by the driver today happen to fall
within one specific PCI BAR region, and thus re-use a single device-wide
iomap, there's no requirement that this stay true for future platforms
or features. I.e., if a future platform adds a new 'foo' hardware unit
that exists at a different area in the BAR, or even in a completely
different BAR, then that would be handled by doing a separate iomap of
that unit's register region and wrapping it in its own 'struct xe_mmio
foo_regs' structure. The pointer to the new 'foo_regs' could be placed
within the xe_device, xe_tile, xe_gt, etc., according to where the new
hardware unit falls within the current hardware hierarchy.
This effectively reverts the following commits, although parts of these
commits had already vanished or changed with the earlier xe_mmio
refactor work:
- commit 399a13323f0d ("drm/xe: add 28-bit address support in struct
xe_reg")
- commit fdef72e02e20 ("drm/xe: add a flag to bypass multi-tile config
from MTCFG reg")
- commit 866b2b176434 ("drm/xe: add MMIO extension support flags")
- commit ef29b390c734 ("drm/xe: map MMIO BAR according to the num of
tiles in device desc")
- commit a4e2f3a299ea ("drm/xe: refactor xe_mmio_probe_tiles to support
MMIO extension")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Koby Elbaz <kelbaz@habana.ai>
Acked-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Reviewed-by: Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250106234312.2986065-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Fix all typos in files of xe, reported by codespell tool.
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250106102646.1400146-2-nitin.r.gote@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
|
|
Since both Xe2 and Xe3 platforms currently use the same set of graphics
IP feature flags, we associate the "graphics_xe2" structure with both IPs.
Update the name string on that IP structure to clarify this and avoid
confusion as Xe3 platforms start going into public CI.
Fixes: 800d75bf20ae ("drm/xe/xe3: Define Xe3 feature flags")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241125194838.1190599-2-matthew.d.roper@intel.com
(cherry picked from commit 4fe70f664a105391321c85b2af241001e8118d24)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
At this point we should have enough support landed to turn on and start
basic testing of display functionality.
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Signed-off-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Acked-by: Jani Saarinen <jani.saarinen@intel.com>
Tested-by: Jani Saarinen<jani.saarinen@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241028193015.3241858-10-clinton.a.taylor@intel.com
|
|
Switch to the shared PCI ID macros in drm/intel/pciids.h. Remove
xe_pciids.h.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/84e08172184bdc6409cf6dd13f6c52971c647dbb.1729590029.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
PTL is an integrated GPU based on the Xe3 architecture.
v2: explicitly turn off display until display patches land.
Bspec: 72574
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-6-matthew.s.atwood@intel.com
|
|
Define a common set of Xe3 feature flags and definitions that will be
used for all platforms in this family.
The feature flags are inherited unchanged from the Xe2 (XE2_FEATURES)
platform.
Following B-spec details inherited from Xe2 feature flag definition
commit.
v2: reuse graphics_xe2 definition
Bspec: 58695
- dma_mask_size remains 46 (not documented in bspec)
- supports_usm=1 (Bspec 59651)
- has_flatccs=1 (Bspec 58797)
- has_4tile=1 (Bspec 58788)
- has_asid=1 (Bspec 59654, 59265, 60288)
- has_range_tlb_invalidate=1 (Bspec 71126)
- five-level page table (Bspec 59505)
- 1 VD + 1 VE + 1 SFC (Bspec 67103, 70819)
- platform engine mask (Bspec 60149)
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008013509.61233-3-matthew.s.atwood@intel.com
|
|
The kernel fault injection infrastructure is used to test proper error
handling during probe. The return code of the functions using
ALLOW_ERROR_INJECTION() can be conditionnally modified at runtime by
tuning some debugfs entries. This requires CONFIG_FUNCTION_ERROR_INJECTION
(among others).
One way to use fault injection at probe time by making each of those
functions fail one at a time is:
FAILTYPE=fail_function
DEVICE="0000:00:08.0" # depends on the system
ERRNO=-12 # -ENOMEM, can depend on the function
echo N > /sys/kernel/debug/$FAILTYPE/task-filter
echo 100 > /sys/kernel/debug/$FAILTYPE/probability
echo 0 > /sys/kernel/debug/$FAILTYPE/interval
echo -1 > /sys/kernel/debug/$FAILTYPE/times
echo 0 > /sys/kernel/debug/$FAILTYPE/space
echo 1 > /sys/kernel/debug/$FAILTYPE/verbose
modprobe xe
echo $DEVICE > /sys/bus/pci/drivers/xe/unbind
grep -oP "^.* \[xe\]" /sys/kernel/debug/$FAILTYPE/injectable | \
cut -d ' ' -f 1 | while read -r FUNCTION ; do
echo "Injecting fault in $FUNCTION"
echo "" > /sys/kernel/debug/$FAILTYPE/inject
echo $FUNCTION > /sys/kernel/debug/$FAILTYPE/inject
printf %#x $ERRNO > /sys/kernel/debug/$FAILTYPE/$FUNCTION/retval
echo $DEVICE > /sys/bus/pci/drivers/xe/bind
done
rmmod xe
It will also be integrated into IGT for systematic execution by CI.
v2: Wrappers are not needed in the cases covered by this patch, so
remove them and use ALLOW_ERROR_INJECTION() directly.
v3: Document the use of fault injection at probe time in xe_pci_probe
and refer to it where ALLOW_ERROR_INJECTION() is used.
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240927151207.399354-1-francois.dugast@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
With the recent xe_mmio redesign, tiles and GTs each have their own MMIO
accessor, with the GT inheriting some of the information (such as the
iomap pointer) from their containing tile. Given that non-root tiles
get initialized later than the root tile (and currently after the point
at which GT MMIO is initialized for _all_ GTs), we wind up incorrectly
inheriting uninitialized pointers for the initialization of GT MMIO for
GTs that reside on non-root tiles. This causes a driver crash on
multi-tile PVC platforms.
With the general xe_mmio redesign, it's now only necessary to do the
GT-level MMIO setup before the point we start reading/writing GT
registers. Move initialization of gt->mmio out of xe_info_init (which
runs before non-root tiles are initialized) and to the beginning of
where we start actually accessing the GTs themselves.
The high-level initialization flow now boils down to:
- General device init, software-only setup
- (no register access possible yet)
- Root tile initialization
- (access to device/tile0 registers possible via xe_root_tile_mmio())
- Initialization of non-root tiles
- (access to any tile's registers possible via tile->mmio)
- GT MMIO initialization, inheriting iomap from each GT's tile
- (access to any GT's registers possible via gt->mmio)
Fixes: fa599b8c95a7 ("drm/xe: Populate GT's mmio iomap from tile during init")
Reported-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240917221615.875962-2-matthew.d.roper@intel.com
|
|
The pci state was saved, but not restored. Restore
right after the power state transition request like
every other driver.
v2: Use right fixes tag, since this was there initialy, but
accidentally removed.
Fixes: f6761c68c0ac ("drm/xe/display: Improve s2idle handling.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240912214507.456897-1-rodrigo.vivi@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
Stop using GT pointers for register access.
v2:
- Clarify comment about manual GSI offset handling. (Rodrigo)
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-63-matthew.d.roper@intel.com
|
|
Although we want to break the GT-centric nature of the MMIO code in the
general driver, the SRIOV handling still relies on data in a VF
substructure of the GT. So add a GT backpointer, but name it
sriov_vf_gt to make it clear that it's only for this one specific
special case and will not be set or usable for anything else.
v2:
- Store backpointer to the GT itself rather than the SRIOV-specific
substructure. (Michal)
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> # v1
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-53-matthew.d.roper@intel.com
|
|
Once MMIO operations stop being (incorrectly) tied to a GT, we'll still
need a backpointer for feature checks, message logging, and tracepoints.
Use a tile backpointer since that may allow the most useful debugging
output, while also providing access to the xe_device.
v2:
- Make backpointer an xe_tile instead of xe_device. (Michal)
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> # v1
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-52-matthew.d.roper@intel.com
|
|
Each GT should share the same register iomap as its parent tile. Future
patches will switch to access the iomap through the GT's mmio substruct
rather than through the tile.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-50-matthew.d.roper@intel.com
|