Age | Commit message (Collapse) | Author |
|
Currently the teardown/setup flow for driver probe/remove is quite
a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up().
One key piece that's missing are the calls to pci_alloc_irq_vectors()
and pci_free_irq_vectors(). The pcie reset case is calling
pci_free_irq_vectors() on reset_prepare, but not calling the
corresponding pci_alloc_irq_vectors() on reset_done. This is causing
unexpected/unwanted interrupt behavior due to the adminq interrupt
being accidentally put into legacy interrupt mode. Also, the
pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being
called directly in probe/remove respectively.
Fix this inconsistency by making the following changes:
1. Always call pdsc_dev_init() in pdsc_setup(), which calls
pci_alloc_irq_vectors() and get rid of the now unused
pds_dev_reinit().
2. Always free/clear the pdsc->intr_info in pdsc_teardown()
since this structure will get re-alloced in pdsc_setup().
3. Move the calls of pci_free_irq_vectors() to pdsc_teardown()
since pci_alloc_irq_vectors() will always be called in
pdsc_setup()->pdsc_dev_init() for both the probe/remove and
reset flows.
4. Make sure to only create the debugfs "identity" entry when it
doesn't already exist, which it will in the reset case because
it's already been created in the initial call to pdsc_dev_init().
Fixes: ffa55858330f ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During reset the BARs might be accessed when they are
unmapped. This can cause unexpected issues, so fix it by
clearing the cached BAR values so they are not accessed
until they are re-mapped.
Also, make sure any places that can access the BARs
when they are NULL are prevented.
Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-6-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are multiple paths that can result in using the pdsc's
adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(),
i.e. pdsc_work_thread()->pdsc_process_adminq()
[2] pdsc_adminq_post()
When the device goes through reset via PCIe reset and/or
a fw_down/fw_up cycle due to bad PCIe state or bad device
state the adminq is destroyed and recreated.
A NULL pointer dereference can happen if [1] or [2] happens
after the adminq is already destroyed.
In order to fix this, add some further state checks and
implement reference counting for adminq uses. Reference
counting was used because multiple threads can attempt to
access the adminq at the same time via [1] or [2]. Additionally,
multiple clients (i.e. pds-vfio-pci) can be using [2]
at the same time.
The adminq_refcnt is initialized to 1 when the adminq has been
allocated and is ready to use. Users/clients of the adminq
(i.e. [1] and [2]) will increment the refcnt when they are using
the adminq. When the driver goes into a fw_down cycle it will
set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt
to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent
any further adminq_refcnt increments. Waiting for the
adminq_refcnt to hit 1 allows for any current users of the adminq
to finish before the driver frees the adminq. Once the
adminq_refcnt hits 1 the driver clears the refcnt to signify that
the adminq is deleted and cannot be used. On the fw_up cycle the
driver will once again initialize the adminq_refcnt to 1 allowing
the adminq to be used again.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The initial design for the adminq interrupt was done based
on client drivers having their own adminq and adminq
interrupt. So, each client driver's adminq isr would use
their specific adminqcq for the private data struct. For the
time being the design has changed to only use a single
adminq for all clients. So, instead use the struct pdsc for
the private data to simplify things a bit.
This also has the benefit of not dereferencing the adminqcq
to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit
is set and the adminqcq has actually been cleared/freed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a small window where pdsc_work_thread()
calls pdsc_process_adminq() and pdsc_process_adminq()
passes the PDSC_S_STOPPING_DRIVER check and starts
to process adminq/notifyq work and then the driver
starts a fw_down cycle. This could cause some
undefined behavior if the notifyqcq/adminqcq are
free'd while pdsc_process_adminq() is running. Use
cancel_work_sync() on the adminqcq's work struct
to make sure any pending work items are cancelled
and any in progress work items are completed.
Also, make sure to not call cancel_work_sync() if
the work item has not be initialized. Without this,
traces will happen in cases where a reset fails and
teardown is called again or if reset fails and the
driver is removed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The PCIe reset handlers can run at the same time as the
health thread. This can cause the health thread to
stomp on the PCIe reset. Fix this by preventing the
health thread from running while a PCIe reset is happening.
As part of this use timer_shutdown_sync() during reset and
remove to make sure the timer doesn't ever get rearmed.
Fixes: ffa55858330f ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-2-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The fc transport logs the opcode and fctype on command timeout.
This is sufficient information to identify the command issued,
but not very human-readable. Use the nvme_fabrics_opcode_str()
helper to also log the name of the command, as rdma and tcp already do.
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
nvme_opcode_str() currently supports admin, IO, and fabrics commands.
However, fabrics commands aren't allowed for the pci transport.
Currently the pci caller passes 0 as the fctype,
which means any fabrics command would be displayed as "Property Set".
Move fabrics command support into a function nvme_fabrics_opcode_str()
and remove the fctype argument to nvme_opcode_str().
This way, a fabrics command will display as "Unknown" for pci.
Convert the rdma and tcp transports to use nvme_fabrics_opcode_str().
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
In nvme_get_error_status_str(), the status code is already masked
with 0x7ff at the beginning of the function.
Don't bother masking it again when indexing nvme_statuses.
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The functions in drivers/nvme/host/constants.c returning human-readable
status and opcode strings currently use type "const unsigned char *".
Typically string constants use type "const char *",
so remove "unsigned" from the return types.
This is a purely cosmetic change to clarify that the functions
return text strings instead of an array of bytes, for example.
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Add MODULE_DESCRIPTION() in order to remove warnings & get clean build:-
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/common/nvme-auth.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/common/nvme-keyring.o
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Authentication commands might trigger a lengthy computation on the
controller or even a callout to an external entity.
In these cases the controller might return a status without the DNR
bit set, indicating that the command should be retried.
This patch enables retries for authentication commands by setting
NVME_SUBMIT_RETRY for __nvme_submit_sync_cmd().
Reported-by: Martin George <marting@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Combine the two arguments 'flags' and 'at_head' from __nvme_submit_sync_cmd()
into a single 'flags' argument and use function-specific values to indicate
what should be set within the function.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
No point in having macros just for a single function nvme_auth_submit().
Open-code them into the caller.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Not all mv88e6xxx device support C45 read/write operations. Those
which do not return -EOPNOTSUPP. However, when phylib scans the bus,
it considers this fatal, and the probe of the MDIO bus fails, which in
term causes the mv88e6xxx probe as a whole to fail.
When there is no device on the bus for a given address, the pull up
resistor on the data line results in the read returning 0xffff. The
phylib core code understands this when scanning for devices on the
bus. C45 allows multiple devices to be supported at one address, so
phylib will perform a few reads at each address, so although thought
not the most efficient solution, it is a way to avoid fatal
errors. Make use of this as a minimal fix for stable to fix the
probing problems.
Follow up patches will rework how C45 operates to make it similar to
C22 which considers -ENODEV as a none-fatal, and swap mv88e6xxx to
using this.
Cc: stable@vger.kernel.org
Fixes: 743a19e38d02 ("net: dsa: mv88e6xxx: Separate C22 and C45 transactions")
Reported-by: Tim Menninger <tmenninger@purestorage.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240129224948.1531452-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allows us to detect subsequent IH ring buffer overflows as well.
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status.
Beside, current set function callbacks are empty with no real effect.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
No need to set GC golden settings in driver from gfx 11.5.0 onwards.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix a warning.
v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.
v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)
[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.708989] Call Trace:
[ 41.708992] <TASK>
[ 41.708996] ? show_regs+0x6c/0x80
[ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709008] ? __warn+0x93/0x190
[ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709024] ? report_bug+0x1f9/0x210
[ 41.709035] ? handle_bug+0x46/0x80
[ 41.709041] ? exc_invalid_op+0x1d/0x80
[ 41.709048] ? asm_exc_invalid_op+0x1f/0x30
[ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[ 41.709337] ? srso_alias_return_thunk+0x5/0x7f
[ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
[ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
[ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
[ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu]
[ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
[ 41.709945] ? srso_alias_return_thunk+0x5/0x7f
[ 41.709949] ? tomoyo_file_ioctl+0x20/0x30
[ 41.709959] __x64_sys_ioctl+0x9c/0xd0
[ 41.709967] do_syscall_64+0x3f/0x90
[ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r'
Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The error message buffer overflow 'dc->links' 12 <= 12 suggests that the
code is trying to access an element of the dc->links array that is
beyond its bounds. In C, arrays are zero-indexed, so an array with 12
elements has valid indices from 0 to 11. Trying to access dc->links[12]
would be an attempt to access the 13th element of a 12-element array,
which is a buffer overflow.
To fix this, ensure that the loop does not go beyond the last valid
index when accessing dc->links[i + 1] by subtracting 1 from the loop
condition.
This would ensure that i + 1 is always a valid index in the array.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12
Fixes: 59f1622a5f05 ("drm/amd/display: Add dpia display mode validation logic")
Cc: PeiChen Huang <peichen.huang@amd.com>
Cc: Aric Cyr <aric.cyr@amd.com>
Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a NULL check for the kzalloc call that allocates memory for
dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously,
if kzalloc failed to allocate memory and returned NULL, the code would
attempt to use the NULL pointer.
The fix is to check if kzalloc returns NULL, and if so, log an error
message and skip the rest of the current loop iteration with the
continue statement. This prevents the code from attempting to use the
NULL pointer.
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202401300629.ICnCt983-lkp@intel.com/
Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size")
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The same calls are made directly above, but conditional on the firmware
loading and validating successfully.
Cc: stable@vger.kernel.org
Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init")
Signed-off-by: David McFarland <corngood@gmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix the warning info below during mode1 reset.
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000006] ? show_regs+0x6e/0x80
[ +0.000011] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000005] ? __warn+0x91/0x150
[ +0.000009] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000006] ? report_bug+0x19d/0x1b0
[ +0.000013] ? handle_bug+0x46/0x80
[ +0.000012] ? exc_invalid_op+0x1d/0x80
[ +0.000011] ? asm_exc_invalid_op+0x1f/0x30
[ +0.000014] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000007] ? __flush_work.isra.0+0x208/0x390
[ +0.000007] ? _prb_read_valid+0x216/0x290
[ +0.000008] __cancel_work_timer+0x11d/0x1a0
[ +0.000007] ? try_to_grab_pending+0xe8/0x190
[ +0.000012] cancel_work_sync+0x14/0x20
[ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]
This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler
v2:
- Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139
Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why]
odm calculation is missing for pipe split policy determination
and cause Underflow/Corruption issue.
[how]
Add the odm calculation.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
On 14us for exit latency time causes underflow for 8K monitor with HDR on.
Increasing the latency to 28us fixes the underflow.
[How]
Increase the latency to 28us. This workaround should be sufficient
before we figure out why SR exit so long.
Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Nicholas Susanto <nicholas.susanto@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why]
MAX_SURFACES is per stream, while MAX_PLANES is per asic. The
mpc_combine is an array that records all the planes per asic. Therefore
MAX_PLANES should be used as the array size. Using MAX_SURFACES causes
array overflow when there are more than 3 planes.
[how]
Use the MAX_PLANES for the mpc_combine array size.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com>
Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Secondary DP2 display fails to light up in some instances
[How]
Clock needs to be on when DPSTREAMCLK*_EN =1. This change
moves dtbclk_p enable/disable point to make sure this is
the case
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Daniel Miess <daniel.miess@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why]
BIOS's integration info table not following the original order
which is phy instance is ext_displaypath's array index.
[how]
Move them to follow the original order.
Reviewed-by: Muhammad Ahmed <ahmed.ahmed@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why]
Originally, PMFW said min FCLK is 300Mhz, but min DCFCLK can be increased
to 400Mhz because min FCLK is now 600Mhz so FCLK >= 1.5 * DCFCLK hardware
requirement will still be satisfied. Increasing min DCFCLK addresses
underflow issues (underflow occurs when phantom pipe is turned on for some
Sub-Viewport configs).
[how]
Increasing DCFCLK by raising the min_dcfclk_mhz
Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Sohaib Nadeem <sohaib.nadeem@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Revert commit 885c71ad791c
("drm/amd/display: initialize all the dpm level's stutter latency")
Because it causes some regression
Reviewed-by: Muhammad Ahmed <ahmed.ahmed@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
On GFX 9.4.3, for a given KFD node, fetch the correct drm device from
XCP manager when checking for cgroup permissions.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This instruction has no functional difference to S_ENDPGM
but allows performance counters to track save events correctly.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Laurent Morichetti <laurent.morichetti@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Partial migration to system memory should use migrate.addr, not
prange->start as virtual address to allocate system memory page.
Fixes: a546a2768440 ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In (e)poll mode, threads often depend on I/O events to determine when
data is ready for consumption. Within binder, a thread may initiate a
command via BINDER_WRITE_READ without a read buffer and then make use
of epoll_wait() or similar to consume any responses afterwards.
It is then crucial that epoll threads are signaled via wakeup when they
queue their own work. Otherwise, they risk waiting indefinitely for an
event leaving their work unhandled. What is worse, subsequent commands
won't trigger a wakeup either as the thread has pending work.
Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Steven Moreland <smoreland@google.com>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240131215347.1808751-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If CONFIG_OF_KOBJ is not set, a device_node does not contain a
kobj and attempts to access the embedded kobj via kref_read break
the compile.
Replace affected kref_read calls with a macro that reads the
refcount if it exists and returns 1 if there is no embedded kobj.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401291740.VP219WIz-lkp@intel.com/
Fixes: 4dde83569832 ("of: Fix double free in of_parse_phandle_with_args_map")
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Link: https://lore.kernel.org/r/20240129192556.403271-1-lk@c--e.de
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
This patch is to eliminate interrupt warning below:
"[drm] Fence fallback timer expired on ring sdma0.0".
An early vm pt clearing job is sent to SDMA ahead of interrupt enabled.
And re-locating the drm client creation following after drm_dev_register
looks like a more proper flow.
v2: wrap the drm client creation
Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
syzbot has found a type mismatch between a USB pipe and the transfer
endpoint, which is triggered by the bcm5974 driver[1].
This driver expects the device to provide input interrupt endpoints and
if that is not the case, the driver registration should terminate.
Repros are available to reproduce this issue with a certain setup for
the dummy_hcd, leading to an interrupt/bulk mismatch which is caught in
the USB core after calling usb_submit_urb() with the following message:
"BOGUS urb xfer, pipe 1 != type 3"
Some other device drivers (like the appletouch driver bcm5974 is mainly
based on) provide some checking mechanism to make sure that an IN
interrupt endpoint is available. In this particular case the endpoint
addresses are provided by a config table, so the checking can be
targeted to the provided endpoints.
Add some basic checking to guarantee that the endpoints available match
the expected type for both the trackpad and button endpoints.
This issue was only found for the trackpad endpoint, but the checking
has been added to the button endpoint as well for the same reasons.
Given that there was never a check for the endpoint type, this bug has
been there since the first implementation of the driver (f89bd95c5c94).
[1] https://syzkaller.appspot.com/bug?extid=348331f63b034f89b622
Fixes: f89bd95c5c94 ("Input: bcm5974 - add driver for Macbook Air and Pro Penryn touchpads")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reported-and-tested-by: syzbot+348331f63b034f89b622@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20231007-topic-bcm5974_bulk-v3-1-d0f38b9d2935@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six small fixes. Five are obvious and in drivers. The last one is a
core fix to remove the host lock acquisition and release, caused by a
dynamic check of host_busy, in the error handling loop which has been
reported to cause lockups"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: storvsc: Fix ring buffer size calculation
scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler
scsi: MAINTAINERS: Update ibmvscsi_tgt maintainer
scsi: initio: Remove redundant variable 'rb'
scsi: virtio_scsi: Remove duplicate check if queue is broken
scsi: isci: Fix an error code problem in isci_io_request_build()
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the MediaTek mt76 drivers.
Here is a sorted list of descriptions. It might make the reviewing
process easier.
MODULE_DESCRIPTION("MediaTek MT7603E and MT76x8 wireless driver");
MODULE_DESCRIPTION("MediaTek MT7615E and MT7663E wireless driver");
MODULE_DESCRIPTION("MediaTek MT7615E MMIO helpers");
MODULE_DESCRIPTION("MediaTek MT7663 SDIO/USB helpers");
MODULE_DESCRIPTION("MediaTek MT7663S (SDIO) wireless driver");
MODULE_DESCRIPTION("MediaTek MT7663U (USB) wireless driver");
MODULE_DESCRIPTION("MediaTek MT76x02 helpers");
MODULE_DESCRIPTION("MediaTek MT76x02 MCU helpers");
MODULE_DESCRIPTION("MediaTek MT76x0E (PCIe) wireless driver");
MODULE_DESCRIPTION("MediaTek MT76x0U (USB) wireless driver");
MODULE_DESCRIPTION("MediaTek MT76x2 EEPROM helpers");
MODULE_DESCRIPTION("MediaTek MT76x2E (PCIe) wireless driver");
MODULE_DESCRIPTION("MediaTek MT76x2U (USB) wireless driver");
MODULE_DESCRIPTION("MediaTek MT76x connac layer helpers");
MODULE_DESCRIPTION("MediaTek MT76x EEPROM helpers");
MODULE_DESCRIPTION("MediaTek MT76x helpers");
MODULE_DESCRIPTION("MediaTek MT76x SDIO helpers");
MODULE_DESCRIPTION("MediaTek MT76x USB helpers");
MODULE_DESCRIPTION("MediaTek MT7915E MMIO helpers");
MODULE_DESCRIPTION("MediaTek MT7921 core driver");
MODULE_DESCRIPTION("MediaTek MT7921E (PCIe) wireless driver");
MODULE_DESCRIPTION("MediaTek MT7921S (SDIO) wireless driver");
MODULE_DESCRIPTION("MediaTek MT7921U (USB) wireless driver");
MODULE_DESCRIPTION("MediaTek MT7925 core driver");
MODULE_DESCRIPTION("MediaTek MT7925E (PCIe) wireless driver");
MODULE_DESCRIPTION("MediaTek MT7925U (USB) wireless driver");
MODULE_DESCRIPTION("MediaTek MT792x core driver");
MODULE_DESCRIPTION("MediaTek MT792x USB helpers");
MODULE_DESCRIPTION("MediaTek MT7996 MMIO helpers");
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-10-leitao@debian.org
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Atmel WILC1000 SPI driver.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-9-leitao@debian.org
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the TI WiLink 8 wireless driver.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-8-leitao@debian.org
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Prism54 SPI wireless driver.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-7-leitao@debian.org
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Qualcomm Atheros WCN3660/3680 wireless driver.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-6-leitao@debian.org
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Atheros AR5523 wireless driver.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-5-leitao@debian.org
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Broadcom FullMac WLAN drivers.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-4-leitao@debian.org
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the TI wireless drivers wl12xx and wl1251.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-3-leitao@debian.org
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the TI WLAN wlcore drivers.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240130104243.3025393-2-leitao@debian.org
|
|
A last minute revert in 6.7-final introduced a potential deadlock when
enabling ASPM during probe of Qualcomm PCIe controllers as reported by
lockdep:
============================================
WARNING: possible recursive locking detected
6.7.0 #40 Not tainted
--------------------------------------------
kworker/u16:5/90 is trying to acquire lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
but task is already holding lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(pci_bus_sem);
lock(pci_bus_sem);
*** DEADLOCK ***
Call trace:
print_deadlock_bug+0x25c/0x348
__lock_acquire+0x10a4/0x2064
lock_acquire+0x1e8/0x318
down_read+0x60/0x184
pcie_aspm_pm_state_change+0x58/0xdc
pci_set_full_power_state+0xa8/0x114
pci_set_power_state+0xc4/0x120
qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
pci_walk_bus+0x64/0xbc
qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
The deadlock can easily be reproduced on machines like the Lenovo ThinkPad
X13s by adding a delay to increase the race window during asynchronous
probe where another thread can take a write lock.
Add a new pci_set_power_state_locked() and associated helper functions that
can be called with the PCI bus semaphore held to avoid taking the read lock
twice.
Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com
Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org
Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 6.7
|
|
DMA buffers allocated from the CMA dma-buf heap get counted under
RssFile for processes that map them and trigger page faults. In
addition to the incorrect accounting reported to userspace, reclaim
behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but
this memory is not reclaimable. [1] Change the CMA dma-buf heap to set
VM_PFNMAP on the VMA so MM does not poke at the memory managed by this
dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting.
The system dma-buf heap does not suffer from this issue since
remap_pfn_range is used during the mmap of the buffer, which also sets
VM_PFNMAP on the VMA.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede
Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117181141.286383-1-tjmercier@google.com
|