linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-02-18	nvmet: pci-epf: Correctly initialize CSTS when enabling the controller	Damien Le Moal
	The function nvmet_pci_epf_poll_cc_work() sets the NVME_CSTS_RDY bit of the controller status register (CSTS) when nvmet_pci_epf_enable_ctrl() returns success. However, since this function can be called several times (e.g. if the host reboots), instead of setting the bit in ctrl->csts, initialize this field to only have NVME_CSTS_RDY set. Conversely, if nvmet_pci_epf_enable_ctrl() fails, make sure to clear all bits from ctrl->csts. To simplify nvmet_pci_epf_poll_cc_work(), initialize ctrl->csts to NVME_CSTS_RDY directly inside nvmet_pci_epf_enable_ctrl() and clear this field in that function as well in case of a failure. To be consistent, move clearing the NVME_CSTS_RDY bit from ctrl->csts when the controller is being disabled from nvmet_pci_epf_poll_cc_work() into nvmet_pci_epf_disable_ctrl(). Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-18	nvmet-rdma: recheck queue state is LIVE in state lock in recv done	Ruozhu Li
	The queue state checking in nvmet_rdma_recv_done is not in queue state lock.Queue state can transfer to LIVE in cm establish handler between state checking and state lock here, cause a silent drop of nvme connect cmd. Recheck queue state whether in LIVE state in state lock to prevent this issue. Signed-off-by: Ruozhu Li <david.li@jaguarmicro.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-18	nvmet: Fix crash when a namespace is disabled	Hannes Reinecke
	The namespace percpu counter protects pending I/O, and we can only safely diable the namespace once the counter drop to zero. Otherwise we end up with a crash when running blktests/nvme/058 (eg for loop transport): [ 2352.930426] [ T53909] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI [ 2352.930431] [ T53909] KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] [ 2352.930434] [ T53909] CPU: 3 UID: 0 PID: 53909 Comm: kworker/u16:5 Tainted: G W 6.13.0-rc6 #232 [ 2352.930438] [ T53909] Tainted: [W]=WARN [ 2352.930440] [ T53909] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 2352.930443] [ T53909] Workqueue: nvmet-wq nvme_loop_execute_work [nvme_loop] [ 2352.930449] [ T53909] RIP: 0010:blkcg_set_ioprio+0x44/0x180 as the queue is already torn down when calling submit_bio(); So we need to init the percpu counter in nvmet_ns_enable(), and wait for it to drop to zero in nvmet_ns_disable() to avoid having I/O pending after the namespace has been disabled. Fixes: 74d16965d7ac ("nvmet-loop: avoid using mutex in IO hotpath") Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-18	nvme-tcp: add basic support for the C2HTermReq PDU	Maurizio Lombardi
	Previously, the NVMe/TCP host driver did not handle the C2HTermReq PDU, instead printing "unsupported pdu type (3)" when received. This patch adds support for processing the C2HTermReq PDU, allowing the driver to print the Fatal Error Status field. Example of output: nvme nvme4: Received C2HTermReq (FES = Invalid PDU Header Field) Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-18	nvme-pci: quirk Acer FA100 for non-uniqueue identifiers	Christopher Lentocha
	In order for two Acer FA100 SSDs to work in one PC (in the case of myself, a Lenovo Legion T5 28IMB05), and not show one drive and not the other, and sometimes mix up what drive shows up (randomly), these two lines of code need to be added, and then both of the SSDs will show up and not conflict when booting off of one of them. If you boot up your computer with both SSDs installed without this patch, you may also randomly get into a kernel panic (if the initrd is not set up) or stuck in the initrd "/init" process, it is set up, however, if you do apply this patch, there should not be problems with booting or seeing both contents of the drive. Tested with the btrfs filesystem with a RAID configuration of having the root drive '/' combined to make two 256GB Acer FA100 SSDs become 512GB in total storage. Kernel Logs with patch applied (`dmesg -t \| grep -i nvm`): ``` ... nvme 0000:04:00.0: platform quirk: setting simple suspend nvme nvme0: pci function 0000:04:00.0 nvme 0000:05:00.0: platform quirk: setting simple suspend nvme nvme1: pci function 0000:05:00.0 nvme nvme1: missing or invalid SUBNQN field. nvme nvme1: allocated 64 MiB host memory buffer. nvme nvme0: missing or invalid SUBNQN field. nvme nvme0: allocated 64 MiB host memory buffer. nvme nvme1: 8/0/0 default/read/poll queues nvme nvme1: Ignoring bogus Namespace Identifiers nvme nvme0: 8/0/0 default/read/poll queues nvme nvme0: Ignoring bogus Namespace Identifiers nvme0n1: p1 p2 ... ``` Kernel Logs with patch not applied (`dmesg -t \| grep -i nvm`): ``` ... nvme 0000:04:00.0: platform quirk: setting simple suspend nvme nvme0: pci function 0000:04:00.0 nvme 0000:05:00.0: platform quirk: setting simple suspend nvme nvme1: pci function 0000:05:00.0 nvme nvme0: missing or invalid SUBNQN field. nvme nvme1: missing or invalid SUBNQN field. nvme nvme0: allocated 64 MiB host memory buffer. nvme nvme1: allocated 64 MiB host memory buffer. nvme nvme0: 8/0/0 default/read/poll queues nvme nvme1: 8/0/0 default/read/poll queues nvme nvme1: globally duplicate IDs for nsid 1 nvme nvme1: VID:DID 1dbe:5216 model:Acer SSD FA100 256GB firmware:1.Z.J.2X nvme0n1: p1 p2 ... ``` Signed-off-by: Christopher Lentocha <christopherericlentocha@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-18	gpiolib: protect gpio_chip with SRCU in array_info paths in multi get/set	Bartosz Golaszewski
	During the locking rework in GPIOLIB, we omitted one important use-case, namely: setting and getting values for GPIO descriptor arrays with array_info present. This patch does two things: first it makes struct gpio_array store the address of the underlying GPIO device and not chip. Next: it protects the chip with SRCU from removal in gpiod_get_array_value_complex() and gpiod_set_array_value_complex(). Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250215095655.23152-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-18	drm/bridge: nwl-dsi: Set bridge type	Alexander Stein
	This is a DSI bridge, so set the bridge type accordingly. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250120132135.554391-2-alexander.stein@ew.tq-group.com
2025-02-18	drm/bridge: ti-sn65dsi83: Set bridge type	Alexander Stein
	This is a DSI to LVDS bridge, so set the bridge type accordingly. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250120132135.554391-1-alexander.stein@ew.tq-group.com
2025-02-18	drm/bridge: analogix_dp: Use devm_platform_ioremap_resource()	Shixiong Ou
	Convert platform_get_resource(), devm_ioremap_resource() to a single call to devm_platform_ioremap_resource(). Signed-off-by: Shixiong Ou <oushixiong@kylinos.cn> Reviewed-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250128065645.27140-1-oushixiong1025@163.com
2025-02-18	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann
	Backmerging to get bugfixes from v6.14-rc2. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-02-17	ibmvnic: Don't reference skb after sending to VIOS	Nick Child
	Previously, after successfully flushing the xmit buffer to VIOS, the tx_bytes stat was incremented by the length of the skb. It is invalid to access the skb memory after sending the buffer to the VIOS because, at any point after sending, the VIOS can trigger an interrupt to free this memory. A race between reading skb->len and freeing the skb is possible (especially during LPM) and will result in use-after-free: ================================================================== BUG: KASAN: slab-use-after-free in ibmvnic_xmit+0x75c/0x1808 [ibmvnic] Read of size 4 at addr c00000024eb48a70 by task hxecom/14495 <...> Call Trace: [c000000118f66cf0] [c0000000018cba6c] dump_stack_lvl+0x84/0xe8 (unreliable) [c000000118f66d20] [c0000000006f0080] print_report+0x1a8/0x7f0 [c000000118f66df0] [c0000000006f08f0] kasan_report+0x128/0x1f8 [c000000118f66f00] [c0000000006f2868] __asan_load4+0xac/0xe0 [c000000118f66f20] [c0080000046eac84] ibmvnic_xmit+0x75c/0x1808 [ibmvnic] [c000000118f67340] [c0000000014be168] dev_hard_start_xmit+0x150/0x358 <...> Freed by task 0: kasan_save_stack+0x34/0x68 kasan_save_track+0x2c/0x50 kasan_save_free_info+0x64/0x108 __kasan_mempool_poison_object+0x148/0x2d4 napi_skb_cache_put+0x5c/0x194 net_tx_action+0x154/0x5b8 handle_softirqs+0x20c/0x60c do_softirq_own_stack+0x6c/0x88 <...> The buggy address belongs to the object at c00000024eb48a00 which belongs to the cache skbuff_head_cache of size 224 ================================================================== Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250214155233.235559-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17	s390/ism: add release function for struct device	Julian Ruess
	According to device_release() in /drivers/base/core.c, a device without a release function is a broken device and must be fixed. The current code directly frees the device after calling device_add() without waiting for other kernel parts to release their references. Thus, a reference could still be held to a struct device, e.g., by sysfs, leading to potential use-after-free issues if a proper release function is not set. Fixes: 8c81ba20349d ("net/smc: De-tangle ism and smc device initialization") Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Julian Ruess <julianr@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250214120137.563409-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18	drm/i915: Hook up display fault interrupts for VLV/CHV	Ville Syrjälä
	Hook up the display fault irq handlers for VLV/CHV. Unfortunately the actual hardware doesn't agree with the spec on how DPINVGTT should behave. The docs claim that the status bits can be cleared by writing '1' to them, but in reality there doesn't seem to be any way to clear them. So we must disable and ignore any fault we've already seen in the past. The entire register does reset when the display power well goes down, so we can just always re-enable all the bits in irq postinstall without having to track the state beyond that. v2: Use intel_display instead of dev_priv Move xe gen2_error_{init,reset}() out Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217070047.953-9-ville.syrjala@linux.intel.com
2025-02-18	drm/i915: Un-invert {i9xx,i965}_error_mask()	Ville Syrjälä
	Make life a bit more straightforward by removing the bitwise not from {i9xx,i965}_error_mask() and instead do it when feeding the value to gen2_error_init(). Make life a bit easier I think. Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217070047.953-8-ville.syrjala@linux.intel.com
2025-02-18	drm/i915: Introduce i915_error_regs	Ville Syrjälä
	Introduce i915_error_regs as the EIR/EMR counterpart to the IIR/IMR/IER i915_irq_regs, and update the irq reset/postingstall to utilize them accordingly. v2: Include xe compat versions Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217070047.953-7-ville.syrjala@linux.intel.com Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-18	drm/i915: Hook in display GTT faults for ILK/SNB	Ville Syrjälä
	Hook up display GTT fault interrupts for ILK/SNB. Bspec: 8559 Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217070047.953-6-ville.syrjala@linux.intel.com
2025-02-18	drm/i915: Hook in display GTT faults for IVB/HSW	Ville Syrjälä
	Dump out the display fault information from the IVB/HSW error interrupt handler. Bspec: 8203 Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217070047.953-5-ville.syrjala@linux.intel.com
2025-02-18	drm/i915: Pimp display fault reporting	Ville Syrjälä
	Decode the display faults a bit more extensively so that one doesn't have to translate the bitmask to planes/etc. manually. Also for plane faults we can read out a bit of state from the relevant plane(s) and dump that out. Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217070047.953-4-ville.syrjala@linux.intel.com
2025-02-18	drm/i915: Introduce a minimal plane error state	Ville Syrjälä
	I want to capture a little bit more information about the state of the plane upon faults. To that end introduce a small plane error state struct and provide per-plane vfuncs to read it out. For now we just stick the CTL, SURF, and SURFLIVE (if available) registers contents in there. v2: Use struct intel_display instead of dev_priv Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217070047.953-3-ville.syrjala@linux.intel.com
2025-02-18	drm/i915: Add missing else to the if ladder in missing else	Ville Syrjälä
	The if ladder in gen8_de_pipe_fault_mask() was missing one else, add it. Doesn't actually matter since each if branch just returns directly. But the code is less confusing when you always do things the same way. Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250217070047.953-2-ville.syrjala@linux.intel.com
2025-02-17	irqchip/jcore-aic, clocksource/drivers/jcore: Fix jcore-pit interrupt request	Artur Rojek
	The jcore-aic irqchip does not have separate interrupt numbers reserved for cpu-local vs global interrupts. Therefore the device drivers need to request the given interrupt as per CPU interrupt. 69a9dcbd2d65 ("clocksource/drivers/jcore: Use request_percpu_irq()") converted the clocksource driver over to request_percpu_irq(), but failed to do add all the required changes, resulting in a failure to register PIT interrupts. Fix this by: 1) Explicitly mark the interrupt via irq_set_percpu_devid() in jcore_pit_init(). 2) Enable and disable the per CPU interrupt in the CPU hotplug callbacks. 3) Pass the correct per-cpu cookie to the irq handler by using handle_percpu_devid_irq() instead of handle_percpu_irq() in handle_jcore_irq(). [ tglx: Massage change log ] Fixes: 69a9dcbd2d65 ("clocksource/drivers/jcore: Use request_percpu_irq()") Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/all/20250216175545.35079-3-contact@artur-rojek.eu
2025-02-17	irqchip/gic-v3: Fix rk3399 workaround when secure interrupts are enabled	Marc Zyngier
	Christoph reports that their rk3399 system dies since commit 773c05f417fa1 ("irqchip/gic-v3: Work around insecure GIC integrations"). It appears that some rk3399 have secure payloads, and that the firmware sets SCR_EL3.FIQ==1. Obivously, disabling security in that configuration leads to even more problems. Revisit the workaround by: - making it rk3399 specific - checking whether Group-0 is available, which is a good proxy for SCR_EL3.FIQ being 0 - either apply the workaround if Group-0 is available, or disable pseudo-NMIs if not Note that this doesn't mean that the secure side is able to receive interrupts, as all interrupts are made non-secure anyway. Clearly, nobody ever tested secure interrupts on this platform. Fixes: 773c05f417fa1 ("irqchip/gic-v3: Work around insecure GIC integrations") Reported-by: Christoph Fritz <chf.fritz@googlemail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Christoph Fritz <chf.fritz@googlemail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250215185241.3768218-1-maz@kernel.org Closes: https://lore.kernel.org/r/b1266652fb64857246e8babdf268d0df8f0c36d9.camel@googlemail.com
2025-02-17	drm/amdgpu: Generate bad page threshold cper records	Xiang Liu
	Generate CPER record when bad page threshold exceed and commit to CPER ring. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: Commit CPER entry	Xiang Liu
	Commit the CPER entry to the ring buffer. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: add mutex lock for cper ring	Tao Zhou
	Avoid the confliction between read and write of ring buffer. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amd/pm: Limit jpeg rings as per max for jpeg_v_4_0_3	Asad Kamal
	Since pmfw supports for smuv13_0_6 is limited to 8 jpeg rings per instance, which is the max for jpeg_v_4_0_3. Limit it to same to avoid out of bound access. Fixes: 568199a5c7a9 ("drm/amd/pm: Limit to 8 jpeg rings per instance") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: add data write function for CPER ring	Tao Zhou
	Old CPER data will be overwritten if ring buffer is full, and read pointer always points to CPER header. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: read CPER ring via debugfs	Tao Zhou
	We read CPER data from read pointer to write pointer without changing the pointers. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: add RAS CPER ring buffer	Tao Zhou
	And initialize it, this is a pure software ring to store RAS CPER data. v2: change ring size to 0x100000 v2: update the initialization of count_dw of cper ring, it's dword variable v3: skip VM inv eng for cper v3: init/fini when aca enabled Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: Get timestamp from system time	Xiang Liu
	Get system local time and encode it to timestamp for CPER. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu/mes12: allocate hw_resource_1 buffer once	Alex Deucher
	Allocate the buffer at sw init time so we don't alloc and free it for every suspend/resume or reset cycle. Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu/mes11: allocate hw_resource_1 buffer once	Alex Deucher
	Allocate the buffer at sw init time so we don't alloc and free it for every suspend/resume or reset cycle. Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amd/display: Reapply 2fde4fdddc1f	Nathan Chancellor
	Commit 2563391e57b5 ("drm/amd/display: DML2.1 resynchronization") blew away the compiler warning fix from commit 2fde4fdddc1f ("drm/amd/display: Avoid -Wenum-float-conversion in add_margin_and_round_to_dfs_grainularity()"), causing the warning to reappear. drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c:183:58: error: arithmetic between enumeration type 'enum dentist_divider_range' and floating-point type 'double' [-Werror,-Wenum-float-conversion] 183 \| divider = (unsigned int)(DFS_DIVIDER_RANGE_SCALE_FACTOR * (vco_freq_khz / clock_khz)); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Apply the fix again to resolve the warning. Fixes: 1b30456150e5 ("drm/amd/display: DML21 Reintegration") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: Generate cper records	Hawking Zhang
	Encode the error information in CPER format and commit to the cper ring Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdkfd: Fix user queue validation on Gfx7/8	Philip Yang
	To workaround queue full h/w issue on Gfx7/8, when application create AQL queue, the ring buffer bo allocate size is queue_size/2 and map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each attachment map size is queue_size/2, with same ring_bo backing memory. For Gfx7/8, user queue buffer validation should use queue_size/2 to verify ring_bo allocation and mapping size. Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers") Suggested-by: Tomáš Trnka <trnka@scm.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: Introduce funcs for generating cper record	Hawking Zhang
	Introduce new functions that are used to generate cper ue or ce records. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: Include ACA error type in aca bank	Hawking Zhang
	ACA error types managed by driver a direct 1:1 correspondence with those managed by firmware. To address this, for each ACA bank, include both the ACA error type and the ACA SMU type. This addition is useful for creating CPER records. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: Optimize the enablement of GECC	Candice Li
	Enable GECC only when the default memory ECC mode or the module parameter amdgpu_ras_enable is activated. v2: Add kernel message to remind users explicitly set amdgpu_ras_enable=1 before driver loading to enable GECC and set amdgpu_ras_enable=0 to disable GECC when GECC is currently enabled if needed. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: Introduce funcs for populating CPER	Hawking Zhang
	Introduce utility functions designed to assist in populating CPER records. v2: call cper_init/fini in device_ip_init/fini. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amd/include: Add amd cper header	Hawking Zhang
	AMD is using Common Platform Error Record (CPER) format to report all gpu hardware errors. v2: add program attribute Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu: Rename VCN clock gating function for consistency	Srinivasan Shanmugam
	Change the function name from vcn_v2_5_enable_clock_gating_inst to vcn_v2_5_enable_clock_gating to ensure consistency in naming. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:781: warning: expecting prototype for vcn_v2_5_enable_clock_gating_inst(). Prototype was for vcn_v2_5_enable_clock_gating() instead Cc: Leo Liu <leo.liu@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu/vcn4.0.3: drop dpm power helpers	Alex Deucher
	VCN 4.0.3 doesn't support powergating so there is no need to call these. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu/vcn5.0.1: drop dpm power helpers	Alex Deucher
	VCN 5.0.1 doesn't support powergating so there is no need to call these. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu/vcn5.0.1: use correct dpm helper	Alex Deucher
	The VCN and UVD helpers were split in commit ff69bba05f08 ("drm/amd/pm: add inst to dpm_set_powergating_by_smu") However, this happened in parallel to the vcn 5.0.1 development so it was missed there. Fixes: 346492f30ce3 ("drm/amdgpu: Add VCN_5_0_1 support") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sonny Jiang <sonjiang@amd.com> Cc: Boyuan Zhang <boyuan.zhang@amd.com>
2025-02-17	drm/amdgpu/umsch: tidy up the ucode name string handling	Alex Deucher
	Make the constant parts of the name part of the string we pass to amdgpu_ucode_request(). Only the version number varies from IP to IP. Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lang Yu <Lang.Yu@amd.com>
2025-02-17	drm/amdgpu/umsch: fix ucode check	Alex Deucher
	Return an error if the IP version doesn't match otherwise we end up passing a NULL string to amdgpu_ucode_request. We should never hit this in practice today since we only enable the umsch code on the supported IP versions, but add a check to be safe. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502130406.iWQ0eBug-lkp@intel.com/ Fixes: 020620424b27 ("drm/amd: Use a constant format string for amdgpu_ucode_request") Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Lang Yu <Lang.Yu@amd.com>
2025-02-17	drm/amdgpu: Remove extra checks for CPX	Amber Lin
	As far as the number of XCCs, the number of compute partitions, and the number of memory partitions qualify, CPX is valid. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17	drm/amdgpu/umsch: declare umsch firmware	Alex Deucher
	Needed to be properly picked up for the initrd, etc. Fixes: 3488c79beafa ("drm/amdgpu: add initial support for UMSCH") Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lang Yu <Lang.Yu@amd.com>
2025-02-17	drm/amdgpu/gfx: only call mes for enforce isolation if supported	Alex Deucher
	This should not be called on chips without MES so check if MES is enabled and if the cleaner shader is supported. Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES") Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
2025-02-17	drm/amdgpu: Add ring reset callback for JPEG2_0_0	Sathishkumar S
	Add ring reset function callback for JPEG2_0_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>