summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2024-12-18zram: fix uninitialized ZRAM not releasing backing deviceKairui Song
Setting backing device is done before ZRAM initialization. If we set the backing device, then remove the ZRAM module without initializing the device, the backing device reference will be leaked and the device will be hold forever. Fix this by always reset the ZRAM fully on rmmod or reset store. Link: https://lkml.kernel.org/r/20241209165717.94215-3-ryncsn@gmail.com Fixes: 013bf95a83ec ("zram: add interface to specif backing device") Signed-off-by: Kairui Song <kasong@tencent.com> Reported-by: Desheng Wu <deshengwu@tencent.com> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18zram: refuse to use zero sized block device as backing deviceKairui Song
Patch series "zram: fix backing device setup issue", v2. This series fixes two bugs of backing device setting: - ZRAM should reject using a zero sized (or the uninitialized ZRAM device itself) as the backing device. - Fix backing device leaking when removing a uninitialized ZRAM device. This patch (of 2): Setting a zero sized block device as backing device is pointless, and one can easily create a recursive loop by setting the uninitialized ZRAM device itself as its own backing device by (zram0 is uninitialized): echo /dev/zram0 > /sys/block/zram0/backing_dev It's definitely a wrong config, and the module will pin itself, kernel should refuse doing so in the first place. By refusing to use zero sized device we avoided misuse cases including this one above. Link: https://lkml.kernel.org/r/20241209165717.94215-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20241209165717.94215-2-ryncsn@gmail.com Fixes: 013bf95a83ec ("zram: add interface to specif backing device") Signed-off-by: Kairui Song <kasong@tencent.com> Reported-by: Desheng Wu <deshengwu@tencent.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-19Merge tag 'v6.13-rc3' into drm-nextDave Airlie
Backmerge linux 6.13-rc3 as amd next has some dependencies on fixes in it. Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-12-18Merge tag 'linux-can-fixes-for-6.13-20241218' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2024-12-18 There are 2 patches by Matthias Schiffer for the m_can_pci driver that handles the m_can cores found on the Intel Elkhart Lake processor. They fix the initialization and the interrupt handling under high CAN bus load. * tag 'linux-can-fixes-for-6.13-20241218' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: m_can: fix missed interrupts with m_can_pci can: m_can: set init flag earlier in probe ==================== Link: https://patch.msgid.link/20241218121722.2311963-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-18net: usb: qmi_wwan: add Quectel RG255CMartin Hou
Add support for Quectel RG255C which is based on Qualcomm SDX35 chip. The composition is DM / NMEA / AT / QMI. T: Bus=01 Lev=01 Prnt=01 Port=04 Cnt=01 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2c7c ProdID=0316 Rev= 5.15 S: Manufacturer=Quectel S: Product=RG255C-CN S: SerialNumber=c68192c1 C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=86(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Martin Hou <martin.hou@foxmail.com> Link: https://patch.msgid.link/tencent_17DDD787B48E8A5AB8379ED69E23A0CD9309@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-18net: phy: avoid undefined behavior in *_led_polarity_set()Arnd Bergmann
gcc runs into undefined behavior at the end of the three led_polarity_set() callback functions if it were called with a zero 'modes' argument and it just ends the function there without returning from it. This gets flagged by 'objtool' as a function that continues on to the next one: drivers/net/phy/aquantia/aquantia_leds.o: warning: objtool: aqr_phy_led_polarity_set+0xf: can't find jump dest instruction at .text+0x5d9 drivers/net/phy/intel-xway.o: warning: objtool: xway_gphy_led_polarity_set() falls through to next function xway_gphy_config_init() drivers/net/phy/mxl-gpy.o: warning: objtool: gpy_led_polarity_set() falls through to next function gpy_led_hw_control_get() There is no point to micro-optimize the behavior here to save a single-digit number of bytes in the kernel, so just change this to a "return -EINVAL" as we do when any unexpected bits are set. Fixes: 1758af47b98c ("net: phy: intel-xway: add support for PHY LEDs") Fixes: 9d55e68b19f2 ("net: phy: aquantia: correctly describe LED polarity override") Fixes: eb89c79c1b8f ("net: phy: mxl-gpy: correctly describe LED polarity") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241217081056.238792-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-19power: supply: bq24190: Fix BQ24296 Vbus regulator supportHans de Goede
There are 2 issues with bq24296_set_otg_vbus(): 1. When writing the OTG_CONFIG bit it uses POC_CHG_CONFIG_SHIFT which should be POC_OTG_CONFIG_SHIFT. 2. When turning the regulator off it never turns charging back on. Note this must be done through bq24190_charger_set_charge_type(), to ensure that the charge_type property value of none/trickle/fast is honored. Resolve both issues to fix BQ24296 Vbus regulator support not working. Fixes: b150a703b56f ("power: supply: bq24190_charger: Add support for BQ24296") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20241116203648.169100-2-hdegoede@redhat.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-12-19Merge tag 'drm-intel-gt-next-2024-12-18' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next Driver Changes: - More accurate engine busyness metrics with GuC submission (Umesh) - Ensure partial BO segment offset never exceeds allowed max (Krzysztof) - Flush GuC CT receive tasklet during reset preparation (Zhanjun) - Code cleanups and refactoring (David, Lucas) - Debugging improvements (Jesus) - Selftest improvements (Sk) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Z2KadNXgumx1aQMP@jlahtine-mobl.ger.corp.intel.com
2024-12-18Merge tag 'cxl-fixes-6.13-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Ira Weiny: - prevent probe failure when non-critical RAS unmasking fails - fix CXL 1.1 link status sysfs attribute - fix 4 way (and greater) switch interleave region creation * tag 'cxl-fixes-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Fix region creation for greater than x2 switches cxl/pci: Check dport->regs.rcd_pcie_cap availability before accessing cxl/pci: Fix potential bogus return value upon successful probing
2024-12-18drm/amdgpu/nbio7.0: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0ec43fbece784215d3c4469973e4556d70bce915) Cc: stable@vger.kernel.org
2024-12-18drm/amd: Update strapping for NBIO 2.5.0Mario Limonciello
This helps to avoid a spurious PME event on hotplug to Azalia. Cc: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Reported-and-tested-by: ionut_n2001@yahoo.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215884 Tested-by: Gabriel Marcano <gabemarcano@yahoo.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211024414.7840-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3f6f237b9dd189e1fb85b8a3f7c97a8f27c1e49a) Cc: stable@vger.kernel.org
2024-12-18ACPI: EC: Enable EC support on LoongArch by defaultHuacai Chen
Commit a6021aa24f6417416d933 ("ACPI: EC: make EC support compile-time conditional") only enable ACPI_EC on X86 by default, but the embedded controller is also widely used on LoongArch laptops so we also enable ACPI_EC for LoongArch. The laptop driver cannot work without EC, so also update the dependency of LOONGSON_LAPTOP to let it depend on APCI_EC. Fixes: a6021aa24f6417416d933 ("ACPI: EC: make EC support compile-time conditional") Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn> Tested-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://patch.msgid.link/20241217073704.3339587-1-chenhuacai@loongson.cn [ rjw: Added Fixes: ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-18drm/xe: Force write completion of MI_STORE_DATA_IMMJosé Roberto de Souza
With Force write completion unset there is no guarantees of when the write will be globally visible what is not the behavior wanted. Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241217160732.46280-1-jose.souza@intel.com
2024-12-18drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_updateMichel Dänzer
Third time's the charm, I hope? Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 695c2c745e5dff201b75da8a1d237ce403600d04) Cc: stable@vger.kernel.org
2024-12-18drm/amdgpu: fix amdgpu_coredumpChristian König
The VM pointer might already be outdated when that function is called. Use the PASID instead to gather the information instead. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 57f812d171af4ba233d3ed7c94dfa5b8e92dcc04) Cc: stable@vger.kernel.org
2024-12-18drm/amdgpu/smu14.0.2: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8f2cd1067afe68372a1723e05e19b68ed187676a) Cc: stable@vger.kernel.org
2024-12-18drm/amdgpu/gfx12: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f1fd1d0f40272948aa6ab82a3a82ecbbc76dff53) Cc: stable@vger.kernel.org
2024-12-18drm/amdgpu/mmhub4.1: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 63bfd24088b42c6f55c2096bfc41b50213d419b2) Cc: stable@vger.kernel.org
2024-12-18drm/amdgpu/nbio7.11: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2c8eeaaa0fe5841ccf07a0eb51b1426f34ef39f7) Cc: stable@vger.kernel.org
2024-12-18drm/amdgpu/nbio7.7: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 22b9555bc90df22b585bdd1f161b61584b13af51) Cc: stable@vger.kernel.org
2024-12-18wifi: cw1200: Fix potential NULL dereferenceLinus Walleij
A recent refactoring was identified by smatch to cause another potential NULL dereference: drivers/net/wireless/st/cw1200/cw1200_spi.c:440 cw1200_spi_disconnect() error: we previously assumed 'self' could be null (see line 433) Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202411271742.Xa7CNVh1-lkp@intel.com/ Fixes: 2719a9e7156c ("wifi: cw1200: Convert to GPIO descriptors") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20241217-cw1200-fix-v1-1-911e6b5823ec@linaro.org
2024-12-18drm/amdgpu: don't access invalid schedPierre-Eric Pelloux-Prayer
Since 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()") accessing job->base.sched can produce unexpected results as the initialisation of (*job)->base.sched done in amdgpu_job_alloc is overwritten by the memset. This commit fixes an issue when a CS would fail validation and would be rejected after job->num_ibs is incremented. In this case, amdgpu_ib_free(ring->adev, ...) will be called, which would crash the machine because the ring value is bogus. To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this because the device is actually not used in this function. The next commit will remove the ring argument completely. Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2ae520cb12831d264ceb97c61f72c59d33c0dbd7)
2024-12-18drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCOMario Limonciello
If the kernel hasn't been compiled with PCIe hotplug support this can lead to problems with dGPUs that use BOCO because they effectively drop off the bus. To prevent issues, disable BOCO support when compiled without PCIe hotplug. Reported-by: Gabriel Marcano <gabemarcano@yahoo.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862 Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211155601.3585256-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1ad5bdc28bafa66db0f041cc6cdd278a80426aae)
2024-12-18Merge tag 'hyperv-fixes-signed-20241217' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Various fixes to Hyper-V tools in the kernel tree (Dexuan Cui, Olaf Hering, Vitaly Kuznetsov) - Fix a bug in the Hyper-V TSC page based sched_clock() (Naman Jain) - Two bug fixes in the Hyper-V utility functions (Michael Kelley) - Convert open-coded timeouts to secs_to_jiffies() in Hyper-V drivers (Easwar Hariharan) * tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: tools/hv: reduce resource usage in hv_kvp_daemon tools/hv: add a .gitignore file tools/hv: reduce resouce usage in hv_get_dns_info helper hv/hv_kvp_daemon: Pass NIC name to hv_get_dns_info as well Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet Drivers: hv: util: Don't force error code to ENODEV in util_probe() tools/hv: terminate fcopy daemon if read from uio fails drivers: hv: Convert open-coded timeouts to secs_to_jiffies() tools: hv: change permissions of NetworkManager configuration file x86/hyperv: Fix hv tsc page based sched_clock for hibernation tools: hv: Fix a complier warning in the fcopy uio daemon
2024-12-18drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_updateMichel Dänzer
Third time's the charm, I hope? Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/admgpu: replace kmalloc() and memcpy() with kmemdup()Mirsad Todorovac
The static analyser tool gave the following advice: ./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup → 1266 tmp = kmalloc(used_size, GFP_KERNEL); 1267 if (!tmp) 1268 return -ENOMEM; 1269 → 1270 memcpy(tmp, &host_telemetry->body.error_count, used_size); Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics. Original code works without fault, so this is not a bug fix but proposed improvement. Link: https://lwn.net/Articles/198928/ Fixes: 84a2947ecc85 ("drm/amdgpu: Implement virt req_ras_err_count") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Zhigang Luo <Zhigang.Luo@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Jack Xiao <Jack.Xiao@amd.com> Cc: Vignesh Chander <Vignesh.Chander@amd.com> Cc: Danijel Slivka <danijel.slivka@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amd/display: Fix NULL pointer dereference in dmub_tracebuffer_showSrinivasan Shanmugam
It corrects the issue by checking if 'adev->dm.dmub_srv' is NULL before accessing its 'meta_info' member. This ensures that we do not dereference a NULL pointer. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_debugfs.c:917 dmub_tracebuffer_show() warn: address of 'adev->dm.dmub_srv->meta_info' is non-NULL drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_debugfs.c 901 static int dmub_tracebuffer_show(struct seq_file *m, void *data) 902 { 903 struct amdgpu_device *adev = m->private; 904 struct dmub_srv_fb_info *fb_info = adev->dm.dmub_fb_info; 905 struct dmub_fw_meta_info *fw_meta_info = &adev->dm.dmub_srv->meta_info; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Even if adev->dm.dmub_srv is NULL, the address of ->meta_info can't be NULL 906 struct dmub_debugfs_trace_entry *entries; 907 uint8_t *tbuf_base; 908 uint32_t tbuf_size, max_entries, num_entries, first_entry, i; 909 910 if (!fb_info) 911 return 0; 912 913 tbuf_base = (uint8_t *)fb_info->fb[DMUB_WINDOW_5_TRACEBUFF].cpu_addr; 914 if (!tbuf_base) 915 return 0; 916 --> 917 tbuf_size = fw_meta_info ? fw_meta_info->trace_buffer_size : ^^^^^^^^^^^^ Always non-NULL 918 DMUB_TRACE_BUFFER_SIZE; 919 max_entries = (tbuf_size - sizeof(struct dmub_debugfs_trace_header)) / 920 sizeof(struct dmub_debugfs_trace_entry); 921 922 num_entries = v2: Initialize struct dmub_fw_meta_info *fw_meta_info to NULL (Dan Carpenter) Fixes: 5a498172c8d0 ("drm/amd/display: Make DMCUB tracebuffer debugfs chronological") Cc: Leo Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: Show warning message if IH ring overflowPhilip Yang
If IH primary ring and KFD ih fifo overflows, we may miss CP, SDMA interrupts and cause application soft hang. Show warning message with ring name if overflow happens. Add function to get ih ring name to avoid duplicating it. To keep warning message consistent between GPU generations, change all *_ih.c except ASICs older than Vega which has only one ih ring. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdkfd: Improve signal event slow pathPhilip Yang
If event slot is not signaled, kfd_signal_event_interrupt goes to slow path to scan all event slots to find the signaled event, this is needed for old ASICs that don't have the event ID or the event IDs are incorrect in the IH payload. There is case that GPU signal the same event twice, then driver process the first event interrupt, set_event and event slot is auto-reset, then for the second event interrupt, KFD goes to slow path as event is not signaled, just drop the second event interrupt because the application only need wakeup once. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdkfd: Queue interrupt work to different CPUPhilip Yang
For CPX mode, each KFD node has interrupt worker to process ih_fifo to send events to user space. Currently all interrupt workers of same adev queue to same CPU, all workers execution are actually serialized and this cause KFD ih_fifo overflow when CPU usage is high. Use per-GPU unbounded highpri queue with number of workers equals to number of partitions, let queue_work select the next CPU round robin among the local CPUs of same NUMA. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: Optimize gfx v9 GPU page fault handlingPhilip Yang
After GPU page fault, there are lots of page fault interrupts generated at short period even with CAM filter enabled because the fault address is different. Each page fault copy to KFD ih fifo to send event to user space by KFD interrupt worker, this could cause KFD ih fifo overflow while other processes generate events at same time. KFD process is aborted after GPU page fault, we only need one GPU page fault interrupt sent to KFD ih fifo to send memory exception event to user space. Incease KFD ih fifo size to 2 times of IH primary ring size, to handle the burst events case. This patch handle the gfx v9 path, cover retry on/off and CAM filter on/off cases. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdkfd: KFD interrupt access ih_fifo data in-placePhilip Yang
To handle 40000 to 80000 interrupts per second running CPX mode with 4 streams/queues per KFD node, KFD interrupt handler becomes the performance bottleneck. Remove the kfifo_out memcpy overhead by accessing ih_fifo data in-place and updating rptr with kfifo_skip_count. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: partially revert "reduce reset time"Christian König
This partially reverts commit 194eb174cbe4fe2b3376ac30acca2dc8c8beca00. This commit introduced a new state variable into adev without even remotely worrying about CPU barriers. Since we already have the amdgpu_in_reset() function exactly for this use case partially revert that. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepareChristian König
As soon as the prepare phase is completed the VM might be released, better set it to NULL. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: fix amdgpu_coredumpChristian König
The VM pointer might already be outdated when that function is called. Use the PASID instead to gather the information instead. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: Enable psp v14_0_3 RAS support for non-SRIOV configurations.Candice Li
Enable psp v14_0_3 RAS support for non-SRIOV configurations. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: Don't enable sdma 4.4.5 CTXEMPTY interruptPhilip Yang
The sdma context empty interrupt is dropped in amdgpu_irq_dispatch as unregistered interrupt src_id 243, this interrupt accounts to 1/3 of total interrupts and causes IH primary ring overflow when running stressful benchmark application. Disable this interrupt has no side effect found. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu: Fix potential integer overflow in scheduler mask calculationsKarol Przybylski
The use of 1 << i in scheduler mask calculations can result in an unintentional integer overflow due to the expression being evaluated as a 32-bit signed integer. This patch replaces 1 << i with 1ULL << i to ensure the operation is performed as a 64-bit unsigned integer, preventing overflow Discovered in coverity scan, CID 1636393, 1636175, 1636007, 1635853 Fixes: c5c63d9cb5d3 ("drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs") Signed-off-by: Karol Przybylski <karprzy7@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18wifi: iwlwifi: mvm: Fix __counted_by usage in cfg80211_wowlan_nd_*Kees Cook
Both struct cfg80211_wowlan_nd_match and struct cfg80211_wowlan_nd_info pre-allocate space for channels and matches, but then may end up using fewer that the full allocation. Shrink the associated counter (n_channels and n_matches) after counting the results. This avoids compile-time (and run-time) warnings from __counted_by. (The counter member needs to be updated _before_ accessing the array index.) Seen with coming GCC 15: drivers/net/wireless/intel/iwlwifi/mvm/d3.c: In function 'iwl_mvm_query_set_freqs': drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2877:66: warning: operation on 'match->n_channels' may be undefined [-Wsequence-point] 2877 | match->channels[match->n_channels++] = | ~~~~~~~~~~~~~~~~~^~ drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2885:66: warning: operation on 'match->n_channels' may be undefined [-Wsequence-point] 2885 | match->channels[match->n_channels++] = | ~~~~~~~~~~~~~~~~~^~ drivers/net/wireless/intel/iwlwifi/mvm/d3.c: In function 'iwl_mvm_query_netdetect_reasons': drivers/net/wireless/intel/iwlwifi/mvm/d3.c:2982:58: warning: operation on 'net_detect->n_matches' may be undefined [-Wsequence-point] 2982 | net_detect->matches[net_detect->n_matches++] = match; | ~~~~~~~~~~~~~~~~~~~~~^~ Cc: stable@vger.kernel.org Fixes: aa4ec06c455d ("wifi: cfg80211: use __counted_by where appropriate") Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20240619211233.work.355-kees@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-12-18drm/amdgpu/smu14.0.2: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu/gfx12: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu/mmhub4.1: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu/nbio7.11: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu/nbio7.0: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amdgpu/nbio7.7: fix IP version checkAlex Deucher
Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amd/display: 3.2.314Aric Cyr
DC 3.2.314 contains some improvements as summarized below: * Update DML21 code. * Fixes for FAMS2 interface. * HDMI fixes. * Compilation warning fixes. Signed-off-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amd/display: Disable MPC rate control on ODM pipe updateGeorge Shen
[Why] Seamless boot skips MPC init for the active pipe, resulting in stale MPC rate control state being retained. This will cause issues since other logic assumes it is disabled (as DCN30 and newer does not need it). [How] Disable MPC rate control on ODM pipe update to cover the seamless boot case. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amd/display: Block Invalid TMDS operationChris Park
[Why] When sink type is TMDS, PHY programming does not block against pixel clock greater than 600MHz. [How] Based on sink type, block greater than 600MHz phy programming. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Chris Park <chris.park@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18drm/amd/display: Re-validate streams on commit_streamsDillon Varone
To prevent invalid HW programming, streams should be revalidated first before committing to HW. Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-18Revert "drm/amd/display: Fix green screen issue after suspend"Rodrigo Siqueira
This reverts commit 87b7ebc2e16c14d32a912f18206a4d6cc9abc3e8. A long time ago, we had an issue with the Raven system when it was connected to two displays: one with DP and another with HDMI. After the system woke up from suspension, we saw a solid green screen caused by an underflow generated by bad DCC metadata. To workaround this issue, the 'commit 87b7ebc2e16c ("drm/amd/display: Fix green screen issue after suspend")' was introduced to disable the DCC for a few frames after in the resume phase. However, in hindsight, this solution was probably a workaround at the kernel level for some issues from another part (probably other driver components or user space). After applying this patch and trying to reproduce the green issue in a similar hardware system but using the latest kernel and userspace, we cannot see the issue, which makes this workaround obsolete and creates extra unnecessary complexity to the code; for all of this reason, this commit reverts the original change. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>