summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-10platform/x86: dell_rbu: Stop overwriting data bufferStuart Hayes
The dell_rbu driver will use memset() to clear the data held by each packet when it is no longer needed (when the driver is unloaded, the packet size is changed, etc). The amount of memory that is cleared (before this patch) is the normal packet size. However, the last packet in the list may be smaller. Fix this to only clear the memory actually used by each packet, to prevent it from writing past the end of data buffer. Because the packet data buffers are allocated with __get_free_pages() (in page-sized increments), this bug could only result in a buffer being overwritten when a packet size larger than one page is used. The only user of the dell_rbu module should be the Dell BIOS update program, which uses a packet size of 4096, so no issues should be seen without the patch, it just blocks the possiblity. Fixes: 6c54c28e69f2 ("[PATCH] dell_rbu: new Dell BIOS update driver") Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Link: https://lore.kernel.org/r/20250609184659.7210-5-stuart.w.hayes@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-10platform/x86: dell_rbu: Fix list usageStuart Hayes
Pass the correct list head to list_for_each_entry*() when looping through the packet list. Without this patch, reading the packet data via sysfs will show the data incorrectly (because it starts at the wrong packet), and clearing the packet list will result in a NULL pointer dereference. Fixes: d19f359fbdc6 ("platform/x86: dell_rbu: don't open code list_for_each_entry*()") Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Link: https://lore.kernel.org/r/20250609184659.7210-3-stuart.w.hayes@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-10platform/x86: dell_rbu: Fix lock context warningStuart Hayes
Fix a sparse lock context warning. Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Link: https://lore.kernel.org/r/20250609184659.7210-2-stuart.w.hayes@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-10platform/x86/amd: pmf: Simplify error flow in amd_pmf_init_smart_pc()Mario Limonciello
commit 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()") adjusted the error handling flow to use a ladder but this isn't actually needed because work is only scheduled in amd_pmf_start_policy_engine() and with device managed cleanups pointers for allocations don't need to be freed. Adjust the error flow to a single call to amd_pmf_deinit_smart_pc() for the cases that need to clean up. Cc: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/20250512211154.2510397-4-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250522003457.1516679-4-superm1@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-10platform/x86/amd: pmf: Prevent amd_pmf_tee_deinit() from running twiceMario Limonciello
If any of the tee init fails, pass up the errors and clear the tee_ctx pointer. This will prevent cleaning up multiple times. Fixes: ac052d8c08f9d ("platform/x86/amd/pmf: Add PMF TEE interface") Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/20250512211154.2510397-3-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250522003457.1516679-3-superm1@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-10platform/x86/amd: pmf: Use device managed allocationsMario Limonciello
If setting up smart PC fails for any reason then this can lead to a double free when unloading amd-pmf. This is because dev->buf was freed but never set to NULL and is again freed in amd_pmf_remove(). To avoid subtle allocation bugs in failures leading to a double free change all allocations into device managed allocations. Fixes: 5b1122fc4995f ("platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()") Link: https://lore.kernel.org/r/20250512211154.2510397-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250522003457.1516679-2-superm1@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-06-10ALSA: sb: Force to disable DMAs once when DMA mode is changedTakashi Iwai
When the DMA mode is changed on the (still real!) SB AWE32 after playing a stream and closing, the previous DMA setup was still silently kept, and it can confuse the hardware, resulting in the unexpected noises. As a workaround, enforce the disablement of DMA setups when the DMA setup is changed by the kcontrol. https://bugzilla.kernel.org/show_bug.cgi?id=218185 Link: https://patch.msgid.link/20250610064322.26787-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-06-10ALSA: sb: Don't allow changing the DMA mode during operationsTakashi Iwai
When a PCM stream is already running, one shouldn't change the DMA mode via kcontrol, which may screw up the hardware. Return -EBUSY instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218185 Link: https://patch.msgid.link/20250610064322.26787-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-06-10ALSA: hda/realtek: Add quirk for Asus GU605CRichard Fitzgerald
The GU605C has similar audio hardware to the GU605M so apply the same quirk. Note that in the linked bugzilla there are two separate problems with the GU605C. This patch fixes one of the problems, so I haven't added a Closes: tag. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Reported-by: Nick Karaolidis <nick@karaolidis.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=220152 Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20250609102125.63196-1-rf@opensource.cirrus.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-06-10ALSA: hda/realtek: Fix built-in mic on ASUS VivoBook X513EAChris Chiu
The built-in mic of ASUS VivoBook X513EA is broken recently by the fix of the pin sort. The fixup ALC256_FIXUP_ASUS_MIC_NO_PRESENCE is working for addressing the regression, too. Fixes: 3b4309546b48 ("ALSA: hda: Fix headset detection failure due to unstable sort") Signed-off-by: Chris Chiu <chris.chiu@canonical.com> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20250610035607.690771-1-chris.chiu@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-06-09perf: Fix libjvmti.c sign compare errorYuzhuo Jing
Fix the compile errors when compiling with -Werror=sign-compare. This is a follow-up patch to a previous patch series for a separate issue. Link: https://lore.kernel.org/lkml/aC9lXhPFcs5fkHWH@x1/ Signed-off-by: Yuzhuo Jing <yuzhuo@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250604173632.2362759-1-yuzhuo@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf script: perf script tests fails with segfaultAditya Bodkhe
pert script tests fails with segmentation fault as below: 92: perf script tests: --- start --- test child forked, pid 103769 DB test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.012 MB /tmp/perf-test-script.7rbftEpOzX/perf.data (9 samples) ] /usr/libexec/perf-core/tests/shell/script.sh: line 35: 103780 Segmentation fault (core dumped) perf script -i "${perfdatafile}" -s "${db_test}" --- Cleaning up --- ---- end(-1) ---- 92: perf script tests : FAILED! Backtrace pointed to : #0 0x0000000010247dd0 in maps.machine () #1 0x00000000101d178c in db_export.sample () #2 0x00000000103412c8 in python_process_event () #3 0x000000001004eb28 in process_sample_event () #4 0x000000001024fcd0 in machines.deliver_event () #5 0x000000001025005c in perf_session.deliver_event () #6 0x00000000102568b0 in __ordered_events__flush.part.0 () #7 0x0000000010251618 in perf_session.process_events () #8 0x0000000010053620 in cmd_script () #9 0x00000000100b5a28 in run_builtin () #10 0x00000000100b5f94 in handle_internal_command () #11 0x0000000010011114 in main () Further investigation reveals that this occurs in the `perf script tests`, because it uses `db_test.py` script. This script sets `perf_db_export_mode = True`. With `perf_db_export_mode` enabled, if a sample originates from a hypervisor, perf doesn't set maps for "[H]" sample in the code. Consequently, `al->maps` remains NULL when `maps__machine(al->maps)` is called from `db_export__sample`. As al->maps can be NULL in case of Hypervisor samples , use thread->maps because even for Hypervisor sample, machine should exist. If we don't have machine for some reason, return -1 to avoid segmentation fault. Reported-by: Disha Goel <disgoel@linux.ibm.com> Signed-off-by: Aditya Bodkhe <aditya.b1@linux.ibm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Link: https://lore.kernel.org/r/20250429065132.36839-1-adityab1@linux.ibm.com Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09Merge tag 'powerpc-6.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - a couple of fixes for out of bounds issues in memtrace and vas Thanks to Ritesh Harjani (IBM), Haren Myneni, and Jonathan Greental * tag 'powerpc-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vas: Return -EINVAL if the offset is non-zero in mmap() powerpc/powernv/memtrace: Fix out of bounds issue in memtrace mmap
2025-06-10powerpc/vas: Return -EINVAL if the offset is non-zero in mmap()Haren Myneni
The user space calls mmap() to map VAS window paste address and the kernel returns the complete mapped page for each window. So return -EINVAL if non-zero is passed for offset parameter to mmap(). See Documentation/arch/powerpc/vas-api.rst for mmap() restrictions. Co-developed-by: Jonathan Greental <yonatan02greental@gmail.com> Signed-off-by: Jonathan Greental <yonatan02greental@gmail.com> Reported-by: Jonathan Greental <yonatan02greental@gmail.com> Fixes: dda44eb29c23 ("powerpc/vas: Add VAS user space API") Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250610021227.361980-2-maddy@linux.ibm.com
2025-06-10powerpc/powernv/memtrace: Fix out of bounds issue in memtrace mmapRitesh Harjani (IBM)
memtrace mmap issue has an out of bounds issue. This patch fixes the by checking that the requested mapping region size should stay within the allocated region size. Reported-by: Jonathan Greental <yonatan02greental@gmail.com> Fixes: 08a022ad3dfa ("powerpc/powernv/memtrace: Allow mmaping trace buffers") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250610021227.361980-1-maddy@linux.ibm.com
2025-06-09scsi: error: alua: I/O errors for ALUA state transitionsRajashekhar M A
When a host is configured with a few LUNs and I/O is running, injecting FC faults repeatedly leads to path recovery problems. The LUNs have 4 paths each and 3 of them come back active after say an FC fault which makes 2 of the paths go down, instead of all 4. This happens after several iterations of continuous FC faults. Reason here is that we're returning an I/O error whenever we're encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying. Signed-off-by: Rajashekhar M A <rajs@netapp.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20250606135924.27397-1-hare@kernel.org Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09scsi: storvsc: Increase the timeouts to storvsc_timeoutDexuan Cui
Currently storvsc_timeout is only used in storvsc_sdev_configure(), and 5s and 10s are used elsewhere. It turns out that rarely the 5s is not enough on Azure, so let's use storvsc_timeout everywhere. In case a timeout happens and storvsc_channel_init() returns an error, close the VMBus channel so that any host-to-guest messages in the channel's ringbuffer, which might come late, can be safely ignored. Add a "const" to storvsc_timeout. Cc: stable@kernel.org Signed-off-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1749243459-10419-1-git-send-email-decui@microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09drm/msm/adreno: Check for recognized GPU before bindRob Clark
If we have a newer dtb than kernel, we could end up in a situation where the GPU device is present in the dtb, but not in the drivers device table. We don't want this to prevent the display from probing. So check that we recognize the GPU before adding the GPU component. v2: use %pOF Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657701/
2025-06-09Merge tag 'for-net-2025-06-05' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - MGMT: Fix UAF on mgmt_remove_adv_monitor_complete - MGMT: Protect mgmt_pending list with its own lock - hci_core: fix list_for_each_entry_rcu usage - btintel_pcie: Increase the tx and rx descriptor count - btintel_pcie: Reduce driver buffer posting to prevent race condition - btintel_pcie: Fix driver not posting maximum rx buffers * tag 'for-net-2025-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: MGMT: Protect mgmt_pending list with its own lock Bluetooth: MGMT: Fix UAF on mgmt_remove_adv_monitor_complete Bluetooth: btintel_pcie: Reduce driver buffer posting to prevent race condition Bluetooth: btintel_pcie: Increase the tx and rx descriptor count Bluetooth: btintel_pcie: Fix driver not posting maximum rx buffers Bluetooth: hci_core: fix list_for_each_entry_rcu usage ==================== Link: https://patch.msgid.link/20250605191136.904411-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-09net_sched: sch_sfq: fix a potential crash on gso_skb handlingEric Dumazet
SFQ has an assumption of always being able to queue at least one packet. However, after the blamed commit, sch->q.len can be inflated by packets in sch->gso_skb, and an enqueue() on an empty SFQ qdisc can be followed by an immediate drop. Fix sfq_drop() to properly clear q->tail in this situation. Tested: ip netns add lb ip link add dev to-lb type veth peer name in-lb netns lb ethtool -K to-lb tso off # force qdisc to requeue gso_skb ip netns exec lb ethtool -K in-lb gro on # enable NAPI ip link set dev to-lb up ip -netns lb link set dev in-lb up ip addr add dev to-lb 192.168.20.1/24 ip -netns lb addr add dev in-lb 192.168.20.2/24 tc qdisc replace dev to-lb root sfq limit 100 ip netns exec lb netserver netperf -H 192.168.20.2 -l 100 & netperf -H 192.168.20.2 -l 100 & netperf -H 192.168.20.2 -l 100 & netperf -H 192.168.20.2 -l 100 & Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback") Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Closes: https://lore.kernel.org/netdev/9da42688-bfaa-4364-8797-e9271f3bdaef@hetzner-cloud.de/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250606165127.3629486-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-09smb: client: disable path remapping with POSIX extensionsPhilipp Kerling
If SMB 3.1.1 POSIX Extensions are available and negotiated, the client should be able to use all characters and not remap anything. Currently, the user has to explicitly request this behavior by specifying the "nomapposix" mount option. Link: https://lore.kernel.org/4195bb677b33d680e77549890a4f4dd3b474ceaf.camel@rx2.rx-server.de Signed-off-by: Philipp Kerling <pkerling@casix.org> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-09drm/msm/adreno: Pass device_node to find_chipid()Rob Clark
We are going to want to re-use this before the component is bound, when we don't yet have the device pointer (but we do have the of node). v2: use %pOF Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657705/
2025-06-09drm/msm: Rename add_components_mdp()Rob Clark
To better match add_gpu_components(). Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657700/
2025-06-09drivers: gpu: drm: msm: registers: improve reproducibilityRyan Eatmon
The files generated by gen_header.py capture the source path to the input files and the date. While that can be informative, it varies based on where and when the kernel was built as the full path is captured. Since all of the files that this tool is run on is under the drivers directory, this modifies the application to strip all of the path before drivers. Additionally it prints <stripped> instead of the date. Signed-off-by: Ryan Eatmon <reatmon@ti.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com> Signed-off-by: Viswanath Kraleti <viswanath.kraleti@oss.qualcomm.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/655599/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm/a7xx: Call CP_RESET_CONTEXT_STATEConnor Abbott
Calling this packet is necessary when we switch contexts because there are various pieces of state used by userspace to synchronize between BR and BV that are persistent across submits and we need to make sure that they are in a "safe" state when switching contexts. Otherwise a userspace submission in one context could cause another context to function incorrectly and hang, effectively a denial of service (although without leaking data). This was missed during initial a7xx bringup. Fixes: af66706accdf ("drm/msm/a6xx: Add skeleton A7xx support") Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654924/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Fix CP_RESET_CONTEXT_STATE bitfield namesConnor Abbott
Based on kgsl. Fixes: af66706accdf ("drm/msm/a6xx: Add skeleton A7xx support") Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654922/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09Merge branch '6.16/scsi-queue' into 6.16/scsi-fixesMartin K. Petersen
Pull in remaining fixes from queue branch. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09scsi: s390: zfcp: Ensure synchronous unit_addPeter Oberparleiter
Improve the usability of the unit_add sysfs attribute by ensuring that the associated FCP LUN scan processing is completed synchronously. This enables configuration tooling to consistently determine the end of the scan process to allow for serialization of follow-on actions. While the scan process associated with unit_add typically completes synchronously, it is deferred to an asynchronous background process if unit_add is used before initial remote port scanning has completed. This occurs when unit_add is used immediately after setting the associated FCP device online. To ensure synchronous unit_add processing, wait for remote port scanning to complete before initiating the FCP LUN scan. Cc: stable@vger.kernel.org Reviewed-by: M Nikhil <nikh1092@linux.ibm.com> Reviewed-by: Nihar Panda <niharp@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Nihar Panda <niharp@linux.ibm.com> Link: https://lore.kernel.org/r/20250603182252.2287285-2-niharp@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09scsi: iscsi: Fix incorrect error path labels for flashnode operationsAlok Tiwari
Correct the error handling goto labels used when host lookup fails in various flashnode-related event handlers: - iscsi_new_flashnode() - iscsi_del_flashnode() - iscsi_login_flashnode() - iscsi_logout_flashnode() - iscsi_logout_flashnode_sid() scsi_host_put() is not required when shost is NULL, so jumping to the correct label avoids unnecessary operations. These functions previously jumped to the wrong goto label (put_host), which did not match the intended cleanup logic. Use the correct exit labels (exit_new_fnode, exit_del_fnode, etc.) to ensure proper error handling. Also remove the unused put_host label under iscsi_new_flashnode() as it is no longer needed. No functional changes beyond accurate error path correction. Fixes: c6a4bb2ef596 ("[SCSI] scsi_transport_iscsi: Add flash node mgmt support") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://lore.kernel.org/r/20250530193012.3312911-1-alok.a.tiwari@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09scsi: mvsas: Fix typos in per-phy comments and SAS cmd port registersAnkit Chauhan
Spelling fixes: Deocder --> Decoder Memroy --> Memory This is a non-functional change aimed at improving code clarity. Signed-off-by: Ankit Chauhan <ankitchauhan2065@gmail.com> Link: https://lore.kernel.org/r/20250528110604.59528-1-ankitchauhan2065@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09drm/msm: Temporarily disable stall-on-fault after a page faultConnor Abbott
When things go wrong, the GPU is capable of quickly generating millions of faulting translation requests per second. When that happens, in the stall-on-fault model each access will stall until it wins the race to signal the fault and then the RESUME register is written. This slows processing page faults to a crawl as the GPU can generate faults much faster than the CPU can acknowledge them. It also means that all available resources in the SMMU are saturated waiting for the stalled transactions, so that other transactions such as transactions generated by the GMU, which shares translation resources with the GPU, cannot proceed. This causes a GMU watchdog timeout, which leads to a failed reset because GX cannot collapse when there is a transaction pending and a permanently hung GPU. On older platforms with qcom,smmu-v2, it seems that when one transaction is stalled subsequent faulting transactions are terminated, which avoids this problem, but the MMU-500 follows the spec here. To work around these problems, disable stall-on-fault as soon as we get a page fault until a cooldown period after pagefaults stop. This allows the GMU some guaranteed time to continue working. We only use stall-on-fault to halt the GPU while we collect a devcoredump and we always terminate the transaction afterward, so it's fine to miss some subsequent page faults. We also keep it disabled so long as the current devcoredump hasn't been deleted, because in that case we likely won't capture another one if there's a fault. After this commit HFI messages still occasionally time out, because the crashdump handler doesn't run fast enough to let the GMU resume, but the driver seems to recover from it. This will probably go away after the HFI timeout is increased. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654891/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Delete resume_translation()Connor Abbott
Unused since the previous commit. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654890/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Don't use a worker to capture fault devcoredumpConnor Abbott
Now that we use a threaded IRQ, it should be safe to do this in the fault handler. We can also remove fault_info from struct msm_gpu and just pass it directly. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654889/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Fix another leak in the submit error pathRob Clark
put_unused_fd() doesn't free the installed file, if we've already done fd_install(). So we need to also free the sync_file. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/653583/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Fix a fence leak in submit error pathRob Clark
In error paths, we could unref the submit without calling drm_sched_entity_push_job(), so msm_job_free() will never get called. Since drm_sched_job_cleanup() will NULL out the s_fence, we can use that to detect this case. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/653584/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09perf test trace: Change the regex pattern in the struct testHoward Chu
Ian mentioned a reliably occurred failure in the trace_btf_general test where he obtained trace output of: sleep/279619 clock_nanosleep(0, 0, {1,1,}, 0x7ffcd47b6450) = 0 But the regex pattern used for verification is "^sleep/[0-9]+ clock_nanosleep\(0, 0, \{1,\}, ..." This lead to a mismatch. The reason is, different sleep commands use different timespec data to call clock_nanosleep, on my machine, the value of tv_nsec is 0. ~~~ $ sudo /tmp/perf/perf trace -e clock_nanosleep -- sleep 1 0.000 (1000.196 ms): sleep/54261 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7ffe13529550) = 0 ~~~ While Ian had this trace log: ~~~ $ sudo /tmp/perf/perf trace -e clock_nanosleep -- sleep 1 0.000 (1000.208 ms): sleep/1710732 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 1 }, rmtp: 0x7ffc091f4090) = 0 ~~~ Because sleep's behavior of setting 'tv_nsec' is not certain, and tv_sec is most definitely 1, this patch relaxes the key regex pattern to '\{1,.*\}' for a better chance of matching. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250528191148.89118-7-howardchu95@gmail.com Reported-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf test trace: Use --sort-events in BTF general testsHoward Chu
Without the '--sort-events' flag, perf trace doesn't receive and process events based on their arrival time, thus PERF_RECORD_COMM event that assigns the correct comm to a PID, may be delivered and processed after regular samples, causing trace outputs not having a 'comm', e.g. 'mv', instead, having the default PID placeholder, e.g. ':14514'. Hopefully this answers Namhyung's question in [1]. You can simply justify the statement with this diff: [2]. Now, simply run this command multiple times: $ touch /tmp/file1 && sudo /tmp/perf trace -e renameat* -- mv /tmp/file1 /tmp/file2 && rm -f /tmp/file2 And you should see two types of results: $ touch /tmp/file1 && sudo /tmp/perf trace -e renameat* -- mv /tmp/file1 /tmp/file2 && rm -f /tmp/file2 [debug] deliver [debug] machine__process_comm_event [OVERRIDE] old :1221169 new mv str mv [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver 0.000 ( 0.013 ms): mv/1221169 renameat2(olddfd: CWD, oldname: "/tmp/file1", newdfd: CWD, newname: "/tmp/file2", flags: NOREPLACE) = 0 [debug] deliver $ touch /tmp/file1 && sudo /tmp/perf trace -e renameat* -- mv /tmp/file1 /tmp/file2 && rm -f /tmp/file2 [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver [debug] deliver 0.000 ( 0.014 ms): :1221398/1221398 renameat2(olddfd: CWD, oldname: "/tmp/file1", newdfd: CWD, newname: "/tmp/file2", flags: NOREPLACE) = 0 [debug] deliver [debug] deliver [debug] machine__process_comm_event [OVERRIDE] old :1221398 new mv str mv [debug] deliver [debug] deliver [debug] deliver Anyway, use --sort-events in BTF general tests to avoid :PID, a comm is preferred. [1]: https://lore.kernel.org/linux-perf-users/Z_AeswETE5xLcPT8@google.com/ [2]: https://gist.githubusercontent.com/Sberm/6b72b2a1cf1c62244f1f996481769baf/raw/529667bd74a2e7e1953bbd4be545bf875da8a3e7/unsorted.patch Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250528191148.89118-6-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf test trace: Remove set -e for BTF general testsHoward Chu
Remove set -e and print error messages in BTF general tests. Before: $ sudo /tmp/perf test btf -vv 108: perf trace BTF general tests: 108: perf trace BTF general tests : Running --- start --- test child forked, pid 889299 Checking if vmlinux BTF exists Testing perf trace's string augmentation String augmentation test failed ---- end(-1) ---- 108: perf trace BTF general tests : FAILED! After: $ sudo /tmp/perf test btf -vv 108: perf trace BTF general tests: 108: perf trace BTF general tests : Running --- start --- test child forked, pid 886551 Checking if vmlinux BTF exists Testing perf trace's string augmentation String augmentation test failed, output: :886566/886566 renameat2(CWD, "/tmp/file1_RcMa", CWD, "/tmp/file2_RcMa", NOREPLACE) = 0---- end(-1) ---- 108: perf trace BTF general tests : FAILED! Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250528191148.89118-5-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf test trace: Stop tracing hrtimer_setup event in trace enum testHoward Chu
The event 'timer:hrtimer_setup' is relatively new, for older kernels, perf trace enum tests won't run as the event 'timer:hrtimer_setup' cannot be found. It was originally called 'timer:hrtimer_init', before being renamed in: commit 244132c4e577 ("tracing/timers: Rename the hrtimer_init event to hrtimer_setup") Using timer:hrtimer_start should be enough for current testing, and hopefully 'start' won't be renamed in the future. Before: $ sudo /tmp/perf test enum -vv 107: perf trace enum augmentation tests: 107: perf trace enum augmentation tests : Running --- start --- test child forked, pid 786187 Checking if vmlinux exists Tracing syscall landlock_add_rule Tracing non-syscall tracepoint timer:hrtimer_setup,timer:hrtimer_start [tracepoint failure] Failed to trace timer:hrtimer_setup,timer:hrtimer_start tracepoint, output: event syntax error: 'timer:hrtimer_setup,timer:hrtimer_start' \___ unknown tracepoint Error: File /sys/kernel/tracing//events/timer/hrtimer_setup not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. Run 'perf list' for a list of valid events Usage: perf trace [<options>] [<command>] or: perf trace [<options>] -- <command> [<options>] or: perf trace record [<options>] [<command>] or: perf trace record [<options>] -- <command> [<options>] -e, --event <event> event/syscall selector. use 'perf list' to list available events ---- end(-1) ---- 107: perf trace enum augmentation tests : FAILED! After: $ sudo /tmp/perf test enum -vv 107: perf trace enum augmentation tests: 107: perf trace enum augmentation tests : Running --- start --- test child forked, pid 808547 Checking if vmlinux exists Tracing syscall landlock_add_rule Tracing non-syscall tracepoint timer:hrtimer_start ---- end(0) ---- 107: perf trace enum augmentation tests : Ok Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250528191148.89118-4-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf test trace: Remove set -e and print trace test's error messagesHoward Chu
Currently perf test utilizes the set -e option in shell that exit immediately if a command exits with a non-zero status, this prevents further error handling and introduces ambiguity. This patch removes set -e and prints the error message after invoking perf trace during perf tests. In my case, the command that exits with a non-zero status is perf trace instead of grep, because it can't find the 'timer:hrtimer_setup' tracepoint, see below. Before: $ sudo /tmp/perf test enum -vv 107: perf trace enum augmentation tests: 107: perf trace enum augmentation tests : Running --- start --- test child forked, pid 783533 Checking if vmlinux exists Tracing syscall landlock_add_rule Tracing non-syscall tracepoint syscall ---- end(-1) ---- 107: perf trace enum augmentation tests : FAILED! After: $ sudo /tmp/perf test enum -vv 107: perf trace enum augmentation tests: 107: perf trace enum augmentation tests : Running --- start --- test child forked, pid 851658 Checking if vmlinux exists Tracing syscall landlock_add_rule Tracing non-syscall tracepoint timer:hrtimer_setup,timer:hrtimer_start [tracepoint failure] Failed to trace tracepoint timer:hrtimer_setup,timer:hrtimer_start, output: event syntax error: 'timer:hrtimer_setup,timer:hrtimer_start' \___ unknown tracepoint Error: File /sys/kernel/tracing//events/timer/hrtimer_setup not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. Run 'perf list' for a list of valid events Usage: perf trace [<options>] [<command>] or: perf trace [<options>] -- <command> [<options>] or: perf trace record [<options>] [<command>] or: perf trace record [<options>] -- <command> [<options>] -e, --event <event> event/syscall selector. use 'perf list' to list available events---- end(-1) ---- 107: perf trace enum augmentation tests : FAILED! Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250528191148.89118-3-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf test trace: Use shell's -f flag to check if vmlinux existsHoward Chu
To match the style of the existing codebase, no functional changes were applied. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250528191148.89118-2-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf trace: Remove --map-dump documentationHoward Chu
The --map-dump option was removed in 5e6da6be3082 ("perf trace: Migrate BPF augmentation to use a skeleton"), this patch removes its remaining documentation. Fixes: 5e6da6be3082 ("perf trace: Migrate BPF augmentation to use a skeleton") Signed-off-by: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250601173252.717780-1-howardchu95@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09tools/build: Remove some unused libbpf pre-1.0 feature test logicIan Rogers
Commit 76a97cf2e169 ("perf build: Remove libbpf pre-1.0 feature tests") removed the libbpf feature test logic used by perf in favor of using LIBBPF_MAJOR_VERSION. Remove some build targets that should have been removed as part of that clean up. Fixes: 76a97cf2e169 ("perf build: Remove libbpf pre-1.0 feature tests") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250603221358.2562167-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf thread_map: Remove uid optionsIan Rogers
Now the target doesn't have a uid, it is handled through BPF filters, remove the uid options to thread_map creation. Tidy up the functions used in tests to avoid passing unused arguments. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250604174545.2853620-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf target: Remove uid from targetIan Rogers
Gathering threads with a uid by scanning /proc is inherently racy leading to perf_event_open failures that quit perf. All users of the functionality now use BPF filters, so remove uid and uid_str from target. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250604174545.2853620-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf bench evlist-open-close: Switch user option to use BPF filterIan Rogers
Finding user processes by scanning /proc is inherently racy and results in perf_event_open failures. Use a BPF filter to drop samples where the uid doesn't match. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250604174545.2853620-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf trace: Switch user option to use BPF filterIan Rogers
Finding user processes by scanning /proc is inherently racy and results in perf_event_open failures. Use a BPF filter to drop samples where the uid doesn't match. Ensure adding the BPF filter forces system-wide. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250604174545.2853620-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf top: Switch user option to use BPF filterIan Rogers
Finding user processes by scanning /proc is inherently racy and results in perf_event_open failures. Use a BPF filter to drop samples where the uid doesn't match. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250604174545.2853620-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf tests record: Add basic uid filtering testIan Rogers
Based on the system-wide test with changes around how failure is handled as BPF permissions are a bigger issue than perf event paranoia. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250604174545.2853620-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-06-09perf record: Switch user option to use BPF filterIan Rogers
Finding user processes by scanning /proc is inherently racy and results in perf_event_open failures. Use a BPF filter to drop samples where the uid doesn't match. Ensure adding the BPF filter forces system-wide. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250604174545.2853620-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>