summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-28rv: Replace tss and sncid monitors with more complete stsGabriele Monaco
The tss monitor currently guarantees task switches can happen only while scheduling, whereas the sncid monitor enforces scheduling occurs with interrupt disabled. Replace the monitors with a more comprehensive specification which implies both but also ensures that: * each scheduler call disable interrupts to switch * each task switch happens with interrupts disabled Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nam Cao <namcao@linutronix.de> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250728135022.255578-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28sched: Adapt sched tracepoints for RV task modelGabriele Monaco
Add the following tracepoint: * sched_set_need_resched(tsk, cpu, tif) Called when a task is set the need resched [lazy] flag Remove the unused ip parameter from sched_entry and sched_exit and alter sched_entry to have a value of preempt consistent with the one used in sched_switch. Also adapt all monitors using sched_{entry,exit} to avoid breaking build. These tracepoints are useful to describe the Linux task model and are adapted from the patches by Daniel Bristot de Oliveira (https://bristot.me/linux-task-model/). Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nam Cao <namcao@linutronix.de> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250728135022.255578-7-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28rv: Retry when da monitor detects race conditionsGabriele Monaco
DA monitor can be accessed from multiple cores simultaneously, this is likely, for instance when dealing with per-task monitors reacting on events that do not always occur on the CPU where the task is running. This can cause race conditions where two events change the next state and we see inconsistent values. E.g.: [62] event_srs: 27: sleepable x sched_wakeup -> running (final) [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable [63] error_srs: 27: event sched_switch_suspend not expected in the state running In this case the monitor fails because the event on CPU 62 wins against the one on CPU 63, although the correct state should have been sleepable, since the task get suspended. Detect if the current state was modified by using try_cmpxchg while storing the next value. If it was, try again reading the current state. After a maximum number of failed retries, react by calling a special tracepoint, print on the console and reset the monitor. Remove the functions da_monitor_curr_state() and da_monitor_set_state() as they only hide the underlying implementation in this case. Monitors where this type of condition can occur must be able to account for racing events in any possible order, as we cannot know the winner. Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250728135022.255578-6-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28rv: Adjust monitor dependenciesGabriele Monaco
RV monitors relying on the preemptirqs tracepoints are set as dependent on PREEMPT_TRACER and IRQSOFF_TRACER. In fact, those configurations do enable the tracepoints but are not the minimal configurations enabling them, which are TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS (not selectable manually). Set TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS as dependencies for monitors. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250728135022.255578-5-gmonaco@redhat.com Fixes: fbe6c09b7eb4 ("rv: Add scpd, snep and sncid per-cpu monitors") Acked-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28rv: Use strings in da monitors tracepointsGabriele Monaco
Using DA monitors tracepoints with KASAN enabled triggers the following warning: BUG: KASAN: global-out-of-bounds in do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0 Read of size 32 at addr ffffffffaada8980 by task ... Call Trace: <TASK> [...] do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0 ? __pfx_do_trace_event_raw_event_event_da_monitor+0x10/0x10 ? trace_event_sncid+0x83/0x200 trace_event_sncid+0x163/0x200 [...] The buggy address belongs to the variable: automaton_snep+0x4e0/0x5e0 This is caused by the tracepoints reading 32 bytes __array instead of __string from the automata definition. Such strings are literals and reading 32 bytes ends up in out of bound memory accesses (e.g. the next automaton's data in this case). The error is harmless as, while printing the string, we stop at the null terminator, but it should still be fixed. Use the __string facilities while defining the tracepoints to avoid reading out of bound memory. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250728135022.255578-4-gmonaco@redhat.com Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros") Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28rv: Remove trailing whitespace from tracepoint stringGabriele Monaco
RV event tracepoints print a line with the format: "event_xyz: S0 x event -> S1 " "event_xyz: S1 x event -> S0 (final)" While printing an event leading to a non-final state, the line has a trailing white space (visible above before the closing "). Adapt the format string not to print the trailing whitespace if we are not printing "(final)". Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250728135022.255578-3-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28rv: Add da_handle_start_run_event_ to per-task monitorsGabriele Monaco
The RV da_monitor API allows to start monitors in two ways: da_handle_start_event_NAME and da_handle_start_run_event_NAME. The former is used when the event is followed by the initial state of the module, so we ignore the event but we know the monitor is in the initial state and can start monitoring, the latter can be used if the event can only occur in the initial state, so we do handle the event as if the monitor was in the initial state. This latter API is defined for implicit monitors but not per-task ones. Define da_handle_start_run_event_NAME macro also for per-task monitors. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250728135022.255578-2-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28Merge tag 'vfs-6.17-rc1.mmap_prepare' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_*mmap() users to .mmap_prepare() fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28drm/amdgpu: update mmhub 4.1.0 client id mappingsAlex Deucher
Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28Merge tag 'vfs-6.17-rc1.fallocate' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fallocate updates from Christian Brauner: "fallocate() currently supports creating preallocated files efficiently. However, on most filesystems fallocate() will preallocate blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified. The extent state must later be converted to a written state when the user writes data into this range, which can trigger numerous metadata changes and journal I/O. This may leads to significant write amplification and performance degradation in synchronous write mode. At the moment, the only method to avoid this is to create an empty file and write zero data into it (for example, using 'dd' with a large block size). However, this method is slow and consumes a considerable amount of disk bandwidth. Now that more and more flash-based storage devices are available it is possible to efficiently write zeros to SSDs using the unmap write zeroes command if the devices do not write physical zeroes to the media. For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support the DEAC bit[1], the write zeroes command does not write actual data to the device, instead, NVMe converts the zeroed range to a deallocated state, which works fast and consumes almost no disk write bandwidth. This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices. fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a way that subsequent writes to that range do not require further changes to the file mapping metadata. This flag is beneficial for subsequent pure overwriting within this range, as it can save on block allocation and, consequently, significant metadata changes" * tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ext4: add FALLOC_FL_WRITE_ZEROES support block: add FALLOC_FL_WRITE_ZEROES support block: factor out common part in blkdev_fallocate() fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate dm: clear unmap write zeroes limits when disabling write zeroes scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP nvmet: set WZDS and DRB if device enables unmap write zeroes operation nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits
2025-07-28Merge tag 'vfs-6.17-rc1.async.dir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull async directory updates from Christian Brauner: "This contains preparatory changes for the asynchronous directory locking scheme. While the locking scheme is still very much controversial and we're still far away from landing any actual changes in that area the preparatory work that we've been upstreaming for a while now has been very useful. This is another set of minor changes and cleanups" * tag 'vfs-6.17-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: exportfs: use lookup_one_unlocked() coda: use iterate_dir() in coda_readdir() VFS: Minor fixes for porting.rst VFS: merge lookup_one_qstr_excl_raw() back into lookup_one_qstr_excl()
2025-07-28drm/amd/display: Allow DCN301 to clear update flagsIvan Lipski
[Why & How] Not letting DCN301 to clear after surface/stream update results in artifacts when switching between active overlay planes. The issue is known and has been solved initially. See below: (https://gitlab.freedesktop.org/drm/amd/-/issues/3441) Fixes: f354556e29f4 ("drm/amd/display: limit clear_update_flags t dcn32 and above") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-28drm/amd/display: Pass up errors for reset GPU that fails to init HWMario Limonciello
[Why] If a GPU is in reset and the hardware fails to initialize the rest of the resume sequence shouldn't be run. [How] Pass error code up to caller of dm_resume(). Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amd/display: Only finalize atomic_obj if it was initializedMario Limonciello
[Why] If amdgpu_dm failed to initalize before amdgpu_dm_initialize_drm_device() completed then freeing atomic_obj will lead to list corruption. [How] Check if atomic_obj state is initialized before trying to free. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-28drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supportedMario Limonciello
[Why] If PSR-SU is disabled on the link, then configuring su_y granularity in mod_power_calc_psr_configs() can lead to assertions in psr_su_set_dsc_slice_height(). [How] Check the PSR version in amdgpu_dm_link_setup_psr() to determine whether or not to configure granularity. Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-28drm/amd/display: Disable dsc_power_gate for dcn314 by defaultRoman Li
[Why] "REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line" warnings seen after resuming from s2idle. DCN314 has issues with DSC power gating that cause REG_WAIT timeouts when attempting to power down DSC blocks. [How] Disable dsc_power_gate for dcn314 by default. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-28drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14Frank Min
1. Add kicker firmwares loading for gfx12/smu14/psp14 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_12_0_1_rlc_kicker.bin - gc_12_0_1_imu_kicker.bin - psp_14_0_3_sos_kicker.bin - psp_14_0_3_ta_kicker.bin - smu_14_0_3_kicker.bin Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Gui Chengming <Jack.Gui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr accessYang Wang
Add lock protection for 'ring->wptr'/'ring->rptr' to ensure the correct execution. Fixes: 8652920d2c00 ("drm/amdgpu: add mutex lock for cper ring") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c'Srinivasan Shanmugam
Fix the comment style before cntl_stuck_hw_workaround() by replacing '/**' with '/*' since it is not a kdoc comment. Fixes the below with gcc W=1: display/dc/dce/dce_i2c_hw.c:380: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * If we boot without an HDMI display, the I2C engine does not get initialized Fixes: 04d57f4462a6 ("drm/amd/display: Workaround for stuck I2C arbitrage") Cc: Alvin Lee <alvin.lee2@amd.com> Cc: Dominik Kaszewski <dominik.kaszewski@amd.com> Cc: Ivan Lipski <ivan.lipski@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-28drm/amd/display: fix initial backlight brightness calculationLauri Tirkkonen
DIV_ROUND_CLOSEST(x, 100) returns either 0 or 1 if 0<x<=100, so the division needs to be performed after the multiplication and not the other way around, to properly scale the value. Fixes: 8b5f3a229a70 ("drm/amd/display: Fix default DC and AC levels") Signed-off-by: Lauri Tirkkonen <lauri@hacktheplanet.fi> Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/aH2Q_HJvxKbW74vU@hacktheplanet.fi Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-28drm/amdgpu: Avoid extra evict-restore process.Gang Ba
If vm belongs to another process, this is fclose after fork, wait may enable signaling KFD eviction fence and cause parent process queue evicted. [677852.634569] amdkfd_fence_enable_signaling+0x56/0x70 [amdgpu] [677852.634814] __dma_fence_enable_signaling+0x3e/0xe0 [677852.634820] dma_fence_wait_timeout+0x3a/0x140 [677852.634825] amddma_resv_wait_timeout+0x7f/0xf0 [amdkcl] [677852.634831] amdgpu_vm_wait_idle+0x2d/0x60 [amdgpu] [677852.635026] amdgpu_flush+0x34/0x50 [amdgpu] [677852.635208] filp_flush+0x38/0x90 [677852.635213] filp_close+0x14/0x30 [677852.635216] do_close_on_exec+0xdd/0x130 [677852.635221] begin_new_exec+0x1da/0x490 [677852.635225] load_elf_binary+0x307/0xea0 [677852.635231] ? srso_alias_return_thunk+0x5/0xfbef5 [677852.635235] ? ima_bprm_check+0xa2/0xd0 [677852.635240] search_binary_handler+0xda/0x260 [677852.635245] exec_binprm+0x58/0x1a0 [677852.635249] bprm_execve.part.0+0x16f/0x210 [677852.635254] bprm_execve+0x45/0x80 [677852.635257] do_execveat_common.isra.0+0x190/0x200 Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Gang Ba <Gang.Ba@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_propAlex Deucher
Used to to set the MQD appropriately for each queue type. Kernel queues have additional privileges. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.16.x
2025-07-28drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilitiesPeter Shkenev
HUBBUB structure is not initialized on DCE hardware, so check if it is NULL to avoid null dereference while accessing amdgpu_dm_capabilities file in debugfs. Signed-off-by: Peter Shkenev <mustela@erminea.space> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()Nathan Chancellor
After a recent change in clang to expose uninitialized warnings from const variables and pointers [1], there is a warning in imu_v12_0_program_rlc_ram() because data is passed uninitialized to program_imu_rlc_ram(): drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:374:30: error: variable 'data' is uninitialized when used here [-Werror,-Wuninitialized] 374 | program_imu_rlc_ram(adev, data, (const u32)size); | ^~~~ As this warning happens early in clang's frontend, it does not realize that due to the assignment of r to -EINVAL, program_imu_rlc_ram() is never actually called, and even if it were, data would not be dereferenced because size is 0. Just initialize data to NULL to silence the warning, as the commit that added program_imu_rlc_ram() mentioned it would eventually be used over the old method, at which point data can be properly initialized and used. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2107 Fixes: 56159fffaab5 ("drm/amdgpu: use new method to program rlc ram") Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-28drm/amd/display: Fix divide by zero when calculating min ODM factorDillon Varone
[WHY&HOW] If the debug option is set to disable_dsc the max slice width and/or dispclk can be zero. This causes a divide by zero when calculating the min ODM combine factor. Add a check to ensure they are valid first. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-07-28Merge tag 'vfs-6.17-rc1.nsfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains namespace updates. This time specifically for nsfs: - Userspace heavily relies on the root inode numbers for namespaces to identify the initial namespaces. That's already a hard dependency. So we cannot change that anymore. Move the initial inode numbers to a public header and align the only two namespaces that currently don't do that with all the other namespaces. - The root inode of /proc having a fixed inode number has been part of the core kernel ABI since its inception, and recently some userspace programs (mainly container runtimes) have started to explicitly depend on this behaviour. The main reason this is useful to userspace is that by checking that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is PROCFS_ROOT_INO, they can then use openat2() together with RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a bind-mount that replaces some procfs file with a different one. This kind of attack has lead to security issues in container runtimes in the past (such as CVE-2019-19921) and libraries like libpathrs[1] use this feature of procfs to provide safe procfs handling functions" * tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: uapi: export PROCFS_ROOT_INO mntns: use stable inode number for initial mount ns netns: use stable inode number for initial mount ns nsfs: move root inode number to uapi
2025-07-28Merge tag 'vfs-6.17-rc1.ovl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull overlayfs updates from Christian Brauner: "This contains overlayfs updates for this cycle. The changes for overlayfs in here are primarily focussed on preparing for some proposed changes to directory locking. Overlayfs currently will sometimes lock a directory on the upper filesystem and do a few different things while holding the lock. This is incompatible with the new potential scheme. This series narrows the region of code protected by the directory lock, taking it multiple times when necessary. This theoretically opens up the possibilty of other changes happening on the upper filesytem between the unlock and the lock. To some extent the patches guard against that by checking the dentries still have the expect parent after retaking the lock. In general, concurrent changes to the upper and lower filesystems aren't supported properly anyway" * tag 'vfs-6.17-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) ovl: properly print correct variable ovl: rename ovl_cleanup_unlocked() to ovl_cleanup() ovl: change ovl_create_real() to receive dentry parent ovl: narrow locking in ovl_check_rename_whiteout() ovl: narrow locking in ovl_whiteout() ovl: change ovl_cleanup_and_whiteout() to take rename lock as needed ovl: narrow locking on ovl_remove_and_whiteout() ovl: change ovl_workdir_cleanup() to take dir lock as needed. ovl: narrow locking in ovl_workdir_cleanup_recurse() ovl: narrow locking in ovl_indexdir_cleanup() ovl: narrow locking in ovl_workdir_create() ovl: narrow locking in ovl_cleanup_index() ovl: narrow locking in ovl_cleanup_whiteouts() ovl: narrow locking in ovl_rename() ovl: simplify gotos in ovl_rename() ovl: narrow locking in ovl_create_over_whiteout() ovl: narrow locking in ovl_clear_empty() ovl: narrow locking in ovl_create_upper() ovl: narrow the locked region in ovl_copy_up_workdir() ovl: Call ovl_create_temp() without lock held. ...
2025-07-28Merge tag 'vfs-6.17-rc1.coredump' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull coredump updates from Christian Brauner: "This contains an extension to the coredump socket and a proper rework of the coredump code. - This extends the coredump socket to allow the coredump server to tell the kernel how to process individual coredumps. This allows for fine-grained coredump management. Userspace can decide to just let the kernel write out the coredump, or generate the coredump itself, or just reject it. * COREDUMP_KERNEL The kernel will write the coredump data to the socket. * COREDUMP_USERSPACE The kernel will not write coredump data but will indicate to the parent that a coredump has been generated. This is used when userspace generates its own coredumps. * COREDUMP_REJECT The kernel will skip generating a coredump for this task. * COREDUMP_WAIT The kernel will prevent the task from exiting until the coredump server has shutdown the socket connection. The flexible coredump socket can be enabled by using the "@@" prefix instead of the single "@" prefix for the regular coredump socket: @@/run/systemd/coredump.socket - Cleanup the coredump code properly while we have to touch it anyway. Split out each coredump mode in a separate helper so it's easy to grasp what is going on and make the code easier to follow. The core coredump function should now be very trivial to follow" * tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits) cleanup: add a scoped version of CLASS() coredump: add coredump_skip() helper coredump: avoid pointless variable coredump: order auto cleanup variables at the top coredump: add coredump_cleanup() coredump: auto cleanup prepare_creds() cred: add auto cleanup method coredump: directly return coredump: auto cleanup argv coredump: add coredump_write() coredump: use a single helper for the socket coredump: move pipe specific file check into coredump_pipe() coredump: split pipe coredumping into coredump_pipe() coredump: move core_pipe_count to global variable coredump: prepare to simplify exit paths coredump: split file coredumping into coredump_file() coredump: rename do_coredump() to vfs_coredump() selftests/coredump: make sure invalid paths are rejected coredump: validate socket path in coredump_parse() coredump: don't allow ".." in coredump socket path ...
2025-07-28Merge tag 'vfs-6.17-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc VFS updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add ext4 IOCB_DONTCACHE support This refactors the address_space_operations write_begin() and write_end() callbacks to take const struct kiocb * as their first argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the filesystem's buffered I/O path. Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and advertises support via the FOP_DONTCACHE file operation flag. Additionally, the i915 driver's shmem write paths are updated to bypass the legacy write_begin/write_end interface in favor of directly calling write_iter() with a constructed synchronous kiocb. Another i915 change replaces a manual write loop with kernel_write() during GEM shmem object creation. Cleanups: - don't duplicate vfs_open() in kernel_file_open() - proc_fd_getattr(): don't bother with S_ISDIR() check - fs/ecryptfs: replace snprintf with sysfs_emit in show function - vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() - filelock: add new locks_wake_up_waiter() helper - fs: Remove three arguments from block_write_end() - VFS: change old_dir and new_dir in struct renamedata to dentrys - netfs: Remove unused declaration netfs_queue_write_request() Fixes: - eventpoll: Fix semi-unbounded recursion - eventpoll: fix sphinx documentation build warning - fs/read_write: Fix spelling typo - fs: annotate data race between poll_schedule_timeout() and pollwake() - fs/pipe: set FMODE_NOWAIT in create_pipe_files() - docs/vfs: update references to i_mutex to i_rwsem - fs/buffer: remove comment about hard sectorsize - fs/buffer: remove the min and max limit checks in __getblk_slow() - fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable - fs_context: fix parameter name in infofc() macro - fs: Prevent file descriptor table allocations exceeding INT_MAX" * tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits) netfs: Remove unused declaration netfs_queue_write_request() eventpoll: fix sphinx documentation build warning ext4: support uncached buffered I/O mm/pagemap: add write_begin_get_folio() helper function fs: change write_begin/write_end interface to take struct kiocb * drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter drm/i915: Use kernel_write() in shmem object create eventpoll: Fix semi-unbounded recursion vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable fs/buffer: remove the min and max limit checks in __getblk_slow() fs: Prevent file descriptor table allocations exceeding INT_MAX fs: Remove three arguments from block_write_end() fs/ecryptfs: replace snprintf with sysfs_emit in show function fs: annotate suspected data race between poll_schedule_timeout() and pollwake() docs/vfs: update references to i_mutex to i_rwsem fs/buffer: remove comment about hard sectorsize fs_context: fix parameter name in infofc() macro VFS: change old_dir and new_dir in struct renamedata to dentrys proc_fd_getattr(): don't bother with S_ISDIR() check ...
2025-07-28ASoC: SOF: amd: acp-loader: Use GFP_KERNEL for DMA allocations in resume contextMuhammad Usama Anjum
Replace GFP_ATOMIC with GFP_KERNEL for dma_alloc_coherent() calls. This change improves memory allocation reliability during firmware loading, particularly during system resume when memory pressure is high. Because of using GFP_KERNEL, reclaim can happen which can reduce the probability of failure. Fixes memory allocation failures observed during system resume with fragmented memory conditions. snd_sof_amd_vangogh 0000:04:00.5: error: failed to load DSP firmware after resume -12 Fixes: 145d7e5ae8f4e ("ASoC: SOF: amd: add option to use sram for data bin loading") Fixes: 7e51a9e38ab20 ("ASoC: SOF: amd: Add fw loader and renoir dsp ops to load firmware") Cc: stable@vger.kernel.org Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://patch.msgid.link/20250725190254.1081184-1-usama.anjum@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-07-28Merge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs mount updates from Al Viro: - mount hash conflicts rudiments are gone now - we do not allow multiple mounts with the same parent/mountpoint to be hashed at the same time. - 'struct mount' changes: - mnt_umounting is gone - mnt_slave_list/mnt_slave is an hlist now - overmounts are kept track of by explicit pointer in mount - a bunch of flags moved out of mnt_flags to a new field, with only namespace_sem for protection - mnt_expiry is protected by mount_lock now (instead of namespace_sem) - MNT_LOCKED is used only for mounts that need to remain attached to their parents to prevent mountpoint exposure - no more overloading it for absolute roots - all mnt_list uses are transient now - it's used only to represent temporary sets during umount_tree() - mount refcounting change: children no longer pin parents for any mounts, whether they'd passed through umount_tree() or not - 'struct mountpoint' changes: - refcount is no more; what matters is ->m_list emptiness - instead of temporary bumping the refcount, we insert a new object (pinned_mountpoint) into ->m_list - new calling conventions for lock_mount() and friends - do_move_mount()/attach_recursive_mnt() seriously cleaned up - globals in fs/pnode.c are gone - propagate_mnt(), change_mnt_propagation() and propagate_umount() cleaned up (in the last case - pretty much completely rewritten). - freeing of emptied mnt_namespace is done in namespace_unlock(). For one thing, there are subtle ordering requirements there; for another it simplifies cleanups. - assorted cleanups - restore the machinery for long-term mounts from accumulated bitrot. This is going to get a followup come next cycle, when the change of vfs_fs_parse_string() calling conventions goes into -next * tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (48 commits) statmount_mnt_basic(): simplify the logics for group id invent_group_ids(): zero ->mnt_group_id always implies !IS_MNT_SHARED() get rid of CL_SHARE_TO_SLAVE take freeing of emptied mnt_namespace to namespace_unlock() copy_tree(): don't link the mounts via mnt_list change_mnt_propagation(): move ->mnt_master assignment into MS_SLAVE case mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node turn do_make_slave() into transfer_propagation() do_make_slave(): choose new master sanely change_mnt_propagation(): do_make_slave() is a no-op unless IS_MNT_SHARED() change_mnt_propagation() cleanups, step 1 propagate_mnt(): fix comment and convert to kernel-doc, while we are at it propagate_mnt(): get rid of last_dest fs/pnode.c: get rid of globals propagate_one(): fold into the sole caller propagate_one(): separate the "what should be the master for this copy" part propagate_one(): separate the "do we need secondary here?" logics propagate_mnt(): handle all peer groups in the same loop propagate_one(): get rid of dest_master mount: separate the flags accessed only under namespace_sem ...
2025-07-28Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull CLASS(fd) update from Al Viro: "A missing bit of commit 66635b077624 ("assorted variants of irqfd setup: convert to CLASS(fd)") from a year ago. mshv_eventfd would've been covered by that, but it had forked slightly before that series and got merged into mainline later" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mshv_eventfd: convert to CLASS(fd)
2025-07-28Merge tag 'pull-ceph-d_name-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ceph dentry->d_name fixes from Al Viro: "Stuff that had fallen through the cracks back in February; ceph folks tested that pile and said they prefer to have it go through my tree..." * tag 'pull-ceph-d_name-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ceph: fix a race with rename() in ceph_mdsc_build_path() prep for ceph_encode_encrypted_fname() fixes [ceph] parse_longname(): strrchr() expects NUL-terminated string
2025-07-28Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc VFS updates from Al Viro: "VFS-related cleanups in various places (mostly of the "that really can't happen" or "there's a better way to do it" variety)" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: gpib: use file_inode() binder_ioctl_write_read(): simplify control flow a bit secretmem: move setting O_LARGEFILE and bumping users' count to the place where we create the file apparmor: file never has NULL f_path.mnt landlock: opened file never has a negative dentry
2025-07-28Merge tag 'pull-securityfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull securityfs updates from Al Viro: "Securityfs cleanups and fixes: - one extra reference is enough to pin a dentry down; no need for two. Switch to regular scheme, similar to shmem, debugfs, etc. This fixes a securityfs_recursive_remove() dentry leak, among other things. - we need to have the filesystem pinned to prevent the contents disappearing; what we do not need is pinning it for each file. Doing that only for files and directories in the root is enough. - the previous two changes allow us to get rid of the racy kludges in efi_secret_unlink(), where we can use simple_unlink() instead of securityfs_remove(). Which does not require unlocking and relocking the parent, with all deadlocks that invites. - Make securityfs_remove() take the entire subtree out, turning securityfs_recursive_remove() into its alias. Makes a lot more sense for callers and fixes a mount leak, while we are at it. - Making securityfs_remove() remove the entire subtree allows for much simpler life in most of the users - efi_secret, ima_fs, evm, ipe, tmp get cleaner. I hadn't touched apparmor use of securityfs, but I suspect that it would be useful there as well" * tag 'pull-securityfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: tpm: don't bother with removal of files in directory we'll be removing ipe: don't bother with removal of files in directory we'll be removing evm_secfs: clear securityfs interactions ima_fs: get rid of lookup-by-dentry stuff ima_fs: don't bother with removal of files in directory we'll be removing efi_secret: clean securityfs use up make securityfs_remove() remove the entire subtree fix locking in efi_secret_unlink() securityfs: pin filesystem only for objects directly in root securityfs: don't pin dentries twice, once is enough...
2025-07-28bpf: Fix various typos in verifier.c commentsSuchit Karunakaran
This patch fixes several minor typos in comments within the BPF verifier. No changes in functionality. Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Link: https://lore.kernel.org/r/20250727081754.15986-1-suchitkarunakaran@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28Merge branch 'bpf-improve-64bits-bounds-refinement'Alexei Starovoitov
Paul Chaignon says: ==================== bpf: Improve 64bits bounds refinement This patchset improves the 64bits bounds refinement when the s64 ranges crosses the sign boundary. The first patch explains the small addition to __reg64_deduce_bounds. The last one explains why we need a third round of __reg_deduce_bounds. The third patch adds a selftest with a more complete example of the impact on verification. The second and fourth patches update the existing selftests to take the new refinement into account. This patchset should reduce the number of kernel warnings hit by syzkaller due to invariant violations [1]. It was also tested with Agni [2] (and Cilium's CI for good measure). Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1] Link: https://github.com/bpfverif/agni [2] Changes in v4: - Fixed outdated test comment, noticed by Eduard. - Rebased. Changes in v3: - Added a 5th patch to call __reg_deduce_bounds a third time in reg_bounds_sync following tests from Eduard. - Fixed broken indentations in the first patch. Changes in v2 (all on Eduard's suggestions): - Added two tests to ensure we cover all cases of u64/s64 overlap. - Improved tests to check deduced ranges with __msg. - Improved code comments. ==================== Link: https://patch.msgid.link/cover.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28bpf: Add third round of bounds deductionPaul Chaignon
Commit d7f008738171 ("bpf: try harder to deduce register bounds from different numeric domains") added a second call to __reg_deduce_bounds in reg_bounds_sync because a single call wasn't enough to converge to a fixed point in terms of register bounds. With patch "bpf: Improve bounds when s64 crosses sign boundary" from this series, Eduard noticed that calling __reg_deduce_bounds twice isn't enough anymore to converge. The first selftest added in "selftests/bpf: Test cross-sign 64bits range refinement" highlights the need for a third call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync performs the following bounds deduction: reg_bounds_sync entry: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146) __update_reg_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146) __reg_deduce_bounds: __reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e) __reg64_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_deduce_mixed_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e) __reg_deduce_bounds: __reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e) __reg64_deduce_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_deduce_mixed_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e) __reg_bound_offset: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff)) __update_reg_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff)) In particular, notice how: 1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds learns new u32 bounds. 2. __reg64_deduce_bounds is unable to improve bounds at this point. 3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds. 4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds improves the smax and umin bounds thanks to patch "bpf: Improve bounds when s64 crosses sign boundary" from this series. 5. Subsequent functions are unable to improve the ranges further (only tnums). Yet, a better smin32 bound could be learned from the smin bound. __reg32_deduce_bounds is able to improve smin32 from smin, but for that we need a third call to __reg_deduce_bounds. As discussed in [1], there may be a better way to organize the deduction rules to learn the same information with less calls to the same functions. Such an optimization requires further analysis and is orthogonal to the present patchset. Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Co-developed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28selftests/bpf: Test invariants on JSLT crossing signPaul Chaignon
The improvement of the u64/s64 range refinement fixed the invariant violation that was happening on this test for BPF_JSLT when crossing the sign boundary. After this patch, we have one test remaining with a known invariant violation. It's the same test as fixed here but for 32 bits ranges. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/ad046fb0016428f1a33c3b81617aabf31b51183f.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28selftests/bpf: Test cross-sign 64bits range refinementPaul Chaignon
This patch adds coverage for the new cross-sign 64bits range refinement logic. The three tests cover the cases when the u64 and s64 ranges overlap (1) in the negative portion of s64, (2) in the positive portion of s64, and (3) in both portions. The first test is a simplified version of a BPF program generated by syzkaller that caused an invariant violation [1]. It looks like syzkaller could not extract the reproducer itself (and therefore didn't report it to the mailing list), but I was able to extract it from the console logs of a crash. The principle is similar to the invariant violation described in commit 6279846b9b25 ("bpf: Forget ranges when refining tnum after JSET"): the verifier walks a dead branch, uses the condition to refine ranges, and ends up with inconsistent ranges. In this case, the dead branch is when we fallthrough on both jumps. The new refinement logic improves the bounds such that the second jump is properly detected as always-taken and the verifier doesn't end up walking a dead branch. The second and third tests are inspired by the first, but rely on condition jumps to prepare the bounds instead of ALU instructions. An R10 write is used to trigger a verifier error when the bounds can't be refined. Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/a0e17b00dab8dabcfa6f8384e7e151186efedfdd.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28selftests/bpf: Update reg_bound range refinement logicPaul Chaignon
This patch updates the range refinement logic in the reg_bound test to match the new logic from the previous commit. Without this change, tests would fail because we end with more precise ranges than the tests expect. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/b7f6b1fbe03373cca4e1bb6a113035a6cd2b3ff7.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28bpf: Improve bounds when s64 crosses sign boundaryPaul Chaignon
__reg64_deduce_bounds currently improves the s64 range using the u64 range and vice versa, but only if it doesn't cross the sign boundary. This patch improves __reg64_deduce_bounds to cover the case where the s64 range crosses the sign boundary but overlaps with the u64 range on only one end. In that case, we can improve both ranges. Consider the following example, with the s64 range crossing the sign boundary: 0 U64_MAX | [xxxxxxxxxxxxxx u64 range xxxxxxxxxxxxxx] | |----------------------------|----------------------------| |xxxxx s64 range xxxxxxxxx] [xxxxxxx| 0 S64_MAX S64_MIN -1 The u64 range overlaps only with positive portion of the s64 range. We can thus derive the following new s64 and u64 ranges. 0 U64_MAX | [xxxxxx u64 range xxxxx] | |----------------------------|----------------------------| | [xxxxxx s64 range xxxxx] | 0 S64_MAX S64_MIN -1 The same logic can probably apply to the s32/u32 ranges, but this patch doesn't implement that change. In addition to the selftests, the __reg64_deduce_bounds change was also tested with Agni, the formal verification tool for the range analysis [1]. Link: https://github.com/bpfverif/agni [1] Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/933bd9ce1f36ded5559f92fdc09e5dbc823fa245.1753695655.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()Quan Zhou
The caller has already passed in the memslot, and there are two instances `{kvm_faultin_pfn/mark_page_dirty}` of retrieving the memslot again in `kvm_riscv_gstage_map`, we can replace them with `{__kvm_faultin_pfn/mark_page_dirty_in_slot}`. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/50989f0a02790f9d7dc804c2ade6387c4e7fbdbc.1749634392.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAsQuan Zhou
There is already a helper function find_vma_intersection() in KVM for searching intersecting VMAs, use it directly. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/230d6c8c8b8dd83081fcfd8d83a4d17c8245fa2f.1731552790.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: perf/kvm: Add reporting of interrupt eventsQuan Zhou
For `perf kvm stat` on the RISC-V, in order to avoid the occurrence of `UNKNOWN` event names, interrupts should be reported in addition to exceptions. testing without patch: Event name Samples Sample% Time(ns) --------------------------- -------- -------- ------------ STORE_GUEST_PAGE_FAULT 1496461 53.00% 889612544 UNKNOWN 887514 31.00% 272857968 LOAD_GUEST_PAGE_FAULT 305164 10.00% 189186331 VIRTUAL_INST_FAULT 70625 2.00% 134114260 SUPERVISOR_SYSCALL 32014 1.00% 58577110 INST_GUEST_PAGE_FAULT 1 0.00% 2545 testing with patch: Event name Samples Sample% Time(ns) --------------------------- -------- -------- ------------ IRQ_S_TIMER 211271 58.00% 738298680600 EXC_STORE_GUEST_PAGE_FAULT 111279 30.00% 130725914800 EXC_LOAD_GUEST_PAGE_FAULT 22039 6.00% 25441480600 EXC_VIRTUAL_INST_FAULT 8913 2.00% 21015381600 IRQ_VS_EXT 4748 1.00% 10155464300 IRQ_S_EXT 2802 0.00% 13288775800 IRQ_S_SOFT 1998 0.00% 4254129300 Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/9693132df4d0f857b8be3a75750c36b40213fcc0.1726211632.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Enable ring-based dirty memory trackingQuan Zhou
Enable ring-based dirty memory tracking on riscv: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL as riscv is weakly ordered. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add a check to kvm_vcpu_kvm_riscv_check_vcpu_requests for checking whether the dirty ring is soft full. To handle vCPU requests that cause exits to userspace, modified the `kvm_riscv_check_vcpu_requests` to return a value (currently only returns 0 or 1). Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20e116efb1f7aff211dd8e3cf8990c5521ed5f34.1749810735.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmapSamuel Holland
The Smnpm extension requires special handling because the guest ISA extension maps to a different extension (Ssnpm) on the host side. commit 1851e7836212 ("RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests") missed that the vcpu->arch.isa bit is based only on the host extension, so currently both KVM_RISCV_ISA_EXT_{SMNPM,SSNPM} map to vcpu->arch.isa[RISCV_ISA_EXT_SSNPM]. This does not cause any problems for the guest, because both extensions are force-enabled anyway when the host supports Ssnpm, but prevents checking for (guest) Smnpm in the SBI FWFT logic. Redefine kvm_isa_ext_arr to look up the guest extension, since only the guest -> host mapping is unambiguous. Factor out the logic for checking for host support of an extension, so this special case only needs to be handled in one place, and be explicit about which variables hold a host vs a guest ISA extension. Fixes: 1851e7836212 ("RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250111004702.2813013-2-samuel.holland@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Delegate illegal instruction fault to VS modeXu Lu
Delegate illegal instruction fault to VS mode by default to avoid such exceptions being trapped to HS and redirected back to VS. The delegation of illegal instruction fault is particularly important to guest applications that use vector instructions frequently. In such cases, an illegal instruction fault will be raised when guest user thread uses vector instruction the first time and then guest kernel will enable user thread to execute following vector instructions. The fw pmu event counter remains undeleted so that guest can still query illegal instruction events via sbi call. Guest will only see zero count on illegal instruction faults and know 'firmware' has delegated it. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Link: https://lore.kernel.org/r/20250714094554.89151-1-luxu.kernel@bytedance.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIsAnup Patel
Currently, all kvm_riscv_hfence_xyz() APIs assume VMID to be the host VMID of the Guest/VM which resticts use of these APIs only for host TLB maintenance. Let's allow passing VMID as a parameter to all kvm_riscv_hfence_xyz() APIs so that they can be re-used for nested virtualization related TLB maintenance. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250618113532.471448-13-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-28RISC-V: KVM: Factor-out g-stage page table managementAnup Patel
The upcoming nested virtualization can share g-stage page table management with the current host g-stage implementation hence factor-out g-stage page table management as separate sources and also use "kvm_riscv_mmu_" prefix for host g-stage functions. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Tested-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com> Link: https://lore.kernel.org/r/20250618113532.471448-12-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>