summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-23KVM: x86: Disable KVM_INTEL_PROVE_VE by defaultSean Christopherson
Disable KVM's "prove #VE" support by default, as it provides no functional value, and even its sanity checking benefits are relatively limited. I.e. it should be fully opt-in even on debug kernels, especially since EPT Violation #VE suppression appears to be buggy on some CPUs. Opportunistically add a line in the help text to make it abundantly clear that KVM_INTEL_PROVE_VE should never be enabled in a production environment. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: VMX: Enumerate EPT Violation #VE support in /proc/cpuinfoSean Christopherson
Don't suppress printing EPT_VIOLATION_VE in /proc/cpuinfo, knowing whether or not KVM_INTEL_PROVE_VE actually does anything is extremely valuable. A privileged user can get at the information by reading the raw MSR, but the whole point of the VMX flags is to avoid needing to glean information from raw MSR reads. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: x86/mmu: Print SPTEs on unexpected #VESean Christopherson
Print the SPTEs that correspond to the faulting GPA on an unexpected EPT Violation #VE to help the user debug failures, e.g. to pinpoint which SPTE didn't have SUPPRESS_VE set. Opportunistically assert that the underlying exit reason was indeed an EPT Violation, as the CPU has *really* gone off the rails if a #VE occurs due to a completely unexpected exit reason. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: VMX: Dump VMCS on unexpected #VESean Christopherson
Dump the VMCS on an unexpected #VE, otherwise it's practically impossible to figure out why the #VE occurred. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: x86/mmu: Add sanity checks that KVM doesn't create EPT #VE SPTEsSean Christopherson
Assert that KVM doesn't set a SPTE to a value that could trigger an EPT Violation #VE on a non-MMIO SPTE, e.g. to help detect bugs even without KVM_INTEL_PROVE_VE enabled, and to help debug actual #VE failures. Note, this will run afoul of TDX support, which needs to reflect emulated MMIO accesses into the guest as #VEs (which was the whole point of adding EPT Violation #VE support in KVM). The obvious fix for that is to exempt MMIO SPTEs, but that's annoyingly difficult now that is_mmio_spte() relies on a per-VM value. However, resolving that conundrum is a future problem, whereas getting KVM_INTEL_PROVE_VE healthy is a current problem. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: nVMX: Always handle #VEs in L0 (never forward #VEs from L2 to L1)Sean Christopherson
Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0, as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs are KVM's responsibility. Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE supportSean Christopherson
Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE enabled and a VE info address pointing at pfn 0. Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: VMX: Don't kill the VM on an unexpected #VESean Christopherson
Don't terminate the VM on an unexpected #VE, as it's extremely unlikely the #VE is fatal to the guest, and even less likely that it presents a danger to the host. Simply resume the guest on "failure", as the #VE info page's BUSY field will prevent converting any more EPT Violations to #VEs for the vCPU (at least, that's what the BUSY field is supposed to do). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: x86/mmu: Use SHADOW_NONPRESENT_VALUE for atomic zap in TDP MMUIsaku Yamahata
Use SHADOW_NONPRESENT_VALUE when zapping TDP MMU SPTEs with mmu_lock held for read, tdp_mmu_zap_spte_atomic() was simply missed during the initial development. Fixes: 7f01cab84928 ("KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE") Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> [sean: write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20240518000430.1118488-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23drm/xe: Enable D3Cold on 'low' VRAM utilizationRodrigo Vivi
Now that we eliminated all the mem_access get/put with its locking issues from the inner calls of migration, we can allow D3Cold. Enable it when VRAM utilization is lower then 300Mb. On higher utilization we only allow D3hot so we don't increase so much the latency on runtime resume due to the memory restoration. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-7-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Stop checking for power_lost on D3ColdRodrigo Vivi
GuC reset status is not reliable for this purpose and it is once in a while ending up in a situation of D3Cold, where power_reset is false and without the proper memory restoration the GuC reload and Display will fail to come back from D3Cold. So, let's do a full restoration of everything if we have a risk of losing power, without further optimizations. v2: also remove the gut_in_reset function (Anshuman) Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-6-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Prepare display for D3ColdRodrigo Vivi
Prepare power-well and DC handling for a full power lost during D3Cold, then sanitize it upon D3->D0. Otherwise we get a bunch of state mismatch. Ideally we could leave DC9 enabled and wouldn't need to move DC9->DC0 on every runtime resume, however, the disable_DC is part of the power-well checks and intrinsic to the dc_off power well. In the future that can be detangled so we can have even bigger power savings. But for now, let's focus on getting a D3Cold, which saves much more power by itself. v2: create new functions to avoid full-suspend-resume path, which would result in a deadlock between xe_gem_fault and the modeset-ioctl. v3: Only avoid the full modeset to avoid the race, for a more robust suspend-resume. Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Uma Shankar <uma.shankar@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-5-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Relax runtime pm protection around VMRodrigo Vivi
In the regular use case scenario, user space will create a VM, and keep it alive for the entire duration of its workload. For the regular desktop cases, it means that the VM is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power. Limit the VM protection solely for long-running workloads that are not protected by the scheduler references. By design, run_job for long-running workloads returns NULL and the scheduler drops all the references of it, hence protecting the VM for this case is necessary. v2: Update commit message to a more imperative language and to reflect why the VM protection is really needed. Also add a comment in the code to let the reason visbible. v3: Remove vma_access case and the mentions to mmap. Mmap cases are already protected by the gem page fault. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Relax runtime pm protection during executionRodrigo Vivi
Limit the protection only during moments of actual job execution, and introduce protection for guc submit fini, which is currently unprotected due to the absence of exec_queue life protection. In the regular use case scenario, user space will create an exec queue, and keep it alive to reuse that until it is done with that kind of workload. For the regular desktop cases, it means that the exec_queue is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power. Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Fix xe_pm_runtime_get_if_in_use documentationRodrigo Vivi
Let's be clear on what it is actually doing and align with xe_pm_runtime_get_if_active doc style. Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Fix xe_pm_runtime_get_if_active returnRodrigo Vivi
Current callers of this function are already taking the result to a boolean and using in an if. It might be a problem because current function might return negative error codes on failure, without increasing the reference counter. In this scenario we could end up with extra 'put' call ending in unbalanced scenarios. Let's fix it, while aligning with the current xe_pm_get_if_in_use style. Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23riscv: Fix early ftrace nop patchingAlexandre Ghiti
Commit c97bf629963e ("riscv: Fix text patching when IPI are used") converted ftrace_make_nop() to use patch_insn_write() which does not emit any icache flush relying entirely on __ftrace_modify_code() to do that. But we missed that ftrace_make_nop() was called very early directly when converting mcount calls into nops (actually on riscv it converts 2B nops emitted by the compiler into 4B nops). This caused crashes on multiple HW as reported by Conor and Björn since the booting core could have half-patched instructions in its icache which would trigger an illegal instruction trap: fix this by emitting a local flush icache when early patching nops. Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reported-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240523115134.70380-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-23Merge tag 'drm-misc-fixes-2024-05-16' of ↵Daniel Vetter
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Short summary of fixes pull: nouveau: - use tile_mode and pte_kind for VM_BIND bo allocations Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240516072658.GA8395@linux.fritz.box
2024-05-23tools/latency-collector: Fix -Wformat-security compile warnsShuah Khan
Fix the following -Wformat-security compile warnings adding missing format arguments: latency-collector.c: In function ‘show_available’: latency-collector.c:938:17: warning: format not a string literal and no format arguments [-Wformat-security] 938 | warnx(no_tracer_msg); | ^~~~~ latency-collector.c:943:17: warning: format not a string literal and no format arguments [-Wformat-security] 943 | warnx(no_latency_tr_msg); | ^~~~~ latency-collector.c: In function ‘find_default_tracer’: latency-collector.c:986:25: warning: format not a string literal and no format arguments [-Wformat-security] 986 | errx(EXIT_FAILURE, no_tracer_msg); | ^~~~ latency-collector.c: In function ‘scan_arguments’: latency-collector.c:1881:33: warning: format not a string literal and no format arguments [-Wformat-security] 1881 | errx(EXIT_FAILURE, no_tracer_msg); | ^~~~ Link: https://lore.kernel.org/linux-trace-kernel/20240404011009.32945-1-skhan@linuxfoundation.org Cc: stable@vger.kernel.org Fixes: e23db805da2df ("tracing/tools: Add the latency-collector to tools directory") Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23soi: Don't call DMA sync API when not neededMark Brown
Merge series from Andy Shevchenko <andriy.shevchenko@linux.intel.com>: A couple of fixes to avoid calling DMA sync API when it's not needed. This doesn't stop from discussing if IOMMU code is doing the right thing, i.e. dereferences SG list when orig_nents == 0, but this is a separate story.
2024-05-23r8169: Fix possible ring buffer corruption on fragmented Tx packets.Ken Milmore
An issue was found on the RTL8125b when transmitting small fragmented packets, whereby invalid entries were inserted into the transmit ring buffer, subsequently leading to calls to dma_unmap_single() with a null address. This was caused by rtl8169_start_xmit() not noticing changes to nr_frags which may occur when small packets are padded (to work around hardware quirks) in rtl8169_tso_csum_v2(). To fix this, postpone inspecting nr_frags until after any padding has been applied. Fixes: 9020845fb5d6 ("r8169: improve rtl8169_start_xmit") Cc: stable@vger.kernel.org Signed-off-by: Ken Milmore <ken.milmore@gmail.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/27ead18b-c23d-4f49-a020-1fc482c5ac95@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-23eventfs: Do not use attributes for events directorySteven Rostedt (Google)
The top "events" directory has a static inode (it's created when it is and removed when the directory is removed). There's no need to use the events ei->attr to determine its permissions. But it is used for saving the permissions of the "events" directory for when it is created, as that is needed for the default permissions for the files and directories underneath it. For example: # cd /sys/kernel/tracing # mkdir instances/foo # chown 1001 instances/foo/events The files under instances/foo/events should still have the same owner as instances/foo (which the instances/foo/events ei->attr will hold), but the events directory now has owner 1001. Link: https://lore.kernel.org/lkml/20240522165032.104981011@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23eventfs: Cleanup permissions in creation of inodesSteven Rostedt (Google)
The permissions being set during the creation of the inodes was updating eventfs_inode attributes as well. Those attributes should only be touched by the setattr or remount operations, not during the creation of inodes. The eventfs_inode attributes should only be used to set the inodes and should not be modified during the inode creation. Simplify the code and fix the situation by: 1) Removing the eventfs_find_events() and doing a simple lookup for the events descriptor in eventfs_get_inode() 2) Remove update_events_attr() as the attributes should only be used to update the inode and should not be modified here. 3) Add update_inode_attr() that uses the attributes to determine what the inode permissions should be. 4) As the parent_inode of the eventfs_root_inode structure is no longer needed, remove it. Now on creation, the inode gets the proper permissions without causing side effects to the ei->attr field. Link: https://lore.kernel.org/lkml/20240522165031.944088388@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23eventfs: Remove getattr and permission callbacksSteven Rostedt (Google)
Now that inodes have their permissions updated on remount, the only other places to update the inode permissions are when they are created and in the setattr callback. The getattr and permission callbacks are not needed as the inodes should already be set at their proper settings. Remove the callbacks, as it not only simplifies the code, but also allows more flexibility to fix the inconsistencies with various corner cases (like changing the permission of an instance directory). Link: https://lore.kernel.org/lkml/20240522165031.782066021@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()Steven Rostedt (Google)
To simplify the code, create a eventfs_get_inode() that is used when an eventfs file or directory is created. Have the internal tracefs_inode updated the appropriate flags in this function and update the inode's mode as well. Link: https://lore.kernel.org/lkml/20240522165031.624864160@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()Steven Rostedt (Google)
When the inode is being dropped from the dentry, the TRACEFS_EVENT_INODE flag needs to be cleared to prevent a remount from calling eventfs_remount() on the tracefs_inode private data. There's a race between the inode is dropped (and the dentry freed) to where the inode is actually freed. If a remount happens between the two, the eventfs_inode could be accessed after it is freed (only the dentry keeps a ref count on it). Currently the TRACEFS_EVENT_INODE flag is cleared from the dentry iput() function. But this is incorrect, as it is possible that the inode has another reference to it. The flag should only be cleared when the inode is really being dropped and has no more references. That happens in the drop_inode callback of the inode, as that gets called when the last reference of the inode is released. Remove the tracefs_d_iput() function and move its logic to the more appropriate tracefs_drop_inode() callback function. Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.908205106@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23eventfs: Update all the eventfs_inodes from the events descriptorSteven Rostedt (Google)
The change to update the permissions of the eventfs_inode had the misconception that using the tracefs_inode would find all the eventfs_inodes that have been updated and reset them on remount. The problem with this approach is that the eventfs_inodes are freed when they are no longer used (basically the reason the eventfs system exists). When they are freed, the updated eventfs_inodes are not reset on a remount because their tracefs_inodes have been freed. Instead, since the events directory eventfs_inode always has a tracefs_inode pointing to it (it is not freed when finished), and the events directory has a link to all its children, have the eventfs_remount() function only operate on the events eventfs_inode and have it descend into its children updating their uid and gids. Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVwJyVQ@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options") Reported-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23tracefs: Update inode permissions on remountSteven Rostedt (Google)
When a remount happens, if a gid or uid is specified update the inodes to have the same gid and uid. This will allow the simplification of the permissions logic for the dynamically created files and directories. Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.592429986@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23eventfs: Keep the directories from having the same inode number as filesSteven Rostedt (Google)
The directories require unique inode numbers but all the eventfs files have the same inode number. Prevent the directories from having the same inode numbers as the files as that can confuse some tooling. Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.428826685@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Fixes: 834bf76add3e6 ("eventfs: Save directory inodes in the eventfs_inode structure") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-23drm/xe: Add process name to devcoredumpJosé Roberto de Souza
Process name help us track what application caused the gpug hang, this is crucial when running several applications at the same time. v2: - handle Xe KMD exec_queues without VM v3: - use get_pid_task() (suggested by Nirmoy) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522201203.145403-1-jose.souza@intel.com
2024-05-23dma-mapping: benchmark: handle NUMA_NO_NODE correctlyFedor Pchelkin
cpumask_of_node() can be called for NUMA_NO_NODE inside do_map_benchmark() resulting in the following sanitizer report: UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28 index -1 is out of range for type 'cpumask [64][1]' CPU: 1 PID: 990 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117) ubsan_epilogue (lib/ubsan.c:232) __ubsan_handle_out_of_bounds (lib/ubsan.c:429) cpumask_of_node (arch/x86/include/asm/topology.h:72) [inline] do_map_benchmark (kernel/dma/map_benchmark.c:104) map_benchmark_ioctl (kernel/dma/map_benchmark.c:246) full_proxy_unlocked_ioctl (fs/debugfs/file.c:333) __x64_sys_ioctl (fs/ioctl.c:890) do_syscall_64 (arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Use cpumask_of_node() in place when binding a kernel thread to a cpuset of a particular node. Note that the provided node id is checked inside map_benchmark_ioctl(). It's just a NUMA_NO_NODE case which is not handled properly later. Found by Linux Verification Center (linuxtesting.org). Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-23dma-mapping: benchmark: fix node id validationFedor Pchelkin
While validating node ids in map_benchmark_ioctl(), node_possible() may be provided with invalid argument outside of [0,MAX_NUMNODES-1] range leading to: BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214) Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971 CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117) kasan_report (mm/kasan/report.c:603) kasan_check_range (mm/kasan/generic.c:189) variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline] arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline] _test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline] node_state (include/linux/nodemask.h:423) [inline] map_benchmark_ioctl (kernel/dma/map_benchmark.c:214) full_proxy_unlocked_ioctl (fs/debugfs/file.c:333) __x64_sys_ioctl (fs/ioctl.c:890) do_syscall_64 (arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Compare node ids with sane bounds first. NUMA_NO_NODE is considered a special valid case meaning that benchmarking kthreads won't be bound to a cpuset of a given node. Found by Linux Verification Center (linuxtesting.org). Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-23dma-mapping: benchmark: avoid needless copy_to_user if benchmark failsFedor Pchelkin
If do_map_benchmark() has failed, there is nothing useful to copy back to userspace. Suggested-by: Barry Song <21cnbao@gmail.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-23dma-mapping: benchmark: fix up kthread-related error handlingFedor Pchelkin
kthread creation failure is invalidly handled inside do_map_benchmark(). The put_task_struct() calls on the error path are supposed to balance the get_task_struct() calls which only happen after all the kthreads are successfully created. Rollback using kthread_stop() for already created kthreads in case of such failure. In normal situation call kthread_stop_put() to gracefully stop kthreads and put their task refcounts. This should be done for all started kthreads. Found by Linux Verification Center (linuxtesting.org). Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs") Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-23null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'Yu Kuai
Writing 'power' and 'submit_queues' concurrently will trigger kernel panic: Test script: modprobe null_blk nr_devices=0 mkdir -p /sys/kernel/config/nullb/nullb0 while true; do echo 1 > submit_queues; echo 4 > submit_queues; done & while true; do echo 1 > power; echo 0 > power; done Test result: BUG: kernel NULL pointer dereference, address: 0000000000000148 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:__lock_acquire+0x41d/0x28f0 Call Trace: <TASK> lock_acquire+0x121/0x450 down_write+0x5f/0x1d0 simple_recursive_removal+0x12f/0x5c0 blk_mq_debugfs_unregister_hctxs+0x7c/0x100 blk_mq_update_nr_hw_queues+0x4a3/0x720 nullb_update_nr_hw_queues+0x71/0xf0 [null_blk] nullb_device_submit_queues_store+0x79/0xf0 [null_blk] configfs_write_iter+0x119/0x1e0 vfs_write+0x326/0x730 ksys_write+0x74/0x150 This is because del_gendisk() can concurrent with blk_mq_update_nr_hw_queues(): nullb_device_power_store nullb_apply_submit_queues null_del_dev del_gendisk nullb_update_nr_hw_queues if (!dev->nullb) // still set while gendisk is deleted return 0 blk_mq_update_nr_hw_queues dev->nullb = NULL Fix this problem by resuing the global mutex to protect nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs. Fixes: 45919fbfe1c4 ("null_blk: Enable modifying 'submit_queues' after an instance has been configured") Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240523153934.1937851-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-23wifi: ath11k: move power type check to ASSOC stage when connecting to 6 GHz APBaochen Qiang
With commit bc8a0fac8677 ("wifi: mac80211: don't set bss_conf in parsing") ath11k fails to connect to 6 GHz AP. This is because currently ath11k checks AP's power type in ath11k_mac_op_assign_vif_chanctx() which would be called in AUTH stage. However with above commit power type is not available until ASSOC stage. As a result power type check fails and therefore connection fails. Fix this by moving power type check to ASSOC stage, also move regulatory rules update there because it depends on power type. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 Fixes: bc8a0fac8677 ("wifi: mac80211: don't set bss_conf in parsing") Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240424064019.4847-1-quic_bqiang@quicinc.com
2024-05-23wifi: ath11k: fix WCN6750 firmware crash caused by 17 num_vdevsCarl Huang
WCN6750 firmware crashes because of num_vdevs changed from 4 to 17 in ath11k_init_wmi_config_qca6390() as the ab->hw_params.num_vdevs is 17. This is caused by commit f019f4dff2e4 ("wifi: ath11k: support 2 station interfaces") which assigns ab->hw_params.num_vdevs directly to config->num_vdevs in ath11k_init_wmi_config_qca6390(), therefore WCN6750 firmware crashes as it can't support such a big num_vdevs. Fix it by assign 3 to num_vdevs in hw_params for WCN6750 as 3 is sufficient too. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01371-QCAMSLSWPLZ-1 Fixes: f019f4dff2e4 ("wifi: ath11k: support 2 station interfaces") Reported-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Luca Weiss <luca.weiss@fairphone.com> Closes: https://lore.kernel.org/r/D15TIIDIIESY.D1EKKJLZINMA@fairphone.com/ Signed-off-by: Carl Huang <quic_cjhuang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240520030757.2209395-1-quic_cjhuang@quicinc.com
2024-05-23HID: nvidia-shield: Add missing check for input_ff_create_memlessChen Ni
Add check for the return value of input_ff_create_memless() and return the error if it fails in order to catch the error. Fixes: 09308562d4af ("HID: nvidia-shield: Initial driver implementation with Thunderstrike support") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-05-23HID: intel-ish-hid: Fix build error for COMPILE_TESTZhang Lixu
kernel test robot reported build error due to a pointer type mismatch: .../ishtp/loader.c:172:8: error: incompatible pointer types passing '__le64 *' (aka 'unsigned long long *') to parameter of type 'dma_addr_t *' (aka 'unsigned int *') The issue arises because the driver, which is primarily intended for x86-64, is also built for i386 when COMPILE_TEST is enabled. Resolve type mismatch by using a temporary dma_addr_t variable to hold the DMA address. Populate this temporary variable in dma_alloc_coherent() function, and then convert and store the address in the fragment->fragment_tbl[i].ddr_adrs field in the correct format. Similarly, convert the ddr_adrs field back to dma_addr_t when freeing the DMA buffer with dma_free_coherent(). Fixes: 579a267e4617 ("HID: intel-ish-hid: Implement loading firmware from host feature") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405201313.SAStVPrT-lkp@intel.com/ Signed-off-by: Zhang Lixu <lixu.zhang@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2024-05-23drm/i915: Define SEL_FETCH_PLANE registers via PICK_EVEN_2RANGES()Ville Syrjälä
Instead of that huge _PICK() let's use PICK_EVEN_2RANGES() for the SEL_FETCH_PLANE registers. A bit more tedious to have to define 8 raw register offsets for everything, but perhaps a bit easier to understand since we use a standard mechanism now instead of hand rolling the arithmetic. Also bloat-o-meter says: add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-326 (-326) Function old new delta icl_plane_update_arm 510 446 -64 icl_plane_disable_sel_fetch_arm.isra 158 54 -104 icl_plane_update_noarm 1898 1740 -158 Total: Before=2574502, After=2574176, chg -0.01% v2: s/mtl+/tgl+/ comments to reflect actual reality Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240516135622.3498-7-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com>
2024-05-23irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflictPalmer Dabbelt
There was a semantic conflict between 21a8f8a0eb35 ("irqchip: Add RISC-V incoming MSI controller early driver") and dc892fb44322 ("riscv: Use IPIs for remote cache/TLB flushes by default") due to an API change. This manifests as a build failure post-merge. Reported-by: Tomasz Jeznach <tjeznach@rivosinc.com> Link: https://lore.kernel.org/all/mhng-10b71228-cf3e-42ca-9abf-5464b15093f1@palmer-ri-x1c9/ Fixes: 0bfbc914d943 ("Merge tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux") Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240522184953.28531-3-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-23i2c: Remove I2C_CLASS_SPDHeiner Kallweit
Remove this class after all users have been gone. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-05-23i2c: synquacer: Remove a clk reference from struct synquacer_i2cChristophe JAILLET
'pclk' is only used locally in the probe. Remove it from the 'synquacer_i2c' structure. Also remove a useless debug message. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-05-23spi: stm32: Revert change that enabled controller before asserting CSUwe Kleine-König
On stm32mp157 enabling the controller before asserting CS makes the hardware trigger spurious interrupts in a tight loop and the transfers fail. Revert the commit that swapped the order of enable and CS. This reintroduces the problem that swapping was supposed to fix, which however is less grave. Reported-by: Leonard Göhrs <l.goehrs@pengutronix.de> Link: https://lore.kernel.org/all/39033ed7-3e57-4339-80b4-fc8919e26aa7@pengutronix.de/ Fixes: 52b62e7a5d4f ("spi: stm32: enable controller before asserting CS") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://msgid.link/r/20240523103326.792907-2-u.kleine-koenig@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-23spi: Check if transfer is mapped before calling DMA sync APIsAndy Shevchenko
The resent update to remove the orig_nents checks revealed that not all DMA sync backends can cope with the unallocated SG list, while supplying orig_nents == 0 (the commit 861370f49ce4 ("iommu/dma: force bouncing if the size is not cacheline-aligned"), for example, makes that happen for the IOMMU case). It means we have to check if the buffers are DMA mapped before trying to sync them. Re-introduce that check in a form of calling ->can_dma() in the same way as it's done in the DMA mapping loop for the SPI transfers. Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reported-by: Neil Armstrong <neil.armstrong@linaro.org> Closes: https://lore.kernel.org/r/8ae675b5-fcf9-4c9b-b06a-4462f70e1322@linaro.org Closes: https://lore.kernel.org/all/d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano Fixes: 8cc3bad9d9d6 ("spi: Remove unneded check for orig_nents") Suggested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://msgid.link/r/20240522171018.3362521-3-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-23spi: Don't mark message DMA mapped when no transfer in it isAndy Shevchenko
There is no need to set the DMA mapped flag of the message if it has no mapped transfers. Moreover, it may give the code a chance to take the wrong paths, i.e. to exercise DMA related APIs on unmapped data. Make __spi_map_msg() to bail earlier on the above mentioned cases. Fixes: 99adef310f68 ("spi: Provide core support for DMA mapping transfers") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://msgid.link/r/20240522171018.3362521-2-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-23Merge tag 'asoc-fix-v6.10-merge-window' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.10 A bunch of fixes that came in during the merge window, all driver specific and none of them especially remarkable.
2024-05-239p: add missing locking around taking dentry fid listDominique Martinet
Fix a use-after-free on dentry's d_fsdata fid list when a thread looks up a fid through dentry while another thread unlinks it: UAF thread: refcount_t: addition on 0; use-after-free. p9_fid_get linux/./include/net/9p/client.h:262 v9fs_fid_find+0x236/0x280 linux/fs/9p/fid.c:129 v9fs_fid_lookup_with_uid linux/fs/9p/fid.c:181 v9fs_fid_lookup+0xbf/0xc20 linux/fs/9p/fid.c:314 v9fs_vfs_getattr_dotl+0xf9/0x360 linux/fs/9p/vfs_inode_dotl.c:400 vfs_statx+0xdd/0x4d0 linux/fs/stat.c:248 Freed by: p9_fid_destroy (inlined) p9_client_clunk+0xb0/0xe0 linux/net/9p/client.c:1456 p9_fid_put linux/./include/net/9p/client.h:278 v9fs_dentry_release+0xb5/0x140 linux/fs/9p/vfs_dentry.c:55 v9fs_remove+0x38f/0x620 linux/fs/9p/vfs_inode.c:518 vfs_unlink+0x29a/0x810 linux/fs/namei.c:4335 The problem is that d_fsdata was not accessed under d_lock, because d_release() normally is only called once the dentry is otherwise no longer accessible but since we also call it explicitly in v9fs_remove that lock is required: move the hlist out of the dentry under lock then unref its fids once they are no longer accessible. Fixes: 154372e67d40 ("fs/9p: fix create-unlink-getattr idiom") Cc: stable@vger.kernel.org Reported-by: Meysam Firouzi Reported-by: Amirmohammad Eftekhar Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Message-ID: <20240521122947.1080227-1-asmadeus@codewreck.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2024-05-23Merge branch 'intel-interpret-set_channels-input-differently'Paolo Abeni
Jacob Keller says: ==================== intel: Interpret .set_channels() input differently The ice and idpf drivers can trigger a crash with AF_XDP due to incorrect interpretation of the asymmetric Tx and Rx parameters in their .set_channels() implementations: 1. ethtool -l <IFNAME> -> combined: 40 2. Attach AF_XDP to queue 30 3. ethtool -L <IFNAME> rx 15 tx 15 combined number is not specified, so command becomes {rx_count = 15, tx_count = 15, combined_count = 40}. 4. ethnl_set_channels checks, if there are any AF_XDP of queues from the new (combined_count + rx_count) to the old one, so from 55 to 40, check does not trigger. 5. the driver interprets `rx 15 tx 15` as 15 combined channels and deletes the queue that AF_XDP is attached to. This is fundamentally a problem with interpreting a request for asymmetric queues as symmetric combined queues. Fix the ice and idpf drivers to stop interpreting such requests as a request for combined queues. Due to current driver design for both ice and idpf, it is not possible to support requests of the same count of Tx and Rx queues with independent interrupts, (i.e. ethtool -L <IFNAME> rx 15 tx 15) so such requests are now rejected. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> ==================== Link: https://lore.kernel.org/r/20240521-iwl-net-2024-05-14-set-channels-fixes-v2-0-7aa39e2e99f1@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-23idpf: Interpret .set_channels() input differentlyLarysa Zaremba
Unlike ice, idpf does not check, if user has requested at least 1 combined channel. Instead, it relies on a check in the core code. Unfortunately, the check does not trigger for us because of the hacky .set_channels() interpretation logic that is not consistent with the core code. This naturally leads to user being able to trigger a crash with an invalid input. This is how: 1. ethtool -l <IFNAME> -> combined: 40 2. ethtool -L <IFNAME> rx 0 tx 0 combined number is not specified, so command becomes {rx_count = 0, tx_count = 0, combined_count = 40}. 3. ethnl_set_channels checks, if there is at least 1 RX and 1 TX channel, comparing (combined_count + rx_count) and (combined_count + tx_count) to zero. Obviously, (40 + 0) is greater than zero, so the core code deems the input OK. 4. idpf interprets `rx 0 tx 0` as 0 channels and tries to proceed with such configuration. The issue has to be solved fundamentally, as current logic is also known to cause AF_XDP problems in ice [0]. Interpret the command in a way that is more consistent with ethtool manual [1] (--show-channels and --set-channels) and new ice logic. Considering that in the idpf driver only the difference between RX and TX queues forms dedicated channels, change the correct way to set number of channels to: ethtool -L <IFNAME> combined 10 /* For symmetric queues */ ethtool -L <IFNAME> combined 8 tx 2 rx 0 /* For asymmetric queues */ [0] https://lore.kernel.org/netdev/20240418095857.2827-1-larysa.zaremba@intel.com/ [1] https://man7.org/linux/man-pages/man8/ethtool.8.html Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>