summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2023-10-01Merge tag 'x86-urgent-2023-10-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a kerneldoc build warning fix, add SRSO mitigation for AMD-derived Hygon processors, and fix a SGX kernel crash in the page fault handler that can trigger when ksgxd races to reclaim the SECS special page, by making the SECS page unswappable" * tag 'x86-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race x86/srso: Add SRSO mitigation for Hygon processors x86/kgdb: Fix a kerneldoc warning when build with W=1
2023-10-01Merge tag 'perf-urgent-2023-10-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fixes from Ingo Molnar: "Misc fixes: work around an AMD microcode bug on certain models, and fix kexec kernel PMI handlers on AMD systems that get loaded on older kernels that have an unexpected register state" * tag 'perf-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd: Do not WARN() on every IRQ perf/x86/amd/core: Fix overflow reset on hotplug
2023-09-29mm: abstract moving to the next PFNMatthew Wilcox (Oracle)
In order to fix the L1TF vulnerability, x86 can invert the PTE bits for PROT_NONE VMAs, which means we cannot move from one PTE to the next by adding 1 to the PFN field of the PTE. This results in the BUG reported at [1]. Abstract advancing the PTE to the next PFN through a pte_next_pfn() function/macro. Link: https://lkml.kernel.org/r/20230920040958.866520-1-willy@infradead.org Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000d099fa0604f03351@google.com [1] Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29x86/cpu/amd: Remove redundant 'break' statementBaolin Liu
This break is after the return statement, so it is redundant & confusing, and should be deleted. Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/396ba14d.2726.189d957b74b.Coremail.liubaolin12138@163.com
2023-09-28x86/sgx: Resolves SECS reclaim vs. page fault for EAUG raceHaitao Huang
The SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC page for an enclave and set secs.epc_page to NULL. The SECS page is used for EAUG and ELDU in the SGX page fault handler. However, the NULL check for secs.epc_page is only done for ELDU, not EAUG before being used. Fix this by doing the same NULL check and reloading of the SECS page as needed for both EAUG and ELDU. The SECS page holds global enclave metadata. It can only be reclaimed when there are no other enclave pages remaining. At that point, virtually nothing can be done with the enclave until the SECS page is paged back in. An enclave can not run nor generate page faults without a resident SECS page. But it is still possible for a #PF for a non-SECS page to race with paging out the SECS page: when the last resident non-SECS page A triggers a #PF in a non-resident page B, and then page A and the SECS both are paged out before the #PF on B is handled. Hitting this bug requires that race triggered with a #PF for EAUG. Following is a trace when it happens. BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:sgx_encl_eaug_page+0xc7/0x210 Call Trace: ? __kmem_cache_alloc_node+0x16a/0x440 ? xa_load+0x6e/0xa0 sgx_vma_fault+0x119/0x230 __do_fault+0x36/0x140 do_fault+0x12f/0x400 __handle_mm_fault+0x728/0x1110 handle_mm_fault+0x105/0x310 do_user_addr_fault+0x1ee/0x750 ? __this_cpu_preempt_check+0x13/0x20 exc_page_fault+0x76/0x180 asm_exc_page_fault+0x27/0x30 Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave") Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20230728051024.33063-1-haitao.huang%40linux.intel.com
2023-09-28x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of ↵Adam Dunlap
a two-phase approach Instead of setting x86_virt_bits to a possibly-correct value and then correcting it later, do all the necessary checks before setting it. At this point, the #VC handler references boot_cpu_data.x86_virt_bits, and in the previous version, it would be triggered by the CPUIDs between the point at which it is set to 48 and when it is set to the correct value. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Adam Dunlap <acdunlap@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jacob Xu <jacobhxu@google.com> Link: https://lore.kernel.org/r/20230912002703.3924521-3-acdunlap@google.com
2023-09-28x86/sev-es: Allow copy_from_kernel_nofault() in earlier bootAdam Dunlap
Previously, if copy_from_kernel_nofault() was called before boot_cpu_data.x86_virt_bits was set up, then it would trigger undefined behavior due to a shift by 64. This ended up causing boot failures in the latest version of ubuntu2204 in the gcp project when using SEV-SNP. Specifically, this function is called during an early #VC handler which is triggered by a CPUID to check if NX is implemented. Fixes: 1aa9aa8ee517 ("x86/sev-es: Setup GHCB-based boot #VC handler") Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Adam Dunlap <acdunlap@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jacob Xu <jacobhxu@google.com> Link: https://lore.kernel.org/r/20230912002703.3924521-2-acdunlap@google.com
2023-09-28KVM: x86: Clear bit12 of ICR after APIC-write VM-exitTao Su
When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi() thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay, but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit. Under the x2APIC section, regarding ICR, the SDM says: It remains readable only to aid in debugging; however, software should not assume the value returned by reading the ICR is the last written value. I.e. the guest is allowed to set bit 12. However, the SDM also gives KVM free reign to do whatever it wants with the bit, so long as KVM's behavior doesn't confuse userspace or break KVM's ABI. Clear bit 12 so that it reads back as '0'. This approach is safer than "do nothing" and is consistent with the case where IPI virtualization is disabled or not supported, i.e., handle_fastpath_set_x2apic_icr_irqoff() -> kvm_x2apic_icr_write() Opportunistically replace the TODO with a comment calling out that eating the write is likely faster than a conditional branch around the busy bit. Link: https://lore.kernel.org/all/ZPj6iF0Q7iynn62p@google.com/ Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode") Cc: stable@vger.kernel.org Signed-off-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20230914055504.151365-1-tao1.su@linux.intel.com [sean: tweak changelog, replace TODO with comment, drop local "val"] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-28KVM: x86: Fix lapic timer interrupt lost after loading a snapshot.Haitao Shan
When running android emulator (which is based on QEMU 2.12) on certain Intel hosts with kernel version 6.3-rc1 or above, guest will freeze after loading a snapshot. This is almost 100% reproducible. By default, the android emulator will use snapshot to speed up the next launching of the same android guest. So this breaks the android emulator badly. I tested QEMU 8.0.4 from Debian 12 with an Ubuntu 22.04 guest by running command "loadvm" after "savevm". The same issue is observed. At the same time, none of our AMD platforms is impacted. More experiments show that loading the KVM module with "enable_apicv=false" can workaround it. The issue started to show up after commit 8e6ed96cdd50 ("KVM: x86: fire timer when it is migrated and expired, and in oneshot mode"). However, as is pointed out by Sean Christopherson, it is introduced by commit 967235d32032 ("KVM: vmx: clear pending interrupts on KVM_SET_LAPIC"). commit 8e6ed96cdd50 ("KVM: x86: fire timer when it is migrated and expired, and in oneshot mode") just makes it easier to hit the issue. Having both commits, the oneshot lapic timer gets fired immediately inside the KVM_SET_LAPIC call when loading the snapshot. On Intel platforms with APIC virtualization and posted interrupt processing, this eventually leads to setting the corresponding PIR bit. However, the whole PIR bits get cleared later in the same KVM_SET_LAPIC call by apicv_post_state_restore. This leads to timer interrupt lost. The fix is to move vmx_apicv_post_state_restore to the beginning of the KVM_SET_LAPIC call and rename to vmx_apicv_pre_state_restore. What vmx_apicv_post_state_restore does is actually clearing any former apicv state and this behavior is more suitable to carry out in the beginning. Fixes: 967235d32032 ("KVM: vmx: clear pending interrupts on KVM_SET_LAPIC") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Haitao Shan <hshan@google.com> Link: https://lore.kernel.org/r/20230913000215.478387-1-hshan@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-28KVM: SVM: Update SEV-ES shutdown intercepts with more metadataPeter Gonda
Currently if an SEV-ES VM shuts down userspace sees KVM_RUN struct with only errno=EINVAL. This is a very limited amount of information to debug the situation. Instead return KVM_EXIT_SHUTDOWN to alert userspace the VM is shutting down and is not usable any further. Signed-off-by: Peter Gonda <pgonda@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20230907162449.1739785-1-pgonda@google.com [sean: tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-28KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow up to 4096 vCPUsKyle Meyer
Add a Kconfig entry to set the maximum number of vCPUs per KVM guest and set the default value to 4096 when MAXSMP is enabled, as there are use cases that want to create more than the currently allowed 1024 vCPUs and are more than happy to eat the memory overhead. The Hyper-V TLFS doesn't allow more than 64 sparse banks, i.e. allows a maximum of 4096 virtual CPUs. Cap KVM's maximum number of virtual CPUs to 4096 to avoid exceeding Hyper-V's limit as KVM support for Hyper-V is unconditional, and alternatives like dynamically disabling Hyper-V enlightenments that rely on sparse banks would require non-trivial code changes. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://lore.kernel.org/r/20230824215244.3897419-1-kyle.meyer@hpe.com [sean: massage changelog with --verbose, document #ifdef mess] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-28x86/boot: Compile boot code with -std=gnu11 tooAlexey Dobriyan
Use -std=gnu11 for consistency with main kernel code. It doesn't seem to change anything in vmlinux. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com> Link: https://lore.kernel.org/r/2058761e-12a4-4b2f-9690-3c3c1c9902a5@p183
2023-09-28x86/srso: Add SRSO mitigation for Hygon processorsPu Wen
Add mitigation for the speculative return stack overflow vulnerability which exists on Hygon processors too. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/tencent_4A14812842F104E93AA722EC939483CEFF05@qq.com
2023-09-27KVM: x86: Force TLB flush on userspace changes to special registersMichal Luczaj
Userspace can directly modify the content of vCPU's CR0, CR3, and CR4 via KVM_SYNC_X86_SREGS and KVM_SET_SREGS{,2}. Make sure that KVM flushes guest TLB entries and paging-structure caches if a (partial) guest TLB flush is architecturally required based on the CRn changes. To keep things simple, flush whenever KVM resets the MMU context, i.e. if any bits in CR0, CR3, CR4, or EFER are modified. This is extreme overkill, but stuffing state from userspace is not such a hot path that preserving guest TLB state is a priority. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230814222358.707877-3-mhal@rbox.co [sean: call out that the flushing on MMU context resets is for simplicity] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-27KVM: x86: Remove redundant vcpu->arch.cr0 assignmentsMichal Luczaj
Drop the vcpu->arch.cr0 assignment after static_call(kvm_x86_set_cr0). CR0 was already set by {vmx,svm}_set_cr0(). Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230814222358.707877-2-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-27x86/entry: Fix typos in commentsXin Li (Intel)
Fix 2 typos in the comments. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com> Link: https://lore.kernel.org/r/20230926061319.1929127-1-xin@zytor.com
2023-09-27x86/entry: Remove unused argument %rsi passed to exc_nmi()Xin Li (Intel)
exc_nmi() only takes one argument of type struct pt_regs *, but asm_exc_nmi() calls it with 2 arguments. The second one passed in %rsi seems to be a leftover, so simply remove it. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com> Link: https://lore.kernel.org/r/20230926061319.1929127-1-xin@zytor.com
2023-09-27x86/amd_nb: Add AMD Family MI300 PCI IDsMuralidhara M K
Add new Root, Device 18h Function 3, and Function 4 PCI IDS for AMD F19h Model 90h-9fh (MI300A). Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Suma Hegde <suma.hegde@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230926051932.193239-1-suma.hegde@amd.com
2023-09-25KVM: x86/pmu: Synthesize at most one PMI per VM-exitJim Mattson
When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a VM-exit that also invokes __kvm_perf_overflow() as a result of instruction emulation, kvm_pmu_deliver_pmi() will be called twice before the next VM-entry. Calling kvm_pmu_deliver_pmi() twice is unlikely to be problematic now that KVM sets the LVTPC mask bit when delivering a PMI. But using IRQ work to trigger the PMI is still broken, albeit very theoretically. E.g. if the self-IPI to trigger IRQ work is be delayed long enough for the vCPU to be migrated to a different pCPU, then it's possible for kvm_pmi_trigger_fn() to race with the kvm_pmu_deliver_pmi() from KVM_REQ_PMI and still generate two PMIs. KVM could set the mask bit using an atomic operation, but that'd just be piling on unnecessary code to workaround what is effectively a hack. The *only* reason KVM uses IRQ work is to ensure the PMI is treated as a wake event, e.g. if the vCPU just executed HLT. Remove the irq_work callback for synthesizing a PMI, and all of the logic for invoking it. Instead, to prevent a vcpu from leaving C0 with a PMI pending, add a check for KVM_REQ_PMI to kvm_vcpu_has_events(). Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") Signed-off-by: Jim Mattson <jmattson@google.com> Tested-by: Mingwei Zhang <mizhang@google.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230925173448.3518223-2-mizhang@google.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-25KVM: x86: Mask LVTPC when handling a PMIJim Mattson
Per the SDM, "When the local APIC handles a performance-monitoring counters interrupt, it automatically sets the mask flag in the LVT performance counter register." Add this behavior to KVM's local APIC emulation. Failure to mask the LVTPC entry results in spurious PMIs, e.g. when running Linux as a guest, PMI handlers that do a "late_ack" spew a large number of "dazed and confused" spurious NMI warnings. Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") Cc: stable@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> Tested-by: Mingwei Zhang <mizhang@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230925173448.3518223-3-mizhang@google.com [sean: massage changelog, correct Fixes] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-25KVM: x86/pmu: Truncate counter value to allowed width on writeRoman Kagan
Performance counters are defined to have width less than 64 bits. The vPMU code maintains the counters in u64 variables but assumes the value to fit within the defined width. However, for Intel non-full-width counters (MSR_IA32_PERFCTRx) the value receieved from the guest is truncated to 32 bits and then sign-extended to full 64 bits. If a negative value is set, it's sign-extended to 64 bits, but then in kvm_pmu_incr_counter() it's incremented, truncated, and compared to the previous value for overflow detection. That previous value is not truncated, so it always evaluates bigger than the truncated new one, and a PMI is injected. If the PMI handler writes a negative counter value itself, the vCPU never quits the PMI loop. Turns out that Linux PMI handler actually does write the counter with the value just read with RDPMC, so when no full-width support is exposed via MSR_IA32_PERF_CAPABILITIES, and the guest initializes the counter to a negative value, it locks up. This has been observed in the field, for example, when the guest configures atop to use perfevents and runs two instances of it simultaneously. To address the problem, maintain the invariant that the counter value always fits in the defined bit width, by truncating the received value in the respective set_msr methods. For better readability, factor the out into a helper function, pmc_write_counter(), shared by vmx and svm parts. Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") Cc: stable@vger.kernel.org Signed-off-by: Roman Kagan <rkagan@amazon.de> Link: https://lore.kernel.org/all/20230504120042.785651-1-rkagan@amazon.de Tested-by: Like Xu <likexu@tencent.com> [sean: tweak changelog, s/set/write in the helper] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-25iov_iter, x86: Be consistent about the __user tag on copy_mc_to_user()David Howells
copy_mc_to_user() has the destination marked __user on powerpc, but not on x86; the latter results in a sparse warning in lib/iov_iter.c. Fix this by applying the tag on x86 too. Fixes: ec6347bb4339 ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-3-dhowells@redhat.com cc: Dan Williams <dan.j.williams@intel.com> cc: Thomas Gleixner <tglx@linutronix.de> cc: Ingo Molnar <mingo@redhat.com> cc: Borislav Petkov <bp@alien8.de> cc: Dave Hansen <dave.hansen@linux.intel.com> cc: "H. Peter Anvin" <hpa@zytor.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: x86@kernel.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-25perf/x86/amd: Do not WARN() on every IRQBreno Leitao
Zen 4 systems running buggy microcode can hit a WARN_ON() in the PMI handler, as shown below, several times while perf runs. A simple `perf top` run is enough to render the system unusable: WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0 This happens because the Performance Counter Global Status Register (PerfCntGlobalStatus) has one or more bits set which are considered reserved according to the "AMD64 Architecture Programmer’s Manual, Volume 2: System Programming, 24593": https://www.amd.com/system/files/TechDocs/24593.pdf To make this less intrusive, warn just once if any reserved bit is set and prompt the user to update the microcode. Also sanitize the value to what the code is handling, so that the overflow events continue to be handled for the number of counters that are known to be sane. Going forward, the following microcode patch levels are recommended for Zen 4 processors in order to avoid such issues with reserved bits: Family=0x19 Model=0x11 Stepping=0x01: Patch=0x0a10113e Family=0x19 Model=0x11 Stepping=0x02: Patch=0x0a10123e Family=0x19 Model=0xa0 Stepping=0x01: Patch=0x0aa00116 Family=0x19 Model=0xa0 Stepping=0x02: Patch=0x0aa00212 Commit f2eb058afc57 ("linux-firmware: Update AMD cpu microcode") from the linux-firmware tree has binaries that meet the minimum required patch levels. [ sandipan: - add message to prompt users to update microcode - rework commit message and call out required microcode levels ] Fixes: 7685665c390d ("perf/x86/amd/core: Add PerfMonV2 overflow handling") Reported-by: Jirka Hladky <jhladky@redhat.com> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/3540f985652f41041e54ee82aa53e7dbd55739ae.1694696888.git.sandipan.das@amd.com/
2023-09-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix EL2 Stage-1 MMIO mappings where a random address was used - Fix SMCCC function number comparison when the SVE hint is set RISC-V: - Fix KVM_GET_REG_LIST API for ISA_EXT registers - Fix reading ISA_EXT register of a missing extension - Fix ISA_EXT register handling in get-reg-list test - Fix filtering of AIA registers in get-reg-list test x86: - Fixes for TSC_AUX virtualization - Stop zapping page tables asynchronously, since we don't zap them as often as before" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX KVM: SVM: Fix TSC_AUX virtualization setup KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe() KVM: x86/mmu: Open code leaf invalidation from mmu_notifier KVM: riscv: selftests: Selectively filter-out AIA registers KVM: riscv: selftests: Fix ISA_EXT register handling in get-reg-list RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensions RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registers KVM: selftests: Assert that vasprintf() is successful KVM: arm64: nvhe: Ignore SVE hint in SMCCC function ID KVM: arm64: Properly return allocated EL2 VA from hyp_alloc_private_va_range()
2023-09-24x86_64: Show CR4.PSE on auxiliaries like on BSPHugh Dickins
Set CR4.PSE in secondary_startup_64: the Intel SDM is clear that it does not matter whether it's 0 or 1 when 4-level-pts are enabled, but it's distracting to find CR4 different on BSP and auxiliaries - on x86_64, BSP alone got to add the PSE bit, in probe_page_size_mask(). Peter Zijlstra adds: "I think the point is that PSE bit is completely without meaning in long mode. But yes, having the same CR4 bits set across BSP and APs is definitely sane." Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/103ad03a-8c93-c3e2-4226-f79af4d9a074@google.com
2023-09-24x86/platform/uv: Annotate struct uv_rtc_timer_head with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Found with Coccinelle: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Add __counted_by for struct uv_rtc_timer_head. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922175151.work.118-kees@kernel.org
2023-09-24x86/kgdb: Fix a kerneldoc warning when build with W=1Christophe JAILLET
When compiled with W=1, the following warning is generated: arch/x86/kernel/kgdb.c:698: warning: Cannot understand * on line 698 - I thought it was a doc line Remove the corresponding empty comment line to fix the warning. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/aad659537c1d4ebd86912a6f0be458676c8e69af.1695401178.git.christophe.jaillet@wanadoo.fr
2023-09-23Merge tag 'kvm-riscv-fixes-6.6-1' of https://github.com/kvm-riscv/linux into ↵Paolo Bonzini
HEAD KVM/riscv fixes for 6.6, take #1 - Fix KVM_GET_REG_LIST API for ISA_EXT registers - Fix reading ISA_EXT register of a missing extension - Fix ISA_EXT register handling in get-reg-list test - Fix filtering of AIA registers in get-reg-list test
2023-09-23KVM: SVM: Do not use user return MSR support for virtualized TSC_AUXTom Lendacky
When the TSC_AUX MSR is virtualized, the TSC_AUX value is swap type "B" within the VMSA. This means that the guest value is loaded on VMRUN and the host value is restored from the host save area on #VMEXIT. Since the value is restored on #VMEXIT, the KVM user return MSR support for TSC_AUX can be replaced by populating the host save area with the current host value of TSC_AUX. And, since TSC_AUX is not changed by Linux post-boot, the host save area can be set once in svm_hardware_enable(). This eliminates the two WRMSR instructions associated with the user return MSR support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <d381de38eb0ab6c9c93dda8503b72b72546053d7.1694811272.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: SVM: Fix TSC_AUX virtualization setupTom Lendacky
The checks for virtualizing TSC_AUX occur during the vCPU reset processing path. However, at the time of initial vCPU reset processing, when the vCPU is first created, not all of the guest CPUID information has been set. In this case the RDTSCP and RDPID feature support for the guest is not in place and so TSC_AUX virtualization is not established. This continues for each vCPU created for the guest. On the first boot of an AP, vCPU reset processing is executed as a result of an APIC INIT event, this time with all of the guest CPUID information set, resulting in TSC_AUX virtualization being enabled, but only for the APs. The BSP always sees a TSC_AUX value of 0 which probably went unnoticed because, at least for Linux, the BSP TSC_AUX value is 0. Move the TSC_AUX virtualization enablement out of the init_vmcb() path and into the vcpu_after_set_cpuid() path to allow for proper initialization of the support after the guest CPUID information has been set. With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid() path, the intercepts must be either cleared or set based on the guest CPUID input. Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: SVM: INTERCEPT_RDTSCP is never intercepted anywayPaolo Bonzini
svm_recalc_instruction_intercepts() is always called at least once before the vCPU is started, so the setting or clearing of the RDTSCP intercept can be dropped from the TSC_AUX virtualization support. Extracted from a patch by Tom Lendacky. Cc: stable@vger.kernel.org Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronouslySean Christopherson
Stop zapping invalidate TDP MMU roots via work queue now that KVM preserves TDP MMU roots until they are explicitly invalidated. Zapping roots asynchronously was effectively a workaround to avoid stalling a vCPU for an extended during if a vCPU unloaded a root, which at the time happened whenever the guest toggled CR0.WP (a frequent operation for some guest kernels). While a clever hack, zapping roots via an unbound worker had subtle, unintended consequences on host scheduling, especially when zapping multiple roots, e.g. as part of a memslot. Because the work of zapping a root is no longer bound to the task that initiated the zap, things like the CPU affinity and priority of the original task get lost. Losing the affinity and priority can be especially problematic if unbound workqueues aren't affined to a small number of CPUs, as zapping multiple roots can cause KVM to heavily utilize the majority of CPUs in the system, *beyond* the CPUs KVM is already using to run vCPUs. When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root zap can result in KVM occupying all logical CPUs for ~8ms, and result in high priority tasks not being scheduled in in a timely manner. In v5.15, which doesn't preserve unloaded roots, the issues were even more noticeable as KVM would zap roots more frequently and could occupy all CPUs for 50ms+. Consuming all CPUs for an extended duration can lead to significant jitter throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots is a semi-frequent operation as memslots are deleted and recreated with different host virtual addresses to react to host GPU drivers allocating and freeing GPU blobs. On ChromeOS, the jitter manifests as audio blips during games due to the audio server's tasks not getting scheduled in promptly, despite the tasks having a high realtime priority. Deleting memslots isn't exactly a fast path and should be avoided when possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the memslot shenanigans, but KVM is squarely in the wrong. Not to mention that removing the async zapping eliminates a non-trivial amount of complexity. Note, one of the subtle behaviors hidden behind the async zapping is that KVM would zap invalidated roots only once (ignoring partial zaps from things like mmu_notifier events). Preserve this behavior by adding a flag to identify roots that are scheduled to be zapped versus roots that have already been zapped but not yet freed. Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can encounter invalid roots, as it's not at all obvious why zapping invalidated roots shouldn't simply zap all invalid roots. Reported-by: Pattara Teerapong <pteerapong@google.com> Cc: David Stevens <stevensd@google.com> Cc: Yiwei Zhang<zzyiwei@google.com> Cc: Paul Hsia <paulhsia@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230916003916.2545000-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe()Paolo Bonzini
All callers except the MMU notifier want to process all address spaces. Remove the address space ID argument of for_each_tdp_mmu_root_yield_safe() and switch the MMU notifier to use __for_each_tdp_mmu_root_yield_safe(). Extracted out of a patch by Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-22Merge tag 'x86_urgent_for_v6.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 rethunk fixes from Borislav Petkov: "Fix the patching ordering between static calls and return thunks" * tag 'x86_urgent_for_v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86,static_call: Fix static-call vs return-thunk x86/alternatives: Remove faulty optimization
2023-09-22Merge tag 'x86-urgent-2023-09-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix a kexec bug - Fix an UML build bug - Fix a handful of SRSO related bugs - Fix a shadow stacks handling bug & robustify related code * tag 'x86-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/shstk: Add warning for shadow stack double unmap x86/shstk: Remove useless clone error handling x86/shstk: Handle vfork clone failure correctly x86/srso: Fix SBPB enablement for spec_rstack_overflow=off x86/srso: Don't probe microcode in a guest x86/srso: Set CPUID feature bits independently of bug or mitigation status x86/srso: Fix srso_show_state() side effect x86/asm: Fix build of UML with KASAN x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
2023-09-22x86/hyperv: Add common print prefix "Hyper-V" in hv_initSaurabh Sengar
Add "#define pr_fmt()" in hv_init.c to use "Hyper-V:" as common print prefix for all pr_*() statements in this file. Remove the "Hyper-V:" already prefixed in couple of prints. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/1695123361-8877-1-git-send-email-ssengar@linux.microsoft.com
2023-09-22x86/hyperv: Remove hv_vtl_early_init initcallSaurabh Sengar
There has been cases reported where HYPERV_VTL_MODE is enabled by mistake, on a non Hyper-V platforms. This causes the hv_vtl_early_init function to be called in an non Hyper-V/VTL platforms which results the memory corruption. Remove the early_initcall for hv_vtl_early_init and call it at the end of hyperv_init to make sure it is never called in a non Hyper-V platform by mistake. Reported-by: Mathias Krause <minipli@grsecurity.net> Closes: https://lore.kernel.org/lkml/40467722-f4ab-19a5-4989-308225b1f9f0@grsecurity.net/ Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/1695358720-27681-1-git-send-email-ssengar@linux.microsoft.com
2023-09-22x86/hyperv: Restrict get_vtl to only VTL platformsSaurabh Sengar
When Linux runs in a non-default VTL (CONFIG_HYPERV_VTL_MODE=y), get_vtl() must never fail as its return value is used in negotiations with the host. In the more generic case, (CONFIG_HYPERV_VTL_MODE=n) the VTL is always zero so there's no need to do the hypercall. Make get_vtl() BUG() in case of failure and put the implementation under "if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)" to avoid the call altogether in the most generic use case. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/1695182675-13405-1-git-send-email-ssengar@linux.microsoft.com
2023-09-22x86,static_call: Fix static-call vs return-thunkPeter Zijlstra
Commit 7825451fa4dc ("static_call: Add call depth tracking support") failed to realize the problem fixed there is not specific to call depth tracking but applies to all return-thunk uses. Move the fix to the appropriate place and condition. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Reported-by: David Kaplan <David.Kaplan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2023-09-22x86/alternatives: Remove faulty optimizationJosh Poimboeuf
The following commit 095b8303f383 ("x86/alternative: Make custom return thunk unconditional") made '__x86_return_thunk' a placeholder value. All code setting X86_FEATURE_RETHUNK also changes the value of 'x86_return_thunk'. So the optimization at the beginning of apply_returns() is dead code. Also, before the above-mentioned commit, the optimization actually had a bug It bypassed __static_call_fixup(), causing some raw returns to remain unpatched in static call trampolines. Thus the 'Fixes' tag. Fixes: d2408e043e72 ("x86/alternative: Optimize returns patching") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/16d19d2249d4485d8380fb215ffaae81e6b8119e.1693889988.git.jpoimboe@kernel.org
2023-09-22hardening: Provide Kconfig fragments for basic optionsKees Cook
Inspired by Salvatore Mesoraca's earlier[1] efforts to provide some in-tree guidance for kernel hardening Kconfig options, add a new fragment named "hardening-basic.config" (along with some arch-specific fragments) that enable a basic set of kernel hardening options that have the least (or no) performance impact and remove a reasonable set of legacy APIs. Using this fragment is as simple as running "make hardening.config". More extreme fragments can be added[2] in the future to cover all the recognized hardening options, and more per-architecture files can be added too. For now, document the fragments directly via comments. Perhaps .rst documentation can be generated from them in the future (rather than the other way around). [1] https://lore.kernel.org/kernel-hardening/1536516257-30871-1-git-send-email-s.mesoraca16@gmail.com/ [2] https://github.com/KSPP/linux/issues/14 Cc: Salvatore Mesoraca <s.mesoraca16@gmail.com> Cc: x86@kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-09-22perf/x86/amd/core: Fix overflow reset on hotplugSandipan Das
Kernels older than v5.19 do not support PerfMonV2 and the PMI handler does not clear the overflow bits of the PerfCntrGlobalStatus register. Because of this, loading a recent kernel using kexec from an older kernel can result in inconsistent register states on Zen 4 systems. The PMI handler of the new kernel gets confused and shows a warning when an overflow occurs because some of the overflow bits are set even if the corresponding counters are inactive. These are remnants from overflows that were handled by the older kernel. During CPU hotplug, the PerfCntrGlobalCtl and PerfCntrGlobalStatus registers should always be cleared for PerfMonV2-capable processors. However, a condition used for NB event constaints applicable only to older processors currently prevents this from happening. Move the reset sequence to an appropriate place and also clear the LBR Freeze bit. Fixes: 21d59e3e2c40 ("perf/x86/amd/core: Detect PerfMonV2 support") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/882a87511af40792ba69bb0e9026f19a2e71e8a3.1694696888.git.sandipan.das@amd.com
2023-09-22x86/speculation, objtool: Use absolute relocations for annotationsFangrui Song
.discard.retpoline_safe sections do not have the SHF_ALLOC flag. These sections referencing text sections' STT_SECTION symbols with PC-relative relocations like R_386_PC32 [0] is conceptually not suitable. Newer LLD will report warnings for REL relocations even for relocatable links [1]: ld.lld: warning: vmlinux.a(drivers/i2c/busses/i2c-i801.o):(.discard.retpoline_safe+0x120): has non-ABS relocation R_386_PC32 against symbol '' Switch to absolute relocations instead, which indicate link-time addresses. In a relocatable link, these addresses are also output section offsets, used by checks in tools/objtool/check.c. When linking vmlinux, these .discard.* sections will be discarded, therefore it is not a problem that R_X86_64_32 cannot represent a kernel address. Alternatively, we could set the SHF_ALLOC flag for .discard.* sections, but I think non-SHF_ALLOC for sections to be discarded makes more sense. Note: if we decide to never support REL architectures (e.g. arm, i386), we can utilize R_*_NONE relocations (.reloc ., BFD_RELOC_NONE, sym), making .discard.* sections zero-sized. That said, the section content waste is 4 bytes per entry, much smaller than sizeof(Elf{32,64}_Rel). [0] commit 1c0c1faf5692 ("objtool: Use relative pointers for annotations") [1] https://github.com/ClangBuiltLinux/linux/issues/1937 Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20230920001728.1439947-1-maskray@google.com
2023-09-22x86/cpu: Clear SVM feature if disabled by BIOSPaolo Bonzini
When SVM is disabled by BIOS, one cannot use KVM but the SVM feature is still shown in the output of /proc/cpuinfo. On Intel machines, VMX is cleared by init_ia32_feat_ctl(), so do the same on AMD and Hygon processors. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230921114940.957141-1-pbonzini@redhat.com
2023-09-22x86/bitops: Remove unused __sw_hweight64() assembly implementation on x86-32Ingo Molnar
Header cleanups in the fast-headers tree highlighted that we have an unused assembly implementation for __sw_hweight64(): WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ... __arch_hweight64() on x86-32 is defined in the arch/x86/include/asm/arch_hweight.h header as an inline, using __arch_hweight32(): #ifdef CONFIG_X86_32 static inline unsigned long __arch_hweight64(__u64 w) { return __arch_hweight32((u32)w) + __arch_hweight32((u32)(w >> 32)); } *But* there's also a __sw_hweight64() assembly implementation: arch/x86/lib/hweight.S SYM_FUNC_START(__sw_hweight64) #ifdef CONFIG_X86_64 ... #else /* CONFIG_X86_32 */ /* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */ pushl %ecx call __sw_hweight32 movl %eax, %ecx # stash away result movl %edx, %eax # second part of input call __sw_hweight32 addl %ecx, %eax # result popl %ecx ret #endif But this __sw_hweight64 assembly implementation is unused - and it's essentially doing the same thing that the inline wrapper does. Remove the assembly version and add a comment about it. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org
2023-09-22x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions ↵Ingo Molnar
from <asm/processor.h> to <asm/pgtable.h> <linux/mm.h> relies on these definitions being included first, which is true currently due to historic header spaghetti, but in the future <asm/processor.h> will not guaranteed to be included by the MM code. Move these definitions over into a suitable MM header. This is a preparatory patch for x86 header dependency simplifications and reductions. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21futex: Add sys_futex_requeue()peterz@infradead.org
Finish off the 'simple' futex2 syscall group by adding sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are too numerous to fit into a regular syscall. As such, use struct futex_waitv to pass the 'source' and 'destination' futexes to the syscall. This syscall implements what was previously known as FUTEX_CMP_REQUEUE and uses {val, uaddr, flags} for source and {uaddr, flags} for destination. This design explicitly allows requeueing between different types of futex by having a different flags word per uaddr. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105248.511860556@noisy.programming.kicks-ass.net
2023-09-21futex: Add sys_futex_wait()peterz@infradead.org
To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This syscall implements what was previously known as FUTEX_WAIT_BITSET except it uses 'unsigned long' for the value and bitmask arguments, takes timespec and clockid_t arguments for the absolute timeout and uses FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105248.164324363@noisy.programming.kicks-ass.net
2023-09-21futex: Add sys_futex_wake()peterz@infradead.org
To complement sys_futex_waitv() add sys_futex_wake(). This syscall implements what was previously known as FUTEX_WAKE_BITSET except it uses 'unsigned long' for the bitmask and takes FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230921105247.936205525@noisy.programming.kicks-ass.net