summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-01Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The important one is a change to the way in which we handle protection keys around signal delivery so that we're more closely aligned with the x86 behaviour, however there is also a revert of the previous fix to disable software tag-based KASAN with GCC, since a workaround materialised shortly afterwards. I'd love to say we're done with 6.12, but we're aware of some longstanding fpsimd register corruption issues that we're almost at the bottom of resolving. Summary: - Fix handling of POR_EL0 during signal delivery so that pushing the signal context doesn't fail based on the pkey configuration of the interrupted context and align our user-visible behaviour with that of x86. - Fix a bogus pointer being passed to the CPU hotplug code from the Arm SDEI driver. - Re-enable software tag-based KASAN with GCC by using an alternative implementation of '__no_sanitize_address'" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: signal: Improve POR_EL0 handling to avoid uaccess failures firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() Revert "kasan: Disable Software Tag-Based KASAN with GCC" kasan: Fix Software Tag-Based KASAN with GCC
2024-11-01Merge tag 'vfs-6.12-rc6.iomap' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull iomap fixes from Christian Brauner: "Fixes for iomap to prevent data corruption bugs in the fallocate unshare range implementation of fsdax and a small cleanup to turn iomap_want_unshare_iter() into an inline function" * tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iomap: turn iomap_want_unshare_iter into an inline function fsdax: dax_unshare_iter needs to copy entire blocks fsdax: remove zeroing code from dax_unshare_iter iomap: share iomap_unshare_iter predicate code with fsdax xfs: don't allocate COW extents when unsharing a hole
2024-11-01Merge tag 'vfs-6.12-rc6.fixes' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull filesystem fixes from Christian Brauner: "VFS: - Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set - Add a get_tree_bdev_flags() helper that allows to modify e.g., whether errors are logged into the filesystem context during superblock creation. This is used by erofs to fix a userspace regression where an error is currently logged when its used on a regular file which is an new allowed mode in erofs. netfs: - Fix the sysfs debug path in the documentation. - Fix iov_iter_get_pages*() for folio queues by skipping the page extracation if we're at the end of a folio. afs: - Fix moving subdirectories to different parent directory. autofs: - Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in validate_dev_ioctl(). The actual ioctl number, not the ioctl command needs to be checked for autofs" * tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP autofs: fix thinko in validate_dev_ioctl() iov_iter: Fix iov_iter_get_pages*() for folio_queue afs: Fix missing subdir edit when renamed between parent dirs doc: correcting the debug path for cachefiles erofs: use get_tree_bdev_flags() to avoid misleading messages fs/super.c: introduce get_tree_bdev_flags()
2024-11-01Merge tag 'for-6.12-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more stability fixes. There's one patch adding export of MIPS cmpxchg helper, used in the error propagation fix. - fix error propagation from split bios to the original btrfs bio - fix merging of adjacent extents (normal operation, defragmentation) - fix potential use after free after freeing btrfs device structures" * tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix defrag not merging contiguous extents due to merged extent maps btrfs: fix extent map merging not happening for adjacent extents btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() btrfs: fix error propagation of split bios MIPS: export __cmpxchg_small()
2024-11-01Merge tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Various syzbot fixes, and the more notable ones: - Fix for pointers in an extent overflowing the max (16) on a filesystem with many devices: we were creating too many cached copies when moving data around. Now, we only create at most one cached copy if there's a promote target set. Caching will be a bit broken for reflinked data until 6.13: I have larger series queued up which significantly improves the plumbing for data options down into the extent (bch_extent_rebalance) to fix this. - Fix for deadlock on -ENOSPC on tiny filesystems Allocation from the partial open_bucket list wasn't correctly accounting partial open_buckets as free: this fixes the main cause of tests timing out in the automated tests" * tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefs: bcachefs: Fix NULL ptr dereference in btree_node_iter_and_journal_peek bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get() bcachefs: Fix deadlock on -ENOSPC w.r.t. partial open buckets bcachefs: Don't filter partial list buckets in open_buckets_to_text() bcachefs: Don't keep tons of cached pointers around bcachefs: init freespace inited bits to 0 in bch2_fs_initialize bcachefs: Fix unhandled transaction restart in fallocate bcachefs: Fix UAF in bch2_reconstruct_alloc() bcachefs: fix null-ptr-deref in have_stripes() bcachefs: fix shift oob in alloc_lru_idx_fragmentation bcachefs: Fix invalid shift in validate_sb_layout()
2024-11-01KVM: selftests: Ensure KVM supports AVX for SEV-ES VMSA FPU testSean Christopherson
Verify that KVM's supported XCR0 includes AVX (and earlier features) when running the SEV-ES VMSA XSAVE test. In practice, the issue will likely never pop up, since KVM support for AVX predates KVM support for SEV-ES, but checking for KVM support makes the requirement more obvious. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Drop manual XCR0 configuration from SEV smoke testSean Christopherson
Now that CR4.OSXSAVE and XCR0 are setup by default, drop the manual enabling from the SEV smoke test that validates FPU state can be transferred into the VMSA. In guest_code_xsave(), explicitly set the Requested-Feature Bitmask (RFBM) to exactly XFEATURE_MASK_X87_AVX instead of relying on the host side of things to enable only X87_AVX features in guest XCR0. I.e. match the RFBM for the host XSAVE. Link: https://lore.kernel.org/r/20241003234337.273364-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Drop manual XCR0 configuration from state testSean Christopherson
Now that CR4.OSXSAVE and XCR0 are setup by default, drop the manual enabling from the state test, which is fully redundant with the default behavior. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Drop manual XCR0 configuration from AMX testSean Christopherson
Now that CR4.OSXSAVE and XCR0 are setup by default, drop the manual enabling of OXSAVE and XTILE from the AMX test. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Drop manual CR4.OSXSAVE enabling from CR4/CPUID sync testSean Christopherson
Now that CR4.OSXSAVE is enabled by default, drop the manual enabling from CR4/CPUID sync test and instead assert that CR4.OSXSAVE is enabled. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Verify XCR0 can be "downgraded" and "upgraded"Sean Christopherson
Now that KVM selftests enable all supported XCR0 features by default, add a testcase to the XCR0 vs. CPUID test to verify that the guest can disable everything except the legacy FPU in XCR0, and then re-enable the full feature set, which is kinda sorta what the test did before XCR0 was setup by default. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Configure XCR0 to max supported value by defaultSean Christopherson
To play nice with compilers generating AVX instructions, set CR4.OSXSAVE and configure XCR0 by default when creating selftests vCPUs. Some distros have switched gcc to '-march=x86-64-v3' by default, and while it's hard to find a CPU which doesn't support AVX today, many KVM selftests fail with ==== Test Assertion Failure ==== lib/x86_64/processor.c:570: Unhandled exception in guest pid=72747 tid=72747 errno=4 - Interrupted system call Unhandled exception '0x6' at guest RIP '0x4104f7' due to selftests not enabling AVX by default for the guest. The failure is easy to reproduce elsewhere with: $ make clean && CFLAGS='-march=x86-64-v3' make -j && ./x86_64/kvm_pv_test E.g. gcc-13 with -march=x86-64-v3 compiles this chunk from selftests' kvm_fixup_exception(): regs->rip = regs->r11; regs->r9 = regs->vector; regs->r10 = regs->error_code; into this monstronsity (which is clever, but oof): 405313: c4 e1 f9 6e c8 vmovq %rax,%xmm1 405318: 48 89 68 08 mov %rbp,0x8(%rax) 40531c: 48 89 e8 mov %rbp,%rax 40531f: c4 c3 f1 22 c4 01 vpinsrq $0x1,%r12,%xmm1,%xmm0 405325: 49 89 6d 38 mov %rbp,0x38(%r13) 405329: c5 fa 7f 45 00 vmovdqu %xmm0,0x0(%rbp) Alternatively, KVM selftests could explicitly restrict the compiler to -march=x86-64-v2, but odds are very good that punting on AVX enabling will simply result in tests that "need" AVX doing their own thing, e.g. there are already three or so additional cleanups that can be done on top. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Closes: https://lore.kernel.org/all/20240920154422.2890096-1-vkuznets@redhat.com Reviewed-and-tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Rework OSXSAVE CR4=>CPUID test to play nice with AVX insnsSean Christopherson
Rework the CR4/CPUID sync test to clear CR4.OSXSAVE, do CPUID, and restore CR4.OSXSAVE in assembly, so that there is zero chance of AVX instructions being executed while CR4.OSXSAVE is disabled. This will allow enabling CR4.OSXSAVE by default for selftests vCPUs as a general means of playing nice with AVX instructions. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Mask off OSPKE and OSXSAVE when comparing CPUID entriesSean Christopherson
Mask off OSPKE and OSXSAVE, which are toggled based on corresponding CR4 enabling bits, when comparing vCPU CPUID against KVM's supported CPUID. This will allow setting OSXSAVE by default when creating vCPUs, without causing test failures (KVM doesn't enumerate OSXSAVE=1). Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Precisely mask off dynamic fields in CPUID testSean Christopherson
When comparing vCPU CPUID entries against KVM's supported CPUID, mask off only the dynamic fields/bits instead of skipping the entire entry. Precisely masking bits isn't meaningfully more difficult than skipping entire entries, and will be necessary to maintain test coverage when a future commit enables OSXSAVE by default, i.e. makes one bit in all of CPUID.0x1 dynamic. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241003234337.273364-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Add a testcase for disabling feature MSRs init quirkSean Christopherson
Expand and rename the feature MSRs test to verify KVM's ABI and quirk for initializing feature MSRs. Exempt VM_CR{0,4}_FIXED1 from most tests as KVM intentionally takes full control of the MSRs, e.g. to prevent L1 from running L2 with bogus CR0 and/or CR4 values. Link: https://lore.kernel.org/r/20240802185511.305849-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: selftests: Verify get/set PERF_CAPABILITIES w/o guest PDMC behaviorSean Christopherson
Add another testcase to x86's PMU capabilities test to verify that KVM's handling of userspace accesses to PERF_CAPABILITIES when the vCPU doesn't support the MSR (per the vCPU's CPUID). KVM's (newly established) ABI is that userspace MSR accesses are subject to architectural existence checks, but that if the MSR is advertised as supported _by KVM_, "bad" reads get '0' and writes of '0' are always allowed. Link: https://lore.kernel.org/r/20240802185511.305849-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Remove ordering check b/w MSR_PLATFORM_INFO and MISC_FEATURES_ENABLESSean Christopherson
Drop KVM's odd restriction that disallows clearing CPUID_FAULT in MSR_PLATFORM_INFO if CPL>0 CPUID faulting is enabled in MSR_MISC_FEATURES_ENABLES. KVM generally doesn't require specific ordering when userspace sets MSRs, and the completely arbitrary order of MSRs in emulated_msrs_all means that a userspace that uses KVM's list verbatim could run afoul of the check. Dropping the restriction obviously means that userspace could stuff a nonsensical vCPU model, but that's the case all over KVM. KVM typically restricts userspace MSR writes only when it makes things easier for KVM and/or userspace. Link: https://lore.kernel.org/r/20240802185511.305849-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Reject userspace attempts to access ARCH_CAPABILITIES w/o supportSean Christopherson
Reject userspace accesses to ARCH_CAPABILITIES if the MSR isn't supposed to exist, according to guest CPUID. However, "reject" accesses with KVM_MSR_RET_UNSUPPORTED, so that reads get '0' and writes of '0' are ignored if KVM advertised support ARCH_CAPABILITIES. KVM's ABI is that userspace must set guest CPUID prior to setting MSRs, and that setting MSRs that aren't supposed exist is disallowed (modulo the '0' exemption). Link: https://lore.kernel.org/r/20240802185511.305849-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: VMX: Remove restriction that PMU version > 0 for PERF_CAPABILITIESSean Christopherson
Drop the restriction that the PMU version is non-zero when handling writes to PERF_CAPABILITIES now that KVM unconditionally checks for PDCM support. Link: https://lore.kernel.org/r/20240802185511.305849-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Reject userspace attempts to access PERF_CAPABILITIES w/o PDCMSean Christopherson
Reject userspace accesses to PERF_CAPABILITIES if PDCM isn't set in guest CPUID, i.e. if the vCPU doesn't actually have PERF_CAPABILITIES. But! Do so via KVM_MSR_RET_UNSUPPORTED, so that reads get '0' and writes of '0' are ignored if KVM advertised support PERF_CAPABILITIES. KVM's ABI is that userspace must set guest CPUID prior to setting MSRs, and that setting MSRs that aren't supposed exist is disallowed (modulo the '0' exemption). Link: https://lore.kernel.org/r/20240802185511.305849-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Quirk initialization of feature MSRs to KVM's max configurationSean Christopherson
Add a quirk to control KVM's misguided initialization of select feature MSRs to KVM's max configuration, as enabling features by default violates KVM's approach of letting userspace own the vCPU model, and is actively problematic for MSRs that are conditionally supported, as the vCPU will end up with an MSR value that userspace can't restore. E.g. if the vCPU is configured with PDCM=0, userspace will save and attempt to restore a non-zero PERF_CAPABILITIES, thanks to KVM's meddling. Link: https://lore.kernel.org/r/20240802185511.305849-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Disallow changing MSR_PLATFORM_INFO after vCPU has runSean Christopherson
Tag MSR_PLATFORM_INFO as a feature MSR (because it is), i.e. disallow it from being modified after the vCPU has run. To make KVM's selftest compliant, simply delete the userspace MSR write that restores KVM's original value at the end of the test. Verifying that userspace can write back what it originally read is uninteresting in this particular case, because KVM doesn't enforce _any_ bits in the MSR, i.e. userspace should be able to write any arbitrary value. Link: https://lore.kernel.org/r/20240802185511.305849-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Co-locate initialization of feature MSRs in kvm_arch_vcpu_create()Sean Christopherson
Bunch all of the feature MSR initialization in kvm_arch_vcpu_create() so that it can be easily quirked in a future patch. No functional change intended. Link: https://lore.kernel.org/r/20240802185511.305849-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Document an erratum in KVM_SET_VCPU_EVENTS on Intel CPUsSean Christopherson
Document a flaw in KVM's ABI which lets userspace attempt to inject a "bad" hardware exception event, and thus induce VM-Fail on Intel CPUs. Fixing the flaw is a fool's errand, as AMD doesn't sanity check the validity of the error code, Intel CPUs that support CET relax the check for Protected Mode, userspace can change the mode after queueing an exception, KVM ignores the error code when emulating Real Mode exceptions, and so on and so forth. The VM-Fail itself doesn't harm KVM or the kernel beyond triggering a ratelimited pr_warn(), so just document the oddity. Link: https://lore.kernel.org/r/20240802200420.330769-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: nVMX: fix canonical check of vmcs12 HOST_RIPMaxim Levitsky
HOST_RIP canonical check should check the L1 of CR4.LA57 stored in the vmcs12 rather than the current L1's because it is legal to change the CR4.LA57 value during VM exit from L2 to L1. This is a theoretical bug though, because it is highly unlikely that a VM exit will change the CR4.LA57 from the value it had on VM entry. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-5-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: model canonical checks more preciselyMaxim Levitsky
As a result of a recent investigation, it was determined that x86 CPUs which support 5-level paging, don't always respect CR4.LA57 when doing canonical checks. In particular: 1. MSRs which contain a linear address, allow full 57-bitcanonical address regardless of CR4.LA57 state. For example: MSR_KERNEL_GS_BASE. 2. All hidden segment bases and GDT/IDT bases also behave like MSRs. This means that full 57-bit canonical address can be loaded to them regardless of CR4.LA57, both using MSRS (e.g GS_BASE) and instructions (e.g LGDT). 3. TLB invalidation instructions also allow the user to use full 57-bit address regardless of the CR4.LA57. Finally, it must be noted that the CPU doesn't prevent the user from disabling 5-level paging, even when the full 57-bit canonical address is present in one of the registers mentioned above (e.g GDT base). In fact, this can happen without any userspace help, when the CPU enters SMM mode - some MSRs, for example MSR_KERNEL_GS_BASE are left to contain a non-canonical address in regard to the new mode. Since most of the affected MSRs and all segment bases can be read and written freely by the guest without any KVM intervention, this patch makes the emulator closely follow hardware behavior, which means that the emulator doesn't take in the account the guest CPUID support for 5-level paging, and only takes in the account the host CPU support. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-4-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Add X86EMUL_F_MSR and X86EMUL_F_DT_LOAD to aid canonical checksMaxim Levitsky
Add emulation flags for MSR accesses and Descriptor Tables loads, and pass the new flags as appropriate to emul_is_noncanonical_address(). The flags will be used to perform the correct canonical check, as the type of access affects whether or not CR4.LA57 is consulted when determining the canonical bit. No functional change is intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-3-mlevitsk@redhat.com [sean: split to separate patch, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Route non-canonical checks in emulator through emulate_opsMaxim Levitsky
Add emulate_ops.is_canonical_addr() to perform (non-)canonical checks in the emulator, which will allow extending is_noncanonical_address() to support different flavors of canonical checks, e.g. for descriptor table bases vs. MSRs, without needing duplicate logic in the emulator. No functional change is intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-3-mlevitsk@redhat.com [sean: separate from additional of flags, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: drop x86.h include from cpuid.hMaxim Levitsky
Drop x86.h include from cpuid.h to allow the x86.h to include the cpuid.h instead. Also fix various places where x86.h was implicitly included via cpuid.h Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-2-mlevitsk@redhat.com [sean: fixup a missed include in mtrr.c] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Use '0' for guest RIP if PMI encounters protected guest stateSean Christopherson
Explicitly return '0' for guest RIP when handling a PMI VM-Exit for a vCPU with protected guest state, i.e. when KVM can't read the real RIP. While there is no "right" value, and profiling a protect guest is rather futile, returning the last known RIP is worse than returning obviously "bad" data. E.g. for SEV-ES+, the last known RIP will often point somewhere in the guest's boot flow. Opportunistically add WARNs to effectively assert that the in_kernel() and get_ip() callbacks are restricted to the common PMI handler, as the return values for the protected guest state case are largely arbitrary, i.e. only make any sense whatsoever for PMIs, where the returned values have no functional impact and thus don't truly matter. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241009175002.1118178-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Add lockdep-guarded asserts on register cache usageSean Christopherson
When lockdep is enabled, assert that KVM accesses the register caches if and only if cache fills are guaranteed to consume fresh data, i.e. when KVM when KVM is in control of the code sequence. Concretely, the caches can only be used from task context (synchronous) or when handling a PMI VM-Exit (asynchronous, but only in specific windows where the caches are in a known, stable state). Generally speaking, there are very few flows where reading register state from an asynchronous context is correct or even necessary. So, rather than trying to figure out a generic solution, simply disallow using the caches outside of task context by default, and deal with any future exceptions on a case-by-case basis _if_ they arise. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241009175002.1118178-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Bypass register cache when querying CPL from kvm_sched_out()Sean Christopherson
When querying guest CPL to determine if a vCPU was preempted while in kernel mode, bypass the register cache, i.e. always read SS.AR_BYTES from the VMCS on Intel CPUs. If the kernel is running with full preemption enabled, using the register cache in the preemption path can result in stale and/or uninitialized data being cached in the segment cache. In particular the following scenario is currently possible: - vCPU is just created, and the vCPU thread is preempted before SS.AR_BYTES is written in vmx_vcpu_reset(). - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() => vmx_get_cpl() reads and caches '0' for SS.AR_BYTES. - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't invoke vmx_segment_cache_clear() to invalidate the cache. As a result, KVM retains a stale value in the cache, which can be read, e.g. via KVM_GET_SREGS. Usually this is not a problem because the VMX segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM selftests) reads and writes system registers just after the vCPU was created, _without_ modifying SS.AR_BYTES, userspace will write back the stale '0' value and ultimately will trigger a VM-Entry failure due to incorrect SS segment type. Note, the VM-Enter failure can also be avoided by moving the call to vmx_segment_cache_clear() until after the vmx_vcpu_reset() initializes all segments. However, while that change is correct and desirable (and will come along shortly), it does not address the underlying problem that accessing KVM's register caches from !task context is generally unsafe. In addition to fixing the immediate bug, bypassing the cache for this particular case will allow hardening KVM register caching log to assert that the caches are accessed only when KVM _knows_ it is safe to do so. Fixes: de63ad4cf497 ("KVM: X86: implement the logic for spinlock optimization") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Closes: https://lore.kernel.org/all/20240716022014.240960-3-mlevitsk@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241009175002.1118178-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: AMD's IBPB is not equivalent to Intel's IBPBJim Mattson
From Intel's documentation [1], "CPUID.(EAX=07H,ECX=0):EDX[26] enumerates support for indirect branch restricted speculation (IBRS) and the indirect branch predictor barrier (IBPB)." Further, from [2], "Software that executed before the IBPB command cannot control the predicted targets of indirect branches (4) executed after the command on the same logical processor," where footnote 4 reads, "Note that indirect branches include near call indirect, near jump indirect and near return instructions. Because it includes near returns, it follows that **RSB entries created before an IBPB command cannot control the predicted targets of returns executed after the command on the same logical processor.**" [emphasis mine] On the other hand, AMD's IBPB "may not prevent return branch predictions from being specified by pre-IBPB branch targets" [3]. However, some AMD processors have an "enhanced IBPB" [terminology mine] which does clear the return address predictor. This feature is enumerated by CPUID.80000008:EDX.IBPB_RET[bit 30] [4]. Adjust the cross-vendor features enumerated by KVM_GET_SUPPORTED_CPUID accordingly. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/cpuid-enumeration-and-architectural-msrs.html [2] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html#Footnotes [3] https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1040.html [4] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24594.pdf Fixes: 0c54914d0c52 ("KVM: x86: use Intel speculation bugs and features as derived in generic x86 code") Suggested-by: Venkatesh Srinivas <venkateshs@chromium.org> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241011214353.1625057-5-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Advertise AMD_IBPB_RET to userspaceJim Mattson
This is an inherent feature of IA32_PRED_CMD[0], so it is trivially virtualizable (as long as IA32_PRED_CMD[0] is virtualized). Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241011214353.1625057-4-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Ensure vcpu->mode is loaded from memory in kvm_vcpu_exit_request()Sean Christopherson
Wrap kvm_vcpu_exit_request()'s load of vcpu->mode with READ_ONCE() to ensure the variable is re-loaded from memory, as there is no guarantee the caller provides the necessary annotations to ensure KVM sees a fresh value, e.g. the VM-Exit fastpath could theoretically reuse the pre-VM-Enter value. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20240828232013.768446-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Fix a comment inside __kvm_set_or_clear_apicv_inhibit()Kai Huang
Change svm_vcpu_run() to vcpu_enter_guest() in the comment of __kvm_set_or_clear_apicv_inhibit() to make it reflect the fact. When one thread updates VM's APICv state due to updating the APICv inhibit reasons, it kicks off all vCPUs and makes them wait until the new reason has been updated and can be seen by all vCPUs. There was one WARN() to make sure VM's APICv state is consistent with vCPU's APICv state in the svm_vcpu_run(). Commit ee49a8932971 ("KVM: x86: Move SVM's APICv sanity check to common x86") moved that WARN() to x86 common code vcpu_enter_guest() due to the logic is not unique to SVM, and added comments to both __kvm_set_or_clear_apicv_inhibit() and vcpu_enter_guest() to explain this. However, although the comment in __kvm_set_or_clear_apicv_inhibit() mentioned the WARN(), it seems forgot to reflect that the WARN() had been moved to x86 common, i.e., it still mentioned the svm_vcpu_run() but not vcpu_enter_guest(). Fix it. Note after the change the first line that contains vcpu_enter_guest() exceeds 80 characters, but leave it as is to make the diff clean. Fixes: ee49a8932971 ("KVM: x86: Move SVM's APICv sanity check to common x86") Signed-off-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/e462e7001b8668649347f879c66597d3327dbac2.1728383775.git.kai.huang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: Fix a comment inside kvm_vcpu_update_apicv()Kai Huang
The sentence "... so that KVM can the AVIC doorbell to ..." doesn't have a verb. Fix it. After adding the verb 'use', that line exceeds 80 characters. Thus wrap the 'to' to the next line. Signed-off-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/666e991edf81e1fccfba9466f3fe65965fcba897.1728383775.git.kai.huang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-31mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizesVlastimil Babka
Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") a mmap() of anonymous memory without a specific address hint and of at least PMD_SIZE will be aligned to PMD so that it can benefit from a THP backing page. However this change has been shown to regress some workloads significantly. [1] reports regressions in various spec benchmarks, with up to 600% slowdown of the cactusBSSN benchmark on some platforms. The benchmark seems to create many mappings of 4632kB, which would have merged to a large THP-backed area before commit efa7df3e3bb5 and now they are fragmented to multiple areas each aligned to PMD boundary with gaps between. The regression then seems to be caused mainly due to the benchmark's memory access pattern suffering from TLB or cache aliasing due to the aligned boundaries of the individual areas. Another known regression bisected to commit efa7df3e3bb5 is darktable [2] [3] and early testing suggests this patch fixes the regression there as well. To fix the regression but still try to benefit from THP-friendly anonymous mapping alignment, add a condition that the size of the mapping must be a multiple of PMD size instead of at least PMD size. In case of many odd-sized mapping like the cactusBSSN creates, those will stop being aligned and with gaps between, and instead naturally merge again. Link: https://lkml.kernel.org/r/20241024151228.101841-2-vbabka@suse.cz Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Michael Matz <matz@suse.de> Debugged-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229012 [1] Reported-by: Matthias Bodenbinder <matthias@bodenbinder.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2] Closes: https://lore.kernel.org/all/2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.info/ [3] Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Cc: Rik van Riel <riel@surriel.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Petr Tesarik <ptesarik@suse.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31mm: shrinker: avoid memleak in alloc_shrinker_infoChen Ridong
A memleak was found as below: unreferenced object 0xffff8881010d2a80 (size 32): comm "mkdir", pid 1559, jiffies 4294932666 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 @............... backtrace (crc 2e7ef6fa): [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470 [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0 [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0 [<ffffffff81198dd9>] online_css+0x29/0xa0 [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360 [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0 [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90 [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220 [<ffffffff813e1c97>] do_mkdirat+0x87/0x130 [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70 [<ffffffff81f8c928>] do_syscall_64+0x68/0x140 [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e alloc_shrinker_info(), when shrinker_unit_alloc() returns an errer, the info won't be freed. Just fix it. Link: https://lkml.kernel.org/r/20241025060942.1049263-1-chenridong@huaweicloud.com Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}") Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Wang Weiyang <wangweiyang2@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31.mailmap: update e-mail address for Eugen HristevEugen Hristev
Update e-mail address. Link: https://lkml.kernel.org/r/20241025085848.483149-1-eugen.hristev@linaro.org Signed-off-by: Eugen Hristev <eugen.hristev@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31vmscan,migrate: fix page count imbalance on node stats when demoting pagesGregory Price
When numa balancing is enabled with demotion, vmscan will call migrate_pages when shrinking LRUs. migrate_pages will decrement the the node's isolated page count, leading to an imbalanced count when invoked from (MG)LRU code. The result is dmesg output like such: $ cat /proc/sys/vm/stat_refresh [77383.088417] vmstat_refresh: nr_isolated_anon -103212 [77383.088417] vmstat_refresh: nr_isolated_file -899642 This negative value may impact compaction and reclaim throttling. The following path produces the decrement: shrink_folio_list demote_folio_list migrate_pages migrate_pages_batch migrate_folio_move migrate_folio_done mod_node_page_state(-ve) <- decrement This path happens for SUCCESSFUL migrations, not failures. Typically callers to migrate_pages are required to handle putback/accounting for failures, but this is already handled in the shrink code. When accounting for migrations, instead do not decrement the count when the migration reason is MR_DEMOTION. As of v6.11, this demotion logic is the only source of MR_DEMOTION. Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Gregory Price <gourry@gourry.net> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31mailmap: update Jarkko's email addressesJarkko Sakkinen
Remove my previous work email, and the new one. The previous was never used in the commit log, so there's no good reason to spare it. Link: https://lkml.kernel.org/r/20241025181530.6151-1-jarkko@kernel.org Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Cc: Alex Elder <elder@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Geliang Tang <geliang@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Matt Ranostay <matt@ranostay.sg> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Quentin Monnet <qmo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: - Put the QP netlink dump back in cxgb4, fixes a user visible regression - Don't change the rounding style in mlx5 for user provided rd_atomic values - Resolve a race in bnxt_re around the qp-handle table array * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/bnxt_re: synchronize the qp-handle table array RDMA/bnxt_re: Fix the usage of control path spin locks RDMA/mlx5: Round max_rd_atomic/max_dest_rd_atomic up instead of down RDMA/cxgb4: Dump vendor specific QP details
2024-10-31Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to force a checkpoint when the program's jump history becomes too long (Eduard Zingerman) - Add several fixes to the BPF bits iterator addressing issues like memory leaks and overflow problems (Hou Tao) - Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong) - Fix BPF test infra's LIVE_FRAME frame update after a page has been recycled (Toke Høiland-Jørgensen) - Fix BPF verifier and undo the 40-bytes extra stack space for bpf_fastcall patterns due to various bugs (Eduard Zingerman) - Fix a BPF sockmap race condition which could trigger a NULL pointer dereference in sock_map_link_update_prog (Cong Wang) - Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under the socket lock (Jiayuan Chen) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled selftests/bpf: Add three test cases for bits_iter bpf: Use __u64 to save the bits in bits iterator bpf: Check the validity of nr_words in bpf_iter_bits_new() bpf: Add bpf_mem_alloc_check_size() helper bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() bpf: disallow 40-bytes extra stack for bpf_fastcall patterns selftests/bpf: Add test for trie_get_next_key() bpf: Fix out-of-bounds write in trie_get_next_key() selftests/bpf: Test with a very short loop bpf: Force checkpoint when jmp history is too long bpf: fix filed access without lock sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
2024-10-31Merge tag 'net-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi, bluetooth and netfilter. No known new regressions outstanding. Current release - regressions: - wifi: mt76: do not increase mcu skb refcount if retry is not supported Current release - new code bugs: - wifi: - rtw88: fix the RX aggregation in USB 3 mode - mac80211: fix memory corruption bug in struct ieee80211_chanctx Previous releases - regressions: - sched: - stop qdisc_tree_reduce_backlog on TC_H_ROOT - sch_api: fix xa_insert() error path in tcf_block_get_ext() - wifi: - revert "wifi: iwlwifi: remove retry loops in start" - cfg80211: clear wdev->cqm_config pointer on free - netfilter: fix potential crash in nf_send_reset6() - ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find() - bluetooth: fix null-ptr-deref in hci_read_supported_codecs - eth: mlxsw: add missing verification before pushing Tx header - eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue Previous releases - always broken: - wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower - netfilter: sanitize offset and length before calling skb_checksum() - core: - fix crash when config small gso_max_size/gso_ipv4_max_size - skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension - mptcp: protect sched with rcu_read_lock - eth: ice: fix crash on probe for DPLL enabled E810 LOM - eth: macsec: fix use-after-free while sending the offloading packet - eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data - eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices - eth: mtk_wed: fix path of MT7988 WO firmware" * tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits) net: hns3: fix kernel crash when 1588 is sent on HIP08 devices net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue net: hns3: initialize reset_timer before hclgevf_misc_irq_init() net: hns3: don't auto enable misc vector net: hns3: Resolved the issue that the debugfs query result is inconsistent. net: hns3: fix missing features due to dev->features configuration too early net: hns3: fixed reset failure issues caused by the incorrect reset type net: hns3: add sync command to sync io-pgtable net: hns3: default enable tx bounce buffer when smmu enabled netfilter: nft_payload: sanitize offset and length before calling skb_checksum() net: ethernet: mtk_wed: fix path of MT7988 WO firmware selftests: forwarding: Add IPv6 GRE remote change tests mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address mlxsw: pci: Sync Rx buffers for device mlxsw: pci: Sync Rx buffers for CPU mlxsw: spectrum_ptp: Add missing verification before pushing Tx header net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6() netfilter: Fix use-after-free in get_info() ...
2024-11-01Merge tag 'mediatek-drm-fixes-20241028' of ↵Dave Airlie
https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes Mediatek DRM Fixes - 20241028 1. Fix degradation problem of alpha blending 2. Fix color format MACROs in OVL 3. Fix get efuse issue for MT8188 DPTX 4. Fix potential NULL dereference in mtk_crtc_destroy() 5. Correct dpi power-domains property 6. Add split subschema property constraints Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241028135846.3570-1-chunkuang.hu@kernel.org
2024-11-01Merge tag 'amd-drm-fixes-6.12-2024-10-31' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.12-2024-10-31: amdgpu: - DCN 3.5 fix - Vangogh SMU KASAN fix - SMU 13 profile reporting fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241031151539.3523633-1-alexander.deucher@amd.com
2024-10-31KVM: arm64: Get rid of userspace_irqchip_in_useRaghavendra Rao Ananta
Improper use of userspace_irqchip_in_use led to syzbot hitting the following WARN_ON() in kvm_timer_update_irq(): WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459 kvm_timer_update_irq+0x21c/0x394 Call trace: kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459 kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968 kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264 kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline] kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline] kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695 kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 The following sequence led to the scenario: - Userspace creates a VM and a vCPU. - The vCPU is initialized with KVM_ARM_VCPU_PMU_V3 during KVM_ARM_VCPU_INIT. - Without any other setup, such as vGIC or vPMU, userspace issues KVM_RUN on the vCPU. Since the vPMU is requested, but not setup, kvm_arm_pmu_v3_enable() fails in kvm_arch_vcpu_run_pid_change(). As a result, KVM_RUN returns after enabling the timer, but before incrementing 'userspace_irqchip_in_use': kvm_arch_vcpu_run_pid_change() ret = kvm_arm_pmu_v3_enable() if (!vcpu->arch.pmu.created) return -EINVAL; if (ret) return ret; [...] if (!irqchip_in_kernel(kvm)) static_branch_inc(&userspace_irqchip_in_use); - Userspace ignores the error and issues KVM_ARM_VCPU_INIT again. Since the timer is already enabled, control moves through the following flow, ultimately hitting the WARN_ON(): kvm_timer_vcpu_reset() if (timer->enabled) kvm_timer_update_irq() if (!userspace_irqchip()) ret = kvm_vgic_inject_irq() ret = vgic_lazy_init() if (unlikely(!vgic_initialized(kvm))) if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2) return -EBUSY; WARN_ON(ret); Theoretically, since userspace_irqchip_in_use's functionality can be simply replaced by '!irqchip_in_kernel()', get rid of the static key to avoid the mismanagement, which also helps with the syzbot issue. Cc: <stable@vger.kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-01Merge tag 'drm-misc-fixes-2024-10-31' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: ivpu: - Fix firewall IRQ handling panthor: - Fix firmware initialization wrt page sizes - Fix handling and reporting of dead job groups sched: - Guarantee forward progress via WC_MEM_RECLAIM tests: - Fix memory leak in drm_display_mode_from_cea_vic() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20241031144348.GA7826@linux-2.fritz.box