Age | Commit message (Collapse) | Author |
|
- Drop obsolete references to PPC970 KVM, which was removed 10 years ago.
- Fix incorrect references to non-existing ioctls
- List registers supported by KVM_GET/SET_ONE_REG on s390
- Use rST internal links
- Reorganize the introduction to the API document
|
|
KVM x86 misc changes for 6.13
- Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.
- Quirk KVM's misguided behavior of initialized certain feature MSRs to
their maximum supported feature set, which can result in KVM creating
invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero
value results in the vCPU having invalid state if userspace hides PDCM
from the guest, which can lead to save/restore failures.
- Fix KVM's handling of non-canonical checks for vCPUs that support LA57
to better follow the "architecture", in quotes because the actual
behavior is poorly documented. E.g. most MSR writes and descriptor
table loads ignore CR4.LA57 and operate purely on whether the CPU
supports LA57.
- Bypass the register cache when querying CPL from kvm_sched_out(), as
filling the cache from IRQ context is generally unsafe, and harden the
cache accessors to try to prevent similar issues from occuring in the
future.
- Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
over-advertises SPEC_CTRL when trying to support cross-vendor VMs.
- Minor cleanups
|
|
KVM selftests changes for 6.13
- Enable XFAM-based features by default for all selftests VMs, which will
allow removing the "no AVX" restriction.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- second part of the ucontrol selftest
- cpumodel sanity check selftest
- gen17 cpumodel changes
|
|
Check if the PFCR query reported in userspace coincides with the
kernel reported function list. Right now we don't mask the functions
in the kernel so they have to be the same.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Hariharan Mari <hari55@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-5-brueckner@linux.ibm.com
[frankja@linux.ibm.com: Added commit description]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-5-brueckner@linux.ibm.com>
|
|
The length of the interrupt parameters (IP) are:
a: 2 bytes
b: 4 bytes
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107141024.238916-6-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-6-schlameuss@linux.ibm.com>
|
|
Checkpatch thinks that we're doing a multiplication but we're obviously
not. Fix 4 instances where we adhered to wrong checkpatch advice.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107141024.238916-5-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-5-schlameuss@linux.ibm.com>
|
|
Add a test case verifying KVM_SET_USER_MEMORY_REGION and
KVM_SET_USER_MEMORY_REGION2 cannot be executed on ucontrol VMs.
Executing this test case on not patched kernels will cause a null
pointer dereference in the host kernel.
This is fixed with commit:
commit 7816e58967d0 ("kvm: s390: Reject memory region operations for ucontrol VMs")
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107141024.238916-4-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-4-schlameuss@linux.ibm.com>
|
|
Add a test case manipulating s390 storage keys from within the ucontrol
VM.
Storage key instruction (ISKE, SSKE and RRBE) intercepts and
Keyless-subset facility are disabled on first use, where the skeys are
setup by KVM in non ucontrol VMs.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Link: https://lore.kernel.org/r/20241108091620.289406-1-schlameuss@linux.ibm.com
Acked-by: Janosch Frank <frankja@linux.ibm.com>
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241108091620.289406-1-schlameuss@linux.ibm.com>
|
|
Add a test case verifying basic running and interaction of ucontrol VMs.
Fill the segment and page tables for allocated memory and map memory on
first access.
* uc_map_unmap
Store and load data to mapped and unmapped memory and use pic segment
translation handling to map memory on access.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link:
https://lore.kernel.org/r/20241107141024.238916-2-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-2-schlameuss@linux.ibm.com>
|
|
In 08a7d2525511 ("tools arch x86: Sync the msr-index.h copy with the
kernel sources"), VMX_BASIC_MEM_TYPE_WB was removed. Use X86_MEMTYPE_WB
instead.
Fixes: 08a7d2525511 ("tools arch x86: Sync the msr-index.h copy with the
kernel sources")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Message-ID: <20241106034031.503291-1-jsperbeck@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM x86 and selftests fixes for 6.12:
- Increase the timeout for the memslot performance selftest to avoid false
failures on arm64 and nested x86 platforms.
- Fix a goof in the guest_memfd selftest where a for-loop initialized a
bit mask to zero instead of BIT(0).
- Disable strict aliasing when building KVM selftests to prevent the
compiler from treating things like "u64 *" to "uint64_t *" cases as
undefined behavior, which can lead to nasty, hard to debug failures.
- Force -march=x86-64-v2 for KVM x86 selftests if and only if the uarch
is supported by the compiler.
- When emulating a guest TLB flush for a nested guest, flush vpid01, not
vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if L2 and
L1 are sharing VPID '0' (from L1's perspective).
- Fix a bug in the SNP initialization flow where KVM would return '0' to
userspace instead of -errno on failure.
|
|
Force -march=x86-64-v2 to avoid SSE/AVX instructions if and only if the
uarch definition is supported by the compiler, e.g. gcc 7.5 only supports
x86-64.
Fixes: 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX instructions")
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-and-tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241031045333.1209195-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Disable strict aliasing, as has been done in the kernel proper for decades
(literally since before git history) to fix issues where gcc will optimize
away loads in code that looks 100% correct, but is _technically_ undefined
behavior, and thus can be thrown away by the compiler.
E.g. arm64's vPMU counter access test casts a uint64_t (unsigned long)
pointer to a u64 (unsigned long long) pointer when setting PMCR.N via
u64p_replace_bits(), which gcc-13 detects and optimizes away, i.e. ignores
the result and uses the original PMCR.
The issue is most easily observed by making set_pmcr_n() noinline and
wrapping the call with printf(), e.g. sans comments, for this code:
printf("orig = %lx, next = %lx, want = %lu\n", pmcr_orig, pmcr, pmcr_n);
set_pmcr_n(&pmcr, pmcr_n);
printf("orig = %lx, next = %lx, want = %lu\n", pmcr_orig, pmcr, pmcr_n);
gcc-13 generates:
0000000000401c90 <set_pmcr_n>:
401c90: f9400002 ldr x2, [x0]
401c94: b3751022 bfi x2, x1, #11, #5
401c98: f9000002 str x2, [x0]
401c9c: d65f03c0 ret
0000000000402660 <test_create_vpmu_vm_with_pmcr_n>:
402724: aa1403e3 mov x3, x20
402728: aa1503e2 mov x2, x21
40272c: aa1603e0 mov x0, x22
402730: aa1503e1 mov x1, x21
402734: 940060ff bl 41ab30 <_IO_printf>
402738: aa1403e1 mov x1, x20
40273c: 910183e0 add x0, sp, #0x60
402740: 97fffd54 bl 401c90 <set_pmcr_n>
402744: aa1403e3 mov x3, x20
402748: aa1503e2 mov x2, x21
40274c: aa1503e1 mov x1, x21
402750: aa1603e0 mov x0, x22
402754: 940060f7 bl 41ab30 <_IO_printf>
with the value stored in [sp + 0x60] ignored by both printf() above and
in the test proper, resulting in a false failure due to vcpu_set_reg()
simply storing the original value, not the intended value.
$ ./vpmu_counter_access
Random seed: 0x6b8b4567
orig = 3040, next = 3040, want = 0
orig = 3040, next = 3040, want = 0
==== Test Assertion Failure ====
aarch64/vpmu_counter_access.c:505: pmcr_n == get_pmcr_n(pmcr)
pid=71578 tid=71578 errno=9 - Bad file descriptor
1 0x400673: run_access_test at vpmu_counter_access.c:522
2 (inlined by) main at vpmu_counter_access.c:643
3 0x4132d7: __libc_start_call_main at libc-start.o:0
4 0x413653: __libc_start_main at ??:0
5 0x40106f: _start at ??:0
Failed to update PMCR.N to 0 (received: 6)
Somewhat bizarrely, gcc-11 also exhibits the same behavior, but only if
set_pmcr_n() is marked noinline, whereas gcc-13 fails even if set_pmcr_n()
is inlined in its sole caller.
Cc: stable@vger.kernel.org
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116912
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The loop in test_create_guest_memfd_invalid() that is supposed to test
that nothing is accepted as a valid flag to KVM_CREATE_GUEST_MEMFD was
initializing `flag` as 0 instead of BIT(0). This caused the loop to
immediately exit instead of iterating over BIT(0), BIT(1), ... .
Fixes: 8a89efd43423 ("KVM: selftests: Add basic selftest for guest_memfd()")
Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Reviewed-by: James Gowans <jgowans@amazon.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20241024095956.3668818-1-roypat@amazon.co.uk
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When memslot_perf_test is run nested, first iteration of test_memslot_rw_loop
testcase, sometimes takes more than 2 seconds due to build of shadow page tables.
Following iterations are fast.
To be on the safe side, bump the timeout to 10 seconds.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://lore.kernel.org/r/20241004220153.287459-1-mlevitsk@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM.
The usual collection of singletons - please see the changelogs"
* tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
mm: shrinker: avoid memleak in alloc_shrinker_info
.mailmap: update e-mail address for Eugen Hristev
vmscan,migrate: fix page count imbalance on node stats when demoting pages
mailmap: update Jarkko's email addresses
mm: allow set/clear page_type again
nilfs2: fix potential deadlock with newly created symlinks
Squashfs: fix variable overflow in squashfs_readpage_block
kasan: remove vmalloc_percpu test
tools/mm: -Werror fixes in page-types/slabinfo
mm, swap: avoid over reclaim of full clusters
mm: fix PSWPIN counter for large folios swap-in
mm: avoid VM_BUG_ON when try to map an anon large folio to zero page.
mm/codetag: fix null pointer check logic for ref and tag
mm/gup: stop leaking pinned pages in low memory conditions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 6.12-rc6 that
have been sitting in my tree this week. Included in here are the
following:
- thunderbolt driver fixes for reported issues
- USB typec driver fixes
- xhci driver fixes for reported problems
- dwc2 driver revert for a broken change
- usb phy driver fix
- usbip tool fix
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices
usb: phy: Fix API devm_usb_put_phy() can not release the phy
usb: typec: use cleanup facility for 'altmodes_node'
usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes()
usb: typec: qcom-pmic-typec: fix missing fwnode removal in error path
usb: typec: qcom-pmic-typec: use fwnode_handle_put() to release fwnodes
usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links
Revert "usb: dwc2: Skip clock gating on Broadcom SoCs"
xhci: Fix Link TRB DMA in command ring stopped completion event
xhci: Use pm_runtime_get to prevent RPM on unsupported systems
usbip: tools: Fix detach_port() invalid port error path
thunderbolt: Honor TMU requirements in the domain when setting TMU mode
thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- fix syntax error in frequency calculation arithmetic expression in
intel_pstate run.sh
- add missing cpupower dependency check intel_pstate run.sh
- fix idmap_mount_tree_invalid test failure due to incorrect argument
- fix watchdog-test run leaving the watchdog timer enabled causing
system reboot. With this fix, the test disables the watchdog timer
when it gets terminated with SIGTERM, SIGKILL, and SIGQUIT in
addition to SIGINT
* tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/watchdog-test: Fix system accidentally reset after watchdog-test
selftests/intel_pstate: check if cpupower is installed
selftests/intel_pstate: fix operand expected error
selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Ira Weiny:
"The bulk of these fixes center around an initialization order bug
reported by Gregory Price and some additional fall out from the
debugging effort.
In summary, cxl_acpi and cxl_mem race and previously worked because of
a bus_rescan_devices() while testing without modules built in.
Unfortunately with modules built in the rescan would fail due to the
cxl_port driver being registered late via the build order. Furthermore
it was found bus_rescan_devices() did not guarantee a probe barrier
which CXL was expecting. Additional fixes to cxl-test and decoder
allocation came along as they were found in this debugging effort.
The other fixes are pretty minor but one affects trace point data seen
by user space.
Summary:
- Fix crashes when running with cxl-test code
- Fix Trace DRAM Event Record field decodes
- Fix module/built in initialization order errors
- Fix use after free on decoder shutdowns
- Fix out of order decoder allocations
- Improve cxl-test to better reflect real world systems"
* tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/test: Improve init-order fidelity relative to real-world systems
cxl/port: Prevent out-of-order decoder allocation
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
cxl/port: Fix CXL port initialization order when the subsystem is built-in
cxl/events: Fix Trace DRAM Event Record
cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
|
|
Verify that KVM's supported XCR0 includes AVX (and earlier features) when
running the SEV-ES VMSA XSAVE test. In practice, the issue will likely
never pop up, since KVM support for AVX predates KVM support for SEV-ES,
but checking for KVM support makes the requirement more obvious.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that CR4.OSXSAVE and XCR0 are setup by default, drop the manual
enabling from the SEV smoke test that validates FPU state can be
transferred into the VMSA.
In guest_code_xsave(), explicitly set the Requested-Feature Bitmask (RFBM)
to exactly XFEATURE_MASK_X87_AVX instead of relying on the host side of
things to enable only X87_AVX features in guest XCR0. I.e. match the RFBM
for the host XSAVE.
Link: https://lore.kernel.org/r/20241003234337.273364-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that CR4.OSXSAVE and XCR0 are setup by default, drop the manual
enabling from the state test, which is fully redundant with the default
behavior.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that CR4.OSXSAVE and XCR0 are setup by default, drop the manual
enabling of OXSAVE and XTILE from the AMX test.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that CR4.OSXSAVE is enabled by default, drop the manual enabling from
CR4/CPUID sync test and instead assert that CR4.OSXSAVE is enabled.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that KVM selftests enable all supported XCR0 features by default, add
a testcase to the XCR0 vs. CPUID test to verify that the guest can disable
everything except the legacy FPU in XCR0, and then re-enable the full
feature set, which is kinda sorta what the test did before XCR0 was setup
by default.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
To play nice with compilers generating AVX instructions, set CR4.OSXSAVE
and configure XCR0 by default when creating selftests vCPUs. Some distros
have switched gcc to '-march=x86-64-v3' by default, and while it's hard to
find a CPU which doesn't support AVX today, many KVM selftests fail with
==== Test Assertion Failure ====
lib/x86_64/processor.c:570: Unhandled exception in guest
pid=72747 tid=72747 errno=4 - Interrupted system call
Unhandled exception '0x6' at guest RIP '0x4104f7'
due to selftests not enabling AVX by default for the guest. The failure
is easy to reproduce elsewhere with:
$ make clean && CFLAGS='-march=x86-64-v3' make -j && ./x86_64/kvm_pv_test
E.g. gcc-13 with -march=x86-64-v3 compiles this chunk from selftests'
kvm_fixup_exception():
regs->rip = regs->r11;
regs->r9 = regs->vector;
regs->r10 = regs->error_code;
into this monstronsity (which is clever, but oof):
405313: c4 e1 f9 6e c8 vmovq %rax,%xmm1
405318: 48 89 68 08 mov %rbp,0x8(%rax)
40531c: 48 89 e8 mov %rbp,%rax
40531f: c4 c3 f1 22 c4 01 vpinsrq $0x1,%r12,%xmm1,%xmm0
405325: 49 89 6d 38 mov %rbp,0x38(%r13)
405329: c5 fa 7f 45 00 vmovdqu %xmm0,0x0(%rbp)
Alternatively, KVM selftests could explicitly restrict the compiler to
-march=x86-64-v2, but odds are very good that punting on AVX enabling will
simply result in tests that "need" AVX doing their own thing, e.g. there
are already three or so additional cleanups that can be done on top.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Closes: https://lore.kernel.org/all/20240920154422.2890096-1-vkuznets@redhat.com
Reviewed-and-tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rework the CR4/CPUID sync test to clear CR4.OSXSAVE, do CPUID, and restore
CR4.OSXSAVE in assembly, so that there is zero chance of AVX instructions
being executed while CR4.OSXSAVE is disabled. This will allow enabling
CR4.OSXSAVE by default for selftests vCPUs as a general means of playing
nice with AVX instructions.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Mask off OSPKE and OSXSAVE, which are toggled based on corresponding CR4
enabling bits, when comparing vCPU CPUID against KVM's supported CPUID.
This will allow setting OSXSAVE by default when creating vCPUs, without
causing test failures (KVM doesn't enumerate OSXSAVE=1).
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When comparing vCPU CPUID entries against KVM's supported CPUID, mask off
only the dynamic fields/bits instead of skipping the entire entry.
Precisely masking bits isn't meaningfully more difficult than skipping
entire entries, and will be necessary to maintain test coverage when a
future commit enables OSXSAVE by default, i.e. makes one bit in all of
CPUID.0x1 dynamic.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20241003234337.273364-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Expand and rename the feature MSRs test to verify KVM's ABI and quirk
for initializing feature MSRs.
Exempt VM_CR{0,4}_FIXED1 from most tests as KVM intentionally takes full
control of the MSRs, e.g. to prevent L1 from running L2 with bogus CR0
and/or CR4 values.
Link: https://lore.kernel.org/r/20240802185511.305849-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add another testcase to x86's PMU capabilities test to verify that KVM's
handling of userspace accesses to PERF_CAPABILITIES when the vCPU doesn't
support the MSR (per the vCPU's CPUID). KVM's (newly established) ABI is
that userspace MSR accesses are subject to architectural existence checks,
but that if the MSR is advertised as supported _by KVM_, "bad" reads get
'0' and writes of '0' are always allowed.
Link: https://lore.kernel.org/r/20240802185511.305849-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Tag MSR_PLATFORM_INFO as a feature MSR (because it is), i.e. disallow it
from being modified after the vCPU has run.
To make KVM's selftest compliant, simply delete the userspace MSR write
that restores KVM's original value at the end of the test. Verifying that
userspace can write back what it originally read is uninteresting in this
particular case, because KVM doesn't enforce _any_ bits in the MSR, i.e.
userspace should be able to write any arbitrary value.
Link: https://lore.kernel.org/r/20240802185511.305849-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to force a checkpoint when the program's jump
history becomes too long (Eduard Zingerman)
- Add several fixes to the BPF bits iterator addressing issues like
memory leaks and overflow problems (Hou Tao)
- Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong)
- Fix BPF test infra's LIVE_FRAME frame update after a page has been
recycled (Toke Høiland-Jørgensen)
- Fix BPF verifier and undo the 40-bytes extra stack space for
bpf_fastcall patterns due to various bugs (Eduard Zingerman)
- Fix a BPF sockmap race condition which could trigger a NULL pointer
dereference in sock_map_link_update_prog (Cong Wang)
- Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under
the socket lock (Jiayuan Chen)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
selftests/bpf: Add three test cases for bits_iter
bpf: Use __u64 to save the bits in bits iterator
bpf: Check the validity of nr_words in bpf_iter_bits_new()
bpf: Add bpf_mem_alloc_check_size() helper
bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
bpf: disallow 40-bytes extra stack for bpf_fastcall patterns
selftests/bpf: Add test for trie_get_next_key()
bpf: Fix out-of-bounds write in trie_get_next_key()
selftests/bpf: Test with a very short loop
bpf: Force checkpoint when jmp history is too long
bpf: fix filed access without lock
sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from WiFi, bluetooth and netfilter.
No known new regressions outstanding.
Current release - regressions:
- wifi: mt76: do not increase mcu skb refcount if retry is not
supported
Current release - new code bugs:
- wifi:
- rtw88: fix the RX aggregation in USB 3 mode
- mac80211: fix memory corruption bug in struct ieee80211_chanctx
Previous releases - regressions:
- sched:
- stop qdisc_tree_reduce_backlog on TC_H_ROOT
- sch_api: fix xa_insert() error path in tcf_block_get_ext()
- wifi:
- revert "wifi: iwlwifi: remove retry loops in start"
- cfg80211: clear wdev->cqm_config pointer on free
- netfilter: fix potential crash in nf_send_reset6()
- ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find()
- bluetooth: fix null-ptr-deref in hci_read_supported_codecs
- eth: mlxsw: add missing verification before pushing Tx header
- eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of
bounds issue
Previous releases - always broken:
- wifi: mac80211: do not pass a stopped vif to the driver in
.get_txpower
- netfilter: sanitize offset and length before calling skb_checksum()
- core:
- fix crash when config small gso_max_size/gso_ipv4_max_size
- skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
- mptcp: protect sched with rcu_read_lock
- eth: ice: fix crash on probe for DPLL enabled E810 LOM
- eth: macsec: fix use-after-free while sending the offloading packet
- eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data
- eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices
- eth: mtk_wed: fix path of MT7988 WO firmware"
* tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
net: hns3: fix kernel crash when 1588 is sent on HIP08 devices
net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
net: hns3: initialize reset_timer before hclgevf_misc_irq_init()
net: hns3: don't auto enable misc vector
net: hns3: Resolved the issue that the debugfs query result is inconsistent.
net: hns3: fix missing features due to dev->features configuration too early
net: hns3: fixed reset failure issues caused by the incorrect reset type
net: hns3: add sync command to sync io-pgtable
net: hns3: default enable tx bounce buffer when smmu enabled
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
net: ethernet: mtk_wed: fix path of MT7988 WO firmware
selftests: forwarding: Add IPv6 GRE remote change tests
mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
mlxsw: pci: Sync Rx buffers for device
mlxsw: pci: Sync Rx buffers for CPU
mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
netfilter: Fix use-after-free in get_info()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for net:
1) Remove unused parameters in conntrack_dump_flush.c used by
selftests, from Liu Jing.
2) Fix possible UaF when removing xtables module via getsockopt()
interface, from Dong Chenchen.
3) Fix potential crash in nf_send_reset6() reported by syzkaller.
From Eric Dumazet
4) Validate offset and length before calling skb_checksum()
in nft_payload, otherwise hitting BUG() is possible.
netfilter pull request 24-10-31
* tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
netfilter: Fix use-after-free in get_info()
selftests: netfilter: remove unused parameter
====================
Link: https://patch.msgid.link/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit e6d2c436ff693 ("tools/mm: allow users to provide additional
cflags/ldflags") passes now CFLAGS to Makefile. With this, build systems
with default -Werror enabled found:
slabinfo.c:1300:25: error: ignoring return value of 'chdir'
declared with attribute 'warn_unused_result' [-Werror=unused-result]
chdir("..");
^~~~~~~~~~~
page-types.c:397:35: error: format '%lu' expects argument of type
'long unsigned int', but argument 2 has type 'uint64_t'
{aka 'long long unsigned int'} [-Werror=format=]
printf("%lu\t", mapcnt0);
~~^ ~~~~~~~
..
Fix page-types by using PRIu64 for uint64_t prints and check in slabinfo
for return code on chdir("..").
Link: https://lkml.kernel.org/r/c1ceb507-94bc-461c-934d-c19b77edd825@gmail.com
Fixes: e6d2c436ff69 ("tools/mm: allow users to provide additional cflags/ldflags")
Signed-off-by: Wladislav Wiebe <wladislav.kw@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Herton R. Krzesinski <herton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Test that after changing the remote address of an ip6gre net device
traffic is forwarded as expected. Test with both flat and hierarchical
topologies and with and without an input / output keys.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/02b05246d2cdada0cf2fccffc0faa8a424d0f51b.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Update more header copies with the kernel sources, including const.h,
msr-index.h, arm64's cputype.h, kvm's, bits.h and unaligned.h
- The return from 'write' isn't a pid, fix cut'n'paste error in 'perf
trace'
- Fix up the python binding build on architectures without
HAVE_KVM_STAT_SUPPORT
- Add some more bounds checks to augmented_raw_syscalls.bpf.c (used to
collect syscall pointer arguments in 'perf trace') to make the
resulting bytecode to pass the kernel BPF verifier, allowing us to go
back accepting clang 12.0.1 as the minimum version required for
compiling BPF sources
- Add __NR_capget for x86 to fix a regression on running perf + intel
PT (hw tracing) as non-root setting up the capabilities as described
in https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
- Fix missing syscalltbl in non-explicitly listed architectures,
noticed on ARM 32-bit, that still needs a .tbl generator for the
syscall id<->name tables, should be added for v6.13
- Handle 'perf test' failure when handling broken DWARF for ASM files
* tag 'perf-tools-fixes-for-v6.12-2-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf cap: Add __NR_capget to arch/x86 unistd
tools headers: Update the linux/unaligned.h copy with the kernel sources
tools headers arm64: Sync arm64's cputype.h with the kernel sources
tools headers: Synchronize {uapi/}linux/bits.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORT
perf test: Handle perftool-testsuite_probe failure due to broken DWARF
tools headers UAPI: Sync kvm headers with the kernel sources
perf trace: Fix non-listed archs in the syscalltbl routines
perf build: Change the clang check back to 12.0.1
perf trace augmented_raw_syscalls: Add more checks to pass the verifier
perf trace augmented_raw_syscalls: Add extra array index bounds checking to satisfy some BPF verifiers
perf trace: The return from 'write' isn't a pid
tools headers UAPI: Sync linux/const.h with the kernel headers
|
|
Use of macro ARRAY_SIZE to calculate array size minimizes
the redundant code and improves code reusability.
./tools/testing/selftests/kvm/x86_64/debug_regs.c:169:32-33: WARNING: Use ARRAY_SIZE.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10847
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240913054315.130832-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The macro GUEST_CODE_PIO_PORT is never referenced in the code,
just remove it.
Signed-off-by: Ba Jing <bajing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240903043135.11087-1-bajing@cmss.chinamobile.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add more test cases for bits iterator:
(1) huge word test
Verify the multiplication overflow of nr_bits in bits_iter. Without
the overflow check, when nr_words is 67108865, nr_bits becomes 64,
causing bpf_probe_read_kernel_common() to corrupt the stack.
(2) max word test
Verify correct handling of maximum nr_words value (511).
(3) bad word test
Verify early termination of bits iteration when bits iterator
initialization fails.
Also rename bits_nomem to bits_too_big to better reflect its purpose.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
err is never used, remove it.
Signed-off-by: Liu Jing <liujing@cmss.chinamobile.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Hou Tao reported an issue with bpf_fastcall patterns allowing extra
stack space above MAX_BPF_STACK limit. This extra stack allowance is
not integrated properly with the following verifier parts:
- backtracking logic still assumes that stack can't exceed
MAX_BPF_STACK;
- bpf_verifier_env->scratched_stack_slots assumes only 64 slots are
available.
Here is an example of an issue with precision tracking
(note stack slot -8 tracked as precise instead of -520):
0: (b7) r1 = 42 ; R1_w=42
1: (b7) r2 = 42 ; R2_w=42
2: (7b) *(u64 *)(r10 -512) = r1 ; R1_w=42 R10=fp0 fp-512_w=42
3: (7b) *(u64 *)(r10 -520) = r2 ; R2_w=42 R10=fp0 fp-520_w=42
4: (85) call bpf_get_smp_processor_id#8 ; R0_w=scalar(...)
5: (79) r2 = *(u64 *)(r10 -520) ; R2_w=42 R10=fp0 fp-520_w=42
6: (79) r1 = *(u64 *)(r10 -512) ; R1_w=42 R10=fp0 fp-512_w=42
7: (bf) r3 = r10 ; R3_w=fp0 R10=fp0
8: (0f) r3 += r2
mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx -1
mark_precise: frame0: regs=r2 stack= before 7: (bf) r3 = r10
mark_precise: frame0: regs=r2 stack= before 6: (79) r1 = *(u64 *)(r10 -512)
mark_precise: frame0: regs=r2 stack= before 5: (79) r2 = *(u64 *)(r10 -520)
mark_precise: frame0: regs= stack=-8 before 4: (85) call bpf_get_smp_processor_id#8
mark_precise: frame0: regs= stack=-8 before 3: (7b) *(u64 *)(r10 -520) = r2
mark_precise: frame0: regs=r2 stack= before 2: (7b) *(u64 *)(r10 -512) = r1
mark_precise: frame0: regs=r2 stack= before 1: (b7) r2 = 42
9: R2_w=42 R3_w=fp42
9: (95) exit
This patch disables the additional allowance for the moment.
Also, two test cases are removed:
- bpf_fastcall_max_stack_ok:
it fails w/o additional stack allowance;
- bpf_fastcall_max_stack_fail:
this test is no longer necessary, stack size follows
regular rules, pattern invalidation is checked by other
test cases.
Reported-by: Hou Tao <houtao@huaweicloud.com>
Closes: https://lore.kernel.org/bpf/20241023022752.172005-1-houtao@huaweicloud.com/
Fixes: 5b5f51bff1b6 ("bpf: no_caller_saved_registers attribute for helper calls")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241029193911.1575719-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Instances of scx_ops_bypass() could race each other leading to
misbehavior. Fix by protecting the operation with a spinlock.
- selftest and userspace header fixes
* tag 'sched_ext-for-6.12-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix enq_last_no_enq_fails selftest
sched_ext: Make cast_mask() inline
scx: Fix raciness in scx_ops_bypass()
scx: Fix exit selftest to use custom DSQ
sched_ext: Fix function pointer type mismatches in BPF selftests
selftests/sched_ext: add order-only dependency of runner.o on BPFOBJ
|
|
Add a test for out-of-bounds write in trie_get_next_key() when a full
path from root to leaf exists and bpf_map_get_next_key() is called
with the leaf node. It may crashes the kernel on failure, so please
run in a VM.
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/Zxx4ep78tsbeWPVM@localhost.localdomain
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test added is a simplified reproducer from syzbot report [1].
If verifier does not insert checkpoint somewhere inside the loop,
verification of the program would take a very long time.
This would happen because mark_chain_precision() for register r7 would
constantly trace jump history of the loop back, processing many
iterations for each mark_chain_precision() call.
[1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@google.com/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20241029172641.1042523-2-eddyz87@gmail.com
|
|
The CI occasionaly encounters a failing test run. Example:
# PASS: ipsec tunnel mode for ns1/ns2
# re-run with random mtus: -o 10966 -l 19499 -r 31322
# PASS: flow offloaded for ns1/ns2
[..]
# FAIL: ipsec tunnel ... counter 1157059 exceeds expected value 878489
This script will re-exec itself, on the second run, random MTUs are
chosen for the involved links. This is done so we can cover different
combinations (large mtu on client, small on server, link has lowest
mtu, etc).
Furthermore, file size is random, even for the first run.
Rework this script and always use the same file size on initial run so
that at least the first round can be expected to have reproducible
behavior.
Second round will use random mtu/filesize.
Raise the failure limit to that of the file size, this should avoid all
errneous test errors. Currently, first fin will remove the offload, so if
one peer is already closing remaining data is handled by classic path,
which result in larger-than-expected counter and a test failure.
Given packet path also counts tcp/ip headers, in case offload is
completely broken this test will still fail (as expected).
The test counter limit could be made more strict again in the future
once flowtable can keep a connection in offloaded state until FINs
in both directions were seen.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241022152324.13554-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some additional synchronization is needed on Android ARM64; we see a
deadlock with pthread_create when the parent thread races forward before
the child has a chance to start doing work.
Link: https://lkml.kernel.org/r/20241018171734.2315053-4-edliaw@google.com
Fixes: cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit e61ef21e27e8deed8c474e9f47f4aa7bc37e138c.
uffd_poll_thread may be called by other tests that do not initialize the
pthread_barrier, so this approach is not correct. This will revert to
using atomic_bool instead.
Link: https://lkml.kernel.org/r/20241018171734.2315053-3-edliaw@google.com
Fixes: e61ef21e27e8 ("selftests/mm: replace atomic_bool with pthread_barrier_t")
Signed-off-by: Edward Liaw <edliaw@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|