Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:
- Extend the x86 instruction decoder with APX and
other new instructions
- Misc cleanups
* tag 'perf-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/cstate: Remove unused 'struct perf_cstate_msr'
perf/x86/rapl: Rename 'maxdie' to nr_rapl_pmu and 'dieid' to rapl_pmu_idx
x86/insn: Add support for APX EVEX instructions to the opcode map
x86/insn: Add support for APX EVEX to the instruction decoder logic
x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map
x86/insn: Add support for REX2 prefix to the instruction decoder logic
x86/insn: Add misc new Intel instructions
x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS
x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map
x86/insn: Add Key Locker instructions to the opcode map
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
|
|
Pull rdma updates from Jason Gunthorpe:
"Aside from the usual things this has an arch update for
__iowrite64_copy() used by the RDMA drivers.
This API was intended to generate large 64 byte MemWr TLPs on PCI.
These days most processors had done this by just repeating writel() in
a loop. S390 and some new ARM64 designs require a special helper to
get this to generate.
- Small improvements and fixes for erdma, efa, hfi1, bnxt_re
- Fix a UAF crash after module unload on leaking restrack entry
- Continue adding full RDMA support in mana with support for EQs,
GID's and CQs
- Improvements to the mkey cache in mlx5
- DSCP traffic class support in hns and several bug fixes
- Cap the maximum number of MADs in the receive queue to avoid OOM
- Another batch of rxe bug fixes from large scale testing
- __iowrite64_copy() optimizations for write combining MMIO memory
- Remove NULL checks before dev_put/hold()
- EFA support for receive with immediate
- Fix a recent memleaking regression in a cma error path"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (70 commits)
RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw
RDMA/IPoIB: Fix format truncation compilation errors
bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq
RDMA/efa: Support QP with unsolicited write w/ imm. receive
IB/hfi1: Remove generic .ndo_get_stats64
IB/hfi1: Do not use custom stat allocator
RDMA/hfi1: Use RMW accessors for changing LNKCTL2
RDMA/mana_ib: implement uapi for creation of rnic cq
RDMA/mana_ib: boundary check before installing cq callbacks
RDMA/mana_ib: introduce a helper to remove cq callbacks
RDMA/mana_ib: create and destroy RNIC cqs
RDMA/mana_ib: create EQs for RNIC CQs
RDMA/core: Remove NULL check before dev_{put, hold}
RDMA/ipoib: Remove NULL check before dev_{put, hold}
RDMA/mlx5: Remove NULL check before dev_{put, hold}
RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources.
RDMA/core: Add an option to display driver-specific QPs in the rdmatool
RDMA/efa: Add shutdown notifier
RDMA/mana_ib: Fix missing ret value
IB/mlx5: Use __iowrite64_copy() for write combining stores
...
|
|
__raw_cpu_write() with !USE_X86_SEG_SUPPORT config uses read/write
operand constraint modifier "+" for its memory location. This signals
the compiler that the location is both read and written by the asm.
This is not true, because MOV insn only writes to the output.
Correct the modifier to "=" to inform the compiler that the memory
location is only written to. This also prevents the compiler from
value tracking the undefined value from the uninitialized memory.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240430091833.196482-5-ubizjak@gmail.com
|
|
Introduce the __raw_cpu_read_const() macro to further reduce ifdeffery
and differences between configs w/ and w/o USE_X86_SEG_SUPPORT.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240430091833.196482-4-ubizjak@gmail.com
|
|
Redefine percpu_from_op() and percpu_to_op() as __raw_cpu_read()
and __raw_cpu_write(). Unify __raw_cpu_{read,write}() macros
between configs w/ and w/o USE_X86_SEG_SUPPORT in order to
unify {raw,this}_cpu{read_write}_N() accessors between configs.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240430091833.196482-3-ubizjak@gmail.com
|
|
Move some percpu macros around to make a follow-up
patch more readable.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240430091833.196482-2-ubizjak@gmail.com
|
|
Introduce the pcpu_binary_op() macro, a copy of the percpu_to_op() macro.
Update percpu binary operators to use the new macro, since
percpu_to_op() will be re-purposed as a raw percpu write accessor
in a follow-up patch.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240430091833.196482-1-ubizjak@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings
via prctl, notably NPHIE which controls hashst/hashchk for ROP
protection.
- Install powerpc selftests in sub-directories. Note this changes the
way run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory
add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove
events.
- Other small features, cleanups and fixes.
Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.
* tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
powerpc/fadump: Fix section mismatch warning
powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
powerpc/fadump: update documentation about bootargs_append
powerpc/fadump: pass additional parameters when fadump is active
powerpc/fadump: setup additional parameters for dump capture kernel
powerpc/pseries/fadump: add support for multiple boot memory regions
selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
KVM: PPC: Fix documentation for ppc mmu caps
KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
powerpc/code-patching: Use dedicated memory routines for patching
powerpc/code-patching: Test patch_instructions() during boot
powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
powerpc: Fix typos
powerpc/eeh: Fix spelling of the word "auxillary" and update comment
macintosh/ams: Fix unused variable warning
powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
...
|
|
Pull KVM updates from Paolo Bonzini:
"ARM:
- Move a lot of state that was previously stored on a per vcpu basis
into a per-CPU area, because it is only pertinent to the host while
the vcpu is loaded. This results in better state tracking, and a
smaller vcpu structure.
- Add full handling of the ERET/ERETAA/ERETAB instructions in nested
virtualisation. The last two instructions also require emulating
part of the pointer authentication extension. As a result, the trap
handling of pointer authentication has been greatly simplified.
- Turn the global (and not very scalable) LPI translation cache into
a per-ITS, scalable cache, making non directly injected LPIs much
cheaper to make visible to the vcpu.
- A batch of pKVM patches, mostly fixes and cleanups, as the
upstreaming process seems to be resuming. Fingers crossed!
- Allocate PPIs and SGIs outside of the vcpu structure, allowing for
smaller EL2 mapping and some flexibility in implementing more or
less than 32 private IRQs.
- Purge stale mpidr_data if a vcpu is created after the MPIDR map has
been created.
- Preserve vcpu-specific ID registers across a vcpu reset.
- Various minor cleanups and improvements.
LoongArch:
- Add ParaVirt IPI support
- Add software breakpoint support
- Add mmio trace events support
RISC-V:
- Support guest breakpoints using ebreak
- Introduce per-VCPU mp_state_lock and reset_cntx_lock
- Virtualize SBI PMU snapshot and counter overflow interrupts
- New selftests for SBI PMU and Guest ebreak
- Some preparatory work for both TDX and SNP page fault handling.
This also cleans up the page fault path, so that the priorities of
various kinds of fauls (private page, no memory, write to read-only
slot, etc.) are easier to follow.
x86:
- Minimize amount of time that shadow PTEs remain in the special
REMOVED_SPTE state.
This is a state where the mmu_lock is held for reading but
concurrent accesses to the PTE have to spin; shortening its use
allows other vCPUs to repopulate the zapped region while the zapper
finishes tearing down the old, defunct page tables.
- Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID
field, which is defined by hardware but left for software use.
This lets KVM communicate its inability to map GPAs that set bits
51:48 on hosts without 5-level nested page tables. Guest firmware
is expected to use the information when mapping BARs; this avoids
that they end up at a legal, but unmappable, GPA.
- Fixed a bug where KVM would not reject accesses to MSR that aren't
supposed to exist given the vCPU model and/or KVM configuration.
- As usual, a bunch of code cleanups.
x86 (AMD):
- Implement a new and improved API to initialize SEV and SEV-ES VMs,
which will also be extendable to SEV-SNP.
The new API specifies the desired encryption in KVM_CREATE_VM and
then separately initializes the VM. The new API also allows
customizing the desired set of VMSA features; the features affect
the measurement of the VM's initial state, and therefore enabling
them cannot be done tout court by the hypervisor.
While at it, the new API includes two bugfixes that couldn't be
applied to the old one without a flag day in userspace or without
affecting the initial measurement. When a SEV-ES VM is created with
the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are rejected
once the VMSA has been encrypted. Also, the FPU and AVX state will
be synchronized and encrypted too.
- Support for GHCB version 2 as applicable to SEV-ES guests.
This, once more, is only accessible when using the new
KVM_SEV_INIT2 flow for initialization of SEV-ES VMs.
x86 (Intel):
- An initial bunch of prerequisite patches for Intel TDX were merged.
They generally don't do anything interesting. The only somewhat
user visible change is a new debugging mode that checks that KVM's
MMU never triggers a #VE virtualization exception in the guest.
- Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig
VM-Exit to L1, as per the SDM.
Generic:
- Use vfree() instead of kvfree() for allocations that always use
vcalloc() or __vcalloc().
- Remove .change_pte() MMU notifier - the changes to non-KVM code are
small and Andrew Morton asked that I also take those through the
KVM tree.
The callback was only ever implemented by KVM (which was also the
original user of MMU notifiers) but it had been nonfunctional ever
since calls to set_pte_at_notify were wrapped with
invalidate_range_start and invalidate_range_end... in 2012.
Selftests:
- Enhance the demand paging test to allow for better reporting and
stressing of UFFD performance.
- Convert the steal time test to generate TAP-friendly output.
- Fix a flaky false positive in the xen_shinfo_test due to comparing
elapsed time across two different clock domains.
- Skip the MONITOR/MWAIT test if the host doesn't actually support
MWAIT.
- Avoid unnecessary use of "sudo" in the NX hugepage test wrapper
shell script, to play nice with running in a minimal userspace
environment.
- Allow skipping the RSEQ test's sanity check that the vCPU was able
to complete a reasonable number of KVM_RUNs, as the assert can fail
on a completely valid setup.
If the test is run on a large-ish system that is otherwise idle,
and the test isn't affined to a low-ish number of CPUs, the vCPU
task can be repeatedly migrated to CPUs that are in deep sleep
states, which results in the vCPU having very little net runtime
before the next migration due to high wakeup latencies.
- Define _GNU_SOURCE for all selftests to fix a warning that was
introduced by a change to kselftest_harness.h late in the 6.9
cycle, and because forcing every test to #define _GNU_SOURCE is
painful.
- Provide a global pseudo-RNG instance for all tests, so that library
code can generate random, but determinstic numbers.
- Use the global pRNG to randomly force emulation of select writes
from guest code on x86, e.g. to help validate KVM's emulation of
locked accesses.
- Allocate and initialize x86's GDT, IDT, TSS, segments, and default
exception handlers at VM creation, instead of forcing tests to
manually trigger the related setup.
Documentation:
- Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (225 commits)
selftests/kvm: remove dead file
KVM: selftests: arm64: Test vCPU-scoped feature ID registers
KVM: selftests: arm64: Test that feature ID regs survive a reset
KVM: selftests: arm64: Store expected register value in set_id_regs
KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope
KVM: arm64: Only reset vCPU-scoped feature ID regs once
KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs()
KVM: arm64: Rename is_id_reg() to imply VM scope
KVM: arm64: Destroy mpidr_data for 'late' vCPU creation
KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support
KVM: arm64: Fix hvhe/nvhe early alias parsing
KVM: SEV: Allow per-guest configuration of GHCB protocol version
KVM: SEV: Add GHCB handling for termination requests
KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
KVM: SEV: Add support to handle AP reset MSR protocol
KVM: x86: Explicitly zero kvm_caps during vendor module load
KVM: x86: Fully re-initialize supported_mce_cap on vendor module load
KVM: x86: Fully re-initialize supported_vm_types on vendor module load
KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns
KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Use no_printk() instead of "if (0) printk()" constructs to avoid
generating printk index for messages disabled at compile time
- Remove deprecated strncpy/strcpy from printk.c
- Remove redundant CONFIG_BASE_FULL in favor of CONFIG_BASE_SMALL
* tag 'printk-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: cleanup deprecated uses of strncpy/strcpy
printk: Remove redundant CONFIG_BASE_FULL
printk: Change type of CONFIG_BASE_SMALL to bool
printk: Fix LOG_CPU_MAX_BUF_SHIFT when BASE_SMALL is enabled
ceph: Use no_printk() helper
dyndbg: Use *no_printk() helpers
dev_printk: Add and use dev_no_printk()
printk: Let no_printk() use _printk()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These are ACPICA updates coming from the 20240322 release upstream, an
ACPI DPTF driver update adding new platform support for it, some new
quirks and some assorted fixes and cleanups.
Specifics:
- Add EINJ CXL error types to actbl1.h (Ben Cheatham)
- Add support for RAS2 table to ACPICA (Shiju Jose)
- Fix various spelling mistakes in text files and code comments in
ACPICA (Colin Ian King)
- Fix spelling and typos in ACPICA (Saket Dumbre)
- Modify ACPI_OBJECT_COMMON_HEADER (lijun)
- Add RISC-V RINTC affinity structure support to ACPICA (Haibo Xu)
- Fix CXL 3.0 structure (RDPAS) in the CEDT table (Hojin Nam)
- Add missin increment of registered GPE count to ACPICA (Daniil
Tatianin)
- Mark new ACPICA release 20240322 (Saket Dumbre)
- Add support for the AEST V2 table to ACPICA (Ruidong Tian)
- Disable -Wstringop-truncation for some ACPICA code in the kernel to
avoid a compiler warning that is not very useful (Arnd Bergmann)
- Make the kernel indicate support for several ACPI features that are
in fact supported to the platform firmware through _OSC and fix the
Generic Initiator Affinity _OSC bit (Armin Wolf)
- Make the ACPI core set the owner value for ACPI drivers, drop the
owner setting from a number of drivers and eliminate the owner
field from struct acpi_driver (Krzysztof Kozlowski)
- Rearrange fields in several structures to effectively eliminate
computations from container_of() in some cases (Andy Shevchenko)
- Do some assorted cleanups of the ACPI device enumeration code (Andy
Shevchenko)
- Make the ACPI device enumeration code skip devices with _STA values
clearly identified by the specification as invalid (Rafael Wysocki)
- Rework the handling of the NHLT table to simplify and clarify it
and drop some obsolete pieces (Cezary Rojewski)
- Add ACPI IRQ override quirks for Asus Vivobook Pro N6506MV,
TongFang GXxHRXx and GMxHGxx, and XMG APEX 17 M23 (Guenter
Schafranek, Tamim Khan, Christoffer Sandberg)
- Add reference to UEFI DSD Guide to the documentation related to the
ACPI handling of device properties (Sakari Ailus)
- Fix SRAT lookup of CFMWS ranges with numa_fill_memblks(), remove
lefover architecture-dependent code from the ACPI NUMA handling
code and simplify it on top of that (Robert Richter)
- Add a num-cs device property to specify the number of chip selects
for Intel Braswell to the ACPI LPSS (Intel SoC) driver and remove a
nested CONFIG_PM #ifdef from it (Andy Shevchenko)
- Move three x86-specific ACPI files to the x86 directory (Andy
Shevchenko)
- Mark SMO8810 accel on Dell XPS 15 9550 as always present and add a
PNP_UART1_SKIP quirk for Lenovo Blade2 tablets (Hans de Goede)
- Move acpi_blacklisted() declaration to asm/acpi.h (Kuppuswamy
Sathyanarayanan)
- Add Lunar Lake support to the ACPI DPTF driver (Sumeet Pawnikar)
- Mark the einj_driver driver's remove callback as __exit because it
cannot get unbound via sysfs (Uwe Kleine-König)
- Fix a typo in the ACPI documentation regarding the layout of sysfs
subdirectory representing the ACPI namespace (John Watts)
- Make the ACPI pfrut utility print the update_cap field during
capability query (Chen Yu)
- Add HAS_IOPORT dependencies to PNP (Niklas Schnelle)"
* tag 'acpi-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
ACPI/NUMA: Squash acpi_numa_memory_affinity_init() into acpi_parse_memory_affinity()
ACPI/NUMA: Squash acpi_numa_slit_init() into acpi_parse_slit()
ACPI/NUMA: Remove architecture dependent remainings
x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()
ACPI: video: Add backlight=native quirk for Lenovo Slim 7 16ARH7
ACPI: scan: Avoid enumerating devices with clearly invalid _STA values
ACPI: Move acpi_blacklisted() declaration to asm/acpi.h
ACPI: resource: Skip IRQ override on Asus Vivobook Pro N6506MV
ACPICA: AEST: Add support for the AEST V2 table
ACPI: tools: pfrut: Print the update_cap field during capability query
ACPI: property: Add reference to UEFI DSD Guide
Documentation: firmware-guide: ACPI: Fix namespace typo
PNP: add HAS_IOPORT dependencies
ACPI: resource: Do IRQ override on TongFang GXxHRXx and GMxHGxx
ACPI: resource: Do IRQ override on GMxBGxx (XMG APEX 17 M23)
ACPICA: Update acpixf.h for new ACPICA release 20240322
ACPICA: events/evgpeinit: don't forget to increment registered GPE count
ACPICA: Fix CXL 3.0 structure (RDPAS) in the CEDT table
ACPICA: SRAT: Add dump and compiler support for RINTC affinity structure
ACPICA: SRAT: Add RISC-V RINTC affinity structure
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 interrupt handling updates from Thomas Gleixner:
"Add support for posted interrupts on bare metal.
Posted interrupts is a virtualization feature which allows to inject
interrupts directly into a guest without host interaction. The VT-d
interrupt remapping hardware sets the bit which corresponds to the
interrupt vector in a vector bitmap which is either used to inject the
interrupt directly into the guest via a virtualized APIC or in case
that the guest is scheduled out provides a host side notification
interrupt which informs the host that an interrupt has been marked
pending in the bitmap.
This can be utilized on bare metal for scenarios where multiple
devices, e.g. NVME storage, raise interrupts with a high frequency. In
the default mode these interrupts are handles independently and
therefore require a full roundtrip of interrupt entry/exit.
Utilizing posted interrupts this roundtrip overhead can be avoided by
coalescing these interrupt entries to a single entry for the posted
interrupt notification. The notification interrupt then demultiplexes
the pending bits in a memory based bitmap and invokes the
corresponding device specific handlers.
Depending on the usage scenario and device utilization throughput
improvements between 10% and 130% have been measured.
As this is only relevant for high end servers with multiple device
queues per CPU attached and counterproductive for situations where
interrupts are arriving at distinct times, the functionality is opt-in
via a kernel command line parameter"
* tag 'x86-irq-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Use existing helper for pending vector check
iommu/vt-d: Enable posted mode for device MSIs
iommu/vt-d: Make posted MSI an opt-in command line option
x86/irq: Extend checks for pending vectors to posted interrupts
x86/irq: Factor out common code for checking pending interrupts
x86/irq: Install posted MSI notification handler
x86/irq: Factor out handler invocation from common_interrupt()
x86/irq: Set up per host CPU posted interrupt descriptors
x86/irq: Reserve a per CPU IDT vector for posted MSIs
x86/irq: Add a Kconfig option for posted MSI
x86/irq: Remove bitfields in posted interrupt descriptor
x86/irq: Unionize PID.PIR for 64bit access w/o casting
KVM: VMX: Move posted interrupt descriptor out of VMX code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
"Core code:
- Make timekeeping and VDSO time readouts resilent against math
overflow:
In guest context the kernel is prone to math overflow when the host
defers the timer interrupt due to overload, malfunction or malice.
This can be mitigated by checking the clocksource delta for the
maximum deferrement which is readily available. If that value is
exceeded then the code uses a slowpath function which can handle
the multiplication overflow.
This functionality is enabled unconditionally in the kernel, but
made conditional in the VDSO code. The latter is conditional
because it allows architectures to optimize the check so it is not
causing performance regressions.
On X86 this is achieved by reworking the existing check for
negative TSC deltas as a negative delta obviously exceeds the
maximum deferrement when it is evaluated as an unsigned value. That
avoids two conditionals in the hotpath and allows to hide both the
negative delta and the large delta handling in the same slow path.
- Add an initial minimal ktime_t abstraction for Rust
- The usual boring cleanups and enhancements
Drivers:
- Boring updates to device trees and trivial enhancements in various
drivers"
* tag 'timers-core-2024-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
clocksource/drivers/arm_arch_timer: Mark hisi_161010101_oem_info const
clocksource/drivers/timer-ti-dm: Remove an unused field in struct dmtimer
clocksource/drivers/renesas-ostm: Avoid reprobe after successful early probe
clocksource/drivers/renesas-ostm: Allow OSTM driver to reprobe for RZ/V2H(P) SoC
dt-bindings: timer: renesas: ostm: Document Renesas RZ/V2H(P) SoC
rust: time: doc: Add missing C header links
clocksource: Make the int help prompt unit readable in ncurses
hrtimer: Rename __hrtimer_hres_active() to hrtimer_hres_active()
timerqueue: Remove never used function timerqueue_node_expires()
rust: time: Add Ktime
vdso: Fix powerpc build U64_MAX undeclared error
clockevents: Convert s[n]printf() to sysfs_emit()
clocksource: Convert s[n]printf() to sysfs_emit()
clocksource: Make watchdog and suspend-timing multiplication overflow safe
timekeeping: Let timekeeping_cycles_to_ns() handle both under and overflow
timekeeping: Make delta calculation overflow safe
timekeeping: Prepare timekeeping_cycles_to_ns() for overflow safety
timekeeping: Fold in timekeeping_delta_to_ns()
timekeeping: Consolidate timekeeping helpers
timekeeping: Refactor timekeeping helpers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Small cleanups and improvements
* tag 'x86_sev_for_v6.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Make the VMPL0 checking more straight forward
x86/sev: Rename snp_init() in boot/compressed/sev.c
x86/sev: Shorten struct name snp_secrets_page_layout to snp_secrets_page
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm alternatives updates from Borislav Petkov:
- Switch the in-place instruction patching which lead to at least one
weird bug with 32-bit guests, seeing stale instruction bytes, to one
working on a buffer, like the rest of the alternatives code does
- Add a long overdue check to the X86_FEATURE flag modifying functions
to warn when former get changed in a non-compatible way after
alternatives have been patched because those changes will be already
wrong
- Other cleanups
* tag 'x86_alternatives_for_v6.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Remove alternative_input_2()
x86/alternatives: Sort local vars in apply_alternatives()
x86/alternatives: Optimize optimize_nops()
x86/alternatives: Get rid of __optimize_nops()
x86/alternatives: Use a temporary buffer when optimizing NOPs
x86/alternatives: Catch late X86_FEATURE modifiers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
- Improve the DeviceTree (OF) NUMA enumeration code to address
kernel warnings & mis-mappings on DeviceTree platforms
- Migrate x86 platform drivers to the .remove_new callback API
- Misc cleanups & fixes
* tag 'x86-platform-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/olpc-xo1-sci: Convert to platform remove callback returning void
x86/platform/olpc-x01-pm: Convert to platform remove callback returning void
x86/platform/iris: Convert to platform remove callback returning void
x86/of: Change x86_dtb_parse_smp_config() to static
x86/of: Map NUMA node to CPUs as per DeviceTree
x86/of: Set the parse_smp_cfg for all the DeviceTree platforms by default
x86/hyperv/vtl: Correct x86_init.mpparse.parse_smp_cfg assignment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 percpu updates from Ingo Molnar:
- Expand the named address spaces optimizations down to
GCC 9.1+.
- Re-enable named address spaces with sanitizers for GCC 13.3+
- Generate better this_percpu_xchg_op() code
- Introduce raw_cpu_read_long() to reduce ifdeffery
- Simplify the x86_this_cpu_test_bit() et al macros
- Address Sparse warnings
- Misc cleanups & fixes
* tag 'x86-percpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/percpu: Introduce raw_cpu_read_long() to reduce ifdeffery
x86/percpu: Rewrite x86_this_cpu_test_bit() and friends as macros
x86/percpu: Fix x86_this_cpu_variable_test_bit() asm template
x86/percpu: Re-enable named address spaces with sanitizers for GCC 13.3+
x86/percpu: Use __force to cast from __percpu address space
x86/percpu: Do not use this_cpu_read_stable_8() for 32-bit targets
x86/percpu: Unify arch_raw_cpu_ptr() defines
x86/percpu: Enable named address spaces for GCC 9.1+
x86/percpu: Re-enable named address spaces with KASAN for GCC 13.3+
x86/percpu: Move raw_percpu_xchg_op() to a better place
x86/percpu: Convert this_percpu_xchg_op() from asm() to C code, to generate better code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
- Fix W^X violation check false-positives in the CPA code
when running as a Xen PV guest
- Fix W^X violation warning false-positives in show_fault_oops()
* tag 'x86-mm-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pat: Fix W^X violation false-positives when running as Xen PV guest
x86/pat: Restructure _lookup_address_cpa()
x86/mm: Use lookup_address_in_pgd_attr() in show_fault_oops()
x86/pat: Introduce lookup_address_in_pgd_attr()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
- Fix asm() constraints & modifiers in restore_fpregs_from_fpstate()
- Update comments
- Robustify the free_vm86() definition
* tag 'x86-fpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Update fpu_swap_kvm_fpu() uses in comments as well
x86/vm86: Make sure the free_vm86(task) definition uses its parameter even in the !CONFIG_VM86 case
x86/fpu: Fix AMD X86_BUG_FXSAVE_LEAK fixup
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
- Rework the x86 CPU vendor/family/model code: introduce the 'VFM'
value that is an 8+8+8 bit concatenation of the vendor/family/model
value, and add macros that work on VFM values. This simplifies the
addition of new Intel models & families, and simplifies existing
enumeration & quirk code.
- Add support for the AMD 0x80000026 leaf, to better parse topology
information
- Optimize the NUMA allocation layout of more per-CPU data structures
- Improve the workaround for AMD erratum 1386
- Clear TME from /proc/cpuinfo as well, when disabled by the firmware
- Improve x86 self-tests
- Extend the mce_record tracepoint with the ::ppin and ::microcode fields
- Implement recovery for MCE errors in TDX/SEAM non-root mode
- Misc cleanups and fixes
* tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/mm: Switch to new Intel CPU model defines
x86/tsc_msr: Switch to new Intel CPU model defines
x86/tsc: Switch to new Intel CPU model defines
x86/cpu: Switch to new Intel CPU model defines
x86/resctrl: Switch to new Intel CPU model defines
x86/microcode/intel: Switch to new Intel CPU model defines
x86/mce: Switch to new Intel CPU model defines
x86/cpu: Switch to new Intel CPU model defines
x86/cpu/intel_epb: Switch to new Intel CPU model defines
x86/aperfmperf: Switch to new Intel CPU model defines
x86/apic: Switch to new Intel CPU model defines
perf/x86/msr: Switch to new Intel CPU model defines
perf/x86/intel/uncore: Switch to new Intel CPU model defines
perf/x86/intel/pt: Switch to new Intel CPU model defines
perf/x86/lbr: Switch to new Intel CPU model defines
perf/x86/intel/cstate: Switch to new Intel CPU model defines
x86/bugs: Switch to new Intel CPU model defines
x86/bugs: Switch to new Intel CPU model defines
x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h
x86/cpu/vfm: Add new macros to work with (vendor/family/model) values
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
- Fix function prototypes to address clang function type cast
warnings in the math-emu code
- Reorder definitions in <asm/msr-index.h>
- Remove unused code
- Fix typos
- Simplify #include sections
* tag 'x86-cleanups-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pci/ce4100: Remove unused 'struct sim_reg_op'
x86/msr: Move ARCH_CAP_XAPIC_DISABLE bit definition to its rightful place
x86/math-emu: Fix function cast warnings
x86/extable: Remove unused fixup type EX_TYPE_COPY
x86/rtc: Remove unused intel-mid.h
x86/32: Remove unused IA32_STACK_TOP and two externs
x86/head: Simplify relative include path to xen-head.S
x86/fred: Fix typo in Kconfig description
x86/syscall/compat: Remove ia32_unistd.h
x86/syscall/compat: Remove unused macro __SYSCALL_ia32_NR
x86/virt/tdx: Remove duplicate include
x86/xen: Remove duplicate #include
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
- Use -fpic to build the kexec 'purgatory' (the self-contained
code that runs between two kernels)
- Clean up vmlinux.lds.S generation
- Simplify the X86_EXTENDED_PLATFORM section of the x86 Kconfig
- Misc cleanups & fixes
* tag 'x86-build-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Merge the two CONFIG_X86_EXTENDED_PLATFORM entries
x86/purgatory: Switch to the position-independent small code model
x86/boot: Replace __PHYSICAL_START with LOAD_PHYSICAL_ADDR
x86/vmlinux.lds.S: Take __START_KERNEL out conditional definition
x86/vmlinux.lds.S: Remove conditional definition of LOAD_OFFSET
vmlinux.lds.h: Fix a typo in comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
- Move the kernel cmdline setup earlier in the boot process (again),
to address a split_lock_detect= boot parameter bug
- Ignore relocations in .notes sections
- Simplify boot stack setup
- Re-introduce a bootloader quirk wrt CR4 handling
- Miscellaneous cleanups & fixes
* tag 'x86-boot-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57
x86/boot: Move kernel cmdline setup earlier in the boot process (again)
x86/build: Clean up arch/x86/tools/relocs.c a bit
x86/boot: Ignore relocations in .notes sections in walk_relocs() too
x86: Rename __{start,end}_init_task to __{start,end}_init_stack
x86/boot: Simplify boot stack setup
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
- Clean up & fix asm() operand modifiers & constraints
- Misc cleanups
* tag 'x86-asm-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Remove a superfluous newline in _static_cpu_has()
x86/asm/64: Clean up memset16(), memset32(), memset64() assembly constraints in <asm/string_64.h>
x86/asm: Use "m" operand constraint in WRUSSQ asm template
x86/asm: Use %a instead of %P operand modifier in asm templates
x86/asm: Use %c/%n instead of %P operand modifier in asm templates
x86/asm: Remove %P operand modifier from altinstr asm templates
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Over a dozen code generation micro-optimizations for the atomic
and spinlock code
- Add more __ro_after_init attributes
- Robustify the lockdevent_*() macros
* tag 'locking-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/pvqspinlock/x86: Use _Q_LOCKED_VAL in PV_UNLOCK_ASM macro
locking/qspinlock/x86: Micro-optimize virt_spin_lock()
locking/atomic/x86: Merge __arch{,_try}_cmpxchg64_emu_local() with __arch{,_try}_cmpxchg64_emu()
locking/atomic/x86: Introduce arch_try_cmpxchg64_local()
locking/pvqspinlock/x86: Remove redundant CMP after CMPXCHG in __raw_callee_save___pv_queued_spin_unlock()
locking/pvqspinlock: Use try_cmpxchg() in qspinlock_paravirt.h
locking/pvqspinlock: Use try_cmpxchg_acquire() in trylock_clear_pending()
locking/qspinlock: Use atomic_try_cmpxchg_relaxed() in xchg_tail()
locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions
locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
locking/atomic/x86: Introduce arch_atomic64_read_nonatomic() to x86_32
locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg() to x86_32
locking/atomic/x86: Introduce arch_try_cmpxchg64() for !CONFIG_X86_CMPXCHG64
locking/atomic/x86: Modernize x86_32 arch_{,try_}_cmpxchg64{,_local}()
locking/atomic/x86: Correct the definition of __arch_try_cmpxchg128()
x86/tsc: Make __use_tsc __ro_after_init
x86/kvm: Make kvm_async_pf_enabled __ro_after_init
context_tracking: Make context_tracking_key __ro_after_init
jump_label,module: Don't alloc static_key_mod for __ro_after_init keys
locking/qspinlock: Always evaluate lockevent* non-event parameter once
|
|
Merge x86-specific ACPI updates, an ACPI DPTF driver update adding new
platform support to it, and an ACPI APEI update:
- Add a num-cs device property to specify the number of chip selects
for Intel Braswell to the ACPI LPSS (Intel SoC) driver and remove a
nested CONFIG_PM #ifdef from it (Andy Shevchenko).
- Move three x86-specific ACPI files to the x86 directory (Andy
Shevchenko).
- Mark SMO8810 accel on Dell XPS 15 9550 as always present and add a
PNP_UART1_SKIP quirk for Lenovo Blade2 tablets (Hans de Goede).
- Move acpi_blacklisted() declaration to asm/acpi.h (Kuppuswamy
Sathyanarayanan).
- Add Lunar Lake support to the ACPI DPTF driver (Sumeet Pawnikar).
- Mark the einj_driver driver's remove callback as __exit because it
cannot get unbound via sysfs (Uwe Kleine-König).
* acpi-x86:
ACPI: Move acpi_blacklisted() declaration to asm/acpi.h
ACPI: x86: Add PNP_UART1_SKIP quirk for Lenovo Blade2 tablets
ACPI: x86: utils: Mark SMO8810 accel on Dell XPS 15 9550 as always present
ACPI: x86: Move LPSS to x86 folder
ACPI: x86: Move blacklist to x86 folder
ACPI: x86: Move acpi_cmos_rtc to x86 folder
ACPI: x86: Introduce a Makefile
ACPI: LPSS: Remove nested ifdeffery for CONFIG_PM
ACPI: LPSS: Advertise number of chip selects via property
* acpi-dptf:
ACPI: DPTF: Add Lunar Lake support
* acpi-apei:
ACPI: APEI: EINJ: mark remove callback as __exit
|
|
Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.
A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.
For CREATE
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID. The GPA is saved in the svm struct of the
target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
to the vCPU and then the vCPU is kicked.
For CREATE_ON_INIT:
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID the next time an INIT is performed. The GPA is
saved in the svm struct of the target vCPU.
For DESTROY:
The guest indicates it wishes to stop the vCPU. The GPA is cleared
from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
added to vCPU and then the vCPU is kicked.
The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. If a new VMSA is to
be installed, the VMSA guest page is set as the VMSA in the vCPU VMCB
and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new VMSA is not
to be installed, the VMSA is cleared in the vCPU VMCB and the vCPU state
is set to KVM_MP_STATE_HALTED to prevent it from being run.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-13-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.
When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-12-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.
Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.
As with the MSR-based page-state changes, use the existing
KVM_HC_MAP_GPA_RANGE hypercall format to deliver these requests to
userspace via KVM_EXIT_HYPERCALL.
Signed-off-by: Michael Roth <michael.roth@amd.com>
Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-11-michael.roth@amd.com>
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.
When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
backed by private memory or not.
Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.
Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
requests to userspace via KVM_EXIT_HYPERCALL.
Signed-off-by: Michael Roth <michael.roth@amd.com>
Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-10-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.
On VMEXIT, verify that the GHCB GPA matches with the registered value.
If a mismatch is detected, then abort the guest.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501085210.2213060-9-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a KVM_SEV_SNP_LAUNCH_FINISH command to finalize the cryptographic
launch digest which stores the measurement of the guest at launch time.
Also extend the existing SNP firmware data structures to support
disabling the use of Versioned Chip Endorsement Keys (VCEK) by guests as
part of this command.
While finalizing the launch flow, the code also issues the LAUNCH_UPDATE
SNP firmware commands to encrypt/measure the initial VMSA pages for each
configured vCPU, which requires setting the RMP entries for those pages
to private, so also add handling to clean up the RMP entries for these
pages whening freeing vCPUs during shutdown.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-8-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A key aspect of a launching an SNP guest is initializing it with a
known/measured payload which is then encrypted into guest memory as
pre-validated private pages and then measured into the cryptographic
launch context created with KVM_SEV_SNP_LAUNCH_START so that the guest
can attest itself after booting.
Since all private pages are provided by guest_memfd, make use of the
kvm_gmem_populate() interface to handle this. The general flow is that
guest_memfd will handle allocating the pages associated with the GPA
ranges being initialized by each particular call of
KVM_SEV_SNP_LAUNCH_UPDATE, copying data from userspace into those pages,
and then the post_populate callback will do the work of setting the
RMP entries for these pages to private and issuing the SNP firmware
calls to encrypt/measure them.
For more information see the SEV-SNP specification.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-7-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. Other commands can then at that point be
used to load/encrypt data into the guest's initial launch image.
For more information see the SEV-SNP specification.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Message-ID: <20240501085210.2213060-6-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based security protection. SEV-SNP adds strong memory
encryption and integrity protection to help prevent malicious
hypervisor-based attacks such as data replay, memory re-mapping, and
more, to create an isolated execution environment.
Define a new KVM_X86_SNP_VM type which makes use of these capabilities
and extend the KVM_SEV_INIT2 ioctl to support it. Also add a basic
helper to check whether SNP is enabled and set PFERR_PRIVATE_ACCESS for
private #NPFs so they are handled appropriately by KVM MMU.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240501085210.2213060-5-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Common patches for the target-independent functionality and hooks
that are needed by SEV-SNP and TDX.
|
|
KVM VMX changes for 6.10:
- Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to
L1, as per the SDM.
- Move kvm_vcpu_arch's exit_qualification into x86_exception, as the field is
used only when synthesizing nested EPT violation, i.e. it's not the vCPU's
"real" exit_qualification, which is tracked elsewhere.
- Add a sanity check to assert that EPT Violations are the only sources of
nested PML Full VM-Exits.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.10
1. Add ParaVirt IPI support.
2. Add software breakpoint support.
3. Add mmio trace events support.
|
|
While the main additions from GHCB protocol version 1 to version 2
revolve mostly around SEV-SNP support, there are a number of changes
applicable to SEV-ES guests as well. Pluck a handful patches from the
SNP hypervisor patchset for GHCB-related changes that are also applicable
to SEV-ES. A KVM_SEV_INIT2 field lets userspace can control the maximum
GHCB protocol version advertised to guests and manage compatibility
across kernels/versions.
|
|
A combination of prep work for TDX and SNP, and a clean up of the
page fault path to (hopefully) make it easier to follow the rules for
private memory, noslot faults, writes to read-only slots, etc.
|
|
In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:
- gmem allocates 2MB page
- guest issues PVALIDATE on 2MB page
- guest later converts a subpage to shared
- SNP host code issues PSMASH to split 2MB RMP mapping to 4K
- KVM MMU splits NPT mapping to 4K
- guest later converts that shared page back to private
At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.
Add a hook to determine the max NPT mapping size in situations like
this.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-ID: <20240501085210.2213060-3-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In some cases, like with SEV-SNP, guest memory needs to be updated in a
platform-specific manner before it can be safely freed back to the host.
Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to
allow for special handling of this sort when freeing memory in response
to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go
ahead and define an arch-specific hook for x86 since it will be needed
for handling memory used for SEV-SNP guests.
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20231230172351.574091-6-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
guest_memfd pages are generally expected to be in some arch-defined
initial state prior to using them for guest memory. For SEV-SNP this
initial state is 'private', or 'guest-owned', and requires additional
operations to move these pages into a 'private' state by updating the
corresponding entries the RMP table.
Allow for an arch-defined hook to handle updates of this sort, and go
ahead and implement one for x86 so KVM implementations like AMD SVM can
register a kvm_x86_ops callback to handle these updates for SEV-SNP
guests.
The preparation callback is always called when allocating/grabbing
folios via gmem, and it is up to the architecture to keep track of
whether or not the pages are already in the expected state (e.g. the RMP
table in the case of SEV-SNP).
In some cases, it is necessary to defer the preparation of the pages to
handle things like in-place encryption of initial guest memory payloads
before marking these pages as 'private'/'guest-owned'. Add an argument
(always true for now) to kvm_gmem_get_folio() that allows for the
preparation callback to be bypassed. To detect possible issues in
the way userspace initializes memory, it is only possible to add an
unprepared page if it is not already included in the filemap.
Link: https://lore.kernel.org/lkml/ZLqVdvsF11Ddo7Dq@google.com/
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20231230172351.574091-5-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
lapic_vector_set_in_irr() is already available, use it for checking
pending vectors at the local APIC. No functional change.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Imran Khan <imran.f.khan@oracle.com>
Link: https://lore.kernel.org/r/20240506175612.1141095-1-jacob.jun.pan@linux.intel.com
|
|
The GHCB protocol version may be different from one guest to the next.
Add a field to track it for each KVM instance and extend KVM_SEV_INIT2
to allow it to be configured by userspace.
Now that all SEV-ES support for GHCB protocol version 2 is in place, go
ahead and default to it when creating SEV-ES guests through the new
KVM_SEV_INIT2 interface. Keep the older KVM_SEV_ES_INIT interface
restricted to GHCB protocol version 1.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501071048.2208265-5-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.
Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501071048.2208265-3-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501071048.2208265-2-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Right now the error code is not used when an async page fault is completed.
This is not a problem in the current code, but it is untidy. For protected
VMs, we will also need to check that the page attributes match the current
state of the page, because asynchronous page faults can only occur on
shared pages (private pages go through kvm_faultin_pfn_private() instead of
__gfn_to_pfn_memslot()).
Start by piping the error code from kvm_arch_setup_async_pf() to
kvm_arch_async_page_ready() via the architecture-specific async page
fault data. For now, it can be used to assert that there are no
async page faults on private memory.
Extracted from a patch by Isaku Yamahata.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add and use a synthetic, KVM-defined page fault error code to indicate
whether a fault is to private vs. shared memory. TDX and SNP have
different mechanisms for reporting private vs. shared, and KVM's
software-protected VMs have no mechanism at all. Usurp an error code
flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
and friends.
Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
for TDX and software-protected VMs as appropriate, but that would require
*clearing* the flag for SEV and SEV-ES VMs, which support encrypted
memory at the hardware layer, but don't utilize private memory at the
KVM layer.
Opportunistically add a comment to call out that the logic for software-
protected VMs is (and was before this commit) broken for nested MMUs, i.e.
for nested TDP, as the GPA is an L2 GPA. Punt on trying to play nice with
nested MMUs as there is a _lot_ of functionality that simply doesn't work
for software-protected VMs, e.g. all of the paths where KVM accesses guest
memory need to be updated to be aware of private vs. shared memory.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20240228024147.41573-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|