Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "hung_task: extend blocking task stacktrace dump to semaphore" from
Lance Yang enhances the hung task detector.
The detector presently dumps the blocking tasks's stack when it is
blocked on a mutex. Lance's series extends this to semaphores
- "nilfs2: improve sanity checks in dirty state propagation" from
Wentao Liang addresses a couple of minor flaws in nilfs2
- "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn
fixes a couple of issues in the gdb scripts
- "Support kdump with LUKS encryption by reusing LUKS volume keys" from
Coiby Xu addresses a usability problem with kdump.
When the dump device is LUKS-encrypted, the kdump kernel may not have
the keys to the encrypted filesystem. A full writeup of this is in
the series [0/N] cover letter
- "sysfs: add counters for lockups and stalls" from Max Kellermann adds
/sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and
/sys/kernel/rcu_stall_count
- "fork: Page operation cleanups in the fork code" from Pasha Tatashin
implements a number of code cleanups in fork.c
- "scripts/gdb/symbols: determine KASLR offset on s390 during early
boot" from Ilya Leoshkevich fixes some s390 issues in the gdb
scripts
* tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits)
llist: make llist_add_batch() a static inline
delayacct: remove redundant code and adjust indentation
squashfs: add optional full compressed block caching
crash_dump, nvme: select CONFIGFS_FS as built-in
scripts/gdb/symbols: determine KASLR offset on s390 during early boot
scripts/gdb/symbols: factor out pagination_off()
scripts/gdb/symbols: factor out get_vmlinux()
kernel/panic.c: format kernel-doc comments
mailmap: update and consolidate Casey Connolly's name and email
nilfs2: remove wbc->for_reclaim handling
fork: define a local GFP_VMAP_STACK
fork: check charging success before zeroing stack
fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code
fork: clean-up ifdef logic around stack allocation
kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
x86/crash: make the page that stores the dm crypt keys inaccessible
x86/crash: pass dm crypt keys to kdump kernel
Revert "x86/mm: Remove unused __set_memory_prot()"
crash_dump: retrieve dm crypt keys in kdump kernel
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
creating a pte which addresses the first page in a folio and reduces
the amount of plumbing which architecture must implement to provide
this.
- "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
largely unrelated folio infrastructure changes which clean things up
and better prepare us for future work.
- "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
Price adds early-init code to prevent x86 from leaving physical
memory unused when physical address regions are not aligned to memory
block size.
- "mm/compaction: allow more aggressive proactive compaction" from
Michal Clapinski provides some tuning of the (sadly, hard-coded (more
sadly, not auto-tuned)) thresholds for our invokation of proactive
compaction. In a simple test case, the reduction of a guest VM's
memory consumption was dramatic.
- "Minor cleanups and improvements to swap freeing code" from Kemeng
Shi provides some code cleaups and a small efficiency improvement to
this part of our swap handling code.
- "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
adds the ability for a ptracer to modify syscalls arguments. At this
time we can alter only "system call information that are used by
strace system call tampering, namely, syscall number, syscall
arguments, and syscall return value.
This series should have been incorporated into mm.git's "non-MM"
branch, but I goofed.
- "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
against /proc/pid/pagemap. This permits CRIU to more efficiently get
at the info about guard regions.
- "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
implements that fix. No runtime effect is expected because
validate_page_before_insert() happens to fix up this error.
- "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
Hildenbrand basically brings uprobe text poking into the current
decade. Remove a bunch of hand-rolled implementation in favor of
using more current facilities.
- "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
Khandual provides enhancements and generalizations to the pte dumping
code. This might be needed when 128-bit Page Table Descriptors are
enabled for ARM.
- "Always call constructor for kernel page tables" from Kevin Brodsky
ensures that the ctor/dtor is always called for kernel pgtables, as
it already is for user pgtables.
This permits the addition of more functionality such as "insert hooks
to protect page tables". This change does result in various
architectures performing unnecesary work, but this is fixed up where
it is anticipated to occur.
- "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
Ryhl adds plumbing to permit Rust access to core MM structures.
- "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
Stoakes takes advantage of some VMA merging opportunities which we've
been missing for 15 years.
- "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
SeongJae Park optimizes process_madvise()'s TLB flushing.
Instead of flushing each address range in the provided iovec, we
batch the flushing across all the iovec entries. The syscall's cost
was approximately halved with a microbenchmark which was designed to
load this particular operation.
- "Track node vacancy to reduce worst case allocation counts" from
Sidhartha Kumar makes the maple tree smarter about its node
preallocation.
stress-ng mmap performance increased by single-digit percentages and
the amount of unnecessarily preallocated memory was dramaticelly
reduced.
- "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
a few unnecessary things which Baoquan noted when reading the code.
- ""Enhance sysfs handling for memory hotplug in weighted interleave"
from Rakie Kim "enhances the weighted interleave policy in the memory
management subsystem by improving sysfs handling, fixing memory
leaks, and introducing dynamic sysfs updates for memory hotplug
support". Fixes things on error paths which we are unlikely to hit.
- "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
from SeongJae Park introduces new DAMOS quota goal metrics which
eliminate the manual tuning which is required when utilizing DAMON
for memory tiering.
- "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
provides cleanups and small efficiency improvements which Baoquan
found via code inspection.
- "vmscan: enforce mems_effective during demotion" from Gregory Price
changes reclaim to respect cpuset.mems_effective during demotion when
possible. because presently, reclaim explicitly ignores
cpuset.mems_effective when demoting, which may cause the cpuset
settings to violated.
This is useful for isolating workloads on a multi-tenant system from
certain classes of memory more consistently.
- "Clean up split_huge_pmd_locked() and remove unnecessary folio
pointers" from Gavin Guo provides minor cleanups and efficiency gains
in in the huge page splitting and migrating code.
- "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
for `struct mem_cgroup', yielding improved memory utilization.
- "add max arg to swappiness in memory.reclaim and lru_gen" from
Zhongkun He adds a new "max" argument to the "swappiness=" argument
for memory.reclaim MGLRU's lru_gen.
This directs proactive reclaim to reclaim from only anon folios
rather than file-backed folios.
- "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
first step on the path to permitting the kernel to maintain existing
VMs while replacing the host kernel via file-based kexec. At this
time only memblock's reserve_mem is preserved.
- "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
and uses a smarter way of looping over a pfn range. By skipping
ranges of invalid pfns.
- "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
when a task is pinned a single NUMA mode.
Dramatic performance benefits were seen in some real world cases.
- "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
Garg addresses a warning which occurs during memory compaction when
using JFS.
- "move all VMA allocation, freeing and duplication logic to mm" from
Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
appropriate mm/vma.c.
- "mm, swap: clean up swap cache mapping helper" from Kairui Song
provides code consolidation and cleanups related to the folio_index()
function.
- "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.
- "memcg: Fix test_memcg_min/low test failures" from Waiman Long
addresses some bogus failures which are being reported by the
test_memcontrol selftest.
- "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
Stoakes commences the deprecation of file_operations.mmap() in favor
of the new file_operations.mmap_prepare().
The latter is more restrictive and prevents drivers from messing with
things in ways which, amongst other problems, may defeat VMA merging.
- "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
the per-cpu memcg charge cache from the objcg's one.
This is a step along the way to making memcg and objcg charging
NMI-safe, which is a BPF requirement.
- "mm/damon: minor fixups and improvements for code, tests, and
documents" from SeongJae Park is yet another batch of miscellaneous
DAMON changes. Fix and improve minor problems in code, tests and
documents.
- "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
stats to be irq safe. Another step along the way to making memcg
charging and stats updates NMI-safe, a BPF requirement.
- "Let unmap_hugepage_range() and several related functions take folio
instead of page" from Fan Ni provides folio conversions in the
hugetlb code.
* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
mm: pcp: increase pcp->free_count threshold to trigger free_high
mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
mm/hugetlb: pass folio instead of page to unmap_ref_private()
memcg: objcg stock trylock without irq disabling
memcg: no stock lock for cpu hot-unplug
memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
memcg: make count_memcg_events re-entrant safe against irqs
memcg: make mod_memcg_state re-entrant safe against irqs
memcg: move preempt disable to callers of memcg_rstat_updated
memcg: memcg_rstat_updated re-entrant safe against irqs
mm: khugepaged: decouple SHMEM and file folios' collapse
selftests/eventfd: correct test name and improve messages
alloc_tag: check mem_profiling_support in alloc_tag_init
Docs/damon: update titles and brief introductions to explain DAMOS
selftests/damon/_damon_sysfs: read tried regions directories in order
mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Not a lot going on in the EFI tree this cycle. The only thing that
stands out is the new support for SBAT metadata, which was a bit
contentious when it was first proposed, because in the initial
incarnation, it would have required us to maintain a revocation index,
and bump it each time a vulnerability affecting UEFI secure boot got
fixed. This was shot down for obvious reasons.
This time, only the changes needed to emit the SBAT section into the
PE/COFF image are being carried upstream, and it is up to the distros
to decide what to put in there when creating and signing the build.
This only has the EFI zboot bits (which the distros will be using for
arm64); the x86 bzImage changes should be arriving next cycle,
presumably via the -tip tree.
Summary:
- Add support for emitting a .sbat section into the EFI zboot image,
so that downstreams can easily include revocation metadata in the
signed EFI images
- Align PE symbolic constant names with other projects
- Bug fix for the efi_test module
- Log the physical address and size of the EFI memory map when
failing to map it
- A kerneldoc fix for the EFI stub code"
* tag 'efi-next-for-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
include: pe.h: Fix PE definitions
efi/efi_test: Fix missing pending status update in getwakeuptime
efi: zboot specific mechanism for embedding SBAT section
efi/libstub: Describe missing 'out' parameter in efi_load_initrd
efi: Improve logging around memmap init
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These revert an x86 commit that introduced a nasty power regression on
some systems, fix PSCI cpuidle driver and ACPI cpufreq driver
regressions, add Rust abstractions for cpufreq, OPP, clk, and
cpumasks, add a Rust-based cpufreq-dt driver, and do a minor SCMI
cpufreq driver cleanup:
- Revert an x86 commit that went into 6.15 and caused idle power,
including power in suspend-to-idle, to rise rather dramatically on
systems booting with "nosmt" in the kernel command line (Rafael
Wysocki)
- Prevent freeing an uninitialized pointer in error path of
dt_idle_state_present() in the PSCI cpuidle driver (Dan Carpenter)
- Use KHz as the nominal_freq units in get_max_boost_ratio() in the
ACPI cpufreq driver (iGautham Shenoy)
- Add Rust abstractions for CPUFreq framework (Viresh Kumar)
- Add Rust abstractions for OPP framework (Viresh Kumar)
- Add basic Rust abstractions for Clk and Cpumask frameworks (Viresh
Kumar)
- Clean up the SCMI cpufreq driver somewhat (Mike Tipton)"
* tag 'pm-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (21 commits)
Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
acpi-cpufreq: Fix nominal_freq units to KHz in get_max_boost_ratio()
rust: opp: Move `cfg(CONFIG_OF)` attribute to the top of doc test
cpuidle: psci: Fix uninitialized variable in dt_idle_state_present()
rust: opp: Make the doctest example depend on CONFIG_OF
cpufreq: scmi: Skip SCMI devices that aren't used by the CPUs
cpufreq: Add Rust-based cpufreq-dt driver
rust: opp: Extend OPP abstractions with cpufreq support
rust: cpufreq: Extend abstractions for driver registration
rust: cpufreq: Extend abstractions for policy and driver ops
rust: cpufreq: Add initial abstractions for cpufreq framework
rust: opp: Add abstractions for the configuration options
rust: opp: Add abstractions for the OPP table
rust: opp: Add initial abstractions for OPP framework
rust: cpu: Add from_cpu()
rust: macros: enable use of hyphens in module names
rust: clk: Add initial abstractions
rust: clk: Add helpers for Rust code
MAINTAINERS: Add entry for Rust cpumask API
rust: cpumask: Add initial abstractions
...
|
|
Fix an issue in the PSCI cpuidle driver introduced recently and a nasty
x86 power regression introduced in 6.15:
- Prevent freeing an uninitialized pointer in error path of
dt_idle_state_present() in the PSCI cpuidle driver (Dan Carpenter).
- Revert an x86 commit that went into 6.15 and caused idle power,
including power in suspend-to-idle, to rise rather dramatically on
systems booting with "nosmt" in the kernel command line (Rafael Wysocki).
* pm-cpuidle:
Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
cpuidle: psci: Fix uninitialized variable in dt_idle_state_present()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm
Pull trusted security manager (TSM) updates from Dan Williams:
- Add a general sysfs scheme for publishing "Measurement" values
provided by the architecture's TEE Security Manager. Use it to
publish TDX "Runtime Measurement Registers" ("RTMRs") that either
maintain a hash of stored values (similar to a TPM PCR) or provide
statically provisioned data. These measurements are validated by a
relying party.
- Reorganize the drivers/virt/coco/ directory for "host" and "guest"
shared infrastructure.
- Fix a configfs-tsm-report unregister bug
- With CONFIG_TSM_MEASUREMENTS joining CONFIG_TSM_REPORTS and in
anticipation of more shared "TSM" infrastructure arriving, rename the
maintainer entry to "TRUSTED SECURITY MODULE (TSM) INFRASTRUCTURE".
* tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm:
tsm-mr: Fix init breakage after bin_attrs constification by scoping non-const pointers to init phase
sample/tsm-mr: Fix missing static for sample_report
virt: tdx-guest: Transition to scoped_cond_guard for mutex operations
virt: tdx-guest: Refactor and streamline TDREPORT generation
virt: tdx-guest: Expose TDX MRs as sysfs attributes
x86/tdx: tdx_mcall_get_report0: Return -EBUSY on TDCALL_OPERAND_BUSY error
x86/tdx: Add tdx_mcall_extend_rtmr() interface
tsm-mr: Add tsm-mr sample code
tsm-mr: Add TVM Measurement Register support
configfs-tsm-report: Fix NULL dereference of tsm_ops
coco/guest: Move shared guest CC infrastructure to drivers/virt/coco/guest/
configfs-tsm: Namespace TSM report symbols
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel software guard extension (SGX) updates from Dave Hansen:
"A couple of x86/sgx changes.
The first one is a no-brainer to use the (simple) SHA-256 library.
For the second one, some folks doing testing noticed that SGX systems
under memory pressure were inducing fatal machine checks at pretty
unnerving rates, despite the SGX code having _some_ awareness of
memory poison.
It turns out that the SGX reclaim path was not checking for poison
_and_ it always accesses memory to copy it around. Make sure that
poisoned pages are not reclaimed"
* tag 'x86_sgx_for_6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Prevent attempts to reclaim poisoned pages
x86/sgx: Use SHA-256 library API instead of crypto_shash API
|
|
Revert commit 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
because it introduced a significant power regression on systems that
start with "nosmt" in the kernel command line.
Namely, on such systems, SMT siblings permanently go offline early,
when cpuidle has not been initialized yet, so after the above commit,
hlt_play_dead() is called for them. Later on, when the processor
attempts to enter a deep package C-state, including PC10 which is
requisite for reaching minimum power in suspend-to-idle, it is not
able to do that because of the SMT siblings staying in C1 (which
they have been put into by HLT).
As a result, the idle power (including power in suspend-to-idle)
rises quite dramatically on those systems with all of the possible
consequences, which (needless to say) may not be expected by their
users.
This issue is hard to debug and potentially dangerous, so it needs to
be addressed as soon as possible in a way that will work for 6.15.y,
hence the revert.
Of course, after this revert, the issue that commit 96040f7273e2
attempted to address will be back and it will need to be fixed again
later.
Fixes: 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: 6.15+ <stable@vger.kernel.org> # 6.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/12674167.O9o76ZdvQC@rjwysocki.net
|
|
Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.
Quotes are appropriate because (at 6k lines and 100+ commits) it is
much bigger than the rest, which will come later this week and
consists mostly of bugfixes and selftests. s390 changes will also come
in the second batch.
ARM:
- Add large stage-2 mapping (THP) support for non-protected guests
when pKVM is enabled, clawing back some performance.
- Enable nested virtualisation support on systems that support it,
though it is disabled by default.
- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
and protected modes.
- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. While this has no functional
impact, it ensures correctness of emulation (the data is
automatically extracted from the published JSON files), and helps
dealing with the evolution of the architecture.
- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.
- New selftest checking the pKVM ownership transition rules
- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.
- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.
- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.
- Add a new selftest for the SVE host state being corrupted by a
guest.
- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.
- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.
- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.
- and the usual random cleanups.
LoongArch:
- Don't flush tlb if the host supports hardware page table walks.
- Add KVM selftests support.
RISC-V:
- Add vector registers to get-reg-list selftest
- VCPU reset related improvements
- Remove scounteren initialization from VCPU reset
- Support VCPU reset from userspace using set_mpstate() ioctl
x86:
- Initial support for TDX in KVM.
This finally makes it possible to use the TDX module to run
confidential guests on Intel processors. This is quite a large
series, including support for private page tables (managed by the
TDX module and mirrored in KVM for efficiency), forwarding some
TDVMCALLs to userspace, and handling several special VM exits from
the TDX module.
This has been in the works for literally years and it's not really
possible to describe everything here, so I'll defer to the various
merge commits up to and including commit 7bcf7246c42a ('Merge
branch 'kvm-tdx-finish-initial' into HEAD')"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
x86/tdx: mark tdh_vp_enter() as __flatten
Documentation: virt/kvm: remove unreferenced footnote
RISC-V: KVM: lock the correct mp_state during reset
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
RISC-V: KVM: Remove scounteren initialization
KVM: RISC-V: remove unnecessary SBI reset state
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"Once again, the changes are dominated by cpufreq updates, but this
time the majority of them are cpufreq core changes, mostly related to
the introduction of policy locking guards and __free() usage, and
fixes related to boost handling.
Still, there is also a significant update of the intel_pstate driver
making it register an energy model when running on a hybrid platform
which is used for enabling energy-aware scheduling (EAS) if the driver
operates in the passive mode (and schedutil is used as the cpufreq
governor for all CPUs which is the passive mode default).
There are some amd-pstate driver updates too, for a good measure,
including the "Requested CPU Min frequency" BIOS option support and
new online/offline callbacks.
In the cpuidle space, the most significant change is the addition of a
C1 demotion on/off sysfs knob to intel_idle which should help some
users to configure their systems more precisely. There is also the
conversion of the PSCI cpuidle driver to a faux device one and there
are two small updates of cpuidle governors.
Device power management is also modified quite a bit, especially the
handling of devices with asynchronous suspend and resume enabled
during system transitions. They are now going to be handled more
asynchronously during suspend transitions and somewhat less
aggressively during resume transitions.
Apart from the above, the operating performance points (OPP) library
is now going to use mutex locking guards and scope-based cleanup
helpers and there is the usual bunch of assorted fixes and code
cleanups.
Specifics:
- Fix potential division-by-zero error in em_compute_costs() (Yaxiong
Tian)
- Fix typos in energy model documentation and example driver code
(Moon Hee Lee, Atul Kumar Pant)
- Rearrange the energy model management code and add a new function
for adjusting a CPU energy model after adjusting the capacity of
the given CPU to it (Rafael Wysocki)
- Refactor cpufreq_online(), add and use cpufreq policy locking
guards, use __free() in policy reference counting, and clean up
core cpufreq code on top of that (Rafael Wysocki)
- Fix boost handling on CPU suspend/resume and sysfs updates (Viresh
Kumar)
- Fix des_perf clamping with max_perf in amd_pstate_update()
(Dhananjay Ugwekar)
- Add offline, online and suspend callbacks to the amd-pstate driver,
rename and use the existing amd_pstate_epp callbacks in it
(Dhananjay Ugwekar)
- Add support for the "Requested CPU Min frequency" BIOS option to
the amd-pstate driver (Dhananjay Ugwekar)
- Reset amd-pstate driver mode after running selftests (Swapnil
Sapkal)
- Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan
Chancellor)
- Add helper for governor checks to the schedutil cpufreq governor
and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki)
- Populate the cpu_capacity sysfs entries from the intel_pstate
driver after registering asym capacity support (Ricardo Neri)
- Add support for enabling Energy-aware scheduling (EAS) to the
intel_pstate driver when operating in the passive mode on a hybrid
platform (Rafael Wysocki)
- Drop redundant cpus_read_lock() from store_local_boost() in the
cpufreq core (Seyediman Seyedarab)
- Replace sscanf() with kstrtouint() in the cpufreq code and use a
symbol instead of a raw number in it (Bowen Yu)
- Add support for autonomous CPU performance state selection to the
CPPC cpufreq driver (Lifeng Zheng)
- OPP: Add dev_pm_opp_set_level() (Praveen Talari)
- Introduce scope-based cleanup headers and mutex locking guards in
OPP core (Viresh Kumar)
- Switch OPP to use kmemdup_array() (Zhang Enpei)
- Optimize bucket assignment when next_timer_ns equals KTIME_MAX in
the menu cpuidle governor (Zhongqiu Han)
- Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla)
- Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem
Bityutskiy)
- Fix typos in two comments in the teo cpuidle governor (Atul Kumar
Pant)
- Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
Kalla)
- Move debug runtime PM attributes to runtime_attrs[] (Rafael
Wysocki)
- Add new devm_ functions for enabling runtime PM and runtime PM
reference counting (Bence Csókás)
- Remove size arguments from strscpy() calls in the hibernation core
code (Thorsten Blum)
- Adjust the handling of devices with asynchronous suspend enabled
during system suspend and resume to start resuming them immediately
after resuming their parents and to start suspending such a device
immediately after suspending its first child (Rafael Wysocki)
- Adjust messages printed during tasks freezing to avoid using
pr_cont() (Andrew Sayers, Paul Menzel)
- Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
Zhang)
- Add missing wakeup source attribute relax_count to sysfs and remove
the space character at the end ofi the string produced by
pm_show_wakelocks() (Zijun Hu)
- Add configurable pm_test delay for hibernation (Zihuan Zhang)
- Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
cypd4226 device on Tegra boards from suspending prematurely (Jon
Hunter)
- Unbreak printing PM debug messages during hibernation and clean up
some related code (Rafael Wysocki)
- Add a systemd service to run cpupower and change cpupower binding's
Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli)"
* tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
cpufreq: CPPC: Add support for autonomous selection
cpufreq: Update sscanf() to kstrtouint()
cpufreq: Replace magic number
OPP: switch to use kmemdup_array()
PM: freezer: Rewrite restarting tasks log to remove stray *done.*
PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
cpufreq: drop redundant cpus_read_lock() from store_local_boost()
cpupower: do not install files to /etc/default/
cpupower: do not call systemctl at install time
cpupower: do not write DESTDIR to cpupower.service
PM: sleep: Introduce pm_sleep_transition_in_progress()
cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver()
cpufreq: intel_pstate: Document hybrid processor support
cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache
cpufreq: intel_pstate: EAS support for hybrid platforms
PM: EM: Introduce em_adjust_cpu_capacity()
PM: EM: Move CPU capacity check to em_adjust_new_capacity()
PM: EM: Documentation: Fix typos in example driver code
cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas()
PM: sleep: Introduce pm_suspend_in_progress()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"The most significant part of these changes is an ACPICA update
covering two upstream ACPICA releases, 20241212 and 20250404, that
have not been included into the kernel code base yet.
Among other things, it adds definitions needed to address GCC 15's
-Wunterminated-string-initialization warnings, adds support for three
new tables (MRRM, ERDT, RIMT), extends support for two tables (RAS2,
DMAR), and fixes some issues.
On top of the above, there is a new parser for the MRRM table, more
changes related to GCC 15's -Wunterminated-string-initialization
warnings, a CPPC library update including functions related to
autonomous CPU performance state selection, a couple of new quirks,
some assorted fixes and some code cleanups.
Specifics:
- Fix two ACPICA SLAB cache leaks (Seunghun Han)
- Add EINJv2 get error type action and define Error Injection Actions
in hex values to avoid inconsistencies between the specification
and the code (Zaid Alali)
- Fix typo in comments for SRAT structures (Adam Lackorzynski)
- Prevent possible loss of data in ACPICA because of u32 to u8
conversions (Saket Dumbre)
- Fix reading FFixedHW operation regions in ACPICA (Daniil Tatianin)
- Add support for printing AML arguments when the ACPICA debug level
is ACPI_LV_TRACE_POINT (Mario Limonciello)
- Drop a stale comment about the file content from actbl2.h (Sudeep
Holla)
- Apply pack(1) to union aml_resource (Tamir Duberstein)
- Fix overflow check in the ACPICA version of vsnprintf() (gldrk)
- Interpret SIDP structures in DMAR added revision 3.4 of the VT-d
specification (Alexey Neyman)
- Add typedef and other definitions related to MRRM to ACPICA (Tony
Luck)
- Add definitions for RIMT to ACPICA (Sunil V L)
- Fix spelling mistake "Incremement" -> "Increment" in the ACPICA
utilities code (Colin Ian King)
- Add typedef and other definitions for ERDT to ACPICA (Tony Luck)
- Introduce ACPI_NONSTRING and use it (Kees Cook, Ahmed Salem)
- Rename structure and field names of the RAS2 table in actbl2.h
(Shiju Jose)
- Fix up whitespace in acpica/utcache.c (Zhe Qiao)
- Avoid sequence overread in a call to strncmp() in
ap_get_table_length() and replace strncpy() with memcpy() in ACPICA
in some places (Ahmed Salem)
- Update copyright year in all ACPICA files (Saket Dumbre)
- Add __nonstring annotations for unterminated strings in the static
ACPI tables parsing code (Kees Cook)
- Add support for parsing the MRRM ACPI table and sysfs files to
describe memory regions listed in it (Tony Luck, Anil
Keshavamurthy)
- Remove an (explicitly) unused header file include from the VIOT
ACPI table parser file (Andy Shevchenko)
- Improve logging around acpi_initialize_tables() (Bartosz
Szczepanek)
- Clean up the initialization of CPU data structures in the ACPI
processor driver (Zhang Rui)
- Remove an obsolete comment regarding the C-states handling in the
ACPI processor driver (Giovanni Gherdovich)
- Simplify PCC shared memory region handling (Sudeep Holla)
- Rework and extend functions for reading CPPC register values and
for updating CPPC registers (Lifeng Zheng)
- Add three functions related to autonomous CPU performance state
selection to the CPPC library (Lifeng Zheng)
- Turn the acpi_pci_root_remap_iospace() fwnode_handle parameter into
a const pointer (Pei Xiao)
- Round battery capacity percengate in the ACPI battery driver to the
closest integer to avoid user confusion (shitao)
- Make the ACPI battery driver report the current as a negative
number to the power supply framework when the battery is
discharging as documented (Peter Marheine)
- Add TUXEDO InfinityBook Pro AMD Gen9 to the acpi_ec_no_wakeup[]
list to prevent spurious wakeups from suspend-to-idle (Werner
Sembach)
- Convert the APEI EINJ driver to a faux device one (Sudeep Holla,
Jon Hunter)
- Remove redundant calls to einj_get_available_error_type() from the
APEI EINJ driver (Zaid Alali)
- Fix a typo for MECHREVO in irq1_edge_low_force_override[] (Mingcong
Bai)
- Add an LPS0 check() callback to the AMD pinctrl driver and fix up
config symbol dependencies in it (Mario Limonciello, Rafael
Wysocki)
- Avoid initializing the ACPI platform profile driver on non-ACPI
platforms (Alexandre Ghiti)
- Document that references to ACPI data (non-device) nodes should use
string-only references in hierarchical data node packages (Sakari
Ailus)
- Fail the ACPI bus registration if acpi_kobj registration fails
(Armin Wolf)"
* tag 'acpi-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits)
ACPI: MRRM: Fix default max memory region
ACPI: bus: Bail out if acpi_kobj registration fails
ACPI: platform_profile: Avoid initializing on non-ACPI platforms
pinctrl: amd: Fix hibernation support with CONFIG_SUSPEND unset
ACPI: tables: Improve logging around acpi_initialize_tables()
ACPI: VIOT: Remove (explicitly) unused header
ACPI: Add documentation for exposing MRRM data
ACPI: MRRM: Add /sys files to describe memory ranges
ACPI: MRRM: Minimal parse of ACPI MRRM table
ACPICA: Update copyright year
ACPICA: Logfile: Changes for version 20250404
ACPICA: Replace strncpy() with memcpy()
ACPICA: Apply ACPI_NONSTRING in more places
ACPICA: Avoid sequence overread in call to strncmp()
ACPICA: Adjust the position of code lines
ACPICA: actbl2.h: ACPI 6.5: RAS2: Rename structure and field names of the RAS2 table
ACPICA: Apply ACPI_NONSTRING
ACPICA: Introduce ACPI_NONSTRING
ACPICA: actbl2.h: ERDT: Add typedef and other definitions
ACPICA: infrastructure: Add new DMT_BUF types and shorten a long name
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull AMD SEV update from Borislav Petkov:
"Add a virtual TPM driver glue which allows a guest kernel to talk to a
TPM device emulated by a Secure VM Service Module (SVSM) - a helper
module of sorts which runs at a different privilege level in the
SEV-SNP VM stack.
The intent being that a TPM device is emulated by a trusted entity and
not by the untrusted host which is the default assumption in the
confidential computing scenarios"
* tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Register tpm-svsm platform device
tpm: Add SNP SVSM vTPM driver
svsm: Add header with SVSM_VTPM_CMD helpers
x86/sev: Add SVSM vTPM probe/send_command functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull mtrr update from Borislav Petkov:
"A single change to verify the presence of fixed MTRR ranges before
accessing the respective MSRs"
* tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
"Carve out the resctrl filesystem-related code into fs/resctrl/ so that
multiple architectures can share the fs API for manipulating their
respective hw resource control implementation.
This is the second step in the work towards sharing the resctrl
filesystem interface, the next one being plugging ARM's MPAM into the
aforementioned fs API"
* tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
MAINTAINERS: Add reviewers for fs/resctrl
x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
x86/resctrl: Always initialise rid field in rdt_resources_all[]
x86/resctrl: Relax some asm #includes
x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
x86/resctrl: Squelch whitespace anomalies in resctrl core code
x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h
x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs
x86/resctrl: Move enum resctrl_event_id to resctrl.h
x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl
fs/resctrl: Add boiler plate for external resctrl code
x86/resctrl: Add 'resctrl' to the title of the resctrl documentation
x86/resctrl: Split trace.h
x86/resctrl: Expand the width of domid by replacing mon_data_bits
x86/resctrl: Add end-marker to the resctrl_event_id enum
x86/resctrl: Move is_mba_sc() out of core.c
x86/resctrl: Drop __init/__exit on assorted symbols
x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point
x86/resctrl: Check all domains are offline in resctrl_exit()
x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"Another set of timer API cleanups:
- Convert init_timer*(), try_to_del_timer_sync() and
destroy_timer_on_stack() over to the canonical timer_*()
namespace convention.
There is another large conversion pending, which has not been included
because it would have caused a gazillion of merge conflicts in next.
The conversion scripts will be run towards the end of the merge window
and a pull request sent once all conflict dependencies have been
merged"
* tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack()
treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try()
timers: Rename init_timers() as timers_init()
timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA
timers: Rename __init_timer_on_stack() as __timer_init_on_stack()
timers: Rename __init_timer() as __timer_init()
timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack()
timers: Rename init_timer_key() as timer_init_key()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq cleanups from Thomas Gleixner:
"A set of cleanups for the generic interrupt subsystem:
- Consolidate on one set of functions for the interrupt domain code
to get rid of pointlessly duplicated code with only marginal
different semantics.
- Update the documentation accordingly and consolidate the coding
style of the irqdomain header"
* tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
irqdomain: Consolidate coding style
irqdomain: Fix kernel-doc and add it to Documentation
Documentation: irqdomain: Update it
Documentation: irq-domain.rst: Simple improvements
Documentation: irq/concepts: Minor improvements
Documentation: irq/concepts: Add commas and reflow
irqdomain: Improve kernel-docs of functions
irqdomain: Make struct irq_domain_info variables const
irqdomain: Use irq_domain_instantiate()'s return value as initializers
irqdomain: Drop irq_linear_revmap()
pinctrl: keembay: Switch to irq_find_mapping()
irqchip/armada-370-xp: Switch to irq_find_mapping()
gpu: ipu-v3: Switch to irq_find_mapping()
gpio: idt3243x: Switch to irq_find_mapping()
sh: Switch to irq_find_mapping()
powerpc: Switch to irq_find_mapping()
irqdomain: Drop irq_domain_add_*() functions
powerpc: Switch irq_domain_add_nomap() to use fwnode
thermal: Switch to irq_domain_create_linear()
soc: Switch to irq_domain_create_*()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
"Add the CONFIG_X86_NATIVE_CPU=y build option to locally optimize the
kernel build with '-march=native'"
* tag 'x86-build-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kbuild/64: Restrict clang versions that can use '-march=native'
x86/kbuild/64: Test for the availability of the -mtune=native compiler flag
x86/kbuild/64: Add the CONFIG_X86_NATIVE_CPU option to locally optimize the kernel with '-march=native'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso updates from Ingo Molnar:
"Two changes to simplify the x86 vDSO code a bit"
* tag 'x86-entry-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Remove redundant #ifdeffery around in_ia32_syscall()
x86/vdso: Remove #ifdeffery around page setup variants
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug updates from Ingo Molnar:
"Move the x86 page fault tracepoints to generic code, because other
architectures would like to make use of them as well"
* tag 'x86-debug-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tracing, x86/mm: Move page fault tracepoints to generic
x86/tracing, x86/mm: Remove redundant trace_pagefault_key
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Misc x86 cleanups: kernel-doc updates and a string API transition
patch"
* tag 'x86-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/power: hibernate: Fix W=1 build kernel-doc warnings
x86/mm/pat: Fix W=1 build kernel-doc warning
x86/CPU/AMD: Replace strcpy() with strscpy()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
"Boot code changes:
- A large series of changes to reorganize the x86 boot code into a
better isolated and easier to maintain base of PIC early startup
code in arch/x86/boot/startup/, by Ard Biesheuvel.
Motivation & background:
| Since commit
|
| c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C")
|
| dated Jun 6 2017, we have been using C code on the boot path in a way
| that is not supported by the toolchain, i.e., to execute non-PIC C
| code from a mapping of memory that is different from the one provided
| to the linker. It should have been obvious at the time that this was a
| bad idea, given the need to sprinkle fixup_pointer() calls left and
| right to manipulate global variables (including non-pointer variables)
| without crashing.
|
| This C startup code has been expanding, and in particular, the SEV-SNP
| startup code has been expanding over the past couple of years, and
| grown many of these warts, where the C code needs to use special
| annotations or helpers to access global objects.
This tree includes the first phase of this work-in-progress x86
boot code reorganization.
Scalability enhancements and micro-optimizations:
- Improve code-patching scalability (Eric Dumazet)
- Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper)
CPU features enumeration updates:
- Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S.
Darwish)
- Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish,
Thomas Gleixner)
- Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish)
Memory management changes:
- Allow temporary MMs when IRQs are on (Andy Lutomirski)
- Opt-in to IRQs-off activate_mm() (Andy Lutomirski)
- Simplify choose_new_asid() and generate better code (Borislav
Petkov)
- Simplify 32-bit PAE page table handling (Dave Hansen)
- Always use dynamic memory layout (Kirill A. Shutemov)
- Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov)
- Make 5-level paging support unconditional (Kirill A. Shutemov)
- Stop prefetching current->mm->mmap_lock on page faults (Mateusz
Guzik)
- Predict valid_user_address() returning true (Mateusz Guzik)
- Consolidate initmem_init() (Mike Rapoport)
FPU support and vector computing:
- Enable Intel APX support (Chang S. Bae)
- Reorgnize and clean up the xstate code (Chang S. Bae)
- Make task_struct::thread constant size (Ingo Molnar)
- Restore fpu_thread_struct_whitelist() to fix
CONFIG_HARDENED_USERCOPY=y (Kees Cook)
- Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg
Nesterov)
- Always preserve non-user xfeatures/flags in __state_perm (Sean
Christopherson)
Microcode loader changes:
- Help users notice when running old Intel microcode (Dave Hansen)
- AMD: Do not return error when microcode update is not necessary
(Annie Li)
- AMD: Clean the cache if update did not load microcode (Boris
Ostrovsky)
Code patching (alternatives) changes:
- Simplify, reorganize and clean up the x86 text-patching code (Ingo
Molnar)
- Make smp_text_poke_batch_process() subsume
smp_text_poke_batch_finish() (Nikolay Borisov)
- Refactor the {,un}use_temporary_mm() code (Peter Zijlstra)
Debugging support:
- Add early IDT and GDT loading to debug relocate_kernel() bugs
(David Woodhouse)
- Print the reason for the last reset on modern AMD CPUs (Yazen
Ghannam)
- Add AMD Zen debugging document (Mario Limonciello)
- Fix opcode map (!REX2) superscript tags (Masami Hiramatsu)
- Stop decoding i64 instructions in x86-64 mode at opcode (Masami
Hiramatsu)
CPU bugs and bug mitigations:
- Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov)
- Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov)
- Restructure and harmonize the various CPU bug mitigation methods
(David Kaplan)
- Fix spectre_v2 mitigation default on Intel (Pawan Gupta)
MSR API:
- Large MSR code and API cleanup (Xin Li)
- In-kernel MSR API type cleanups and renames (Ingo Molnar)
PKEYS:
- Simplify PKRU update in signal frame (Chang S. Bae)
NMI handling code:
- Clean up, refactor and simplify the NMI handling code (Sohil Mehta)
- Improve NMI duration console printouts (Sohil Mehta)
Paravirt guests interface:
- Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov)
SEV support:
- Share the sev_secrets_pa value again (Tom Lendacky)
x86 platform changes:
- Introduce the <asm/amd/> header namespace (Ingo Molnar)
- i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to
<asm/amd/fch.h> (Mario Limonciello)
Fixes and cleanups:
- x86 assembly code cleanups and fixes (Uros Bizjak)
- Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy
Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav
Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David
Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf,
Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan
Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank
Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)"
* tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits)
x86/bugs: Fix spectre_v2 mitigation default on Intel
x86/bugs: Restructure ITS mitigation
x86/xen/msr: Fix uninitialized variable 'err'
x86/msr: Remove a superfluous inclusion of <asm/asm.h>
x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only
x86/mm/64: Make 5-level paging support unconditional
x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model
x86/mm/64: Always use dynamic memory layout
x86/bugs: Fix indentation due to ITS merge
x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor()
x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter
x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter
x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2()
x86/cpuid: Rename have_cpuid_p() to cpuid_feature()
x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header
x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h>
x86/msr: Add rdmsrl_on_cpu() compatibility wrapper
x86/mm: Fix kernel-doc descriptions of various pgtable methods
x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too
x86/boot: Defer initialization of VM space related global variables
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
"Core & generic-arch updates:
- Add support for dynamic constraints and propagate it to the Intel
driver (Kan Liang)
- Fix & enhance driver-specific throttling support (Kan Liang)
- Record sample last_period before updating on the x86 and PowerPC
platforms (Mark Barnett)
- Make perf_pmu_unregister() usable (Peter Zijlstra)
- Unify perf_event_free_task() / perf_event_exit_task_context()
(Peter Zijlstra)
- Simplify perf_event_release_kernel() and perf_event_free_task()
(Peter Zijlstra)
- Allocate non-contiguous AUX pages by default (Yabin Cui)
Uprobes updates:
- Add support to emulate NOP instructions (Jiri Olsa)
- selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)
x86 Intel PMU enhancements:
- Support Intel Auto Counter Reload [ACR] (Kan Liang)
- Add PMU support for Clearwater Forest (Dapeng Mi)
- Arch-PEBS preparatory changes: (Dapeng Mi)
- Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
- Decouple BTS initialization from PEBS initialization
- Introduce pairs of PEBS static calls
x86 AMD PMU enhancements:
- Use hrtimer for handling overflows in the AMD uncore driver
(Sandipan Das)
- Prevent UMC counters from saturating (Sandipan Das)
Fixes and cleanups:
- Fix put_ctx() ordering (Frederic Weisbecker)
- Fix irq work dereferencing garbage (Frederic Weisbecker)
- Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian
Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan
Das, Thorsten Blum)"
* tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
perf/headers: Clean up <linux/perf_event.h> a bit
perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
mips/perf: Remove driver-specific throttle support
xtensa/perf: Remove driver-specific throttle support
sparc/perf: Remove driver-specific throttle support
loongarch/perf: Remove driver-specific throttle support
csky/perf: Remove driver-specific throttle support
arc/perf: Remove driver-specific throttle support
alpha/perf: Remove driver-specific throttle support
perf/apple_m1: Remove driver-specific throttle support
perf/arm: Remove driver-specific throttle support
s390/perf: Remove driver-specific throttle support
powerpc/perf: Remove driver-specific throttle support
perf/x86/zhaoxin: Remove driver-specific throttle support
perf/x86/amd: Remove driver-specific throttle support
perf/x86/intel: Remove driver-specific throttle support
perf: Only dump the throttle log for the leader
perf: Fix the throttle logic for a group
perf/core: Add the is_event_in_freq_mode() helper to simplify the code
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Fix memcpy_sglist to handle partially overlapping SG lists
- Use memcpy_sglist to replace null skcipher
- Rename CRYPTO_TESTS to CRYPTO_BENCHMARK
- Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS
- Hide CRYPTO_MANAGER
- Add delayed freeing of driver crypto_alg structures
Compression:
- Allocate large buffers on first use instead of initialisation in scomp
- Drop destination linearisation buffer in scomp
- Move scomp stream allocation into acomp
- Add acomp scatter-gather walker
- Remove request chaining
- Add optional async request allocation
Hashing:
- Remove request chaining
- Add optional async request allocation
- Move partial block handling into API
- Add ahash support to hmac
- Fix shash documentation to disallow usage in hard IRQs
Algorithms:
- Remove unnecessary SIMD fallback code on x86 and arm/arm64
- Drop avx10_256 xts(aes)/ctr(aes) on x86
- Improve avx-512 optimisations for xts(aes)
- Move chacha arch implementations into lib/crypto
- Move poly1305 into lib/crypto and drop unused Crypto API algorithm
- Disable powerpc/poly1305 as it has no SIMD fallback
- Move sha256 arch implementations into lib/crypto
- Convert deflate to acomp
- Set block size correctly in cbcmac
Drivers:
- Do not use sg_dma_len before mapping in sun8i-ss
- Fix warm-reboot failure by making shutdown do more work in qat
- Add locking in zynqmp-sha
- Remove cavium/zip
- Add support for PCI device 0x17D8 to ccp
- Add qat_6xxx support in qat
- Add support for RK3576 in rockchip-rng
- Add support for i.MX8QM in caam
Others:
- Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up
- Add new SEV/SNP platform shutdown API in ccp"
* tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits)
x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining
crypto: qat - add missing header inclusion
crypto: api - Redo lookup on EEXIST
Revert "crypto: testmgr - Add hash export format testing"
crypto: marvell/cesa - Do not chain submitted requests
crypto: powerpc/poly1305 - add depends on BROKEN for now
Revert "crypto: powerpc/poly1305 - Add SIMD fallback"
crypto: ccp - Add missing tee info reg for teev2
crypto: ccp - Add missing bootloader info reg for pspv5
crypto: sun8i-ce - move fallback ahash_request to the end of the struct
crypto: octeontx2 - Use dynamic allocated memory region for lmtst
crypto: octeontx2 - Initialize cptlfs device info once
crypto: xts - Only add ecb if it is not already there
crypto: lrw - Only add ecb if it is not already there
crypto: testmgr - Add hash export format testing
crypto: testmgr - Use ahash for generic tfm
crypto: hmac - Add ahash support
crypto: testmgr - Ignore EEXIST on shash allocation
crypto: algapi - Add driver template support to crypto_inst_setname
crypto: shash - Set reqsize in shash_alg
...
|
|
In some cases tdx_tdvpr_pa() is not fully inlined into tdh_vp_enter(), which
causes the following warning:
vmlinux.o: warning: objtool: tdh_vp_enter+0x8: call to tdx_tdvpr_pa() leaves .noinstr.text section
This happens if the compiler considers tdx_tdvpr_pa() to be "large", for example
because CONFIG_SPARSEMEM adds two function calls to page_to_section() and
__section_mem_map_addr():
({ const struct page *__pg = (pg); \
int __sec = page_to_section(__pg); \
(unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec)));
\
})
Because exiting the noinstr section is a no-no, just mark tdh_vp_enter() for
full inlining.
Reported-by: kernel test robot <lkp@intel.com>
Analyzed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505240530.5KktQ5mX-lkp@intel.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
"Cleanups for the kernel's CRC (cyclic redundancy check) code:
- Use __ro_after_init where appropriate
- Remove unnecessary static_key on s390
- Rename some source code files
- Rename the crc32 and crc32c crypto API modules
- Use subsys_initcall instead of arch_initcall
- Restore maintainers for crc_kunit.c
- Fold crc16_byte() into crc16.c
- Add some SPDX license identifiers"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crc32: add SPDX license identifier
lib/crc16: unexport crc16_table and crc16_byte()
w1: ds2406: use crc16() instead of crc16_byte() loop
MAINTAINERS: add crc_kunit.c back to CRC LIBRARY
lib/crc: make arch-optimized code use subsys_initcall
crypto: crc32 - remove "generic" from file and module names
x86/crc: drop "glue" from filenames
sparc/crc: drop "glue" from filenames
s390/crc: drop "glue" from filenames
powerpc/crc: rename crc32-vpmsum_core.S to crc-vpmsum-template.S
powerpc/crc: drop "glue" from filenames
arm64/crc: drop "glue" from filenames
arm/crc: drop "glue" from filenames
s390/crc32: Remove no-op module init and exit functions
s390/crc32: Remove have_vxrs static key
lib/crc: make the CPU feature static keys __ro_after_init
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.16
* New features:
- Add large stage-2 mapping support for non-protected pKVM guests,
clawing back some performance.
- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
protected modes.
- Enable nested virtualisation support on systems that support it
(yes, it has been a long time coming), though it is disabled by
default.
* Improvements, fixes and cleanups:
- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. This ensures correctness of
emulation (the data is automatically extracted from the published
JSON files), and helps dealing with the evolution of the
architecture.
- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.
- New selftest checking the pKVM ownership transition rules
- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.
- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.
- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.
- Add a new selftest for the SVE host state being corrupted by a
guest.
- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.
- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.
- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.
- and the usual random cleanups.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.16
1. Don't flush tlb if HW PTW supported.
2. Add LoongArch KVM selftests support.
|
|
Merge updates related to system sleep handling and runtime PM for 6.16-rc1:
- Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja
Kalla).
- Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki).
- Add new devm_ functions for enabling runtime PM and runtime PM
reference counting (Bence Csókás).
- Remove size arguments from strscpy() calls in the hibernation core
code (Thorsten Blum).
- Adjust the handling of devices with asynchronous suspend enabled
during system suspend and resume to start resuming them immediately
after resuming their parents and to start suspending such a device
immediately after suspending its first child (Rafael Wysocki).
- Adjust messages printed during tasks freezing to avoid using
pr_cont() (Andrew Sayers, Paul Menzel).
- Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan
Zhang).
- Add missing wakeup source attribute relax_count to sysfs and
remove the space character at the end ofi the string produced by
pm_show_wakelocks() (Zijun Hu).
- Add configurable pm_test delay for hibernation (Zihuan Zhang).
- Disable asynchronous suspend in ucsi_ccg_probe() to prevent the
cypd4226 device on Tegra boards from suspending prematurely (Jon
Hunter).
- Unbreak printing PM debug messages during hibernation and clean up
some related code (Rafael Wysocki).
* pm-runtime:
PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn()
PM: sysfs: Move debug runtime PM attributes to runtime_attrs[]
PM: runtime: Add new devm functions
* pm-sleep:
PM: freezer: Rewrite restarting tasks log to remove stray *done.*
PM: sleep: Introduce pm_sleep_transition_in_progress()
PM: sleep: Introduce pm_suspend_in_progress()
PM: sleep: Print PM debug messages during hibernation
ucsi_ccg: Disable async suspend in ucsi_ccg_probe()
PM: hibernate: add configurable delay for pm_test
PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks()
PM: wakeup: Add missing wakeup source attribute relax_count
PM: sleep: Remove unnecessary !!
PM: sleep: Use two lines for "Restarting..." / "done" messages
PM: sleep: Make suspend of devices more asynchronous
PM: sleep: Suspend async parents after suspending children
PM: sleep: Resume children after resuming the parent
PM: hibernate: Remove size arguments when calling strscpy()
|
|
Merge updates related to the handling of static (data-only) ACPI tables
for 6.16-rc1:
- Add __nonstring annotations for unterminated strings in the static
ACPI tables parsing code (Kees Cook).
- Add support for parsing the MRRM ACPI table and sysfs files to
describe memory regions listed in it (Tony Luck, Anil Keshavamurthy).
- Remove an (explicitly) unused header file include from the VIOT ACPI
table parser file (Andy Shevchenko).
- Improve logging around acpi_initialize_tables() (Bartosz Szczepanek).
* acpi-tables:
ACPI: MRRM: Fix default max memory region
ACPI: tables: Improve logging around acpi_initialize_tables()
ACPI: VIOT: Remove (explicitly) unused header
ACPI: Add documentation for exposing MRRM data
ACPI: MRRM: Add /sys files to describe memory ranges
ACPI: MRRM: Minimal parse of ACPI MRRM table
ACPI: tables: Add __nonstring annotations for unterminated strings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Use folios for symlinks in the page cache
FUSE already uses folios for its symlinks. Mirror that conversion
in the generic code and the NFS code. That lets us get rid of a few
folio->page->folio conversions in this path, and some of the few
remaining users of read_cache_page() / read_mapping_page()
- Try and make a few filesystem operations killable on the VFS
inode->i_mutex level
- Add sysctl vfs_cache_pressure_denom for bulk file operations
Some workloads need to preserve more dentries than we currently
allow through out sysctl interface
A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
involves scanning all files and caching their metadata (including
dentries and inodes) in memory. Each HDD contains approximately 2
million files, resulting in a total of ~20 million cached dentries
after initialization
To minimize dentry reclamation, they set vfs_cache_pressure to 1.
Despite this configuration, memory pressure conditions can still
trigger reclamation of up to 50% of cached dentries, reducing the
cache from 20 million to approximately 10 million entries. During
the subsequent cache rebuild period, any HDFS datanode restart
operation incurs substantial latency penalties until full cache
recovery completes
To maintain service stability, more dentries need to be preserved
during memory reclamation. The current minimum reclaim ratio (1/100
of total dentries) remains too aggressive for such workload. This
patch introduces vfs_cache_pressure_denom for more granular cache
pressure control
The configuration [vfs_cache_pressure=1,
vfs_cache_pressure_denom=10000] effectively maintains the full 20
million dentry cache under memory pressure, preventing datanode
restart performance degradation
- Avoid some jumps in inode_permission() using likely()/unlikely()
- Avid a memory access which is most likely a cache miss when
descending into devcgroup_inode_permission()
- Add fastpath predicts for stat() and fdput()
- Anonymous inodes currently don't come with a proper mode causing
issues in the kernel when we want to add useful VFS debug assert.
Fix that by giving them a proper mode and masking it off when we
report it to userspace which relies on them not having any mode
- Anonymous inodes currently allow to change inode attributes because
the VFS falls back to simple_setattr() if i_op->setattr isn't
implemented. This means the ownership and mode for every single
user of anon_inode_inode can be changed. Block that as it's either
useless or actively harmful. If specific ownership is needed the
respective subsystem should allocate anonymous inodes from their
own private superblock
- Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock
- Add proper tests for anonymous inode behavior
- Make it easy to detect proper anonymous inodes and to ensure that
we can detect them in codepaths such as readahead()
Cleanups:
- Port pidfs to the new anon_inode_{g,s}etattr() helpers
- Try to remove the uselib() system call
- Add unlikely branch hint return path for poll
- Add unlikely branch hint on return path for core_sys_select
- Don't allow signals to interrupt getdents copying for fuse
- Provide a size hint to dir_context for during readdir()
- Use writeback_iter directly in mpage_writepages
- Update compression and mtime descriptions in initramfs
documentation
- Update main netfs API document
- Remove useless plus one in super_cache_scan()
- Remove unnecessary NULL-check guards during setns()
- Add separate separate {get,put}_cgroup_ns no-op cases
Fixes:
- Fix typo in root= kernel parameter description
- Use KERN_INFO for infof()|info_plog()|infofc()
- Correct comments of fs_validate_description()
- Mark an unlikely if condition with unlikely() in
vfs_parse_monolithic_sep()
- Delete macro fsparam_u32hex()
- Remove unused and problematic validate_constant_table()
- Fix potential unsigned integer underflow in fs_name()
- Make file-nr output the total allocated file handles"
* tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits)
fs: Pass a folio to page_put_link()
nfs: Use a folio in nfs_get_link()
fs: Convert __page_get_link() to use a folio
fs/read_write: make default_llseek() killable
fs/open: make do_truncate() killable
fs/open: make chmod_common() and chown_common() killable
include/linux/fs.h: add inode_lock_killable()
readdir: supply dir_context.count as readdir buffer size hint
vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
fuse: don't allow signals to interrupt getdents copying
Documentation: fix typo in root= kernel parameter description
include/cgroup: separate {get,put}_cgroup_ns no-op case
kernel/nsproxy: remove unnecessary guards
fs: use writeback_iter directly in mpage_writepages
fs: remove useless plus one in super_cache_scan()
fs: add S_ANON_INODE
fs: remove uselib() system call
device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
fs/fs_parse: Remove unused and problematic validate_constant_table()
fs: touch up predicts in inode_permission()
...
|
|
irq_fpu_usable() incorrectly returned true before the FPU is
initialized. The x86 CPU onlining code can call sha256() to checksum
AMD microcode images, before the FPU is initialized. Since sha256()
recently gained a kernel-mode FPU optimized code path, a crash occurred
in kernel_fpu_begin_mask() during hotplug CPU onlining.
(The crash did not occur during boot-time CPU onlining, since the
optimized sha256() code is not enabled until subsys_initcalls run.)
Fix this by making irq_fpu_usable() return false before fpu__init_cpu()
has run. To do this without adding any additional overhead to
irq_fpu_usable(), replace the existing per-CPU bool in_kernel_fpu with
kernel_fpu_allowed which tracks both initialization and usage rather
than just usage. The initial state is false; FPU initialization sets it
to true; kernel-mode FPU sections toggle it to false and then back to
true; and CPU offlining restores it to the initial state of false.
Fixes: 11d7956d526f ("crypto: x86/sha256 - implement library instead of shash")
Reported-by: Ayush Jain <Ayush.Jain3@amd.com>
Closes: https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ayush Jain <Ayush.Jain3@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
* kvm-arm64/ubsan-el2:
: .
: Add UBSAN support to the EL2 portion of KVM, reusing most of the
: existing logic provided by CONFIG_IBSAN_TRAP.
:
: Patches courtesy of Mostafa Saleh.
: .
KVM: arm64: Handle UBSAN faults
KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
ubsan: Remove regs from report_ubsan_failure()
arm64: Introduce esr_is_ubsan_brk()
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Let's just have it in a single function. The resulting function is
certainly small enough and readable.
Link: https://lkml.kernel.org/r/20250512123424.637989-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Airlie <airlied@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The "memramp() shrinking" scenario no longer applies, so let's remove that
now-unnecessary handling.
Link: https://lkml.kernel.org/r/20250512123424.637989-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Airlie <airlied@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Always set to 0, so let's remove it.
Link: https://lkml.kernel.org/r/20250512123424.637989-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Airlie <airlied@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We can now get rid of the old interface along with get_pat_info() and
follow_phys().
Link: https://lkml.kernel.org/r/20250512123424.637989-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Airlie <airlied@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Let's provide variants of track_pfn_remap() and untrack_pfn() that won't
mess with VMAs, and replace the usage in mm/memremap.c.
Add some documentation.
Link: https://lkml.kernel.org/r/20250512123424.637989-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Airlie <airlied@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
... by factoring it out from track_pfn_remap() into
pfnmap_setup_cachemode() and provide pfnmap_setup_cachemode_pfn() as a
replacement for track_pfn_insert().
For PMDs/PUDs, we keep checking a single pfn only. Add some
documentation, and also document why it is valid to not check the whole
pfn range.
We'll reuse pfnmap_setup_cachemode() from core MM next.
Link: https://lkml.kernel.org/r/20250512123424.637989-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits]
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Airlie <airlied@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
VM_PAT annoyed me too much and wasted too much of my time, let's clean PAT
handling up and remove VM_PAT.
This should sort out various issues with VM_PAT we discovered recently,
and will hopefully make the whole code more stable and easier to maintain.
In essence: we stop letting PAT mode mess with VMAs and instead lift what
to track/untrack to the MM core. We remember per VMA which pfn range we
tracked in a new struct we attach to a VMA (we have space without
exceeding 192 bytes), use a kref to share it among VMAs during
split/mremap/fork, and automatically untrack once the kref drops to 0.
This implies that we'll keep tracking a full pfn range even after
partially unmapping it, until fully unmapping it; but as that case was
mostly broken before, this at least makes it work in a way that is least
intrusive to VMA handling.
Shrinking with mremap() used to work in a hacky way, now we'll similarly
keep the original pfn range tacked even after this form of partial unmap.
Does anybody care about that? Unlikely. If we run into issues, we could
likely handled that (adjust the tracking) when our kref drops to 1 while
freeing a VMA. But it adds more complexity, so avoid that for now.
Briefly tested with the new pfnmap selftests [1].
This patch (of 11):
Let's factor it out to make the code easier to grasp. Drop one comment
where it is now rather obvious what is happening.
Use it also in pgprot_writecombine()/pgprot_writethrough() where clearing
the old cachemode might not be required, but given that we are already
doing a function call, no need to care about this micro-optimization.
Link: https://lkml.kernel.org/r/20250512123424.637989-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250512123424.637989-2-david@redhat.com
Link: https://lkml.kernel.org/r/20250509153033.952746-1-david@redhat.com [1]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits]
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Airlie <airlied@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This adds an addition layer of protection for the saved copy of dm crypt
key. Trying to access the saved copy will cause page fault.
Link: https://lkml.kernel.org/r/20250502011246.99238-9-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Suggested-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
1st kernel will build up the kernel command parameter dmcryptkeys as
similar to elfcorehdr to pass the memory address of the stored info of dm
crypt key to kdump kernel.
Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit 693bbf2a50447353c6a47961e6a7240a823ace02 as kdump LUKS
support (CONFIG_CRASH_DM_CRYPT) depends on __set_memory_prot.
[akpm@linux-foundation.org: x86 set_memory.h needs pgtable_types.h]
Link: https://lkml.kernel.org/r/20250502011246.99238-7-coxu@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jan Pazdziora <jpazdziora@redhat.com>
Cc: Liu Pingfan <kernelfans@gmail.com>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Ondrej Kozina <okozina@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
* Rename constants to their standard PE names:
- MZ_MAGIC -> IMAGE_DOS_SIGNATURE
- PE_MAGIC -> IMAGE_NT_SIGNATURE
- PE_OPT_MAGIC_PE32_ROM -> IMAGE_ROM_OPTIONAL_HDR_MAGIC
- PE_OPT_MAGIC_PE32 -> IMAGE_NT_OPTIONAL_HDR32_MAGIC
- PE_OPT_MAGIC_PE32PLUS -> IMAGE_NT_OPTIONAL_HDR64_MAGIC
- IMAGE_DLL_CHARACTERISTICS_NX_COMPAT -> IMAGE_DLLCHARACTERISTICS_NX_COMPAT
* Import constants and their description from readpe and file projects
which contains current up-to-date information:
- IMAGE_FILE_MACHINE_*
- IMAGE_FILE_*
- IMAGE_SUBSYSTEM_*
- IMAGE_DLLCHARACTERISTICS_*
- IMAGE_DLLCHARACTERISTICS_EX_*
- IMAGE_DEBUG_TYPE_*
* Add missing IMAGE_SCN_* constants and update their incorrect description
* Fix incorrect value of IMAGE_SCN_MEM_PURGEABLE constant
* Add description for win32_version and loader_flags PE fields
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The throttle support has been added in the generic code. Remove
the driver-specific throttle support.
Besides the throttle, perf_event_overflow may return true because of
event_limit. It already does an inatomic event disable. The pmu->stop
is not required either.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250520181644.2673067-6-kan.liang@linux.intel.com
|
|
The throttle support has been added in the generic code. Remove
the driver-specific throttle support.
Besides the throttle, perf_event_overflow may return true because of
event_limit. It already does an inatomic event disable. The pmu->stop
is not required either.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250520181644.2673067-5-kan.liang@linux.intel.com
|
|
The throttle support has been added in the generic code. Remove
the driver-specific throttle support.
Besides the throttle, perf_event_overflow may return true because of
event_limit. It already does an inatomic event disable. The pmu->stop
is not required either.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250520181644.2673067-4-kan.liang@linux.intel.com
|
|
Commit
480e803dacf8 ("x86/bugs: Restructure spectre_v2 mitigation")
inadvertently changed the spectre-v2 mitigation default from eIBRS to IBRS on
Intel. While splitting the spectre_v2 mitigation in select/update/apply
functions, eIBRS and IBRS selection logic was separated in select and update.
This caused IBRS selection to not consider that eIBRS mitigation is already
selected, fix it.
Fixes: 480e803dacf8 ("x86/bugs: Restructure spectre_v2 mitigation")
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250520-eibrs-fix-v1-1-91bacd35ed09@linux.intel.com
|
|
Restructure the ITS mitigation to use select/update/apply functions like
the other mitigations.
There is a particularly complex interaction between ITS and Retbleed as CDT
(Call Depth Tracking) is a mitigation for both, and either its=stuff or
retbleed=stuff will attempt to enable CDT.
retbleed_update_mitigation() runs first and will check the necessary
pre-conditions for CDT if either ITS or Retbleed stuffing is selected. If
checks pass and ITS stuffing is selected, it will select stuffing for
Retbleed as well.
its_update_mitigation() runs after and will either select stuffing if
retbleed stuffing was enabled, or fall back to the default (aligned thunks)
if stuffing could not be enabled.
Enablement of CDT is done exclusively in retbleed_apply_mitigation().
its_apply_mitigation() is only used to enable aligned thunks.
Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/20250516193212.128782-1-david.kaplan@amd.com
|
|
Pick up build fixes from upstream to make this tree more testable.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
xen_read_msr_safe() currently passes an uninitialized argument 'err' to
xen_do_read_msr(). But as xen_do_read_msr() may not set the argument,
xen_read_msr_safe() could return err with an unpredictable value.
To ensure correctness, initialize err to 0 (representing success)
in xen_read_msr_safe().
Do the same in xen_read_msr(), even err is not used after being passed
to xen_do_read_msr().
Closes: https://lore.kernel.org/xen-devel/aBxNI_Q0-MhtBSZG@stanley.mountain/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250517165713.935384-1-xin@zytor.com
|