summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2025-05-29Merge tag 'x86_sgx_for_6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel software guard extension (SGX) updates from Dave Hansen: "A couple of x86/sgx changes. The first one is a no-brainer to use the (simple) SHA-256 library. For the second one, some folks doing testing noticed that SGX systems under memory pressure were inducing fatal machine checks at pretty unnerving rates, despite the SGX code having _some_ awareness of memory poison. It turns out that the SGX reclaim path was not checking for poison _and_ it always accesses memory to copy it around. Make sure that poisoned pages are not reclaimed" * tag 'x86_sgx_for_6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Prevent attempts to reclaim poisoned pages x86/sgx: Use SHA-256 library API instead of crypto_shash API
2025-05-29Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"Rafael J. Wysocki
Revert commit 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()") because it introduced a significant power regression on systems that start with "nosmt" in the kernel command line. Namely, on such systems, SMT siblings permanently go offline early, when cpuidle has not been initialized yet, so after the above commit, hlt_play_dead() is called for them. Later on, when the processor attempts to enter a deep package C-state, including PC10 which is requisite for reaching minimum power in suspend-to-idle, it is not able to do that because of the SMT siblings staying in C1 (which they have been put into by HLT). As a result, the idle power (including power in suspend-to-idle) rises quite dramatically on those systems with all of the possible consequences, which (needless to say) may not be expected by their users. This issue is hard to debug and potentially dangerous, so it needs to be addressed as soon as possible in a way that will work for 6.15.y, hence the revert. Of course, after this revert, the issue that commit 96040f7273e2 attempted to address will be back and it will need to be fixed again later. Fixes: 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()") Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com> Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 6.15+ <stable@vger.kernel.org> # 6.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/12674167.O9o76ZdvQC@rjwysocki.net
2025-05-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit 7bcf7246c42a ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ...
2025-05-28Merge branch 'kvm-lockdep-common' into HEADPaolo Bonzini
Introduce new mutex locking functions mutex_trylock_nest_lock() and mutex_lock_killable_nest_lock() and use them to clean up locking of all vCPUs for a VM. For x86, this removes some complex code that was used instead of lockdep's "nest_lock" feature. For ARM and RISC-V, this removes a lockdep warning when the VM is configured to have more than MAX_LOCK_DEPTH vCPUs, and removes a fair amount of duplicate code by sharing the logic across all architectures. Signed-off-by: Paolo BOnzini <pbonzini@redhat.com>
2025-05-27Merge tag 'pm-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Once again, the changes are dominated by cpufreq updates, but this time the majority of them are cpufreq core changes, mostly related to the introduction of policy locking guards and __free() usage, and fixes related to boost handling. Still, there is also a significant update of the intel_pstate driver making it register an energy model when running on a hybrid platform which is used for enabling energy-aware scheduling (EAS) if the driver operates in the passive mode (and schedutil is used as the cpufreq governor for all CPUs which is the passive mode default). There are some amd-pstate driver updates too, for a good measure, including the "Requested CPU Min frequency" BIOS option support and new online/offline callbacks. In the cpuidle space, the most significant change is the addition of a C1 demotion on/off sysfs knob to intel_idle which should help some users to configure their systems more precisely. There is also the conversion of the PSCI cpuidle driver to a faux device one and there are two small updates of cpuidle governors. Device power management is also modified quite a bit, especially the handling of devices with asynchronous suspend and resume enabled during system transitions. They are now going to be handled more asynchronously during suspend transitions and somewhat less aggressively during resume transitions. Apart from the above, the operating performance points (OPP) library is now going to use mutex locking guards and scope-based cleanup helpers and there is the usual bunch of assorted fixes and code cleanups. Specifics: - Fix potential division-by-zero error in em_compute_costs() (Yaxiong Tian) - Fix typos in energy model documentation and example driver code (Moon Hee Lee, Atul Kumar Pant) - Rearrange the energy model management code and add a new function for adjusting a CPU energy model after adjusting the capacity of the given CPU to it (Rafael Wysocki) - Refactor cpufreq_online(), add and use cpufreq policy locking guards, use __free() in policy reference counting, and clean up core cpufreq code on top of that (Rafael Wysocki) - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh Kumar) - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay Ugwekar) - Add offline, online and suspend callbacks to the amd-pstate driver, rename and use the existing amd_pstate_epp callbacks in it (Dhananjay Ugwekar) - Add support for the "Requested CPU Min frequency" BIOS option to the amd-pstate driver (Dhananjay Ugwekar) - Reset amd-pstate driver mode after running selftests (Swapnil Sapkal) - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan Chancellor) - Add helper for governor checks to the schedutil cpufreq governor and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki) - Populate the cpu_capacity sysfs entries from the intel_pstate driver after registering asym capacity support (Ricardo Neri) - Add support for enabling Energy-aware scheduling (EAS) to the intel_pstate driver when operating in the passive mode on a hybrid platform (Rafael Wysocki) - Drop redundant cpus_read_lock() from store_local_boost() in the cpufreq core (Seyediman Seyedarab) - Replace sscanf() with kstrtouint() in the cpufreq code and use a symbol instead of a raw number in it (Bowen Yu) - Add support for autonomous CPU performance state selection to the CPPC cpufreq driver (Lifeng Zheng) - OPP: Add dev_pm_opp_set_level() (Praveen Talari) - Introduce scope-based cleanup headers and mutex locking guards in OPP core (Viresh Kumar) - Switch OPP to use kmemdup_array() (Zhang Enpei) - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the menu cpuidle governor (Zhongqiu Han) - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla) - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem Bityutskiy) - Fix typos in two comments in the teo cpuidle governor (Atul Kumar Pant) - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja Kalla) - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki) - Add new devm_ functions for enabling runtime PM and runtime PM reference counting (Bence Csókás) - Remove size arguments from strscpy() calls in the hibernation core code (Thorsten Blum) - Adjust the handling of devices with asynchronous suspend enabled during system suspend and resume to start resuming them immediately after resuming their parents and to start suspending such a device immediately after suspending its first child (Rafael Wysocki) - Adjust messages printed during tasks freezing to avoid using pr_cont() (Andrew Sayers, Paul Menzel) - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan Zhang) - Add missing wakeup source attribute relax_count to sysfs and remove the space character at the end ofi the string produced by pm_show_wakelocks() (Zijun Hu) - Add configurable pm_test delay for hibernation (Zihuan Zhang) - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the cypd4226 device on Tegra boards from suspending prematurely (Jon Hunter) - Unbreak printing PM debug messages during hibernation and clean up some related code (Rafael Wysocki) - Add a systemd service to run cpupower and change cpupower binding's Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli)" * tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits) cpufreq: CPPC: Add support for autonomous selection cpufreq: Update sscanf() to kstrtouint() cpufreq: Replace magic number OPP: switch to use kmemdup_array() PM: freezer: Rewrite restarting tasks log to remove stray *done.* PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn() cpufreq: drop redundant cpus_read_lock() from store_local_boost() cpupower: do not install files to /etc/default/ cpupower: do not call systemctl at install time cpupower: do not write DESTDIR to cpupower.service PM: sleep: Introduce pm_sleep_transition_in_progress() cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver() cpufreq: intel_pstate: Document hybrid processor support cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache cpufreq: intel_pstate: EAS support for hybrid platforms PM: EM: Introduce em_adjust_cpu_capacity() PM: EM: Move CPU capacity check to em_adjust_new_capacity() PM: EM: Documentation: Fix typos in example driver code cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas() PM: sleep: Introduce pm_suspend_in_progress() ...
2025-05-27Merge tag 'acpi-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "The most significant part of these changes is an ACPICA update covering two upstream ACPICA releases, 20241212 and 20250404, that have not been included into the kernel code base yet. Among other things, it adds definitions needed to address GCC 15's -Wunterminated-string-initialization warnings, adds support for three new tables (MRRM, ERDT, RIMT), extends support for two tables (RAS2, DMAR), and fixes some issues. On top of the above, there is a new parser for the MRRM table, more changes related to GCC 15's -Wunterminated-string-initialization warnings, a CPPC library update including functions related to autonomous CPU performance state selection, a couple of new quirks, some assorted fixes and some code cleanups. Specifics: - Fix two ACPICA SLAB cache leaks (Seunghun Han) - Add EINJv2 get error type action and define Error Injection Actions in hex values to avoid inconsistencies between the specification and the code (Zaid Alali) - Fix typo in comments for SRAT structures (Adam Lackorzynski) - Prevent possible loss of data in ACPICA because of u32 to u8 conversions (Saket Dumbre) - Fix reading FFixedHW operation regions in ACPICA (Daniil Tatianin) - Add support for printing AML arguments when the ACPICA debug level is ACPI_LV_TRACE_POINT (Mario Limonciello) - Drop a stale comment about the file content from actbl2.h (Sudeep Holla) - Apply pack(1) to union aml_resource (Tamir Duberstein) - Fix overflow check in the ACPICA version of vsnprintf() (gldrk) - Interpret SIDP structures in DMAR added revision 3.4 of the VT-d specification (Alexey Neyman) - Add typedef and other definitions related to MRRM to ACPICA (Tony Luck) - Add definitions for RIMT to ACPICA (Sunil V L) - Fix spelling mistake "Incremement" -> "Increment" in the ACPICA utilities code (Colin Ian King) - Add typedef and other definitions for ERDT to ACPICA (Tony Luck) - Introduce ACPI_NONSTRING and use it (Kees Cook, Ahmed Salem) - Rename structure and field names of the RAS2 table in actbl2.h (Shiju Jose) - Fix up whitespace in acpica/utcache.c (Zhe Qiao) - Avoid sequence overread in a call to strncmp() in ap_get_table_length() and replace strncpy() with memcpy() in ACPICA in some places (Ahmed Salem) - Update copyright year in all ACPICA files (Saket Dumbre) - Add __nonstring annotations for unterminated strings in the static ACPI tables parsing code (Kees Cook) - Add support for parsing the MRRM ACPI table and sysfs files to describe memory regions listed in it (Tony Luck, Anil Keshavamurthy) - Remove an (explicitly) unused header file include from the VIOT ACPI table parser file (Andy Shevchenko) - Improve logging around acpi_initialize_tables() (Bartosz Szczepanek) - Clean up the initialization of CPU data structures in the ACPI processor driver (Zhang Rui) - Remove an obsolete comment regarding the C-states handling in the ACPI processor driver (Giovanni Gherdovich) - Simplify PCC shared memory region handling (Sudeep Holla) - Rework and extend functions for reading CPPC register values and for updating CPPC registers (Lifeng Zheng) - Add three functions related to autonomous CPU performance state selection to the CPPC library (Lifeng Zheng) - Turn the acpi_pci_root_remap_iospace() fwnode_handle parameter into a const pointer (Pei Xiao) - Round battery capacity percengate in the ACPI battery driver to the closest integer to avoid user confusion (shitao) - Make the ACPI battery driver report the current as a negative number to the power supply framework when the battery is discharging as documented (Peter Marheine) - Add TUXEDO InfinityBook Pro AMD Gen9 to the acpi_ec_no_wakeup[] list to prevent spurious wakeups from suspend-to-idle (Werner Sembach) - Convert the APEI EINJ driver to a faux device one (Sudeep Holla, Jon Hunter) - Remove redundant calls to einj_get_available_error_type() from the APEI EINJ driver (Zaid Alali) - Fix a typo for MECHREVO in irq1_edge_low_force_override[] (Mingcong Bai) - Add an LPS0 check() callback to the AMD pinctrl driver and fix up config symbol dependencies in it (Mario Limonciello, Rafael Wysocki) - Avoid initializing the ACPI platform profile driver on non-ACPI platforms (Alexandre Ghiti) - Document that references to ACPI data (non-device) nodes should use string-only references in hierarchical data node packages (Sakari Ailus) - Fail the ACPI bus registration if acpi_kobj registration fails (Armin Wolf)" * tag 'acpi-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) ACPI: MRRM: Fix default max memory region ACPI: bus: Bail out if acpi_kobj registration fails ACPI: platform_profile: Avoid initializing on non-ACPI platforms pinctrl: amd: Fix hibernation support with CONFIG_SUSPEND unset ACPI: tables: Improve logging around acpi_initialize_tables() ACPI: VIOT: Remove (explicitly) unused header ACPI: Add documentation for exposing MRRM data ACPI: MRRM: Add /sys files to describe memory ranges ACPI: MRRM: Minimal parse of ACPI MRRM table ACPICA: Update copyright year ACPICA: Logfile: Changes for version 20250404 ACPICA: Replace strncpy() with memcpy() ACPICA: Apply ACPI_NONSTRING in more places ACPICA: Avoid sequence overread in call to strncmp() ACPICA: Adjust the position of code lines ACPICA: actbl2.h: ACPI 6.5: RAS2: Rename structure and field names of the RAS2 table ACPICA: Apply ACPI_NONSTRING ACPICA: Introduce ACPI_NONSTRING ACPICA: actbl2.h: ERDT: Add typedef and other definitions ACPICA: infrastructure: Add new DMT_BUF types and shorten a long name ...
2025-05-27Merge tag 'x86_sev_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull AMD SEV update from Borislav Petkov: "Add a virtual TPM driver glue which allows a guest kernel to talk to a TPM device emulated by a Secure VM Service Module (SVSM) - a helper module of sorts which runs at a different privilege level in the SEV-SNP VM stack. The intent being that a TPM device is emulated by a trusted entity and not by the untrusted host which is the default assumption in the confidential computing scenarios" * tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Register tpm-svsm platform device tpm: Add SNP SVSM vTPM driver svsm: Add header with SVSM_VTPM_CMD helpers x86/sev: Add SVSM vTPM probe/send_command functions
2025-05-27Merge tag 'x86_mtrr_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mtrr update from Borislav Petkov: "A single change to verify the presence of fixed MTRR ranges before accessing the respective MSRs" * tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges()
2025-05-27Merge tag 'x86_cache_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...
2025-05-27x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementationMaxim Levitsky
Use kvm_lock_all_vcpus instead of sev's own implementation. Because kvm_lock_all_vcpus uses the _nest_lock feature of lockdep, which ignores subclasses, there is no longer a need to use separate subclasses for source and target VMs. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27Merge tag 'kvm-x86-svm-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM SVM changes for 6.16: - Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. - Add support for ALLOWED_SEV_FEATURES. - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. - Don't account temporary allocations in sev_send_update_data(). - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
2025-05-27Merge tag 'kvm-x86-vmx-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM VMX changes for 6.16: - Explicitly check MSR load/store list counts to fix a potential overflow on 32-bit kernels. - Flush shadow VMCSes on emergency reboot. - Revert mem_enc_ioctl() back to an optional hook, as it's nullified when SEV or TDX is disabled via Kconfig. - Macrofy the handling of vt_x86_ops to eliminate a pile of boilerplate code needed for TDX, and to optimize CONFIG_KVM_INTEL_TDX=n builds.
2025-05-27Merge tag 'kvm-x86-selftests-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM selftests changes for 6.16: - Add support for SNP to the various SEV selftests. - Add a selftest to verify fastops instructions via forced emulation. - Add MGLRU support to the access tracking perf test.
2025-05-27Merge tag 'kvm-x86-pir-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 posted interrupt changes for 6.16: Refine and optimize KVM's software processing of the PIR, and ultimately share PIR harvesting code between KVM and the kernel's Posted MSI handler
2025-05-27Merge tag 'kvm-x86-mmu-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 MMU changes for 6.16: - Refine and harden handling of spurious faults. - Use kvm_x86_call() instead of open coding static_call().
2025-05-27Merge tag 'kvm-x86-misc-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.16: - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX. - Advertise support to userspace for WRMSRNS and PREFETCHI. - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing. - Add a module param to control and enumerate support for device posted interrupts. - Misc cleanups.
2025-05-27KVM: VMX: use __always_inline for is_td_vcpu and is_tdEdward Adam Davis
is_td() and is_td_vcpu() are used in no-instrumentation sections; use __always_inline instead of inline. vmlinux.o: error: objtool: vmx_handle_nmi+0x47: call to is_td_vcpu.isra.0() leaves .noinstr.text section Fixes: 7172c753c26a ("KVM: VMX: Move common fields of struct vcpu_{vmx,tdx} to a struct") Signed-off-by: Edward Adam Davis <eadavis@qq.com> Message-ID: <tencent_1A767567C83C1137829622362E4A72756F09@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27Merge tag 'timers-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "Another set of timer API cleanups: - Convert init_timer*(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_*() namespace convention. There is another large conversion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged" * tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() timers: Rename init_timers() as timers_init() timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA timers: Rename __init_timer_on_stack() as __timer_init_on_stack() timers: Rename __init_timer() as __timer_init() timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack() timers: Rename init_timer_key() as timer_init_key()
2025-05-27Merge tag 'irq-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header" * tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) irqdomain: Consolidate coding style irqdomain: Fix kernel-doc and add it to Documentation Documentation: irqdomain: Update it Documentation: irq-domain.rst: Simple improvements Documentation: irq/concepts: Minor improvements Documentation: irq/concepts: Add commas and reflow irqdomain: Improve kernel-docs of functions irqdomain: Make struct irq_domain_info variables const irqdomain: Use irq_domain_instantiate()'s return value as initializers irqdomain: Drop irq_linear_revmap() pinctrl: keembay: Switch to irq_find_mapping() irqchip/armada-370-xp: Switch to irq_find_mapping() gpu: ipu-v3: Switch to irq_find_mapping() gpio: idt3243x: Switch to irq_find_mapping() sh: Switch to irq_find_mapping() powerpc: Switch to irq_find_mapping() irqdomain: Drop irq_domain_add_*() functions powerpc: Switch irq_domain_add_nomap() to use fwnode thermal: Switch to irq_domain_create_linear() soc: Switch to irq_domain_create_*() ...
2025-05-26Merge tag 'x86-build-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "Add the CONFIG_X86_NATIVE_CPU=y build option to locally optimize the kernel build with '-march=native'" * tag 'x86-build-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kbuild/64: Restrict clang versions that can use '-march=native' x86/kbuild/64: Test for the availability of the -mtune=native compiler flag x86/kbuild/64: Add the CONFIG_X86_NATIVE_CPU option to locally optimize the kernel with '-march=native'
2025-05-26Merge tag 'x86-entry-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "Two changes to simplify the x86 vDSO code a bit" * tag 'x86-entry-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Remove redundant #ifdeffery around in_ia32_syscall() x86/vdso: Remove #ifdeffery around page setup variants
2025-05-26Merge tag 'x86-debug-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug updates from Ingo Molnar: "Move the x86 page fault tracepoints to generic code, because other architectures would like to make use of them as well" * tag 'x86-debug-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tracing, x86/mm: Move page fault tracepoints to generic x86/tracing, x86/mm: Remove redundant trace_pagefault_key
2025-05-26Merge tag 'x86-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc x86 cleanups: kernel-doc updates and a string API transition patch" * tag 'x86-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/power: hibernate: Fix W=1 build kernel-doc warnings x86/mm/pat: Fix W=1 build kernel-doc warning x86/CPU/AMD: Replace strcpy() with strscpy()
2025-05-26Merge tag 'x86-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: | Since commit | | c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") | | dated Jun 6 2017, we have been using C code on the boot path in a way | that is not supported by the toolchain, i.e., to execute non-PIC C | code from a mapping of memory that is different from the one provided | to the linker. It should have been obvious at the time that this was a | bad idea, given the need to sprinkle fixup_pointer() calls left and | right to manipulate global variables (including non-pointer variables) | without crashing. | | This C startup code has been expanding, and in particular, the SEV-SNP | startup code has been expanding over the past couple of years, and | grown many of these warts, where the C code needs to use special | annotations or helpers to access global objects. This tree includes the first phase of this work-in-progress x86 boot code reorganization. Scalability enhancements and micro-optimizations: - Improve code-patching scalability (Eric Dumazet) - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper) CPU features enumeration updates: - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish) - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner) - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish) Memory management changes: - Allow temporary MMs when IRQs are on (Andy Lutomirski) - Opt-in to IRQs-off activate_mm() (Andy Lutomirski) - Simplify choose_new_asid() and generate better code (Borislav Petkov) - Simplify 32-bit PAE page table handling (Dave Hansen) - Always use dynamic memory layout (Kirill A. Shutemov) - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov) - Make 5-level paging support unconditional (Kirill A. Shutemov) - Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik) - Predict valid_user_address() returning true (Mateusz Guzik) - Consolidate initmem_init() (Mike Rapoport) FPU support and vector computing: - Enable Intel APX support (Chang S. Bae) - Reorgnize and clean up the xstate code (Chang S. Bae) - Make task_struct::thread constant size (Ingo Molnar) - Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y (Kees Cook) - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov) - Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson) Microcode loader changes: - Help users notice when running old Intel microcode (Dave Hansen) - AMD: Do not return error when microcode update is not necessary (Annie Li) - AMD: Clean the cache if update did not load microcode (Boris Ostrovsky) Code patching (alternatives) changes: - Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar) - Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() (Nikolay Borisov) - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra) Debugging support: - Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse) - Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam) - Add AMD Zen debugging document (Mario Limonciello) - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu) - Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu) CPU bugs and bug mitigations: - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov) - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov) - Restructure and harmonize the various CPU bug mitigation methods (David Kaplan) - Fix spectre_v2 mitigation default on Intel (Pawan Gupta) MSR API: - Large MSR code and API cleanup (Xin Li) - In-kernel MSR API type cleanups and renames (Ingo Molnar) PKEYS: - Simplify PKRU update in signal frame (Chang S. Bae) NMI handling code: - Clean up, refactor and simplify the NMI handling code (Sohil Mehta) - Improve NMI duration console printouts (Sohil Mehta) Paravirt guests interface: - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov) SEV support: - Share the sev_secrets_pa value again (Tom Lendacky) x86 platform changes: - Introduce the <asm/amd/> header namespace (Ingo Molnar) - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h> (Mario Limonciello) Fixes and cleanups: - x86 assembly code cleanups and fixes (Uros Bizjak) - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)" * tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits) x86/bugs: Fix spectre_v2 mitigation default on Intel x86/bugs: Restructure ITS mitigation x86/xen/msr: Fix uninitialized variable 'err' x86/msr: Remove a superfluous inclusion of <asm/asm.h> x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only x86/mm/64: Make 5-level paging support unconditional x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model x86/mm/64: Always use dynamic memory layout x86/bugs: Fix indentation due to ITS merge x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2() x86/cpuid: Rename have_cpuid_p() to cpuid_feature() x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h> x86/msr: Add rdmsrl_on_cpu() compatibility wrapper x86/mm: Fix kernel-doc descriptions of various pgtable methods x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too x86/boot: Defer initialization of VM space related global variables ...
2025-05-26Merge tag 'perf-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: "Core & generic-arch updates: - Add support for dynamic constraints and propagate it to the Intel driver (Kan Liang) - Fix & enhance driver-specific throttling support (Kan Liang) - Record sample last_period before updating on the x86 and PowerPC platforms (Mark Barnett) - Make perf_pmu_unregister() usable (Peter Zijlstra) - Unify perf_event_free_task() / perf_event_exit_task_context() (Peter Zijlstra) - Simplify perf_event_release_kernel() and perf_event_free_task() (Peter Zijlstra) - Allocate non-contiguous AUX pages by default (Yabin Cui) Uprobes updates: - Add support to emulate NOP instructions (Jiri Olsa) - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa) x86 Intel PMU enhancements: - Support Intel Auto Counter Reload [ACR] (Kan Liang) - Add PMU support for Clearwater Forest (Dapeng Mi) - Arch-PEBS preparatory changes: (Dapeng Mi) - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs - Decouple BTS initialization from PEBS initialization - Introduce pairs of PEBS static calls x86 AMD PMU enhancements: - Use hrtimer for handling overflows in the AMD uncore driver (Sandipan Das) - Prevent UMC counters from saturating (Sandipan Das) Fixes and cleanups: - Fix put_ctx() ordering (Frederic Weisbecker) - Fix irq work dereferencing garbage (Frederic Weisbecker) - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan Das, Thorsten Blum)" * tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) perf/headers: Clean up <linux/perf_event.h> a bit perf/uapi: Clean up <uapi/linux/perf_event.h> a bit perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h> mips/perf: Remove driver-specific throttle support xtensa/perf: Remove driver-specific throttle support sparc/perf: Remove driver-specific throttle support loongarch/perf: Remove driver-specific throttle support csky/perf: Remove driver-specific throttle support arc/perf: Remove driver-specific throttle support alpha/perf: Remove driver-specific throttle support perf/apple_m1: Remove driver-specific throttle support perf/arm: Remove driver-specific throttle support s390/perf: Remove driver-specific throttle support powerpc/perf: Remove driver-specific throttle support perf/x86/zhaoxin: Remove driver-specific throttle support perf/x86/amd: Remove driver-specific throttle support perf/x86/intel: Remove driver-specific throttle support perf: Only dump the throttle log for the leader perf: Fix the throttle logic for a group perf/core: Add the is_event_in_freq_mode() helper to simplify the code ...
2025-05-26Merge tag 'v6.16-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix memcpy_sglist to handle partially overlapping SG lists - Use memcpy_sglist to replace null skcipher - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS - Hide CRYPTO_MANAGER - Add delayed freeing of driver crypto_alg structures Compression: - Allocate large buffers on first use instead of initialisation in scomp - Drop destination linearisation buffer in scomp - Move scomp stream allocation into acomp - Add acomp scatter-gather walker - Remove request chaining - Add optional async request allocation Hashing: - Remove request chaining - Add optional async request allocation - Move partial block handling into API - Add ahash support to hmac - Fix shash documentation to disallow usage in hard IRQs Algorithms: - Remove unnecessary SIMD fallback code on x86 and arm/arm64 - Drop avx10_256 xts(aes)/ctr(aes) on x86 - Improve avx-512 optimisations for xts(aes) - Move chacha arch implementations into lib/crypto - Move poly1305 into lib/crypto and drop unused Crypto API algorithm - Disable powerpc/poly1305 as it has no SIMD fallback - Move sha256 arch implementations into lib/crypto - Convert deflate to acomp - Set block size correctly in cbcmac Drivers: - Do not use sg_dma_len before mapping in sun8i-ss - Fix warm-reboot failure by making shutdown do more work in qat - Add locking in zynqmp-sha - Remove cavium/zip - Add support for PCI device 0x17D8 to ccp - Add qat_6xxx support in qat - Add support for RK3576 in rockchip-rng - Add support for i.MX8QM in caam Others: - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up - Add new SEV/SNP platform shutdown API in ccp" * tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits) x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining crypto: qat - add missing header inclusion crypto: api - Redo lookup on EEXIST Revert "crypto: testmgr - Add hash export format testing" crypto: marvell/cesa - Do not chain submitted requests crypto: powerpc/poly1305 - add depends on BROKEN for now Revert "crypto: powerpc/poly1305 - Add SIMD fallback" crypto: ccp - Add missing tee info reg for teev2 crypto: ccp - Add missing bootloader info reg for pspv5 crypto: sun8i-ce - move fallback ahash_request to the end of the struct crypto: octeontx2 - Use dynamic allocated memory region for lmtst crypto: octeontx2 - Initialize cptlfs device info once crypto: xts - Only add ecb if it is not already there crypto: lrw - Only add ecb if it is not already there crypto: testmgr - Add hash export format testing crypto: testmgr - Use ahash for generic tfm crypto: hmac - Add ahash support crypto: testmgr - Ignore EEXIST on shash allocation crypto: algapi - Add driver template support to crypto_inst_setname crypto: shash - Set reqsize in shash_alg ...
2025-05-26x86/tdx: mark tdh_vp_enter() as __flattenPaolo Bonzini
In some cases tdx_tdvpr_pa() is not fully inlined into tdh_vp_enter(), which causes the following warning: vmlinux.o: warning: objtool: tdh_vp_enter+0x8: call to tdx_tdvpr_pa() leaves .noinstr.text section This happens if the compiler considers tdx_tdvpr_pa() to be "large", for example because CONFIG_SPARSEMEM adds two function calls to page_to_section() and __section_mem_map_addr(): ({ const struct page *__pg = (pg); \ int __sec = page_to_section(__pg); \ (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \ }) Because exiting the noinstr section is a no-no, just mark tdh_vp_enter() for full inlining. Reported-by: kernel test robot <lkp@intel.com> Analyzed-by: Xiaoyao Li <xiaoyao.li@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505240530.5KktQ5mX-lkp@intel.com/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-26Merge tag 'crc-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: "Cleanups for the kernel's CRC (cyclic redundancy check) code: - Use __ro_after_init where appropriate - Remove unnecessary static_key on s390 - Rename some source code files - Rename the crc32 and crc32c crypto API modules - Use subsys_initcall instead of arch_initcall - Restore maintainers for crc_kunit.c - Fold crc16_byte() into crc16.c - Add some SPDX license identifiers" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crc32: add SPDX license identifier lib/crc16: unexport crc16_table and crc16_byte() w1: ds2406: use crc16() instead of crc16_byte() loop MAINTAINERS: add crc_kunit.c back to CRC LIBRARY lib/crc: make arch-optimized code use subsys_initcall crypto: crc32 - remove "generic" from file and module names x86/crc: drop "glue" from filenames sparc/crc: drop "glue" from filenames s390/crc: drop "glue" from filenames powerpc/crc: rename crc32-vpmsum_core.S to crc-vpmsum-template.S powerpc/crc: drop "glue" from filenames arm64/crc: drop "glue" from filenames arm/crc: drop "glue" from filenames s390/crc32: Remove no-op module init and exit functions s390/crc32: Remove have_vxrs static key lib/crc: make the CPU feature static keys __ro_after_init
2025-05-26Merge tag 'kvmarm-6.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups.
2025-05-26Merge tag 'loongarch-kvm-6.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.16 1. Don't flush tlb if HW PTW supported. 2. Add LoongArch KVM selftests support.
2025-05-26Merge branches 'pm-runtime' and 'pm-sleep'Rafael J. Wysocki
Merge updates related to system sleep handling and runtime PM for 6.16-rc1: - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja Kalla). - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki). - Add new devm_ functions for enabling runtime PM and runtime PM reference counting (Bence Csókás). - Remove size arguments from strscpy() calls in the hibernation core code (Thorsten Blum). - Adjust the handling of devices with asynchronous suspend enabled during system suspend and resume to start resuming them immediately after resuming their parents and to start suspending such a device immediately after suspending its first child (Rafael Wysocki). - Adjust messages printed during tasks freezing to avoid using pr_cont() (Andrew Sayers, Paul Menzel). - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan Zhang). - Add missing wakeup source attribute relax_count to sysfs and remove the space character at the end ofi the string produced by pm_show_wakelocks() (Zijun Hu). - Add configurable pm_test delay for hibernation (Zihuan Zhang). - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the cypd4226 device on Tegra boards from suspending prematurely (Jon Hunter). - Unbreak printing PM debug messages during hibernation and clean up some related code (Rafael Wysocki). * pm-runtime: PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn() PM: sysfs: Move debug runtime PM attributes to runtime_attrs[] PM: runtime: Add new devm functions * pm-sleep: PM: freezer: Rewrite restarting tasks log to remove stray *done.* PM: sleep: Introduce pm_sleep_transition_in_progress() PM: sleep: Introduce pm_suspend_in_progress() PM: sleep: Print PM debug messages during hibernation ucsi_ccg: Disable async suspend in ucsi_ccg_probe() PM: hibernate: add configurable delay for pm_test PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks() PM: wakeup: Add missing wakeup source attribute relax_count PM: sleep: Remove unnecessary !! PM: sleep: Use two lines for "Restarting..." / "done" messages PM: sleep: Make suspend of devices more asynchronous PM: sleep: Suspend async parents after suspending children PM: sleep: Resume children after resuming the parent PM: hibernate: Remove size arguments when calling strscpy()
2025-05-26Merge branch 'acpi-tables'Rafael J. Wysocki
Merge updates related to the handling of static (data-only) ACPI tables for 6.16-rc1: - Add __nonstring annotations for unterminated strings in the static ACPI tables parsing code (Kees Cook). - Add support for parsing the MRRM ACPI table and sysfs files to describe memory regions listed in it (Tony Luck, Anil Keshavamurthy). - Remove an (explicitly) unused header file include from the VIOT ACPI table parser file (Andy Shevchenko). - Improve logging around acpi_initialize_tables() (Bartosz Szczepanek). * acpi-tables: ACPI: MRRM: Fix default max memory region ACPI: tables: Improve logging around acpi_initialize_tables() ACPI: VIOT: Remove (explicitly) unused header ACPI: Add documentation for exposing MRRM data ACPI: MRRM: Add /sys files to describe memory ranges ACPI: MRRM: Minimal parse of ACPI MRRM table ACPI: tables: Add __nonstring annotations for unterminated strings
2025-05-26Merge tag 'vfs-6.16-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Use folios for symlinks in the page cache FUSE already uses folios for its symlinks. Mirror that conversion in the generic code and the NFS code. That lets us get rid of a few folio->page->folio conversions in this path, and some of the few remaining users of read_cache_page() / read_mapping_page() - Try and make a few filesystem operations killable on the VFS inode->i_mutex level - Add sysctl vfs_cache_pressure_denom for bulk file operations Some workloads need to preserve more dentries than we currently allow through out sysctl interface A HDFS servers with 12 HDDs per server, on a HDFS datanode startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization To minimize dentry reclamation, they set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes To maintain service stability, more dentries need to be preserved during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for such workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation - Avoid some jumps in inode_permission() using likely()/unlikely() - Avid a memory access which is most likely a cache miss when descending into devcgroup_inode_permission() - Add fastpath predicts for stat() and fdput() - Anonymous inodes currently don't come with a proper mode causing issues in the kernel when we want to add useful VFS debug assert. Fix that by giving them a proper mode and masking it off when we report it to userspace which relies on them not having any mode - Anonymous inodes currently allow to change inode attributes because the VFS falls back to simple_setattr() if i_op->setattr isn't implemented. This means the ownership and mode for every single user of anon_inode_inode can be changed. Block that as it's either useless or actively harmful. If specific ownership is needed the respective subsystem should allocate anonymous inodes from their own private superblock - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock - Add proper tests for anonymous inode behavior - Make it easy to detect proper anonymous inodes and to ensure that we can detect them in codepaths such as readahead() Cleanups: - Port pidfs to the new anon_inode_{g,s}etattr() helpers - Try to remove the uselib() system call - Add unlikely branch hint return path for poll - Add unlikely branch hint on return path for core_sys_select - Don't allow signals to interrupt getdents copying for fuse - Provide a size hint to dir_context for during readdir() - Use writeback_iter directly in mpage_writepages - Update compression and mtime descriptions in initramfs documentation - Update main netfs API document - Remove useless plus one in super_cache_scan() - Remove unnecessary NULL-check guards during setns() - Add separate separate {get,put}_cgroup_ns no-op cases Fixes: - Fix typo in root= kernel parameter description - Use KERN_INFO for infof()|info_plog()|infofc() - Correct comments of fs_validate_description() - Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep() - Delete macro fsparam_u32hex() - Remove unused and problematic validate_constant_table() - Fix potential unsigned integer underflow in fs_name() - Make file-nr output the total allocated file handles" * tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits) fs: Pass a folio to page_put_link() nfs: Use a folio in nfs_get_link() fs: Convert __page_get_link() to use a folio fs/read_write: make default_llseek() killable fs/open: make do_truncate() killable fs/open: make chmod_common() and chown_common() killable include/linux/fs.h: add inode_lock_killable() readdir: supply dir_context.count as readdir buffer size hint vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations fuse: don't allow signals to interrupt getdents copying Documentation: fix typo in root= kernel parameter description include/cgroup: separate {get,put}_cgroup_ns no-op case kernel/nsproxy: remove unnecessary guards fs: use writeback_iter directly in mpage_writepages fs: remove useless plus one in super_cache_scan() fs: add S_ANON_INODE fs: remove uselib() system call device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() fs/fs_parse: Remove unused and problematic validate_constant_table() fs: touch up predicts in inode_permission() ...
2025-05-26x86/fpu: Fix irq_fpu_usable() to return false during CPU onliningEric Biggers
irq_fpu_usable() incorrectly returned true before the FPU is initialized. The x86 CPU onlining code can call sha256() to checksum AMD microcode images, before the FPU is initialized. Since sha256() recently gained a kernel-mode FPU optimized code path, a crash occurred in kernel_fpu_begin_mask() during hotplug CPU onlining. (The crash did not occur during boot-time CPU onlining, since the optimized sha256() code is not enabled until subsys_initcalls run.) Fix this by making irq_fpu_usable() return false before fpu__init_cpu() has run. To do this without adding any additional overhead to irq_fpu_usable(), replace the existing per-CPU bool in_kernel_fpu with kernel_fpu_allowed which tracks both initialization and usage rather than just usage. The initial state is false; FPU initialization sets it to true; kernel-mode FPU sections toggle it to false and then back to true; and CPU offlining restores it to the initial state of false. Fixes: 11d7956d526f ("crypto: x86/sha256 - implement library instead of shash") Reported-by: Ayush Jain <Ayush.Jain3@amd.com> Closes: https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ayush Jain <Ayush.Jain3@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-23arch/x86: Provide the CPU number in the wakeup AP callbackRoman Kisel
When starting APs, confidential guests and paravisor guests need to know the CPU number, and the pattern of using the linear search has emerged in several places. With N processors that leads to the O(N^2) time complexity. Provide the CPU number in the AP wake up callback so that one can get the CPU number in constant time. Suggested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250507182227.7421-3-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250507182227.7421-3-romank@linux.microsoft.com>
2025-05-23x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap()Roman Kisel
To start an application processor in SNP-isolated guest, a hypercall is used that takes a virtual processor index. The hv_snp_boot_ap() function uses that START_VP hypercall but passes as VP index to it what it receives as a wakeup_secondary_cpu_64 callback: the APIC ID. As those two aren't generally interchangeable, that may lead to hung APs if the VP index and the APIC ID don't match up. Update the parameter names to avoid confusion as to what the parameter is. Use the APIC ID to the VP index conversion to provide the correct input to the hypercall. Cc: stable@vger.kernel.org Fixes: 44676bb9d566 ("x86/hyperv: Add smp support for SEV-SNP guest") Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250507182227.7421-2-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250507182227.7421-2-romank@linux.microsoft.com>
2025-05-23arm64, x86: hyperv: Report the VTL the system boots inRoman Kisel
The hyperv guest code might run in various Virtual Trust Levels. Report the level when the kernel boots in the non-default (0) one. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250428210742.435282-7-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250428210742.435282-7-romank@linux.microsoft.com>
2025-05-23Drivers: hv: Provide arch-neutral implementation of get_vtl()Roman Kisel
To run in the VTL mode, Hyper-V drivers have to know what VTL the system boots in, and the arm64/hyperv code does not have the means to compute that. Refactor the code to hoist the function that detects VTL, make it arch-neutral to be able to employ it to get the VTL on arm64. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/r/20250428210742.435282-5-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250428210742.435282-5-romank@linux.microsoft.com>
2025-05-23Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-22x86/mm/pat: inline memtype_match() into memtype_erase()David Hildenbrand
Let's just have it in a single function. The resulting function is certainly small enough and readable. Link: https://lkml.kernel.org/r/20250512123424.637989-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Airlie <airlied@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22x86/mm/pat: remove MEMTYPE_*_MATCHDavid Hildenbrand
The "memramp() shrinking" scenario no longer applies, so let's remove that now-unnecessary handling. Link: https://lkml.kernel.org/r/20250512123424.637989-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits] Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Airlie <airlied@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22x86/mm/pat: remove strict_prot parameter from reserve_pfn_range()David Hildenbrand
Always set to 0, so let's remove it. Link: https://lkml.kernel.org/r/20250512123424.637989-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits] Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Airlie <airlied@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22x86/mm/pat: remove old pfnmap tracking interfaceDavid Hildenbrand
We can now get rid of the old interface along with get_pat_info() and follow_phys(). Link: https://lkml.kernel.org/r/20250512123424.637989-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits] Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Airlie <airlied@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22mm: introduce pfnmap_track() and pfnmap_untrack() and use them for memremapDavid Hildenbrand
Let's provide variants of track_pfn_remap() and untrack_pfn() that won't mess with VMAs, and replace the usage in mm/memremap.c. Add some documentation. Link: https://lkml.kernel.org/r/20250512123424.637989-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits] Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Airlie <airlied@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22mm: convert track_pfn_insert() to pfnmap_setup_cachemode*()David Hildenbrand
... by factoring it out from track_pfn_remap() into pfnmap_setup_cachemode() and provide pfnmap_setup_cachemode_pfn() as a replacement for track_pfn_insert(). For PMDs/PUDs, we keep checking a single pfn only. Add some documentation, and also document why it is valid to not check the whole pfn range. We'll reuse pfnmap_setup_cachemode() from core MM next. Link: https://lkml.kernel.org/r/20250512123424.637989-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits] Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Airlie <airlied@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22x86/mm/pat: factor out setting cachemode into pgprot_set_cachemode()David Hildenbrand
VM_PAT annoyed me too much and wasted too much of my time, let's clean PAT handling up and remove VM_PAT. This should sort out various issues with VM_PAT we discovered recently, and will hopefully make the whole code more stable and easier to maintain. In essence: we stop letting PAT mode mess with VMAs and instead lift what to track/untrack to the MM core. We remember per VMA which pfn range we tracked in a new struct we attach to a VMA (we have space without exceeding 192 bytes), use a kref to share it among VMAs during split/mremap/fork, and automatically untrack once the kref drops to 0. This implies that we'll keep tracking a full pfn range even after partially unmapping it, until fully unmapping it; but as that case was mostly broken before, this at least makes it work in a way that is least intrusive to VMA handling. Shrinking with mremap() used to work in a hacky way, now we'll similarly keep the original pfn range tacked even after this form of partial unmap. Does anybody care about that? Unlikely. If we run into issues, we could likely handled that (adjust the tracking) when our kref drops to 1 while freeing a VMA. But it adds more complexity, so avoid that for now. Briefly tested with the new pfnmap selftests [1]. This patch (of 11): Let's factor it out to make the code easier to grasp. Drop one comment where it is now rather obvious what is happening. Use it also in pgprot_writecombine()/pgprot_writethrough() where clearing the old cachemode might not be required, but given that we are already doing a function call, no need to care about this micro-optimization. Link: https://lkml.kernel.org/r/20250512123424.637989-1-david@redhat.com Link: https://lkml.kernel.org/r/20250512123424.637989-2-david@redhat.com Link: https://lkml.kernel.org/r/20250509153033.952746-1-david@redhat.com [1] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86 bits] Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21x86/crash: make the page that stores the dm crypt keys inaccessibleCoiby Xu
This adds an addition layer of protection for the saved copy of dm crypt key. Trying to access the saved copy will cause page fault. Link: https://lkml.kernel.org/r/20250502011246.99238-9-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Suggested-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21x86/crash: pass dm crypt keys to kdump kernelCoiby Xu
1st kernel will build up the kernel command parameter dmcryptkeys as similar to elfcorehdr to pass the memory address of the stored info of dm crypt key to kdump kernel. Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21Revert "x86/mm: Remove unused __set_memory_prot()"Coiby Xu
This reverts commit 693bbf2a50447353c6a47961e6a7240a823ace02 as kdump LUKS support (CONFIG_CRASH_DM_CRYPT) depends on __set_memory_prot. [akpm@linux-foundation.org: x86 set_memory.h needs pgtable_types.h] Link: https://lkml.kernel.org/r/20250502011246.99238-7-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21include: pe.h: Fix PE definitionsPali Rohár
* Rename constants to their standard PE names: - MZ_MAGIC -> IMAGE_DOS_SIGNATURE - PE_MAGIC -> IMAGE_NT_SIGNATURE - PE_OPT_MAGIC_PE32_ROM -> IMAGE_ROM_OPTIONAL_HDR_MAGIC - PE_OPT_MAGIC_PE32 -> IMAGE_NT_OPTIONAL_HDR32_MAGIC - PE_OPT_MAGIC_PE32PLUS -> IMAGE_NT_OPTIONAL_HDR64_MAGIC - IMAGE_DLL_CHARACTERISTICS_NX_COMPAT -> IMAGE_DLLCHARACTERISTICS_NX_COMPAT * Import constants and their description from readpe and file projects which contains current up-to-date information: - IMAGE_FILE_MACHINE_* - IMAGE_FILE_* - IMAGE_SUBSYSTEM_* - IMAGE_DLLCHARACTERISTICS_* - IMAGE_DLLCHARACTERISTICS_EX_* - IMAGE_DEBUG_TYPE_* * Add missing IMAGE_SCN_* constants and update their incorrect description * Fix incorrect value of IMAGE_SCN_MEM_PURGEABLE constant * Add description for win32_version and loader_flags PE fields Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>