summaryrefslogtreecommitdiff
path: root/arch/x86/entry
AgeCommit message (Collapse)Author
2025-05-26Merge tag 'x86-entry-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "Two changes to simplify the x86 vDSO code a bit" * tag 'x86-entry-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Remove redundant #ifdeffery around in_ia32_syscall() x86/vdso: Remove #ifdeffery around page setup variants
2025-05-17x86/mm/64: Make 5-level paging support unconditionalKirill A. Shutemov
Both Intel and AMD CPUs support 5-level paging, which is expected to become more widely adopted in the future. All major x86 Linux distributions have the feature enabled. Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com
2025-05-09x86/its: Align RETs in BHB clear sequence to avoid thunkingPawan Gupta
The software mitigation for BHI is to execute BHB clear sequence at syscall entry, and possibly after a cBPF program. ITS mitigation thunks RETs in the lower half of the cacheline. This causes the RETs in the BHB clear sequence to be thunked as well, adding unnecessary branches to the BHB clear sequence. Since the sequence is in hot path, align the RET instructions in the sequence to avoid thunking. This is how disassembly clear_bhb_loop() looks like after this change: 0x44 <+4>: mov $0x5,%ecx 0x49 <+9>: call 0xffffffff81001d9b <clear_bhb_loop+91> 0x4e <+14>: jmp 0xffffffff81001de5 <clear_bhb_loop+165> 0x53 <+19>: int3 ... 0x9b <+91>: call 0xffffffff81001dce <clear_bhb_loop+142> 0xa0 <+96>: ret 0xa1 <+97>: int3 ... 0xce <+142>: mov $0x5,%eax 0xd3 <+147>: jmp 0xffffffff81001dd6 <clear_bhb_loop+150> 0xd5 <+149>: nop 0xd6 <+150>: sub $0x1,%eax 0xd9 <+153>: jne 0xffffffff81001dd3 <clear_bhb_loop+147> 0xdb <+155>: sub $0x1,%ecx 0xde <+158>: jne 0xffffffff81001d9b <clear_bhb_loop+91> 0xe0 <+160>: ret 0xe1 <+161>: int3 0xe2 <+162>: int3 0xe3 <+163>: int3 0xe4 <+164>: int3 0xe5 <+165>: lfence 0xe8 <+168>: pop %rbp 0xe9 <+169>: ret Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-04-22x86/vdso: Remove redundant #ifdeffery around in_ia32_syscall()Thomas Weißschuh
The #ifdefs only guard code that is also guarded by in_ia32_syscall(), which already contains the same #ifdef itself. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20240910-x86-vdso-ifdef-v1-2-877c9df9b081@linutronix.de
2025-04-22x86/vdso: Remove #ifdeffery around page setup variantsThomas Weißschuh
Replace the open-coded ifdefs in C sources files with IS_ENABLED(). This makes the code easier to read and enables the compiler to typecheck also the disabled parts, before optimizing them away. To make this work, also remove the ifdefs from declarations of used variables. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20240910-x86-vdso-ifdef-v1-1-877c9df9b081@linutronix.de
2025-04-09x86/bugs: Use SBPB in write_ibpb() if applicableJosh Poimboeuf
write_ibpb() does IBPB, which (among other things) flushes branch type predictions on AMD. If the CPU has SRSO_NO, or if the SRSO mitigation has been disabled, branch type flushing isn't needed, in which case the lighter-weight SBPB can be used. The 'x86_pred_cmd' variable already keeps track of whether IBPB or SBPB should be used. Use that instead of hardcoding IBPB. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/17c5dcd14b29199b75199d67ff7758de9d9a4928.1744148254.git.jpoimboe@kernel.org
2025-04-09x86/bugs: Rename entry_ibpb() to write_ibpb()Josh Poimboeuf
There's nothing entry-specific about entry_ibpb(). In preparation for calling it from elsewhere, rename it to write_ibpb(). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/1e54ace131e79b760de3fe828264e26d0896e3ac.1744148254.git.jpoimboe@kernel.org
2025-04-01mseal sysmap: enable x86-64Jeff Xu
Provide support for CONFIG_MSEAL_SYSTEM_MAPPINGS on x86-64, covering the vdso, vvar, vvar_vclock. Production release testing passes on Android and Chrome OS. Link: https://lkml.kernel.org/r/20250305021711.3867874-4-jeffxu@google.com Signed-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Berg <benjamin@sipsolutions.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Elliot Hughes <enh@google.com> Cc: Florian Faineli <f.fainelli@gmail.com> Cc: Greg Ungerer <gerg@kernel.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Waleij <linus.walleij@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <mike.rapoport@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Röttger <sroettger@google.com> Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-30Merge tag 'x86-urgent-2025-03-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes and updates from Ingo Molnar: - Fix a large number of x86 Kconfig dependency and help text accuracy bugs/problems, by Mateusz Jończyk and David Heideberg - Fix a VM_PAT interaction with fork() crash. This also touches core kernel code - Fix an ORC unwinder bug for interrupt entries - Fixes and cleanups - Fix an AMD microcode loader bug that can promote verification failures into success - Add early-printk support for MMIO based UARTs on an x86 board that had no other serial debugging facility and also experienced early boot crashes * tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Fix __apply_microcode_amd()'s return value x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() x86/fpu: Update the outdated comment above fpstate_init_user() x86/early_printk: Add support for MMIO-based UARTs x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1 x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text x86/Kconfig: Correct X86_X2APIC help text x86/speculation: Remove the extra #ifdef around CALL_NOSPEC x86/Kconfig: Document release year of glibc 2.3.3 x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32 x86/Kconfig: Document CONFIG_PCI_MMCONFIG x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE x86/Kconfig: Enable X86_X2APIC by default and improve help text
2025-03-26Merge tag 'sysctl-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move vm_table members out of kernel/sysctl.c All vm_table array members have moved to their respective subsystems leading to the removal of vm_table from kernel/sysctl.c. This increases modularity by placing the ctl_tables closer to where they are actually used and at the same time reducing the chances of merge conflicts in kernel/sysctl.c. - ctl_table range fixes Replace the proc_handler function that checks variable ranges in coredump_sysctls and vdso_table with the one that actually uses the extra{1,2} pointers as min/max values. This tightens the range of the values that users can pass into the kernel effectively preventing {under,over}flows. - Misc fixes Correct grammar errors and typos in test messages. Update sysctl files in MAINTAINERS. Constified and removed array size in declaration for alignment_tbl * tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits) selftests/sysctl: fix wording of help messages selftests: fix spelling/grammar errors in sysctl/sysctl.sh MAINTAINERS: Update sysctl file list in MAINTAINERS sysctl: Fix underflow value setting risk in vm_table coredump: Fixes core_pipe_limit sysctl proc_handler sysctl: remove unneeded include sysctl: remove the vm_table sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c fs: dcache: move the sysctl to fs/dcache.c sunrpc: simplify rpcauth_cache_shrink_count() fs: drop_caches: move sysctl to fs/drop_caches.c fs: fs-writeback: move sysctl to fs/fs-writeback.c mm: nommu: move sysctl to mm/nommu.c security: min_addr: move sysctl to security/min_addr.c mm: mmap: move sysctl to mm/mmap.c mm: util: move sysctls to mm/util.c mm: vmscan: move vmscan sysctls to mm/vmscan.c mm: swap: move sysctl to mm/swap.c mm: filemap: move sysctl to mm/filemap.c ...
2025-03-25Merge tag 'timers-vdso-2025-03-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull VDSO infrastructure updates from Thomas Gleixner: - Consolidate the VDSO storage The VDSO data storage and data layout has been largely architecture specific for historical reasons. That increases the maintenance effort and causes inconsistencies over and over. There is no real technical reason for architecture specific layouts and implementations. The architecture specific details can easily be integrated into a generic layout, which also reduces the amount of duplicated code for managing the mappings. Convert all architectures over to a unified layout and common mapping infrastructure. This splits the VDSO data layout into subsystem specific blocks, timekeeping, random and architecture parts, which provides a better structure and allows to improve and update the functionalities without conflict and interaction. - Rework the timekeeping data storage The current implementation is designed for exposing system timekeeping accessors, which was good enough at the time when it was designed. PTP and Time Sensitive Networking (TSN) change that as there are requirements to expose independent PTP clocks, which are not related to system timekeeping. Replace the monolithic data storage by a structured layout, which allows to add support for independent PTP clocks on top while reusing both the data structures and the time accessor implementations. * tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) sparc/vdso: Always reject undefined references during linking x86/vdso: Always reject undefined references during linking vdso: Rework struct vdso_time_data and introduce struct vdso_clock vdso: Move architecture related data before basetime data powerpc/vdso: Prepare introduction of struct vdso_clock arm64/vdso: Prepare introduction of struct vdso_clock x86/vdso: Prepare introduction of struct vdso_clock time/namespace: Prepare introduction of struct vdso_clock vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct vdso/vsyscall: Prepare introduction of struct vdso_clock vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock vdso/gettimeofday: Prepare introduction of struct vdso_clock vdso/helpers: Prepare introduction of struct vdso_clock vdso/datapage: Define vdso_clock to prepare for multiple PTP clocks vdso: Make vdso_time_data cacheline aligned arm64: Make asm/cache.h compatible with vDSO ...
2025-03-25x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1Jann Horn
PUSH_REGS with save_ret=1 is used by interrupt entry helper functions that initially start with a UNWIND_HINT_FUNC ORC state. However, save_ret=1 means that we clobber the helper function's return address (and then later restore the return address further down on the stack); after that point, the only thing on the stack we can unwind through is the IRET frame, so use UNWIND_HINT_IRET_REGS until we have a full pt_regs frame. ( An alternate approach would be to move the pt_regs->di overwrite down such that it is the final step of pt_regs setup; but I don't want to rearrange entry code just to make unwinding a tiny bit more elegant. ) Fixes: 9e809d15d6b6 ("x86/entry: Reduce the code footprint of the 'idtentry' macro") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250325-2025-03-unwind-fixes-v1-1-acd774364768@google.com
2025-03-24Merge tag 'x86-core-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "x86 CPU features support: - Generate the <asm/cpufeaturemasks.h> header based on build config (H. Peter Anvin, Xin Li) - x86 CPUID parsing updates and fixes (Ahmed S. Darwish) - Introduce the 'setcpuid=' boot parameter (Brendan Jackman) - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan Jackman) - Utilize CPU-type for CPU matching (Pawan Gupta) - Warn about unmet CPU feature dependencies (Sohil Mehta) - Prepare for new Intel Family numbers (Sohil Mehta) Percpu code: - Standardize & reorganize the x86 percpu layout and related cleanups (Brian Gerst) - Convert the stackprotector canary to a regular percpu variable (Brian Gerst) - Add a percpu subsection for cache hot data (Brian Gerst) - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak) - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak) MM: - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction (Rik van Riel) - Rework ROX cache to avoid writable copy (Mike Rapoport) - PAT: restore large ROX pages after fragmentation (Kirill A. Shutemov, Mike Rapoport) - Make memremap(MEMREMAP_WB) map memory as encrypted by default (Kirill A. Shutemov) - Robustify page table initialization (Kirill A. Shutemov) - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn) - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW (Matthew Wilcox) KASLR: - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir Singh) CPU bugs: - Implement FineIBT-BHI mitigation (Peter Zijlstra) - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta) - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta) - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta) System calls: - Break up entry/common.c (Brian Gerst) - Move sysctls into arch/x86 (Joel Granados) Intel LAM support updates: (Maciej Wieczor-Retman) - selftests/lam: Move cpu_has_la57() to use cpuinfo flag - selftests/lam: Skip test if LAM is disabled - selftests/lam: Test get_user() LAM pointer handling AMD SMN access updates: - Add SMN offsets to exclusive region access (Mario Limonciello) - Add support for debugfs access to SMN registers (Mario Limonciello) - Have HSMP use SMN through AMD_NODE (Yazen Ghannam) Power management updates: (Patryk Wlazlyn) - Allow calling mwait_play_dead with an arbitrary hint - ACPI/processor_idle: Add FFH state handling - intel_idle: Provide the default enter_dead() handler - Eliminate mwait_play_dead_cpuid_hint() Build system: - Raise the minimum GCC version to 8.1 (Brian Gerst) - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor) Kconfig: (Arnd Bergmann) - Add cmpxchg8b support back to Geode CPUs - Drop 32-bit "bigsmp" machine support - Rework CONFIG_GENERIC_CPU compiler flags - Drop configuration options for early 64-bit CPUs - Remove CONFIG_HIGHMEM64G support - Drop CONFIG_SWIOTLB for PAE - Drop support for CONFIG_HIGHPTE - Document CONFIG_X86_INTEL_MID as 64-bit-only - Remove old STA2x11 support - Only allow CONFIG_EISA for 32-bit Headers: - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers (Thomas Huth) Assembly code & machine code patching: - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf) - x86/alternatives: Simplify callthunk patching (Peter Zijlstra) - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf) - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf) - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra) - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> (Uros Bizjak) - Use named operands in inline asm (Uros Bizjak) - Improve performance by using asm_inline() for atomic locking instructions (Uros Bizjak) Earlyprintk: - Harden early_serial (Peter Zijlstra) NMI handler: - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() (Waiman Long) Miscellaneous fixes and cleanups: - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner, Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly Kuznetsov, Xin Li, liuye" * tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits) zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault x86/asm: Make asm export of __ref_stack_chk_guard unconditional x86/mm: Only do broadcast flush from reclaim if pages were unmapped perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones perf/x86/intel, x86/cpu: Simplify Intel PMU initialization x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions x86/asm: Use asm_inline() instead of asm() in clwb() x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h> x86/hweight: Use asm_inline() instead of asm() x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm() x86/hweight: Use named operands in inline asm() x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP x86/head/64: Avoid Clang < 17 stack protector in startup code x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro x86/cpu/intel: Limit the non-architectural constant_tsc model checks x86/mm/pat: Replace Intel x86_model checks with VFM ones x86/cpu/intel: Fix fast string initialization for extended Families ...
2025-03-24Merge tag 'vfs-6.15-rc1.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Mount notifications The day has come where we finally provide a new api to listen for mount topology changes outside of /proc/<pid>/mountinfo. A mount namespace file descriptor can be supplied and registered with fanotify to listen for mount topology changes. Currently notifications for mount, umount and moving mounts are generated. The generated notification record contains the unique mount id of the mount. The listmount() and statmount() api can be used to query detailed information about the mount using the received unique mount id. This allows userspace to figure out exactly how the mount topology changed without having to generating diffs of /proc/<pid>/mountinfo in userspace. - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount api - Support detached mounts in overlayfs Since last cycle we support specifying overlayfs layers via file descriptors. However, we don't allow detached mounts which means userspace cannot user file descriptors received via open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to attach them to a mount namespace via move_mount() first. This is cumbersome and means they have to undo mounts via umount(). Allow them to directly use detached mounts. - Allow to retrieve idmappings with statmount Currently it isn't possible to figure out what idmapping has been attached to an idmapped mount. Add an extension to statmount() which allows to read the idmapping from the mount. - Allow creating idmapped mounts from mounts that are already idmapped So far it isn't possible to allow the creation of idmapped mounts from already idmapped mounts as this has significant lifetime implications. Make the creation of idmapped mounts atomic by allow to pass struct mount_attr together with the open_tree_attr() system call allowing to solve these issues without complicating VFS lookup in any way. The system call has in general the benefit that creating a detached mount and applying mount attributes to it becomes an atomic operation for userspace. - Add a way to query statmount() for supported options Allow userspace to query which mount information can be retrieved through statmount(). - Allow superblock owners to force unmount * tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits) umount: Allow superblock owners to force umount selftests: add tests for mount notification selinux: add FILE__WATCH_MOUNTNS samples/vfs: fix printf format string for size_t fs: allow changing idmappings fs: add kflags member to struct mount_kattr fs: add open_tree_attr() fs: add copy_mount_setattr() helper fs: add vfs_open_tree() helper statmount: add a new supported_mask field samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP selftests: add tests for using detached mount with overlayfs samples/vfs: check whether flag was raised statmount: allow to retrieve idmappings uidgid: add map_id_range_up() fs: allow detached mounts in clone_private_mount() selftests/overlayfs: test specifying layers as O_PATH file descriptors fs: support O_PATH fds with FSCONFIG_SET_FD vfs: add notifications for mount attach and detach fanotify: notify on mount attach and detach ...
2025-03-19x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headersThomas Huth
While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with UAPI headers that rather should use __ASSEMBLER__ instead. So let's standardize on the __ASSEMBLER__ macro that is provided by the compilers now. This is mostly a mechanical patch (done with a simple "sed -i" statement), with some manual tweaks in <asm/frame.h>, <asm/hw_irq.h> and <asm/setup.h> that mentioned this macro in comments with some missing underscores. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250314071013.1575167-38-thuth@redhat.com
2025-03-19x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMPIngo Molnar
The __ref_stack_chk_guard symbol doesn't exist on UP: <stdin>:4:15: error: ‘__ref_stack_chk_guard’ undeclared here (not in a function) Fix the #ifdef around the entry.S export. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250123190747.745588-8-brgerst@gmail.com
2025-03-19x86/syscall/32: Add comment to conditionalBrian Gerst
Add a CONFIG_X86_FRED comment, since this conditional is nested. Suggested-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250314151220.862768-8-brgerst@gmail.com
2025-03-19x86/syscall: Remove stray semicolonsBrian Gerst
No functional change. Suggested-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250314151220.862768-7-brgerst@gmail.com
2025-03-19x86/syscall: Move sys_ni_syscall()Brian Gerst
Move sys_ni_syscall() to kernel/process.c, and remove the now empty entry/common.c No functional changes. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250314151220.862768-6-brgerst@gmail.com
2025-03-19x86/syscall/x32: Move x32 syscall tableBrian Gerst
Since commit: 2e958a8a510d ("x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*") the ABI prefix for x32 syscalls is the same as native 64-bit syscalls. Move the x32 syscall table to syscall_64.c No functional changes. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250314151220.862768-5-brgerst@gmail.com
2025-03-19x86/syscall/64: Move 64-bit syscall dispatch codeBrian Gerst
Move the 64-bit syscall dispatch code to syscall_64.c. No functional changes. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250314151220.862768-4-brgerst@gmail.com
2025-03-19x86/syscall/32: Move 32-bit syscall dispatch codeBrian Gerst
Move the 32-bit syscall dispatch code to syscall_32.c. No functional changes. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250314151220.862768-3-brgerst@gmail.com
2025-03-19x86/xen: Move Xen upcall handlerBrian Gerst
Move the upcall handler to Xen-specific files. No functional changes. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20250314151220.862768-2-brgerst@gmail.com
2025-03-19x86/entry: Add __init to ia32_emulation_override_cmdline()Vitaly Kuznetsov
ia32_emulation_override_cmdline() is an early_param() arg and these are only needed at boot time. In fact, all other early_param() functions in arch/x86 seem to have '__init' annotation and ia32_emulation_override_cmdline() is the only exception. Fixes: a11e097504ac ("x86: Make IA32_EMULATION boot time configurable") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/all/20241210151650.1746022-1-vkuznets%40redhat.com
2025-03-11x86/vdso: Always reject undefined references during linkingThomas Weißschuh
Instead of using a custom script to detect and fail on undefined references, use --no-undefined for all VDSO linker invocations. Drop the now unused checkundef.sh script. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250306-vdso-checkundef-v2-1-a26cc315fd73@linutronix.de
2025-03-04x86/percpu: Move top_of_stack to percpu hot sectionBrian Gerst
No functional change. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250303165246.2175811-9-brgerst@gmail.com
2025-03-04Merge branch 'x86/asm' into x86/core, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-04Merge branch 'x86/cpu' into x86/asm, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-04Merge branch 'x86/urgent' into x86/cpu, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-03Merge tag 'v6.14-rc5' into x86/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-25x86/entry: Fix kernel-doc warningDaniel Sneddon
The do_int80_emulation() function is missing a kernel-doc formatted description of its argument. This is causing a warning when building with W=1. Add a brief description of the argument to satisfy kernel-doc. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20241219155227.685692-1-daniel.sneddon@linux.intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202412131236.a5HhOqXo-lkp@intel.com/
2025-02-21x86/arch_prctl: Simplify sys_arch_prctl()Brian Gerst
Use in_ia32_syscall() instead of a compat syscall entry. No change in functionality intended. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250202202323.422113-2-brgerst@gmail.com
2025-02-21x86/vdso/vdso2c: Remove page handlingThomas Weißschuh
The values are not used anymore. Also the sanity checks performed by vdso2c can never trigger as they only validate invariants already enforced by the linker script. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-16-13a4669dfc8c@linutronix.de
2025-02-21x86/vdso: Switch to generic storage implementationThomas Weißschuh
The generic storage implementation provides the same features as the custom one. However it can be shared between architectures, making maintenance easier. This switch also moves the random state data out of the time data page. The currently used hardcoded __VDSO_RND_DATA_OFFSET does not take into account changes to the time data page layout. Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-15-13a4669dfc8c@linutronix.de
2025-02-21vdso: Rename included MakefileThomas Weißschuh
As the Makefile is included into other Makefiles it can not be used to define objects to be built from the current source directory. However the generic datastore will introduce such a local source file. Rename the included Makefile so it is clear how it is to be used and to make room for a regular Makefile in lib/vdso/. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-4-13a4669dfc8c@linutronix.de
2025-02-21x86/vdso: Fix latent bug in vclock_pages calculationThomas Weißschuh
The vclock pages are *after* the non-vclock pages. Currently there are both two vclock and two non-vclock pages so the existing logic works by accident. As soon as the number of pages changes it will break however. This will be the case with the introduction of the generic vDSO data storage. Use a macro to keep the calculation understandable and in sync between the linker script and mapping code. Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-1-13a4669dfc8c@linutronix.de
2025-02-18x86/stackprotector/64: Convert to normal per-CPU variableBrian Gerst
Older versions of GCC fixed the location of the stack protector canary at %gs:40. This constraint forced the percpu section to be linked at absolute address 0 so that the canary could be the first data object in the percpu section. Supporting the zero-based percpu section requires additional code to handle relocations for RIP-relative references to percpu data, extra complexity to kallsyms, and workarounds for linker bugs due to the use of absolute symbols. GCC 8.1 supports redefining where the canary is located, allowing it to become a normal percpu variable instead of at a fixed location. This removes the constraint that the percpu section must be zero-based. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-8-brgerst@gmail.com
2025-02-14x86/cfi: Clean up linkagePeter Zijlstra
With the introduction of kCFI the addition of ENDBR to SYM_FUNC_START* no longer suffices to make the function indirectly callable. This now requires the use of SYM_TYPED_FUNC_START. As such, remove the implicit ENDBR from SYM_FUNC_START* and add some explicit annotations to fix things up again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20250207122546.409116003@infradead.org
2025-02-12fs: add open_tree_attr()Christian Brauner
Add open_tree_attr() which allow to atomically create a detached mount tree and set mount options on it. If OPEN_TREE_CLONE is used this will allow the creation of a detached mount with a new set of mount options without it ever being exposed to userspace without that set of mount options applied. Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-3-c25feb0d2eb3@kernel.org Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.cKaixiong Yu
When CONFIG_X86_32 is defined and CONFIG_UML is not defined, vdso_enabled belongs to arch/x86/entry/vdso/vdso32-setup.c. So, move it into its own file. Before this patch, vdso_enabled was allowed to be set to a value exceeding 1 on x86_32 architecture. After this patch is applied, vdso_enabled is not permitted to set the value more than 1. It does not matter, because according to the function load_vdso32(), only vdso_enabled is set to 1, VDSO would be enabled. Other values all mean "disabled". The same limitation could be seen in the function vdso32_setup(). Signed-off-by: Kaixiong Yu <yukaixiong@huawei.com> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-02-03Revert "x86/module: prepare module loading for ROX allocations of text"Mike Rapoport (Microsoft)
The module code does not create a writable copy of the executable memory anymore so there is no need to handle it in module relocation and alternatives patching. This reverts commit 9bfc4824fd4836c16bb44f922bfaffba5da3e4f3. Signed-off-by: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250126074733.1384926-8-rppt@kernel.org
2025-01-28treewide: const qualify ctl_tables where applicableJoel Granados
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2024-12-02objtool: Fix ANNOTATE_REACHABLE to be a normal annotationPeter Zijlstra
Currently REACHABLE is weird for being on the instruction after the instruction it modifies. Since all REACHABLE annotations have an explicit instruction, flip them around. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20241128094312.494176035@infradead.org
2024-12-02objtool: Convert {.UN}REACHABLE to ANNOTATEPeter Zijlstra
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20241128094312.353431347@infradead.org
2024-11-23Merge tag 'mm-stable-2024-11-18-19-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...
2024-11-19Merge tag 'timers-vdso-2024-11-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vdso data page handling updates from Thomas Gleixner: "First steps of consolidating the VDSO data page handling. The VDSO data page handling is architecture specific for historical reasons, but there is no real technical reason to do so. Aside of that VDSO data has become a dump ground for various mechanisms and fail to provide a clear separation of the functionalities. Clean this up by: - consolidating the VDSO page data by getting rid of architecture specific warts especially in x86 and PowerPC. - removing the last includes of header files which are pulling in other headers outside of the VDSO namespace. - seperating timekeeping and other VDSO data accordingly. Further consolidation of the VDSO page handling is done in subsequent changes scheduled for the next merge window. This also lays the ground for expanding the VDSO time getters for independent PTP clocks in a generic way without making every architecture add support seperately" * tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) x86/vdso: Add missing brackets in switch case vdso: Rename struct arch_vdso_data to arch_vdso_time_data powerpc: Split systemcfg struct definitions out from vdso powerpc: Split systemcfg data out of vdso data page powerpc: Add kconfig option for the systemcfg page powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors powerpc/pseries/lparcfg: Fix printing of system_active_processors powerpc/procfs: Propagate error of remap_pfn_range() powerpc/vdso: Remove offset comment from 32bit vdso_arch_data x86/vdso: Split virtual clock pages into dedicated mapping x86/vdso: Delete vvar.h x86/vdso: Access vdso data without vvar.h x86/vdso: Move the rng offset to vsyscall.h x86/vdso: Access rng vdso data without vvar.h x86/vdso: Access timens vdso data without vvar.h x86/vdso: Allocate vvar page from C code x86/vdso: Access rng data from kernel without vvar x86/vdso: Place vdso_data at beginning of vvar page x86/vdso: Use __arch_get_vdso_data() to access vdso data x86/mm/mmap: Remove arch_vma_name() ...
2024-11-18Merge tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull xattr updates from Al Viro: "Sanitize xattr and io_uring interactions with it, add *xattrat() syscalls, sanitize struct filename handling in there" * tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xattr: remove redundant check on variable err fs/xattr: add *at family syscalls new helpers: file_removexattr(), filename_removexattr() new helpers: file_listxattr(), filename_listxattr() replace do_getxattr() with saner helpers. replace do_setxattr() with saner helpers. new helper: import_xattr_name() fs: rename struct xattr_ctx to kernel_xattr_ctx xattr: switch to CLASS(fd) io_[gs]etxattr_prep(): just use getname() io_uring: IORING_OP_F[GS]ETXATTR is fine with REQ_F_FIXED_FILE getname_maybe_null() - the third variant of pathname copy-in teach filename_lookup() to treat NULL filename as ""
2024-11-08x86/stackprotector: Work around strict Clang TLS symbol requirementsArd Biesheuvel
GCC and Clang both implement stack protector support based on Thread Local Storage (TLS) variables, and this is used in the kernel to implement per-task stack cookies, by copying a task's stack cookie into a per-CPU variable every time it is scheduled in. Both now also implement -mstack-protector-guard-symbol=, which permits the TLS variable to be specified directly. This is useful because it will allow to move away from using a fixed offset of 40 bytes into the per-CPU area on x86_64, which requires a lot of special handling in the per-CPU code and the runtime relocation code. However, while GCC is rather lax in its implementation of this command line option, Clang actually requires that the provided symbol name refers to a TLS variable (i.e., one declared with __thread), although it also permits the variable to be undeclared entirely, in which case it will use an implicit declaration of the right type. The upshot of this is that Clang will emit the correct references to the stack cookie variable in most cases, e.g., 10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax 10f: R_386_32 __stack_chk_guard However, if a non-TLS definition of the symbol in question is visible in the same compilation unit (which amounts to the whole of vmlinux if LTO is enabled), it will drop the per-CPU prefix and emit a load from a bogus address. Work around this by using a symbol name that never occurs in C code, and emit it as an alias in the linker script. Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1854 Link: https://lore.kernel.org/r/20241105155801.1779119-2-brgerst@gmail.com
2024-11-07x86/module: prepare module loading for ROX allocations of textMike Rapoport (Microsoft)
When module text memory will be allocated with ROX permissions, the memory at the actual address where the module will live will contain invalid instructions and there will be a writable copy that contains the actual module code. Update relocations and alternatives patching to deal with it. [rppt@kernel.org: fix writable address in cfi_rewrite_endbr()] Link: https://lkml.kernel.org/r/ZysRwR29Ji8CcbXc@kernel.org Link: https://lkml.kernel.org/r/20241023162711.2579610-7-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06fs/xattr: add *at family syscallsChristian Göttsche
Add the four syscalls setxattrat(), getxattrat(), listxattrat() and removexattrat(). Those can be used to operate on extended attributes, especially security related ones, either relative to a pinned directory or on a file descriptor without read access, avoiding a /proc/<pid>/fd/<fd> detour, requiring a mounted procfs. One use case will be setfiles(8) setting SELinux file contexts ("security.selinux") without race conditions and without a file descriptor opened with read access requiring SELinux read permission. Use the do_{name}at() pattern from fs/open.c. Pass the value of the extended attribute, its length, and for setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added struct xattr_args to not exceed six syscall arguments and not merging the AT_* and XATTR_* flags. [AV: fixes by Christian Brauner folded in, the entire thing rebased on top of {filename,file}_...xattr() primitives, treatment of empty pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling is cheap, so f...(2) can use it] Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Brauner <brauner@kernel.org> CC: x86@kernel.org CC: linux-alpha@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-arm-kernel@lists.infradead.org CC: linux-ia64@vger.kernel.org CC: linux-m68k@lists.linux-m68k.org CC: linux-mips@vger.kernel.org CC: linux-parisc@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org CC: linux-sh@vger.kernel.org CC: sparclinux@vger.kernel.org CC: linux-fsdevel@vger.kernel.org CC: audit@vger.kernel.org CC: linux-arch@vger.kernel.org CC: linux-api@vger.kernel.org CC: linux-security-module@vger.kernel.org CC: selinux@vger.kernel.org [brauner: slight tweaks] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>