summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2023-04-25Merge tag 'x86_acpi_for_v6.4_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 ACPI update from Borislav Petkov: - Improve code generation in ACPI's global lock's acquisition function * tag 'x86_acpi_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ACPI/boot: Improve __acpi_acquire_global_lock
2023-04-25Merge tag 'ras_core_for_v6.4_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Just cleanups and fixes this time around: make threshold_ktype const, an objtool fix and use proper size for a bitmap * tag 'ras_core_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE/AMD: Use an u64 for bank_map x86/mce: Always inline old MCA stubs x86/MCE/AMD: Make kobj_type structure constant
2023-04-24Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fget updates from Al Viro: "fget() to fdget() conversions" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse_dev_ioctl(): switch to fdget() cgroup_get_from_fd(): switch to fdget_raw() bpf: switch to fdget_raw() build_mount_idmapped(): switch to fdget() kill the last remaining user of proc_ns_fget() SVM-SEV: convert the rest of fget() uses to fdget() in there convert sgx_set_attribute() to fdget()/fdput() convert setns(2) to fdget()/fdput()
2023-04-24Merge tag 'docs-6.4' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "Commit volume in documentation is relatively low this time, but there is still a fair amount going on, including: - Reorganize the architecture-specific documentation under Documentation/arch This makes the structure match the source directory and helps to clean up the mess that is the top-level Documentation directory a bit. This work creates the new directory and moves x86 and most of the less-active architectures there. The current plan is to move the rest of the architectures in 6.5, with the patches going through the appropriate subsystem trees. - Some more Spanish translations and maintenance of the Italian translation - A new "Kernel contribution maturity model" document from Ted - A new tutorial on quickly building a trimmed kernel from Thorsten Plus the usual set of updates and fixes" * tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits) media: Adjust column width for pdfdocs media: Fix building pdfdocs docs: clk: add documentation to log which clocks have been disabled docs: trace: Fix typo in ftrace.rst Documentation/process: always CC responsible lists docs: kmemleak: adjust to config renaming ELF: document some de-facto PT_* ABI quirks Documentation: arm: remove stih415/stih416 related entries docs: turn off "smart quotes" in the HTML build Documentation: firmware: Clarify firmware path usage docs/mm: Physical Memory: Fix grammar Documentation: Add document for false sharing dma-api-howto: typo fix docs: move m68k architecture documentation under Documentation/arch/ docs: move parisc documentation under Documentation/arch/ docs: move ia64 architecture docs under Documentation/arch/ docs: Move arc architecture docs under Documentation/arch/ docs: move nios2 documentation under Documentation/arch/ docs: move openrisc documentation under Documentation/arch/ docs: move superh documentation under Documentation/arch/ ...
2023-04-24Merge tag 'rcu.6.4.april5.2023.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux Pull RCU updates from Joel Fernandes: - Updates and additions to MAINTAINERS files, with Boqun being added to the RCU entry and Zqiang being added as an RCU reviewer. I have also transitioned from reviewer to maintainer; however, Paul will be taking over sending RCU pull-requests for the next merge window. - Resolution of hotplug warning in nohz code, achieved by fixing cpu_is_hotpluggable() through interaction with the nohz subsystem. Tick dependency modifications by Zqiang, focusing on fixing usage of the TICK_DEP_BIT_RCU_EXP bitmask. - Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n kernels, fixed by Zqiang. - Improvements to rcu-tasks stall reporting by Neeraj. - Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for increased robustness, affecting several components like mac802154, drbd, vmw_vmci, tracing, and more. A report by Eric Dumazet showed that the API could be unknowingly used in an atomic context, so we'd rather make sure they know what they're asking for by being explicit: https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/ - Documentation updates, including corrections to spelling, clarifications in comments, and improvements to the srcu_size_state comments. - Better srcu_struct cache locality for readers, by adjusting the size of srcu_struct in support of SRCU usage by Christoph Hellwig. - Teach lockdep to detect deadlocks between srcu_read_lock() vs synchronize_srcu() contributed by Boqun. Previously lockdep could not detect such deadlocks, now it can. - Integration of rcutorture and rcu-related tools, targeted for v6.4 from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis module parameter, and more - Miscellaneous changes, various code cleanups and comment improvements * tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits) checkpatch: Error out if deprecated RCU API used mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep() ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep() net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep() net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep() tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep() misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep() drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep() rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race rcu/trace: use strscpy() to instead of strncpy() tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem ...
2023-04-24Merge branch 'x86-rep-insns': x86 user copy clarificationsLinus Torvalds
Merge my x86 user copy updates branch. This cleans up a lot of our x86 memory copy code, particularly for user accesses. I've been pushing for microarchitectural support for good memory copying and clearing for a long while, and it's been visible in how the kernel has aggressively used 'rep movs' and 'rep stos' whenever possible. And that micro-architectural support has been improving over the years, to the point where on modern CPU's the best option for a memory copy that would become a function call (as opposed to being something that can just be turned into individual 'mov' instructions) is now to inline the string instruction sequence instead. However, that only makes sense when we have the modern markers for this: the x86 FSRM and FSRS capabilities ("Fast Short REP MOVS/STOS"). So this cleans up a lot of our historical code, gets rid of the legacy marker use ("REP_GOOD" and "ERMS") from the memcpy/memset cases, and replaces it with that modern reality. Note that REP_GOOD and ERMS end up still being used by the known large cases (ie page copyin gand clearing). The reason much of this ends up being about user memory accesses is that the normal in-kernel cases are done by the compiler (__builtin_memcpy() and __builtin_memset()) and getting to the point where we can use our instruction rewriting to inline those to be string instructions will need some compiler support. In contrast, the user accessor functions are all entirely controlled by the kernel code, so we can change those arbitrarily. Thanks to Borislav Petkov for feedback on the series, and Jens testing some of this on micro-architectures I didn't personally have access to. * x86-rep-insns: x86: rewrite '__copy_user_nocache' function x86: remove 'zerorest' argument from __copy_user_nocache() x86: set FSRS automatically on AMD CPUs that have FSRM x86: improve on the non-rep 'copy_user' function x86: improve on the non-rep 'clear_user' function x86: inline the 'rep movs' in user copies for the FSRM case x86: move stac/clac from user copy routines into callers x86: don't use REP_GOOD or ERMS for user memory clearing x86: don't use REP_GOOD or ERMS for user memory copies x86: don't use REP_GOOD or ERMS for small memory clearing x86: don't use REP_GOOD or ERMS for small memory copies
2023-04-21perf/x86/intel/uncore: Add events for Intel SPR IMC PMUStephane Eranian
Add missing clockticks and cas_count_* events for Intel SapphireRapids IMC PMU. These events are useful to measure memory bandwidth. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230419214241.2310385-1-eranian@google.com
2023-04-21Merge branch kvm-arm64/smccc-filtering into kvmarm-master/nextMarc Zyngier
* kvm-arm64/smccc-filtering: : . : SMCCC call filtering and forwarding to userspace, courtesy of : Oliver Upton. From the cover letter: : : "The Arm SMCCC is rather prescriptive in regards to the allocation of : SMCCC function ID ranges. Many of the hypercall ranges have an : associated specification from Arm (FF-A, PSCI, SDEI, etc.) with some : room for vendor-specific implementations. : : The ever-expanding SMCCC surface leaves a lot of work within KVM for : providing new features. Furthermore, KVM implements its own : vendor-specific ABI, with little room for other implementations (like : Hyper-V, for example). Rather than cramming it all into the kernel we : should provide a way for userspace to handle hypercalls." : . KVM: selftests: Fix spelling mistake "KVM_HYPERCAL_EXIT_SMC" -> "KVM_HYPERCALL_EXIT_SMC" KVM: arm64: Test that SMC64 arch calls are reserved KVM: arm64: Prevent userspace from handling SMC64 arch range KVM: arm64: Expose SMC/HVC width to userspace KVM: selftests: Add test for SMCCC filter KVM: selftests: Add a helper for SMCCC calls with SMC instruction KVM: arm64: Let errors from SMCCC emulation to reach userspace KVM: arm64: Return NOT_SUPPORTED to guest for unknown PSCI version KVM: arm64: Introduce support for userspace SMCCC filtering KVM: arm64: Add support for KVM_EXIT_HYPERCALL KVM: arm64: Use a maple tree to represent the SMCCC filter KVM: arm64: Refactor hvc filtering to support different actions KVM: arm64: Start handling SMCs from EL1 KVM: arm64: Rename SMC/HVC call handler to reflect reality KVM: arm64: Add vm fd device attribute accessors KVM: arm64: Add a helper to check if a VM has ran once KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-20SVM-SEV: convert the rest of fget() uses to fdget() in thereAl Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-04-20convert sgx_set_attribute() to fdget()/fdput()Al Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-04-20x86: rewrite '__copy_user_nocache' functionLinus Torvalds
I didn't really want to do this, but as part of all the other changes to the user copy loops, I've been looking at this horror. I tried to clean it up multiple times, but every time I just found more problems, and the way it's written, it's just too hard to fix them. For example, the code is written to do quad-word alignment, and will use regular byte accesses to get to that point. That's fairly simple, but it means that any initial 8-byte alignment will be done with cached copies. However, the code then is very careful to do any 4-byte _tail_ accesses using an uncached 4-byte write, and that was claimed to be relevant in commit a82eee742452 ("x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()"). So if you do a 4-byte copy using that function, it carefully uses a 4-byte 'movnti' for the destination. But if you were to do a 12-byte copy that is 4-byte aligned, it would _not_ do a 4-byte 'movnti' followed by a 8-byte 'movnti' to keep it all uncached. Instead, it would align the destination to 8 bytes using a byte-at-a-time loop, and then do a 8-byte 'movnti' for the final 8 bytes. The main caller that cares is __copy_user_flushcache(), which knows about this insanity, and has odd cases for it all. But I just can't deal with looking at this kind of "it does one case right, and another related case entirely wrong". And the code really wasn't fixable without hard drugs, which I try to avoid. So instead, rewrite it in a form that hopefully not only gets this right, but is a bit more maintainable. Knock wood. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-20um: make stub data pages size tweakableJohannes Berg
There's a lot of code here that hard-codes that the data is a single page, and right now that seems to be sufficient, but to make it easier to change this in the future, add a new STUB_DATA_PAGES constant and use it throughout the code. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2023-04-20crypto: x86/sha - Use local .L symbols for codeArd Biesheuvel
Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/crc32 - Use local .L symbols for codeArd Biesheuvel
Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/aesni - Use local .L symbols for codeArd Biesheuvel
Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/sha256 - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/ghash - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/des3 - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/crc32c - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/cast6 - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/cast5 - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/camellia - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/aria - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/aesni - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. In the GCM case, we can get rid of the oversized permutation array entirely while at it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20crypto: x86/aegis128 - Use RIP-relative addressingArd Biesheuvel
Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-19x86: remove 'zerorest' argument from __copy_user_nocache()Linus Torvalds
Every caller passes in zero, meaning they don't want any partial copy to zero the remainder of the destination buffer. Which is just as well, because the implementation of that function didn't actually even look at that argument, and wasn't even aware it existed, although some misleading comments did mention it still. The 'zerorest' thing is a historical artifact of how "copy_from_user()" worked, in that it would zero the rest of the kernel buffer that it copied into. That zeroing still exists, but it's long since been moved to generic code, and the raw architecture-specific code doesn't do it. See _copy_from_user() in lib/usercopy.c for this all. However, while __copy_user_nocache() shares some history and superficial other similarities with copy_from_user(), it is in many ways also very different. In particular, while the code makes it *look* similar to the generic user copy functions that can copy both to and from user space, and take faults on both reads and writes as a result, __copy_user_nocache() does no such thing at all. __copy_user_nocache() always copies to kernel space, and will never take a page fault on the destination. What *can* happen, though, is that the non-temporal stores take a machine check because one of the use cases is for writing to stable memory, and any memory errors would then take synchronous faults. So __copy_user_nocache() does look a lot like copy_from_user(), but has faulting behavior that is more akin to our old copy_in_user() (which no longer exists, but copied from user space to user space and could fault on both source and destination). And it very much does not have the "zero the end of the destination buffer", since a problem with the destination buffer is very possibly the very source of the partial copy. So this whole thing was just a confusing historical artifact from having shared some code with a completely different function with completely different use cases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: set FSRS automatically on AMD CPUs that have FSRMLinus Torvalds
So Intel introduced the FSRS ("Fast Short REP STOS") CPU capability bit, because they seem to have done the (much simpler) REP STOS optimizations separately and later than the REP MOVS one. In contrast, when AMD introduced support for FSRM ("Fast Short REP MOVS"), in the Zen 3 core, it appears to have improved the REP STOS case at the same time, and since the FSRS bit was added by Intel later, it doesn't show up on those AMD Zen 3 cores. And now that we made use of FSRS for the "rep stos" conditional, that made those AMD machines unnecessarily slower. The Intel situation where "rep movs" is fast, but "rep stos" isn't, is just odd. The 'stos' case is a lot simpler with no aliasing, no mutual alignment issues, no complicated cases. So this just sets FSRS automatically when FSRM is available on AMD machines, to get back all the nice REP STOS goodness in Zen 3. Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: improve on the non-rep 'copy_user' functionLinus Torvalds
The old 'copy_user_generic_unrolled' function was oddly implemented for largely historical reasons: it had been largely based on the uncached copy case, which has some other concerns. For example, the __copy_user_nocache() function uses 'movnti' for the destination stores, and those want the destination to be aligned. In contrast, the regular copy function doesn't really care, and trying to align things only complicates matters. Also, like the clear_user function, the copy function had some odd handling of the repeat counts, complicating the exception handling for no really good reason. So as with clear_user, just write it to keep all the byte counts in the %rcx register, exactly like the 'rep movs' functionality that this replaces. Unlike a real 'rep movs', we do allow for this to trash a few temporary registers to not have to unnecessarily save/restore registers on the stack. And like the clearing case, rename this to what it now clearly is: 'rep_movs_alternative', and make it one coherent function, so that it shows up as such in profiles (instead of the odd split between "copy_user_generic_unrolled" and "copy_user_short_string", the latter of which was not about strings at all, and which was shared with the uncached case). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: improve on the non-rep 'clear_user' functionLinus Torvalds
The old version was oddly written to have the repeat count in multiple registers. So instead of taking advantage of %rax being zero, it had some sub-counts in it. All just for a "single word clearing" loop, which isn't even efficient to begin with. So get rid of those games, and just keep all the state in the same registers we got it in (and that we should return things in). That not only makes this act much more like 'rep stos' (which this function is replacing), but makes it much easier to actually do the obvious loop unrolling. Also rename the function from the now nonsensical 'clear_user_original' to what it now clearly is: 'rep_stos_alternative'. End result: if we don't have a fast 'rep stosb', at least we can have a fast fallback for it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: inline the 'rep movs' in user copies for the FSRM caseLinus Torvalds
This does the same thing for the user copies as commit 0db7058e8e23 ("x86/clear_user: Make it faster") did for clear_user(). In other words, it inlines the "rep movs" case when X86_FEATURE_FSRM is set, avoiding the function call entirely. In order to do that, it makes the calling convention for the out-of-line case ("copy_user_generic_unrolled") match the 'rep movs' calling convention, although it does also end up clobbering a number of additional registers. Also, to simplify code sharing in the low-level assembly with the __copy_user_nocache() function (that uses the normal C calling convention), we end up with a kind of mixed return value for the low-level asm code: it will return the result in both %rcx (to work as an alternative for the 'rep movs' case), _and_ in %rax (for the nocache case). We could avoid this by wrapping __copy_user_nocache() callers in an inline asm, but since the cost is just an extra register copy, it's probably not worth it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: move stac/clac from user copy routines into callersLinus Torvalds
This is preparatory work for inlining the 'rep movs' case, but also a cleanup. The __copy_user_nocache() function was mis-used by the rdma code to do uncached kernel copies that don't actually want user copies at all, and as a result doesn't want the stac/clac either. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: don't use REP_GOOD or ERMS for user memory clearingLinus Torvalds
The modern target to use is FSRS (Fast Short REP STOS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Note! This changes the conditional for the inlining from FSRM ("fast short rep movs") to FSRS ("fast short rep stos"). We'll have a separate fixup for AMD microarchitectures that have a good 'rep stosb' yet do not set the new Intel-specific FSRS bit (because FSRM was there first). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: don't use REP_GOOD or ERMS for user memory copiesLinus Torvalds
The modern target to use is FSRM (Fast Short REP MOVS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: don't use REP_GOOD or ERMS for small memory clearingLinus Torvalds
The modern target to use is FSRS (Fast Short REP STOS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: don't use REP_GOOD or ERMS for small memory copiesLinus Torvalds
The modern target to use is FSRM (Fast Short REP MOVS), and the other cases should only be used for bigger areas (ie mainly things like page copying and clearing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAPAneesh Kumar K.V
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to indicate devdax and hugetlb vmemmap optimization support. Hence rename that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Tarun Sahu <tsahu@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18Change DEFINE_SEMAPHORE() to take a number argumentPeter Zijlstra
Fundamentally semaphores are a counted primitive, but DEFINE_SEMAPHORE() does not expose this and explicitly creates a binary semaphore. Change DEFINE_SEMAPHORE() to take a number argument and use that in the few places that open-coded it using __SEMAPHORE_INITIALIZER(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [mcgrof: add some tribal knowledge about why some folks prefer binary sempahores over mutexes] Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-18x86/hyperv: VTL support for Hyper-VSaurabh Sengar
Virtual Trust Levels (VTL) helps enable Hyper-V Virtual Secure Mode (VSM) feature. VSM is a set of hypervisor capabilities and enlightenments offered to host and guest partitions which enable the creation and management of new security boundaries within operating system software. VSM achieves and maintains isolation through VTLs. Add early initialization for Virtual Trust Levels (VTL). This includes initializing the x86 platform for VTL and enabling boot support for secondary CPUs to start in targeted VTL context. For now, only enable the code for targeted VTL level as 2. When starting an AP at a VTL other than VTL0, the AP must start directly in 64-bit mode, bypassing the usual 16-bit -> 32-bit -> 64-bit mode transition sequence that occurs after waking up an AP with SIPI whose vector points to the 16-bit AP startup trampoline code. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Link: https://lore.kernel.org/r/1681192532-15460-6-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-18x86/hyperv: Make hv_get_nmi_reason publicSaurabh Sengar
Move hv_get_nmi_reason to .h file so it can be used in other modules as well. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1681192532-15460-4-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-18x86/hyperv: Add VTL specific structs and hypercallsSaurabh Sengar
Add structs and hypercalls required to enable VTL support on x86. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Link: https://lore.kernel.org/r/1681192532-15460-3-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-18x86/init: Make get/set_rtc_noop() publicSaurabh Sengar
Make get/set_rtc_noop() to be public so that they can be used in other modules as well. Co-developed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1681192532-15460-2-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-18x86/alternatives: Do not use integer constant suffixes in inline asmWilly Tarreau
The usage of the BIT() macro in inline asm code was introduced in 6.3 by the commit in the Fixes tag. However, this macro uses "1UL" for integer constant suffixes in its shift operation, while gas before 2.28 does not support the "L" suffix after a number, and gas before 2.27 does not support the "U" suffix, resulting in build errors such as the following with such versions: ./arch/x86/include/asm/uaccess_64.h:124: Error: found 'L', expected: ')' ./arch/x86/include/asm/uaccess_64.h:124: Error: junk at end of line, first unrecognized character is `L' However, the currently minimal binutils version the kernel supports is 2.25. There's a single use of this macro here, revert to (1 << 0) that works with such older binutils. As an additional info, the binutils PRs which add support for those suffixes are: https://sourceware.org/bugzilla/show_bug.cgi?id=19910 https://sourceware.org/bugzilla/show_bug.cgi?id=20732 [ bp: Massage and extend commit message. ] Fixes: 5d1dd961e743 ("x86/alternatives: Add alt_instr.flags") Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/lkml/a9aae568-3046-306c-bd71-92c1fc8eeddc@linux.alibaba.com/
2023-04-17x86/hyperv: Exclude lazy TLB mode CPUs from enlightened TLB flushesMichael Kelley
In the case where page tables are not freed, native_flush_tlb_multi() does not do a remote TLB flush on CPUs in lazy TLB mode because the CPU will flush itself at the next context switch. By comparison, the Hyper-V enlightened TLB flush does not exclude CPUs in lazy TLB mode and so performs unnecessary flushes. If we're not freeing page tables, add logic to test for lazy TLB mode when adding CPUs to the input argument to the Hyper-V TLB flush hypercall. Exclude lazy TLB mode CPUs so the behavior matches native_flush_tlb_multi() and the unnecessary flushes are avoided. Handle both the <=64 vCPU case and the _ex case for >64 vCPUs. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1679922967-26582-3-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17x86/hyperv: Add callback filter to cpumask_to_vpset()Michael Kelley
When copying CPUs from a Linux cpumask to a Hyper-V VPset, cpumask_to_vpset() currently has a "_noself" variant that doesn't copy the current CPU to the VPset. Generalize this variant by replacing it with a "_skip" variant having a callback function that is invoked for each CPU to decide if that CPU should be copied. Update the one caller of cpumask_to_vpset_noself() to use the new "_skip" variant instead. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1679922967-26582-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17PCI: hv: Enable PCI pass-thru devices in Confidential VMsMichael Kelley
For PCI pass-thru devices in a Confidential VM, Hyper-V requires that PCI config space be accessed via hypercalls. In normal VMs, config space accesses are trapped to the Hyper-V host and emulated. But in a confidential VM, the host can't access guest memory to decode the instruction for emulation, so an explicit hypercall must be used. Add functions to make the new MMIO read and MMIO write hypercalls. Update the PCI config space access functions to use the hypercalls when such use is indicated by Hyper-V flags. Also, set the flag to allow the Hyper-V PCI driver to be loaded and used in a Confidential VM (a.k.a., "Isolation VM"). The driver has previously been hardened against a malicious Hyper-V host[1]. [1] https://lore.kernel.org/all/20220511223207.3386-2-parri.andrea@gmail.com/ Co-developed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-13-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17Drivers: hv: Don't remap addresses that are above shared_gpa_boundaryMichael Kelley
With the vTOM bit now treated as a protection flag and not part of the physical address, avoid remapping physical addresses with vTOM set since technically such addresses aren't valid. Use ioremap_cache() instead of memremap() to ensure that the mapping provides decrypted access, which will correctly set the vTOM bit as a protection flag. While this change is not required for correctness with the current implementation of memremap(), for general code hygiene it's better to not depend on the mapping functions doing something reasonable with a physical address that is out-of-range. While here, fix typos in two error messages. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-12-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17hv_netvsc: Remove second mapping of send and recv buffersMichael Kelley
With changes to how Hyper-V guest VMs flip memory between private (encrypted) and shared (decrypted), creating a second kernel virtual mapping for shared memory is no longer necessary. Everything needed for the transition to shared is handled by set_memory_decrypted(). As such, remove the code to create and manage the second mapping for the pre-allocated send and recv buffers. This mapping is the last user of hv_map_memory()/hv_unmap_memory(), so delete these functions as well. Finally, hv_map_memory() is the last user of vmap_pfn() in Hyper-V guest code, so remove the Kconfig selection of VMAP_PFN. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-11-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17swiotlb: Remove bounce buffer remapping for Hyper-VMichael Kelley
With changes to how Hyper-V guest VMs flip memory between private (encrypted) and shared (decrypted), creating a second kernel virtual mapping for shared memory is no longer necessary. Everything needed for the transition to shared is handled by set_memory_decrypted(). As such, remove swiotlb_unencrypted_base and the associated code. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1679838727-87310-8-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17Merge remote-tracking branch 'tip/x86/sev' into hyperv-nextWei Liu
Merge the following 6 patches from tip/x86/sev, which are taken from Michael Kelley's series [0]. The rest of Michael's series depend on them. x86/hyperv: Change vTOM handling to use standard coco mechanisms init: Call mem_encrypt_init() after Hyper-V hypercall init is done x86/mm: Handle decryption/re-encryption of bss_decrypted consistently Drivers: hv: Explicitly request decrypted in vmap_pfn() calls x86/hyperv: Reorder code to facilitate future work x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM 0: https://lore.kernel.org/linux-hyperv/1679838727-87310-1-git-send-email-mikelley@microsoft.com/
2023-04-16Merge tag 'x86_urgent_for_v6.3_rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Drop __init annotation from two rtc functions which get called after boot is done, in order to prevent a crash * tag 'x86_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/rtc: Remove __init for runtime functions