summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2020-12-15mm: speedup mremap on 1GB or larger regionsKalesh Singh
Android needs to move large memory regions for garbage collection. The GC requires moving physical pages of multi-gigabyte heap using mremap. During this move, the application threads have to be paused for correctness. It is critical to keep this pause as short as possible to avoid jitters during user interaction. Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD level if the source and destination addresses are PUD-aligned. For CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves PGD entries, since the PUD entry is “folded back” onto the PGD entry. Add HAVE_MOVE_PUD so that architectures where moving at the PUD level isn't supported/tested can turn this off by not selecting the config. Link: https://lkml.kernel.org/r/20201014005320.2233162-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Gavin Shan <gshan@redhat.com> Cc: Hassan Naveed <hnaveed@wavecomp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kees Cook <keescook@chromium.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: memcontrol: account pagetables per nodeShakeel Butt
For many workloads, pagetable consumption is significant and it makes sense to expose it in the memory.stat for the memory cgroups. However at the moment, the pagetables are accounted per-zone. Converting them to per-node and using the right interface will correctly account for the memory cgroups as well. [akpm@linux-foundation.org: export __mod_lruvec_page_state to modules for arch/mips/kvm/] Link: https://lkml.kernel.org/r/20201130212541.2781790-3-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/gup: prevent gup_fast from racing with COW during forkJason Gunthorpe
Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") pages under a FOLL_PIN will not be write protected during COW for fork. This means that pages returned from pin_user_pages(FOLL_WRITE) should not become write protected while the pin is active. However, there is a small race where get_user_pages_fast(FOLL_PIN) can establish a FOLL_PIN at the same time copy_present_page() is write protecting it: CPU 0 CPU 1 get_user_pages_fast() internal_get_user_pages_fast() copy_page_range() pte_alloc_map_lock() copy_present_page() atomic_read(has_pinned) == 0 page_maybe_dma_pinned() == false atomic_set(has_pinned, 1); gup_pgd_range() gup_pte_range() pte_t pte = gup_get_pte(ptep) pte_access_permitted(pte) try_grab_compound_head() pte = pte_wrprotect(pte) set_pte_at(); pte_unmap_unlock() // GUP now returns with a write protected page The first attempt to resolve this by using the write protect caused problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid early COW write protect games during fork()") Instead wrap copy_p4d_range() with the write side of a seqcount and check the read side around gup_pgd_range(). If there is a collision then get_user_pages_fast() fails and falls back to slow GUP. Slow GUP is safe against this race because copy_page_range() is only called while holding the exclusive side of the mmap_lock on the src mm_struct. [akpm@linux-foundation.org: coding style fixes] Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com Link: https://lkml.kernel.org/r/2-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: "Ahmed S. Darwish" <a.darwish@linutronix.de> [seqcount_t parts] Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/gup_benchmark: rename to mm/gup_testJohn Hubbard
Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v3. Summary: This series provides two main things, and a number of smaller supporting goodies. The two main points are: 1) Add a new sub-test to gup_test, which in turn is a renamed version of gup_benchmark. This sub-test allows nicer testing of dump_pages(), at least on user-space pages. For quite a while, I was doing a quick hack to gup_test.c whenever I wanted to try out changes to dump_page(). Then Matthew Wilcox asked me what I meant when I said "I used my dump_page() unit test", and I realized that it might be nice to check in a polished up version of that. Details about how it works and how to use it are in the commit description for patch #6 ("selftests/vm: gup_test: introduce the dump_pages() sub-test"). 2) Fixes a limitation of hmm-tests: these tests are incredibly useful, but only if people actually build and run them. And it turns out that libhugetlbfs is a little too effective at throwing a wrench in the works, there. So I've added a little configuration check that removes just two of the 21 hmm-tests, if libhugetlbfs is not available. Further details in the commit description of patch #8 ("selftests/vm: hmm-tests: remove the libhugetlbfs dependency"). Other smaller things that this series does: a) Remove code duplication by creating gup_test.h. b) Clear up the sub-test organization, and their invocation within run_vmtests.sh. c) Other minor assorted improvements. [1] v2 is here: https://lore.kernel.org/linux-doc/20200929212747.251804-1-jhubbard@nvidia.com/ [2] https://lore.kernel.org/r/CAHk-=wgh-TMPHLY3jueHX7Y2fWh3D+nMBqVS__AZm6-oorquWA@mail.gmail.com This patch (of 9): Rename nearly every "gup_benchmark" reference and file name to "gup_test". The one exception is for the actual gup benchmark test itself. The current code already does a *little* bit more than benchmarking, and definitely covers more than get_user_pages_fast(). More importantly, however, subsequent patches are about to add some functionality that is non-benchmark related. Closely related changes: * Kconfig: in addition to renaming the options from GUP_BENCHMARK to GUP_TEST, update the help text to reflect that it's no longer a benchmark-only test. Link: https://lkml.kernel.org/r/20201026064021.3545418-1-jhubbard@nvidia.com Link: https://lkml.kernel.org/r/20201026064021.3545418-2-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15arch/Kconfig: fix spelling mistakesColin Ian King
There are a few spelling mistakes in the Kconfig comments and help text. Fix these. Link: https://lkml.kernel.org/r/20201207155004.171962-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15Merge tag 'kvmarm-5.11' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.11 - PSCI relay at EL2 when "protected KVM" is enabled - New exception injection code - Simplification of AArch32 system register handling - Fix PMU accesses when no PMU is enabled - Expose CSV3 on non-Meltdown hosts - Cache hierarchy discovery fixes - PV steal-time cleanups - Allow function pointers at EL2 - Various host EL2 entry cleanups - Simplification of the EL2 vector allocation
2020-12-15KVM: SVM: Add AP_JUMP_TABLE support in prep for AP bootingTom Lendacky
The GHCB specification requires the hypervisor to save the address of an AP Jump Table so that, for example, vCPUs that have been parked by UEFI can be started by the OS. Provide support for the AP Jump Table set/get exit code. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15arm64: Warn the user when a small VA_BITS value wastes memoryMarc Zyngier
The memblock code ignores any memory that doesn't fit in the linear mapping. In order to preserve the distance between two physical memory locations and their mappings in the linear map, any hole between two memory regions occupies the same space in the linear map. On most systems, this is hardly a problem (the memory banks are close together, and VA_BITS represents a large space compared to the available memory *and* the potential gaps). On NUMA systems, things are quite different: the gaps between the memory nodes can be pretty large compared to the memory size itself, and the range from memblock_start_of_DRAM() to memblock_end_of_DRAM() can exceed the space described by VA_BITS. Unfortunately, we're not very good at making this obvious to the user, and on a D05 system (two sockets and 4 nodes with 64GB each) accidentally configured with 39bit VA, we display something like this: [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe100-0x1ffbffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x2febfc1100-0x2febfc2fff] [ 0.000000] NUMA: Initmem setup node 2 [<memory-less node>] [ 0.000000] NUMA: NODE_DATA [mem 0x2febfbf200-0x2febfc10ff] [ 0.000000] NUMA: NODE_DATA(2) on node 1 [ 0.000000] NUMA: Initmem setup node 3 [<memory-less node>] [ 0.000000] NUMA: NODE_DATA [mem 0x2febfbd300-0x2febfbf1ff] [ 0.000000] NUMA: NODE_DATA(3) on node 1 which isn't very explicit, and doesn't tell the user why 128GB have suddently disappeared. Let's add a warning message telling the user that memory has been truncated, and offer a potential solution (bumping VA_BITS up). Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20201215152918.1511108-1-maz@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-15s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt()Thomas Gleixner
The irq descriptor is already there, no need to look it up again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20201210194043.769108348@linutronix.de
2020-12-15parisc/irq: Use irq_desc_kstat_cpu() in show_interrupts()Thomas Gleixner
The irq descriptor is already there, no need to look it up again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-parisc@vger.kernel.org Link: https://lore.kernel.org/r/20201210194043.659522455@linutronix.de
2020-12-15arm64/smp: Use irq_desc_kstat_cpu() in arch_show_interrupts()Thomas Gleixner
The irq descriptor is already there, no need to look it up again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201210194043.546326568@linutronix.de
2020-12-15ARM: smp: Use irq_desc_kstat_cpu() in show_ipi_list()Thomas Gleixner
The irq descriptor is already there, no need to look it up again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201210194043.454288890@linutronix.de
2020-12-15parisc/irq: Simplify irq count output for /proc/interruptsThomas Gleixner
The SMP variant works perfectly fine on UP as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-parisc@vger.kernel.org Link: https://lore.kernel.org/r/20201210194043.172893840@linutronix.de
2020-12-15genirq: Move irq_has_action() into core codeThomas Gleixner
This function uses irq_to_desc() and is going to be used by modules to replace the open coded irq_to_desc() (ab)usage. The final goal is to remove the export of irq_to_desc() so driver cannot fiddle with it anymore. Move it into the core code and fixup the usage sites to include the proper header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201210194042.548936472@linutronix.de
2020-12-15Merge branches 'pm-sleep', 'pm-acpi', 'pm-domains' and 'powercap'Rafael J. Wysocki
* pm-sleep: PM: sleep: Add dev_wakeup_path() helper PM / suspend: fix kernel-doc markup PM: sleep: Print driver flags for all devices during suspend/resume * pm-acpi: PM: ACPI: Refresh wakeup device power configuration every time PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup() PM: ACPI: reboot: Use S5 for reboot * pm-domains: PM: domains: create debugfs nodes when adding power domains PM: domains: replace -ENOTSUPP with -EOPNOTSUPP * powercap: powercap: Adjust printing the constraint name with new line powercap: RAPL: Add AMD Fam19h RAPL support powercap: Add AMD Fam17h RAPL support powercap/intel_rapl_msr: Convert rapl_msr_priv into pointer x86/msr-index: sort AMD RAPL MSRs by address
2020-12-15arm64: entry: suppress W=1 prototype warningsMark Rutland
When building with W=1, GCC complains that we haven't defined prototypes for a number of non-static functions in entry-common.c: | arch/arm64/kernel/entry-common.c:203:25: warning: no previous prototype for 'el1_sync_handler' [-Wmissing-prototypes] | 203 | asmlinkage void noinstr el1_sync_handler(struct pt_regs *regs) | | ^~~~~~~~~~~~~~~~ | arch/arm64/kernel/entry-common.c:377:25: warning: no previous prototype for 'el0_sync_handler' [-Wmissing-prototypes] | 377 | asmlinkage void noinstr el0_sync_handler(struct pt_regs *regs) | | ^~~~~~~~~~~~~~~~ | arch/arm64/kernel/entry-common.c:447:25: warning: no previous prototype for 'el0_sync_compat_handler' [-Wmissing-prototypes] | 447 | asmlinkage void noinstr el0_sync_compat_handler(struct pt_regs *regs) | | ^~~~~~~~~~~~~~~~~~~~~~~ ... and so automated build systems using W=1 end up sending a number of emails, despite this not being a real problem as the only callers are in entry.S where prototypes cannot matter. For similar cases in entry-common.c we added prototypes to asm/exception.h, so let's do the same thing here for consistency. Note that there are a number of other warnings printed with W=1, both under arch/arm64 and in core code, and this patch only addresses the cases in entry-common.c. Automated build systems typically filter these warnings such that they're only reported when changes are made nearby, so we don't need to solve them all at once. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201214113353.44417-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-15arm64: topology: Drop the useless update to per-cpu cyclesViresh Kumar
The previous call to update_freq_counters_refs() has already updated the per-cpu variables, don't overwrite them with the same value again. Fixes: 4b9cf23c179a ("arm64: wrap and generalise counter read functions") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/7a171f710cdc0f808a2bfbd7db839c0d265527e7.1607579234.git.viresh.kumar@linaro.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-15powerpc: Add config fragment for disabling -WerrorMichael Ellerman
This makes it easy to disable building with -Werror: $ make defconfig $ grep WERROR .config # CONFIG_PPC_DISABLE_WERROR is not set CONFIG_PPC_WERROR=y $ make disable-werror.config GEN Makefile Using .config as base Merging arch/powerpc/configs/disable-werror.config Value of CONFIG_PPC_DISABLE_WERROR is redefined by fragment arch/powerpc/configs/disable-werror.config: Previous value: # CONFIG_PPC_DISABLE_WERROR is not set New value: CONFIG_PPC_DISABLE_WERROR=y ... $ grep WERROR .config CONFIG_PPC_DISABLE_WERROR=y Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201023040002.3313371-1-mpe@ellerman.id.au
2020-12-15powerpc/configs: Add ppc64le_allnoconfig targetMichael Ellerman
Add a phony target for ppc64le_allnoconfig, which tests some combinations of CONFIG symbols that aren't covered by any of our defconfigs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201125031551.2112715-1-mpe@ellerman.id.au
2020-12-15powerpc/powernv: Rate limit opal-elog read failure messageAndrew Donnellan
Sometimes we can't read an error log from OPAL, and we print an error message accordingly. But the OPAL userspace tools seem to like retrying a lot, in which case we flood the kernel log with a lot of messages. Change pr_err() to pr_err_ratelimited() to help with this. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201211021140.28402-1-ajd@linux.ibm.com
2020-12-15powerpc/pseries/memhotplug: Quieten some DLPAR operationsLaurent Dufour
When attempting to remove by index a set of LMBs a lot of messages are displayed on the console, even when everything goes fine: pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000002d Offlined Pages 4096 pseries-hotplug-mem: Memory at 2d0000000 was hot-removed The 2 messages prefixed by "pseries-hotplug-mem" are not really helpful for the end user, they should be debug outputs. In case of error, because some of the LMB's pages couldn't be offlined, the following is displayed on the console: pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000003e pseries-hotplug-mem: Failed to hot-remove memory at 3e0000000 dlpar: Could not handle DLPAR request "memory remove index 0x8000003e" Again, the 2 messages prefixed by "pseries-hotplug-mem" are useless, and the generic DLPAR prefixed message should be enough. These 2 first changes are mainly triggered by the changes introduced in drmgr: https://groups.google.com/g/powerpc-utils-devel/c/Y6ef4NB3EzM/m/9cu5JHRxAQAJ Also, when adding a bunch of LMBs, a message is displayed in the console per LMB like these ones: pseries-hotplug-mem: Memory at 7e0000000 (drc index 8000007e) was hot-added pseries-hotplug-mem: Memory at 7f0000000 (drc index 8000007f) was hot-added pseries-hotplug-mem: Memory at 800000000 (drc index 80000080) was hot-added pseries-hotplug-mem: Memory at 810000000 (drc index 80000081) was hot-added When adding 1TB of memory and LMB size is 256MB, this leads to 4096 messages to be displayed on the console. These messages are not really helpful for the end user, so moving them to the DEBUG level. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> [mpe: Tweak change log wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201211145954.90143-1-ldufour@linux.ibm.com
2020-12-15powerpc: force inlining of csum_partial() to avoid multiple csum_partial() ↵Christophe Leroy
with GCC10 ppc-linux-objdump -d vmlinux | grep -e "<csum_partial>" -e "<__csum_partial>" With gcc9 I get: c0017ef8 <__csum_partial>: c00182fc: 4b ff fb fd bl c0017ef8 <__csum_partial> c0018478: 4b ff fa 80 b c0017ef8 <__csum_partial> c03e8458: 4b c2 fa a0 b c0017ef8 <__csum_partial> c03e8518: 4b c2 f9 e1 bl c0017ef8 <__csum_partial> c03ef410: 4b c2 8a e9 bl c0017ef8 <__csum_partial> c03f0b24: 4b c2 73 d5 bl c0017ef8 <__csum_partial> c04279a4: 4b bf 05 55 bl c0017ef8 <__csum_partial> c0429820: 4b be e6 d9 bl c0017ef8 <__csum_partial> c0429944: 4b be e5 b5 bl c0017ef8 <__csum_partial> c042b478: 4b be ca 81 bl c0017ef8 <__csum_partial> c042b554: 4b be c9 a5 bl c0017ef8 <__csum_partial> c045f15c: 4b bb 8d 9d bl c0017ef8 <__csum_partial> c0492190: 4b b8 5d 69 bl c0017ef8 <__csum_partial> c0492310: 4b b8 5b e9 bl c0017ef8 <__csum_partial> c0495594: 4b b8 29 65 bl c0017ef8 <__csum_partial> c049c420: 4b b7 ba d9 bl c0017ef8 <__csum_partial> c049c870: 4b b7 b6 89 bl c0017ef8 <__csum_partial> c049c930: 4b b7 b5 c9 bl c0017ef8 <__csum_partial> c04a9ca0: 4b b6 e2 59 bl c0017ef8 <__csum_partial> c04bdde4: 4b b5 a1 15 bl c0017ef8 <__csum_partial> c04be480: 4b b5 9a 79 bl c0017ef8 <__csum_partial> c04be710: 4b b5 97 e9 bl c0017ef8 <__csum_partial> c04c969c: 4b b4 e8 5d bl c0017ef8 <__csum_partial> c04ca2fc: 4b b4 db fd bl c0017ef8 <__csum_partial> c04cf5bc: 4b b4 89 3d bl c0017ef8 <__csum_partial> c04d0440: 4b b4 7a b9 bl c0017ef8 <__csum_partial> With gcc10 I get: c0018d08 <__csum_partial>: c0019020 <csum_partial>: c0019020: 4b ff fc e8 b c0018d08 <__csum_partial> c001914c: 4b ff fe d4 b c0019020 <csum_partial> c0019250: 4b ff fd d1 bl c0019020 <csum_partial> c03e404c <csum_partial>: c03e404c: 4b c3 4c bc b c0018d08 <__csum_partial> c03e4050: 4b ff ff fc b c03e404c <csum_partial> c03e40fc: 4b ff ff 51 bl c03e404c <csum_partial> c03e6680: 4b ff d9 cd bl c03e404c <csum_partial> c03e68c4: 4b ff d7 89 bl c03e404c <csum_partial> c03e7934: 4b ff c7 19 bl c03e404c <csum_partial> c03e7bf8: 4b ff c4 55 bl c03e404c <csum_partial> c03eb148: 4b ff 8f 05 bl c03e404c <csum_partial> c03ecf68: 4b c2 bd a1 bl c0018d08 <__csum_partial> c04275b8 <csum_partial>: c04275b8: 4b bf 17 50 b c0018d08 <__csum_partial> c0427884: 4b ff fd 35 bl c04275b8 <csum_partial> c0427b18: 4b ff fa a1 bl c04275b8 <csum_partial> c0427bd8: 4b ff f9 e1 bl c04275b8 <csum_partial> c0427cd4: 4b ff f8 e5 bl c04275b8 <csum_partial> c0427e34: 4b ff f7 85 bl c04275b8 <csum_partial> c045a1c0: 4b bb eb 49 bl c0018d08 <__csum_partial> c0489464 <csum_partial>: c0489464: 4b b8 f8 a4 b c0018d08 <__csum_partial> c04896b0: 4b ff fd b5 bl c0489464 <csum_partial> c048982c: 4b ff fc 39 bl c0489464 <csum_partial> c0490158: 4b b8 8b b1 bl c0018d08 <__csum_partial> c0492f0c <csum_partial>: c0492f0c: 4b b8 5d fc b c0018d08 <__csum_partial> c049326c: 4b ff fc a1 bl c0492f0c <csum_partial> c049333c: 4b ff fb d1 bl c0492f0c <csum_partial> c0493b18: 4b ff f3 f5 bl c0492f0c <csum_partial> c0493f50: 4b ff ef bd bl c0492f0c <csum_partial> c0493ffc: 4b ff ef 11 bl c0492f0c <csum_partial> c04a0f78: 4b b7 7d 91 bl c0018d08 <__csum_partial> c04b3e3c: 4b b6 4e cd bl c0018d08 <__csum_partial> c04b40d0 <csum_partial>: c04b40d0: 4b b6 4c 38 b c0018d08 <__csum_partial> c04b4448: 4b ff fc 89 bl c04b40d0 <csum_partial> c04b46f4: 4b ff f9 dd bl c04b40d0 <csum_partial> c04bf448: 4b b5 98 c0 b c0018d08 <__csum_partial> c04c5264: 4b b5 3a a5 bl c0018d08 <__csum_partial> c04c61e4: 4b b5 2b 25 bl c0018d08 <__csum_partial> gcc10 defines multiple versions of csum_partial() which are just an unconditionnal branch to __csum_partial(). To enforce inlining of that branch to __csum_partial(), mark csum_partial() as __always_inline. With this patch with gcc10: c0018d08 <__csum_partial>: c0019148: 4b ff fb c0 b c0018d08 <__csum_partial> c001924c: 4b ff fa bd bl c0018d08 <__csum_partial> c03e40ec: 4b c3 4c 1d bl c0018d08 <__csum_partial> c03e4120: 4b c3 4b e8 b c0018d08 <__csum_partial> c03eb004: 4b c2 dd 05 bl c0018d08 <__csum_partial> c03ecef4: 4b c2 be 15 bl c0018d08 <__csum_partial> c0427558: 4b bf 17 b1 bl c0018d08 <__csum_partial> c04286e4: 4b bf 06 25 bl c0018d08 <__csum_partial> c0428cd8: 4b bf 00 31 bl c0018d08 <__csum_partial> c0428d84: 4b be ff 85 bl c0018d08 <__csum_partial> c045a17c: 4b bb eb 8d bl c0018d08 <__csum_partial> c0489450: 4b b8 f8 b9 bl c0018d08 <__csum_partial> c0491860: 4b b8 74 a9 bl c0018d08 <__csum_partial> c0492eec: 4b b8 5e 1d bl c0018d08 <__csum_partial> c04a0eac: 4b b7 7e 5d bl c0018d08 <__csum_partial> c04b3e34: 4b b6 4e d5 bl c0018d08 <__csum_partial> c04b426c: 4b b6 4a 9d bl c0018d08 <__csum_partial> c04b463c: 4b b6 46 cd bl c0018d08 <__csum_partial> c04c004c: 4b b5 8c bd bl c0018d08 <__csum_partial> c04c0368: 4b b5 89 a1 bl c0018d08 <__csum_partial> c04c5254: 4b b5 3a b5 bl c0018d08 <__csum_partial> c04c60d4: 4b b5 2c 35 bl c0018d08 <__csum_partial> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a1d31f84ddb0926813b17fcd5cc7f3fa7b4deac2.1602759123.git.christophe.leroy@csgroup.eu
2020-12-15powerpc/perf: Fix Threshold Event Counter Multiplier width for P10Madhavan Srinivasan
Threshold Event Counter Multiplier (TECM) is part of Monitor Mode Control Register A (MMCRA). This field along with Threshold Event Counter Exponent (TECE) is used to get threshould counter value. In Power10, this is a 8bit field, so patch fixes the current code to modify the MMCRA[TECM] extraction macro to handle this change. ISA v3.1 says this is a 7 bit field but POWER10 it's actually 8 bits which will hopefully be fixed in ISA v3.1 update. Fixes: 170a315f41c6 ("powerpc/perf: Support to export MMCRA[TEC*] field to userspace") Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1608022578-1532-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-12-15powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()Christophe Leroy
Commit 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions") inadvertely removed the mask applied to start parameter in those two functions, leading to the following crash on power9. LTP: starting hugemmap05_1 (hugemmap05 -m) ------------[ cut here ]------------ kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:387! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV ... CPU: 99 PID: 308 Comm: ksoftirqd/99 Tainted: G O 5.10.0-rc7-next-20201211 #1 NIP: c00000000005dbec LR: c0000000003352f4 CTR: 0000000000000000 REGS: c00020000bb6f830 TRAP: 0700 Tainted: G O (5.10.0-rc7-next-20201211) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002284 XER: 20040000 GPR00: c0000000003352f4 c00020000bb6fad0 c000000007f70b00 c0002000385b3ff0 GPR04: 0000000000000000 0000000000000003 c00020000bb6f8b4 0000000000000001 GPR08: 0000000000000001 0000000000000009 0000000000000008 0000000000000002 GPR12: 0000000024002488 c000201fff649c00 c000000007f2a20c 0000000000000000 GPR16: 0000000000000007 0000000000000000 c000000000194d10 c000000000194d10 GPR24: 0000000000000014 0000000000000015 c000201cc6e72398 c000000007fac4b4 GPR28: c000000007f2bf80 c000000007fac2f8 0000000000000008 c000200033870000 NIP [c00000000005dbec] __tlb_remove_table+0x1dc/0x1e0 pgtable_free at arch/powerpc/mm/book3s64/pgtable.c:387 (inlined by) __tlb_remove_table at arch/powerpc/mm/book3s64/pgtable.c:405 LR [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0 Call Trace: __tlb_remove_table+0x13c/0x1e0 (unreliable) tlb_remove_table_rcu+0x54/0xa0 __tlb_remove_table_free at mm/mmu_gather.c:101 (inlined by) tlb_remove_table_rcu at mm/mmu_gather.c:156 rcu_core+0x35c/0xbb0 rcu_do_batch at kernel/rcu/tree.c:2502 (inlined by) rcu_core at kernel/rcu/tree.c:2737 __do_softirq+0x480/0x704 run_ksoftirqd+0x74/0xd0 run_ksoftirqd at kernel/softirq.c:651 (inlined by) run_ksoftirqd at kernel/softirq.c:642 smpboot_thread_fn+0x278/0x320 kthread+0x1c4/0x1d0 ret_from_kernel_thread+0x5c/0x80 Properly apply the masks before calling pmd_free_tlb() and pud_free_tlb() respectively. Fixes: 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions") Reported-by: Qian Cai <qcai@redhat.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/56feccd7b6fcd98e353361a233fa7bb8e67c3164.1607780469.git.christophe.leroy@csgroup.eu
2020-12-15KVM: PPC: Book3S HV: Fix mask size for emulated msgsndpLeonardo Bras
According to ISAv3.1 and ISAv3.0b, the msgsndp is described to split RB in: msgtype <- (RB) 32:36 payload <- (RB) 37:63 t <- (RB) 57:63 The current way of getting 'msgtype', and 't' is missing their MSB: msgtype: ((arg >> 27) & 0xf) : Gets (RB) 33:36, missing bit 32 t: (arg &= 0x3f) : Gets (RB) 58:63, missing bit 57 Fixes this by applying the correct mask. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201208215707.31149-1-leobras.c@gmail.com
2020-12-15KVM: PPC: fix comparison to bool warningKaixu Xia
Fix the following coccicheck warning: ./arch/powerpc/kvm/booke.c:503:6-16: WARNING: Comparison to bool ./arch/powerpc/kvm/booke.c:505:6-17: WARNING: Comparison to bool ./arch/powerpc/kvm/booke.c:507:6-16: WARNING: Comparison to bool Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604764178-8087-1-git-send-email-kaixuxia@tencent.com
2020-12-15KVM: PPC: Book3S: Assign boolean values to a bool variableKaixu Xia
Fix the following coccinelle warnings: ./arch/powerpc/kvm/book3s_xics.c:476:3-15: WARNING: Assignment of 0/1 to bool variable ./arch/powerpc/kvm/book3s_xics.c:504:3-15: WARNING: Assignment of 0/1 to bool variable Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604730382-5810-1-git-send-email-kaixuxia@tencent.com
2020-12-15KVM: SVM: Provide support to launch and run an SEV-ES guestTom Lendacky
An SEV-ES guest is started by invoking a new SEV initialization ioctl, KVM_SEV_ES_INIT. This identifies the guest as an SEV-ES guest, which is used to drive the appropriate ASID allocation, VMSA encryption, etc. Before being able to run an SEV-ES vCPU, the vCPU VMSA must be encrypted and measured. This is done using the LAUNCH_UPDATE_VMSA command after all calls to LAUNCH_UPDATE_DATA have been performed, but before LAUNCH_MEASURE has been performed. In order to establish the encrypted VMSA, the current (traditional) VMSA and the GPRs are synced to the page that will hold the encrypted VMSA and then LAUNCH_UPDATE_VMSA is invoked. The vCPU is then marked as having protected guest state. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e9643245adb809caf3a87c09997926d2f3d6ff41.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guestsTom Lendacky
The run sequence is different for an SEV-ES guest compared to a legacy or even an SEV guest. The guest vCPU register state of an SEV-ES guest will be restored on VMRUN and saved on VMEXIT. There is no need to restore the guest registers directly and through VMLOAD before VMRUN and no need to save the guest registers directly and through VMSAVE on VMEXIT. Update the svm_vcpu_run() function to skip register state saving and restoring and provide an alternative function for running an SEV-ES guest in vmenter.S Additionally, certain host state is restored across an SEV-ES VMRUN. As a result certain register states are not required to be restored upon VMEXIT (e.g. FS, GS, etc.), so only do that if the guest is not an SEV-ES guest. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fb1c66d32f2194e171b95fc1a8affd6d326e10c1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Provide support for SEV-ES vCPU loadingTom Lendacky
An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES hardware will restore certain registers on VMEXIT, but not save them on VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the following changes: General vCPU load changes: - During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and save the current values of XCR0, XSS and PKRU to the per-CPU SVM save area as these registers will be restored on VMEXIT. General vCPU put changes: - Do not attempt to restore registers that SEV-ES hardware has already restored on VMEXIT. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <019390e9cb5e93cd73014fa5a040c17d42588733.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Provide support for SEV-ES vCPU creation/loadingTom Lendacky
An SEV-ES vCPU requires additional VMCB initialization requirements for vCPU creation and vCPU load/put requirements. This includes: General VMCB initialization changes: - Set a VMCB control bit to enable SEV-ES support on the vCPU. - Set the VMCB encrypted VM save area address. - CRx registers are part of the encrypted register state and cannot be updated. Remove the CRx register read and write intercepts and replace them with CRx register write traps to track the CRx register values. - Certain MSR values are part of the encrypted register state and cannot be updated. Remove certain MSR intercepts (EFER, CR_PAT, etc.). - Remove the #GP intercept (no support for "enable_vmware_backdoor"). - Remove the XSETBV intercept since the hypervisor cannot modify XCR0. General vCPU creation changes: - Set the initial GHCB gpa value as per the GHCB specification. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <3a8aef366416eddd5556dfa3fdc212aafa1ad0a2.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Update ASID allocation to support SEV-ES guestsTom Lendacky
SEV and SEV-ES guests each have dedicated ASID ranges. Update the ASID allocation routine to return an ASID in the respective range. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <d7aed505e31e3954268b2015bb60a1486269c780.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Set the encryption mask for the SVM host save areaTom Lendacky
The SVM host save area is used to restore some host state on VMEXIT of an SEV-ES guest. After allocating the save area, clear it and add the encryption mask to the SVM host save area physical address that is programmed into the VM_HSAVE_PA MSR. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <b77aa28af6d7f1a0cb545959e08d6dc75e0c3cba.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add NMI support for an SEV-ES guestTom Lendacky
The GHCB specification defines how NMIs are to be handled for an SEV-ES guest. To detect the completion of an NMI the hypervisor must not intercept the IRET instruction (because a #VC while running the NMI will issue an IRET) and, instead, must receive an NMI Complete exit event from the guest. Update the KVM support for detecting the completion of NMIs in the guest to follow the GHCB specification. When an SEV-ES guest is active, the IRET instruction will no longer be intercepted. Now, when the NMI Complete exit event is received, the iret_interception() function will be called to simulate the completion of the NMI. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5ea3dd69b8d4396cefdc9048ebc1ab7caa70a847.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guestTom Lendacky
The guest FPU state is automatically restored on VMRUN and saved on VMEXIT by the hardware, so there is no reason to do this in KVM. Eliminate the allocation of the guest_fpu save area and key off that to skip operations related to the guest FPU state. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <173e429b4d0d962c6a443c4553ffdaf31b7665a4.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Do not report support for SMM for an SEV-ES guestTom Lendacky
SEV-ES guests do not currently support SMM. Update the has_emulated_msr() kvm_x86_ops function to take a struct kvm parameter so that the capability can be reported at a VM level. Since this op is also called during KVM initialization and before a struct kvm instance is available, comments will be added to each implementation of has_emulated_msr() to indicate the kvm parameter can be null. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <75de5138e33b945d2fb17f81ae507bda381808e3.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ESTom Lendacky
Since many of the registers used by the SEV-ES are encrypted and cannot be read or written, adjust the __get_sregs() / __set_sregs() to take into account whether the VMSA/guest state is encrypted. For __get_sregs(), return the actual value that is in use by the guest for all registers being tracked using the write trap support. For __set_sregs(), skip setting of all guest registers values. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <23051868db76400a9b07a2020525483a1e62dbcf.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for CR8 write traps for an SEV-ES guestTom Lendacky
For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES guests introduce new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR8 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5a01033f4c8b3106ca9374b7cadf8e33da852df1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for CR4 write traps for an SEV-ES guestTom Lendacky
For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES guests introduce new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR4 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c3880bf2db8693aa26f648528fbc6e967ab46e25.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for CR0 write traps for an SEV-ES guestTom Lendacky
For SEV-ES guests, the interception of control register write access is not recommended. Control register interception occurs prior to the control register being modified and the hypervisor is unable to modify the control register itself because the register is located in the encrypted register state. SEV-ES support introduces new control register write traps. These traps provide intercept support of a control register write after the control register has been modified. The new control register value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest control registers. Add support to track the value of the guest CR0 register using the control register write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <182c9baf99df7e40ad9617ff90b84542705ef0d7.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for EFER write traps for an SEV-ES guestTom Lendacky
For SEV-ES guests, the interception of EFER write access is not recommended. EFER interception occurs prior to EFER being modified and the hypervisor is unable to modify EFER itself because the register is located in the encrypted register state. SEV-ES support introduces a new EFER write trap. This trap provides intercept support of an EFER write after it has been modified. The new EFER value is provided in the VMCB EXITINFO1 field, allowing the hypervisor to track the setting of the guest EFER. Add support to track the value of the guest EFER value using the EFER write trap so that the hypervisor understands the guest operating mode. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <8993149352a3a87cd0625b3b61bfd31ab28977e1.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Support string IO operations for an SEV-ES guestTom Lendacky
For an SEV-ES guest, string-based port IO is performed to a shared (un-encrypted) page so that both the hypervisor and guest can read or write to it and each see the contents. For string-based port IO operations, invoke SEV-ES specific routines that can complete the operation using common KVM port IO support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <9d61daf0ffda496703717218f415cdc8fd487100.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Support MMIO for an SEV-ES guestTom Lendacky
For an SEV-ES guest, MMIO is performed to a shared (un-encrypted) page so that both the hypervisor and guest can read or write to it and each see the contents. The GHCB specification provides software-defined VMGEXIT exit codes to indicate a request for an MMIO read or an MMIO write. Add support to recognize the MMIO requests and invoke SEV-ES specific routines that can complete the MMIO operation. These routines use common KVM support to complete the MMIO operation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <af8de55127d5bcc3253d9b6084a0144c12307d4d.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Create trace events for VMGEXIT MSR protocol processingTom Lendacky
Add trace events for entry to and exit from VMGEXIT MSR protocol processing. The vCPU will be common for the trace events. The MSR protocol processing is guided by the GHCB GPA in the VMCB, so the GHCB GPA will represent the input and output values for the entry and exit events, respectively. Additionally, the exit event will contain the return code for the event. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c5b3b440c3e0db43ff2fc02813faa94fa54896b0.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Create trace events for VMGEXIT processingTom Lendacky
Add trace events for entry to and exit from VMGEXIT processing. The vCPU id and the exit reason will be common for the trace events. The exit info fields will represent the input and output values for the entry and exit events, respectively. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <25357dca49a38372e8f483753fb0c1c2a70a6898.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100Tom Lendacky
The GHCB specification defines a GHCB MSR protocol using the lower 12-bits of the GHCB MSR (in the hypervisor this corresponds to the GHCB GPA field in the VMCB). Function 0x100 is a request for termination of the guest. The guest has encountered some situation for which it has requested to be terminated. The GHCB MSR value contains the reason for the request. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <f3a1f7850c75b6ea4101e15bbb4a3af1a203f1dc.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004Tom Lendacky
The GHCB specification defines a GHCB MSR protocol using the lower 12-bits of the GHCB MSR (in the hypervisor this corresponds to the GHCB GPA field in the VMCB). Function 0x004 is a request for CPUID information. Only a single CPUID result register can be sent per invocation, so the protocol defines the register that is requested. The GHCB MSR value is set to the CPUID register value as per the specification via the VMCB GHCB GPA field. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fd7ee347d3936e484c06e9001e340bf6387092cd.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002Tom Lendacky
The GHCB specification defines a GHCB MSR protocol using the lower 12-bits of the GHCB MSR (in the hypervisor this corresponds to the GHCB GPA field in the VMCB). Function 0x002 is a request to set the GHCB MSR value to the SEV INFO as per the specification via the VMCB GHCB GPA field. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c23c163a505290a0d1b9efc4659b838c8c902cbc.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Add initial support for a VMGEXIT VMEXITTom Lendacky
SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a VMGEXIT includes mapping the GHCB based on the guest GPA, which is obtained from a new VMCB field, and then validating the required inputs for the VMGEXIT exit reason. Since many of the VMGEXIT exit reasons correspond to existing VMEXIT reasons, the information from the GHCB is copied into the VMCB control exit code areas and KVM register areas. The standard exit handlers are invoked, similar to standard VMEXIT processing. Before restarting the vCPU, the GHCB is updated with any registers that have been updated by the hypervisor. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c6a4ed4294a369bd75c44d03bd7ce0f0c3840e50.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15KVM: SVM: Prepare for SEV-ES exit handling in the sev.c fileTom Lendacky
This is a pre-patch to consolidate some exit handling code into callable functions. Follow-on patches for SEV-ES exit handling will then be able to use them from the sev.c file. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5b8b0ffca8137f3e1e257f83df9f5c881c8a96a3.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>