summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-06x86/sev: Register GHCB memory when SEV-SNP is activeBrijesh Singh
The SEV-SNP guest is required by the GHCB spec to register the GHCB's Guest Physical Address (GPA). This is because the hypervisor may prefer that a guest uses a consistent and/or specific GPA for the GHCB associated with a vCPU. For more information, see the GHCB specification section "GHCB GPA Registration". [ bp: Cleanup comments. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-18-brijesh.singh@amd.com
2022-04-06x86/compressed: Register GHCB memory when SEV-SNP is activeBrijesh Singh
The SEV-SNP guest is required by the GHCB spec to register the GHCB's Guest Physical Address (GPA). This is because the hypervisor may prefer that a guest use a consistent and/or specific GPA for the GHCB associated with a vCPU. For more information, see the GHCB specification section "GHCB GPA Registration". If hypervisor can not work with the guest provided GPA then terminate the guest boot. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-17-brijesh.singh@amd.com
2022-04-06x86/compressed: Add helper for validating pages in the decompression stageBrijesh Singh
Many of the integrity guarantees of SEV-SNP are enforced through the Reverse Map Table (RMP). Each RMP entry contains the GPA at which a particular page of DRAM should be mapped. The VMs can request the hypervisor to add pages in the RMP table via the Page State Change VMGEXIT defined in the GHCB specification. Inside each RMP entry is a Validated flag; this flag is automatically cleared to 0 by the CPU hardware when a new RMP entry is created for a guest. Each VM page can be either validated or invalidated, as indicated by the Validated flag in the RMP entry. Memory access to a private page that is not validated generates a #VC. A VM must use the PVALIDATE instruction to validate a private page before using it. To maintain the security guarantee of SEV-SNP guests, when transitioning pages from private to shared, the guest must invalidate the pages before asking the hypervisor to change the page state to shared in the RMP table. After the pages are mapped private in the page table, the guest must issue a page state change VMGEXIT to mark the pages private in the RMP table and validate them. Upon boot, BIOS should have validated the entire system memory. During the kernel decompression stage, early_setup_ghcb() uses set_page_decrypted() to make the GHCB page shared (i.e. clear encryption attribute). And while exiting from the decompression, it calls set_page_encrypted() to make the page private. Add snp_set_page_{private,shared}() helpers that are used by set_page_{decrypted,encrypted}() to change the page state in the RMP table. [ bp: Massage commit message and comments. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-16-brijesh.singh@amd.com
2022-04-06x86/sev: Check the VMPL levelBrijesh Singh
The Virtual Machine Privilege Level (VMPL) feature in the SEV-SNP architecture allows a guest VM to divide its address space into four levels. The level can be used to provide hardware isolated abstraction layers within a VM. VMPL0 is the highest privilege level, and VMPL3 is the least privilege level. Certain operations must be done by the VMPL0 software, such as: * Validate or invalidate memory range (PVALIDATE instruction) * Allocate VMSA page (RMPADJUST instruction when VMSA=1) The initial SNP support requires that the guest kernel is running at VMPL0. Add such a check to verify the guest is running at level 0 before continuing the boot. There is no easy method to query the current VMPL level, so use the RMPADJUST instruction to determine whether the guest is running at the VMPL0. [ bp: Massage commit message. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-15-brijesh.singh@amd.com
2022-04-06x86/sev: Add a helper for the PVALIDATE instructionBrijesh Singh
An SNP-active guest uses the PVALIDATE instruction to validate or rescind the validation of a guest page’s RMP entry. Upon completion, a return code is stored in EAX and rFLAGS bits are set based on the return code. If the instruction completed successfully, the carry flag (CF) indicates if the content of the RMP were changed or not. See AMD APM Volume 3 for additional details. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-14-brijesh.singh@amd.com
2022-04-06x86/sev: Check SEV-SNP features supportBrijesh Singh
Version 2 of the GHCB specification added the advertisement of features that are supported by the hypervisor. If the hypervisor supports SEV-SNP then it must set the SEV-SNP features bit to indicate that the base functionality is supported. Check that feature bit while establishing the GHCB; if failed, terminate the guest. Version 2 of the GHCB specification adds several new Non-Automatic Exits (NAEs), most of them are optional except the hypervisor feature. Now that the hypervisor feature NAE is implemented, bump the GHCB maximum supported protocol version. While at it, move the GHCB protocol negotiation check from the #VC exception handler to sev_enable() so that all feature detection happens before the first #VC exception. While at it, document why the GHCB page cannot be setup from load_stage2_idt(). [ bp: Massage commit message. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-13-brijesh.singh@amd.com
2022-04-06x86/sev: Save the negotiated GHCB versionBrijesh Singh
The SEV-ES guest calls sev_es_negotiate_protocol() to negotiate the GHCB protocol version before establishing the GHCB. Cache the negotiated GHCB version so that it can be used later. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-12-brijesh.singh@amd.com
2022-04-06x86/sev: Define the Linux-specific guest termination reasonsBrijesh Singh
The GHCB specification defines the reason code for reason set 0. The reason codes defined in the set 0 do not cover all possible causes for a guest to request termination. The reason sets 1 to 255 are reserved for the vendor-specific codes. Reserve the reason set 1 for the Linux guest. Define the error codes for reason set 1 so that one can have meaningful termination reasons and thus better guest failure diagnosis. While at it, change sev_es_terminate() to accept a reason set parameter. [ bp: Massage commit message. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-11-brijesh.singh@amd.com
2022-04-06x86/mm: Extend cc_attr to include AMD SEV-SNPBrijesh Singh
The CC_ATTR_GUEST_SEV_SNP can be used by the guest to query whether the SNP (Secure Nested Paging) feature is active. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-10-brijesh.singh@amd.com
2022-04-06x86/sev: Detect/setup SEV/SME features earlier in bootMichael Roth
sme_enable() handles feature detection for both SEV and SME. Future patches will also use it for SEV-SNP feature detection/setup, which will need to be done immediately after the first #VC handler is set up. Move it now in preparation. Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-9-brijesh.singh@amd.com
2022-04-06x86/compressed/64: Detect/setup SEV/SME features earlier during bootMichael Roth
With upcoming SEV-SNP support, SEV-related features need to be initialized earlier during boot, at the same point the initial #VC handler is set up, so that the SEV-SNP CPUID table can be utilized during the initial feature checks. Also, SEV-SNP feature detection will rely on EFI helper functions to scan the EFI config table for the Confidential Computing blob, and so would need to be implemented at least partially in C. Currently set_sev_encryption_mask() is used to initialize the sev_status and sme_me_mask globals that advertise what SEV/SME features are available in a guest. Rename it to sev_enable() to better reflect that (SME is only enabled in the case of SEV guests in the boot/compressed kernel), and move it to just after the stage1 #VC handler is set up so that it can be used to initialize SEV-SNP as well in future patches. While at it, re-implement it as C code so that all SEV feature detection can be better consolidated with upcoming SEV-SNP feature detection, which will also be in C. The 32-bit entry path remains unchanged, as it never relied on the set_sev_encryption_mask() initialization to begin with. [ bp: Massage commit message. ] Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-8-brijesh.singh@amd.com
2022-04-06x86/boot: Use MSR read/write helpers instead of inline assemblyMichael Roth
Update all C code to use the new boot_rdmsr()/boot_wrmsr() helpers instead of relying on inline assembly. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-7-brijesh.singh@amd.com
2022-04-06x86/boot: Introduce helpers for MSR reads/writesMichael Roth
The current set of helpers used throughout the run-time kernel have dependencies on code/facilities outside of the boot kernel, so there are a number of call-sites throughout the boot kernel where inline assembly is used instead. More will be added with subsequent patches that add support for SEV-SNP, so take the opportunity to provide a basic set of helpers that can be used by the boot kernel to reduce reliance on inline assembly. Use boot_* prefix so that it's clear these are helpers specific to the boot kernel to avoid any confusion with the various other MSR read/write helpers. [ bp: Disambiguate parameter names and trim comment. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-6-brijesh.singh@amd.com
2022-04-06KVM: SVM: Update the SEV-ES save area mappingTom Lendacky
This is the final step in defining the multiple save areas to keep them separate and ensuring proper operation amongst the different types of guests. Update the SEV-ES/SEV-SNP save area to match the APM. This save area will be used for the upcoming SEV-SNP AP Creation NAE event support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-5-brijesh.singh@amd.com
2022-04-06KVM: SVM: Create a separate mapping for the GHCB save areaTom Lendacky
The initial implementation of the GHCB spec was based on trying to keep the register state offsets the same relative to the VM save area. However, the save area for SEV-ES has changed within the hardware causing the relation between the SEV-ES save area to change relative to the GHCB save area. This is the second step in defining the multiple save areas to keep them separate and ensuring proper operation amongst the different types of guests. Create a GHCB save area that matches the GHCB specification. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-4-brijesh.singh@amd.com
2022-04-06KVM: SVM: Create a separate mapping for the SEV-ES save areaTom Lendacky
The save area for SEV-ES/SEV-SNP guests, as used by the hardware, is different from the save area of a non SEV-ES/SEV-SNP guest. This is the first step in defining the multiple save areas to keep them separate and ensuring proper operation amongst the different types of guests. Create an SEV-ES/SEV-SNP save area and adjust usage to the new save area definition where needed. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220405182743.308853-1-brijesh.singh@amd.com
2022-04-05KVM: SVM: Define sev_features and VMPL field in the VMSABrijesh Singh
The hypervisor uses the sev_features field (offset 3B0h) in the Save State Area to control the SEV-SNP guest features such as SNPActive, vTOM, ReflectVC etc. An SEV-SNP guest can read the sev_features field through the SEV_STATUS MSR. While at it, update dump_vmcb() to log the VMPL level. See APM2 Table 15-34 and B-4 for more details. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-2-brijesh.singh@amd.com
2022-04-03Linux 5.18-rc1Linus Torvalds
2022-04-03Merge tag 'trace-v5.18-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - Rename the staging files to give them some meaning. Just stage1,stag2,etc, does not show what they are for - Check for NULL from allocation in bootconfig - Hold event mutex for dyn_event call in user events - Mark user events to broken (to work on the API) - Remove eBPF updates from user events - Remove user events from uapi header to keep it from being installed. - Move ftrace_graph_is_dead() into inline as it is called from hot paths and also convert it into a static branch. * tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Move user_events.h temporarily out of include/uapi ftrace: Make ftrace_graph_is_dead() a static branch tracing: Set user_events to BROKEN tracing/user_events: Remove eBPF interfaces tracing/user_events: Hold event_mutex during dyn_event_add proc: bootconfig: Add null pointer check tracing: Rename the staging files for trace_events
2022-04-03Merge tag 'clk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "A single revert to fix a boot regression seen when clk_put() started dropping rate range requests. It's best to keep various systems booting so we'll kick this out and try again next time" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: Revert "clk: Drop the rate range on clk_put()"
2022-04-03Merge tag 'x86-urgent-2022-04-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes and updates: - Make the prctl() for enabling dynamic XSTATE components correct so it adds the newly requested feature to the permission bitmap instead of overwriting it. Add a selftest which validates that. - Unroll string MMIO for encrypted SEV guests as the hypervisor cannot emulate it. - Handle supervisor states correctly in the FPU/XSTATE code so it takes the feature set of the fpstate buffer into account. The feature sets can differ between host and guest buffers. Guest buffers do not contain supervisor states. So far this was not an issue, but with enabling PASID it needs to be handled in the buffer offset calculation and in the permission bitmaps. - Avoid a gazillion of repeated CPUID invocations in by caching the values early in the FPU/XSTATE code. - Enable CONFIG_WERROR in x86 defconfig. - Make the X86 defconfigs more useful by adapting them to Y2022 reality" * tag 'x86-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xstate: Consolidate size calculations x86/fpu/xstate: Handle supervisor states in XSTATE permissions x86/fpu/xsave: Handle compacted offsets correctly with supervisor states x86/fpu: Cache xfeature flags from CPUID x86/fpu/xsave: Initialize offset/size cache early x86/fpu: Remove unused supervisor only offsets x86/fpu: Remove redundant XCOMP_BV initialization x86/sev: Unroll string mmio with CC_ATTR_GUEST_UNROLL_STRING_IO x86/config: Make the x86 defconfigs a bit more usable x86/defconfig: Enable WERROR selftests/x86/amx: Update the ARCH_REQ_XCOMP_PERM test x86/fpu/xstate: Fix the ARCH_REQ_XCOMP_PERM implementation
2022-04-03Merge tag 'core-urgent-2022-04-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RT signal fix from Thomas Gleixner: "Revert the RT related signal changes. They need to be reworked and generalized" * tag 'core-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"
2022-04-03Merge tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull more dma-mapping updates from Christoph Hellwig: - fix a regression in dma remap handling vs AMD memory encryption (me) - finally kill off the legacy PCI DMA API (Christophe JAILLET) * tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: move pgprot_decrypted out of dma_pgprot PCI/doc: cleanup references to the legacy PCI DMA API PCI: Remove the deprecated "pci-dma-compat.h" API
2022-04-03Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: - avoid unnecessary rebuilds for library objects - fix return value of __setup handlers - fix invalid input check for "crashkernel=" kernel option - silence KASAN warnings in unwind_frame * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame() ARM: 9190/1: kdump: add invalid input check for 'crashkernel=0' ARM: 9187/1: JIVE: fix return value of __setup handler ARM: 9189/1: decompressor: fix unneeded rebuilds of library objects
2022-04-02Revert "clk: Drop the rate range on clk_put()"Stephen Boyd
This reverts commit 7dabfa2bc4803eed83d6f22bd6f045495f40636b. There are multiple reports that this breaks boot on various systems. The common theme is that orphan clks are having rates set on them when that isn't expected. Let's revert it out for now so that -rc1 boots. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Reported-by: Tony Lindgren <tony@atomide.com> Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/r/366a0232-bb4a-c357-6aa8-636e398e05eb@samsung.com Cc: Maxime Ripard <maxime@cerno.tech> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lore.kernel.org/r/20220403022818.39572-1-sboyd@kernel.org
2022-04-02Merge tag 'perf-tools-for-v5.18-2022-04-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools updates from Arnaldo Carvalho de Melo: - Avoid SEGV if core.cpus isn't set in 'perf stat'. - Stop depending on .git files for building PERF-VERSION-FILE, used in 'perf --version', fixing some perf tools build scenarios. - Convert tracepoint.py example to python3. - Update UAPI header copies from the kernel sources: socket, mman-common, msr-index, KVM, i915 and cpufeatures. - Update copy of libbpf's hashmap.c. - Directly return instead of using local ret variable in evlist__create_syswide_maps(), found by coccinelle. * tag 'perf-tools-for-v5.18-2022-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf python: Convert tracepoint.py example to python3 perf evlist: Directly return instead of using local ret variable perf cpumap: More cpu map reuse by merge. perf cpumap: Add is_subset function perf evlist: Rename cpus to user_requested_cpus perf tools: Stop depending on .git files for building PERF-VERSION-FILE tools headers cpufeatures: Sync with the kernel sources tools headers UAPI: Sync drm/i915_drm.h with the kernel sources tools headers UAPI: Sync linux/kvm.h with the kernel sources tools kvm headers arm64: Update KVM headers from the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources tools headers UAPI: Sync asm-generic/mman-common.h with the kernel perf beauty: Update copy of linux/socket.h with the kernel sources perf tools: Update copy of libbpf's hashmap.c perf stat: Avoid SEGV if core.cpus isn't set
2022-04-02Merge tag 'kbuild-fixes-v5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix empty $(PYTHON) expansion. - Fix UML, which got broken by the attempt to suppress Clang warnings. - Fix warning message in modpost. * tag 'kbuild-fixes-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: modpost: restore the warning message for missing symbol versions Revert "um: clang: Strip out -mno-global-merge from USER_CFLAGS" kbuild: Remove '-mno-global-merge' kbuild: fix empty ${PYTHON} in scripts/link-vmlinux.sh kconfig: remove stale comment about removed kconfig_print_symbol()
2022-04-02Merge tag 'mips_5.18_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - build fix for gpio - fix crc32 build problems - check for failed memory allocations * tag 'mips_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: crypto: Fix CRC32 code MIPS: rb532: move GPIOD definition into C-files MIPS: lantiq: check the return value of kzalloc() mips: sgi-ip22: add a check for the return of kzalloc()
2022-04-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr - Documentation improvements - Prevent module exit until all VMs are freed - PMU Virtualization fixes - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences - Other miscellaneous bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: fix sending PV IPI KVM: x86/mmu: do compare-and-exchange of gPTE via the user address KVM: x86: Remove redundant vm_entry_controls_clearbit() call KVM: x86: cleanup enter_rmode() KVM: x86: SVM: fix tsc scaling when the host doesn't support it kvm: x86: SVM: remove unused defines KVM: x86: SVM: move tsc ratio definitions to svm.h KVM: x86: SVM: fix avic spec based definitions again KVM: MIPS: remove reference to trap&emulate virtualization KVM: x86: document limitations of MSR filtering KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr KVM: x86/emulator: Emulate RDPID only if it is enabled in guest KVM: x86/pmu: Fix and isolate TSX-specific performance event logic KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86: Trace all APICv inhibit changes and capture overall status KVM: x86: Add wrappers for setting/clearing APICv inhibits KVM: x86: Make APICv inhibit reasons an enum and cleanup naming KVM: X86: Handle implicit supervisor access with SMAP KVM: X86: Rename variable smap to not_smap in permission_fault() ...
2022-04-03modpost: restore the warning message for missing symbol versionsMasahiro Yamada
This log message was accidentally chopped off. I was wondering why this happened, but checking the ML log, Mark precisely followed my suggestion [1]. I just used "..." because I was too lazy to type the sentence fully. Sorry for the confusion. [1]: https://lore.kernel.org/all/CAK7LNAR6bXXk9-ZzZYpTqzFqdYbQsZHmiWspu27rtsFxvfRuVA@mail.gmail.com/ Fixes: 4a6795933a89 ("kbuild: modpost: Explicitly warn about unprototyped symbols") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2022-04-02Merge tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver fix from Jens Axboe: "Got two reports on nbd spewing warnings on load now, which is a regression from a commit that went into your tree yesterday. Revert the problematic change for now" * tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-block: Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"
2022-04-02Merge tag 'pci-v5.18-changes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fix from Bjorn Helgaas: - Fix Hyper-V "defined but not used" build issue added during merge window (YueHaibing) * tag 'pci-v5.18-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: hv: Remove unused hv_set_msi_entry_from_desc()
2022-04-02Merge tag 'tag-chrome-platform-for-v5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform updates from Benson Leung: "cros_ec_typec: - Check for EC device - Fix a crash when using the cros_ec_typec driver on older hardware not capable of typec commands - Make try power role optional - Mux configuration reorganization series from Prashant cros_ec_debugfs: - Fix use after free. Thanks Tzung-bi sensorhub: - cros_ec_sensorhub fixup - Split trace include file misc: - Add new mailing list for chrome-platform development: chrome-platform@lists.linux.dev Now with patchwork!" * tag 'tag-chrome-platform-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_debugfs: detach log reader wq from devm platform: chrome: Split trace include file platform/chrome: cros_ec_typec: Update mux flags during partner removal platform/chrome: cros_ec_typec: Configure muxes at start of port update platform/chrome: cros_ec_typec: Get mux state inside configure_mux platform/chrome: cros_ec_typec: Move mux flag checks platform/chrome: cros_ec_typec: Check for EC device platform/chrome: cros_ec_typec: Make try power role optional MAINTAINERS: platform-chrome: Add new chrome-platform@lists.linux.dev list
2022-04-02Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"Jens Axboe
This reverts commit 6d35d04a9e18990040e87d2bbf72689252669d54. Both Gabriel and Borislav report that this commit casues a regression with nbd: sysfs: cannot create duplicate filename '/dev/block/43:0' Revert it before 5.18-rc1 and we'll investigage this separately in due time. Link: https://lore.kernel.org/all/YkiJTnFOt9bTv6A2@zn.tnic/ Reported-by: Gabriel L. Somlo <somlo@cmu.edu> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-02watch_queue: Free the page array when watch_queue is dismantledEric Dumazet
Commit 7ea1a0124b6d ("watch_queue: Free the alloc bitmap when the watch_queue is torn down") took care of the bitmap, but not the page array. BUG: memory leak unreferenced object 0xffff88810d9bc140 (size 32): comm "syz-executor335", pid 3603, jiffies 4294946994 (age 12.840s) hex dump (first 32 bytes): 40 a7 40 04 00 ea ff ff 00 00 00 00 00 00 00 00 @.@............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: kmalloc_array include/linux/slab.h:621 [inline] kcalloc include/linux/slab.h:652 [inline] watch_queue_set_size+0x12f/0x2e0 kernel/watch_queue.c:251 pipe_ioctl+0x82/0x140 fs/pipe.c:632 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] Reported-by: syzbot+25ea042ae28f3888727a@syzkaller.appspotmail.com Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20220322004654.618274-1-eric.dumazet@gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-02tracing: mark user_events as BROKENSteven Rostedt (Google)
After being merged, user_events become more visible to a wider audience that have concerns with the current API. It is too late to fix this for this release, but instead of a full revert, just mark it as BROKEN (which prevents it from being selected in make config). Then we can work finding a better API. If that fails, then it will need to be completely reverted. To not have the code silently bitrot, still allow building it with COMPILE_TEST. And to prevent the uapi header from being installed, then later changed, and then have an old distro user space see the old version, move the header file out of the uapi directory. Surround the include with CONFIG_COMPILE_TEST to the current location, but when the BROKEN tag is taken off, it will use the uapi directory, and fail to compile. This is a good way to remind us to move the header back. Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-02tracing: Move user_events.h temporarily out of include/uapiSteven Rostedt (Google)
While user_events API is under development and has been marked for broken to not let the API become fixed, move the header file out of the uapi directory. This is to prevent it from being installed, then later changed, and then have an old distro user space update with a new kernel, where applications see the user_events being available, but the old header is in place, and then they get compiled incorrectly. Also, surround the include with CONFIG_COMPILE_TEST to the current location, but when the BROKEN tag is taken off, it will use the uapi directory, and fail to compile. This is a good way to remind us to move the header back. Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com Link: https://lkml.kernel.org/r/20220401143903.188384f3@gandalf.local.home Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02ftrace: Make ftrace_graph_is_dead() a static branchChristophe Leroy
ftrace_graph_is_dead() is used on hot paths, it just reads a variable in memory and is not worth suffering function call constraints. For instance, at entry of prepare_ftrace_return(), inlining it avoids saving prepare_ftrace_return() parameters to stack and restoring them after calling ftrace_graph_is_dead(). While at it using a static branch is even more performant and is rather well adapted considering that the returned value will almost never change. Inline ftrace_graph_is_dead() and replace 'kill_ftrace_graph' bool by a static branch. The performance improvement is noticeable. Link: https://lkml.kernel.org/r/e0411a6a0ed3eafff0ad2bc9cd4b0e202b4617df.1648623570.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02tracing: Set user_events to BROKENSteven Rostedt (Google)
After being merged, user_events become more visible to a wider audience that have concerns with the current API. It is too late to fix this for this release, but instead of a full revert, just mark it as BROKEN (which prevents it from being selected in make config). Then we can work finding a better API. If that fails, then it will need to be completely reverted. Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Link: https://lkml.kernel.org/r/20220330155835.5e1f6669@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02tracing/user_events: Remove eBPF interfacesBeau Belgrave
Remove eBPF interfaces within user_events to ensure they are fully reviewed. Link: https://lore.kernel.org/all/20220329165718.GA10381@kbox/ Link: https://lkml.kernel.org/r/20220329173051.10087-1-beaub@linux.microsoft.com Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02tracing/user_events: Hold event_mutex during dyn_event_addBeau Belgrave
Make sure the event_mutex is properly held during dyn_event_add call. This is required when adding dynamic events. Link: https://lkml.kernel.org/r/20220328223225.1992-1-beaub@linux.microsoft.com Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02proc: bootconfig: Add null pointer checkLv Ruyi
kzalloc is a memory allocation function which can return NULL when some internal memory errors happen. It is safer to add null pointer check. Link: https://lkml.kernel.org/r/20220329104004.2376879-1-lv.ruyi@zte.com.cn Cc: stable@vger.kernel.org Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02tracing: Rename the staging files for trace_eventsSteven Rostedt (Google)
When looking for implementation of different phases of the creation of the TRACE_EVENT() macro, it is pretty useless when all helper macro redefinitions are in files labeled "stageX_defines.h". Rename them to state which phase the files are for. For instance, when looking for the defines that are used to create the event fields, seeing "stage4_event_fields.h" gives the developer a good idea that the defines are in that file. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02KVM: x86: fix sending PV IPILi RongQing
If apic_id is less than min, and (max - apic_id) is greater than KVM_IPI_CLUSTER_SIZE, then the third check condition is satisfied but the new apic_id does not fit the bitmask. In this case __send_ipi_mask should send the IPI. This is mostly theoretical, but it can happen if the apic_ids on three iterations of the loop are for example 1, KVM_IPI_CLUSTER_SIZE, 0. Fixes: aaffcfd1e82 ("KVM: X86: Implement PV IPIs in linux guest") Signed-off-by: Li RongQing <lirongqing@baidu.com> Message-Id: <1646814944-51801-1-git-send-email-lirongqing@baidu.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: x86/mmu: do compare-and-exchange of gPTE via the user addressPaolo Bonzini
FNAME(cmpxchg_gpte) is an inefficient mess. It is at least decent if it can go through get_user_pages_fast(), but if it cannot then it tries to use memremap(); that is not just terribly slow, it is also wrong because it assumes that the VM_PFNMAP VMA is contiguous. The right way to do it would be to do the same thing as hva_to_pfn_remapped() does since commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05), using follow_pte() and fixup_user_fault() to determine the correct address to use for memremap(). To do this, one could for example extract hva_to_pfn() for use outside virt/kvm/kvm_main.c. But really there is no reason to do that either, because there is already a perfectly valid address to do the cmpxchg() on, only it is a userspace address. That means doing user_access_begin()/user_access_end() and writing the code in assembly to handle exceptions correctly. Worse, the guest PTE can be 8-byte even on i686 so there is the extra complication of using cmpxchg8b to account for. But at least it is an efficient mess. (Thanks to Linus for suggesting improvement on the inline assembly). Reported-by: Qiuhao Li <qiuhao@sysec.org> Reported-by: Gaoning Pan <pgn@zju.edu.cn> Reported-by: Yongkang Jia <kangel@zju.edu.cn> Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com Debugged-by: Tadeusz Struk <tadeusz.struk@linaro.org> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: bd53cb35a3e9 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: x86: Remove redundant vm_entry_controls_clearbit() callZhenzhong Duan
When emulating exit from long mode, EFER_LMA is cleared with vmx_set_efer(). This will already unset the VM_ENTRY_IA32E_MODE control bit as requested by SDM, so there is no need to unset VM_ENTRY_IA32E_MODE again in exit_lmode() explicitly. In case EFER isn't supported by hardware, long mode isn't supported, so exit_lmode() cannot be reached. Note that, thanks to the shadow controls mechanism, this change doesn't eliminate vmread or vmwrite. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20220311102643.807507-3-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: x86: cleanup enter_rmode()Zhenzhong Duan
vmx_set_efer() sets uret->data but, in fact if the value of uret->data will be used vmx_setup_uret_msrs() will have rewritten it with the value returned by update_transition_efer(). uret->data is consumed if and only if uret->load_into_hardware is true, and vmx_setup_uret_msrs() takes care of (a) updating uret->data before setting uret->load_into_hardware to true (b) setting uret->load_into_hardware to false if uret->data isn't updated. Opportunistically use "vmx" directly instead of redoing to_vmx(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20220311102643.807507-2-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: x86: SVM: fix tsc scaling when the host doesn't support itMaxim Levitsky
It was decided that when TSC scaling is not supported, the virtual MSR_AMD64_TSC_RATIO should still have the default '1.0' value. However in this case kvm_max_tsc_scaling_ratio is not set, which breaks various assumptions. Fix this by always calculating kvm_max_tsc_scaling_ratio regardless of host support. For consistency, do the same for VMX. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322172449.235575-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02kvm: x86: SVM: remove unused definesMaxim Levitsky
Remove some unused #defines from svm.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322172449.235575-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: x86: SVM: move tsc ratio definitions to svm.hMaxim Levitsky
Another piece of SVM spec which should be in the header file Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322172449.235575-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>