summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2023-01-24riscv/kprobe: Fix instruction simulation of JALRLiao Chang
Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following crash: [ 32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address 00aaaaaad77b1170 [ 32.093115] Oops [#1] [ 32.093251] Modules linked in: [ 32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16 [ 32.093985] Hardware name: riscv-virtio,qemu (DT) [ 32.094280] epc : ksys_read+0x88/0xd6 [ 32.094855] ra : ksys_read+0xc0/0xd6 [ 32.095016] epc : ffffffff801cda80 ra : ffffffff801cdab8 sp : ff20000000d7bdc0 [ 32.095227] gp : ffffffff80f14000 tp : ff60000080f9cb40 t0 : ffffffff80f13e80 [ 32.095500] t1 : ffffffff8000c29c t2 : ffffffff800dbc54 s0 : ff20000000d7be60 [ 32.095716] s1 : 0000000000000000 a0 : ffffffff805a64ae a1 : ffffffff80a83708 [ 32.095921] a2 : ffffffff80f160a0 a3 : 0000000000000000 a4 : f229b0afdb165300 [ 32.096171] a5 : f229b0afdb165300 a6 : ffffffff80eeebd0 a7 : 00000000000003ff [ 32.096411] s2 : ff6000007ff76800 s3 : fffffffffffffff7 s4 : 00aaaaaad77b1170 [ 32.096638] s5 : ffffffff80f160a0 s6 : ff6000007ff76800 s7 : 0000000000000030 [ 32.096865] s8 : 00ffffffc3d97be0 s9 : 0000000000000007 s10: 00aaaaaad77c9410 [ 32.097092] s11: 0000000000000000 t3 : ffffffff80f13e48 t4 : ffffffff8000c29c [ 32.097317] t5 : ffffffff8000c29c t6 : ffffffff800dbc54 [ 32.097505] status: 0000000200000120 badaddr: 00aaaaaad77b1170 cause: 000000000000000d [ 32.098011] [<ffffffff801cdb72>] ksys_write+0x6c/0xd6 [ 32.098222] [<ffffffff801cdc06>] sys_write+0x2a/0x38 [ 32.098405] [<ffffffff80003c76>] ret_from_syscall+0x0/0x2 Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)', hence it requires obtaining the target address from rs1 followed by updating rd. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Signed-off-by: Liao Chang <liaochang1@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com [Palmer: Pick Guo's cleanup] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-24riscv: fix jal offsets in patched alternativesJisheng Zhang
Alternatives live in a different section, so offsets used by jal instruction will point to wrong locations after the patch got applied. Similar to arm64, adjust the location to consider that offset. Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20230113212205.3534622-1-heiko@sntech.de Fixes: 27c653c06505 ("RISC-V: fix auipc-jalr addresses in patched alternatives") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-25ARM: dts: freescale: Use new media bus type macrosLaurent Pinchart
Now that a header exists with macros for the media interface bus-type values, replace hardcoding numerical constants with the corresponding macros in the DT sources. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2023-01-24RISC-V: Kconfig: Remove trailing whitespaceGeert Uytterhoeven
Remove trailing whitespace that hurts my eyes. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/080aa959266ad842a8e7efca7111f1350c6a065a.1673424858.git.geert+renesas@glider.be Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Pass the correct address to mte_clear_page_tags() on initialising a tagged page - Plug a race against a GICv4.1 doorbell interrupt while saving the vgic-v3 pending state. x86: - A command line parsing fix and a clang compilation fix for selftests - A fix for a longstanding VMX issue, that surprisingly was only found now to affect real world guests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Make reclaim_period_ms input always be positive KVM: x86/vmx: Do not skip segment attributes if unusable bit is set selftests: kvm: move declaration at the beginning of main() KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation KVM: arm64: Pass the actual page address to mte_clear_page_tags()
2023-01-24riscv: pgtable: Fixup comment for KERN_VIRT_SIZEGuo Ren
KERN_VIRT_SIZE is 1/4 of the entries of the page global directory, not half. Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Alexandre Ghiti <alexandre.ghiti@canonical.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230110080419.931185-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-24ARM: add multi_v7_lpae_defconfigNicolas Saenz Julienne
The only missing configuration option preventing us from using multi_v7_defconfig with the Raspberry Pi 4 is ARM_LPAE. It's needed as the PCIe controller found on the SoC depends on 64bit addressing, yet can't be included as not all v7 boards support LPAE. Introduce multi_v7_lpae_defconfig, built off multi_v7_defconfig, which will avoid us having to duplicate and maintain multiple similar configurations. Needless to say the Raspberry Pi 4 is not the only platform that can benefit from this new configuration. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Link: https://lore.kernel.org/r/20230124110213.3221264-11-alexander.stein@ew.tq-group.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24kbuild: Add config fragment merge functionalityNicolas Saenz Julienne
So far this function was only used locally in powerpc, some other architectures might benefit from it. Move it into scripts/Makefile.defconf. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230124110213.3221264-10-alexander.stein@ew.tq-group.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24ARM: multi_v7_defconfig: Add options to support TQMLS102xA seriesAlexander Stein
Enable drivers used on TQMLS102xA + MBLS1021A. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Link: https://lore.kernel.org/r/20230124110213.3221264-9-alexander.stein@ew.tq-group.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24ARM: dts: aspeed: Fix pca9849 compatibleEddie James
Missed a digit in the PCA9849 compatible string. Signed-off-by: Eddie James <eajames@linux.ibm.com> Fixes: 65b697e5dec7 ("ARM: dts: aspeed: Add IBM Bonnell system BMC devicetree") Link: https://lore.kernel.org/r/20220826194457.164492-1-eajames@linux.ibm.com Signed-off-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/20230118051736.246714-1-joel@jms.id.au Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24arm64: drop redundant "ARMv8" from Kconfig option titleKrzysztof Kozlowski
All these platforms are ARMv8 or newer and choosing the platforms in menuconfig is much easier if the titles start with something specific. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230120125722.270722-1-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24ARM: ep93xx: Convert to use descriptors for GPIO LEDsLinus Walleij
This converts the EP93xx to use GPIO descriptors for the LEDs. Cc: Nikita Shubin <nikita.shubin@maquefel.me> Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lukasz Majewski <lukma@denx.de> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230111132210.134478-1-linus.walleij@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24Merge tag 'omap-for-v6.3/omap1-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/soc One clean-up for omap1 for v6.3 One non-urgent change to use platform_device_put() instead of platform_device_unregister(). * tag 'omap-for-v6.3/omap1-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP1: call platform_device_put() in error case in omap1_dm_timer_init() Link: https://lore.kernel.org/r/pull-1674566532-427457@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24Merge tag 'omap-for-v6.3/cleanup-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/soc Clean-up for omaps for v6.3 Non-urgent fixes for missing of_node_put() and clk_put(), drop few unnecessary includes, and fix a typo. None of these are urgent and can be merged along with other clean-up when suitable. * tag 'omap-for-v6.3/cleanup-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: Fix spelling typos in comment ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/machine.h> ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/pinmux.h> ARM: OMAP2+: Fix memory leak in realtime_counter_init() ARM: OMAP2+: omap4-common: Fix refcount leak bug Link: https://lore.kernel.org/r/pull-1674566471-434733@atomide.com-2 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24KVM: x86: Replace IS_ERR() with IS_ERR_VALUE()ye xingchen
Avoid type casts that are needed for IS_ERR() and use IS_ERR_VALUE() instead. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202211161718436948912@zte.com.cn Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Handle NMI VM-Exits in noinstr regionSean Christopherson
Move VMX's handling of NMI VM-Exits into vmx_vcpu_enter_exit() so that the NMI is handled prior to leaving the safety of noinstr. Handling the NMI after leaving noinstr exposes the kernel to potential ordering problems as an instrumentation-induced fault, e.g. #DB, #BP, #PF, etc. will unblock NMIs when IRETing back to the faulting instruction. Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Provide separate subroutines for invoking NMI vs. IRQ handlersSean Christopherson
Split the asm subroutines for handling NMIs versus IRQs that occur in the guest so that the NMI handler can be called from a noinstr section. As a bonus, the NMI path doesn't need an indirect branch. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/entry: KVM: Use dedicated VMX NMI entry for 32-bit kernels tooSean Christopherson
Use a dedicated entry for invoking the NMI handler from KVM VMX's VM-Exit path for 32-bit even though using a dedicated entry for 32-bit isn't strictly necessary. Exposing a single symbol will allow KVM to reference the entry point in assembly code without having to resort to more #ifdefs (or #defines). identry.h is intended to be included from asm files only once, and so simply including idtentry.h in KVM assembly isn't an option. Bypassing the ESP fixup and CR3 switching in the standard NMI entry code is safe as KVM always handles NMIs that occur in the guest on a kernel stack, with a kernel CR3. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Always inline to_vmx() and to_kvm_vmx()Sean Christopherson
Tag to_vmx() and to_kvm_vmx() __always_inline as they both just reflect the passed in pointer (the embedded struct is the first field in the container), and drop the @vmx param from vmx_vcpu_enter_exit(), which likely existed purely to make noinstr validation happy. Amusingly, when the compiler decides to not inline the helpers, e.g. for KASAN builds, to_vmx() and to_kvm_vmx() may end up pointing at the same symbol, which generates very confusing objtool warnings. E.g. the use of to_vmx() in a future patch led to objtool complaining about to_kvm_vmx(), and only once all use of to_kvm_vmx() was commented out did to_vmx() pop up in the obj tool report. vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x160: call to to_kvm_vmx() leaves .noinstr.text section Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Always inline eVMCS read/write helpersSean Christopherson
Tag all evmcs_{read,write}() helpers __always_inline so that they can be freely used in noinstr sections, e.g. to get the VM-Exit reason in vcpu_vmx_enter_exit() (in a future patch). For consistency and to avoid more spot fixes in the future, e.g. see commit 010050a86393 ("x86/kvm: Always inline evmcs_write64()"), tag all accessors even though evmcs_read32() is the only anticipated use case in the near future. In practice, non-KASAN builds are all but guaranteed to inline the helpers anyways. vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x107: call to evmcs_read32() leaves .noinstr.text section Reported-by: kernel test robot <lkp@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Allow VM-Fail path of VMREAD helper to be instrumentedSean Christopherson
Allow instrumentation in the VM-Fail path of __vmcs_readl() so that the helper can be used in noinstr functions, e.g. to get the exit reason in vmx_vcpu_enter_exit() in order to handle NMI VM-Exits in the noinstr section. While allowing instrumentation isn't technically safe, KVM has much bigger problems if VMREAD fails in a noinstr section. Note, all other VMX instructions also allow instrumentation in their VM-Fail paths for similar reasons, VMREAD was simply omitted by commit 3ebccdf373c2 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text") because VMREAD wasn't used in a noinstr section at the time. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Make vmx_get_exit_qual() and vmx_get_intr_info() noinstr-friendlySean Christopherson
Add an extra special noinstr-friendly helper to test+mark a "register" available and use it when caching vmcs.EXIT_QUALIFICATION and vmcs.VM_EXIT_INTR_INFO. Make the caching helpers __always_inline too so that they can be used in noinstr functions. A future fix will move VMX's handling of NMI exits into the noinstr vmx_vcpu_enter_exit() so that the NMI is processed before any kind of instrumentation can trigger a fault and thus IRET, i.e. so that KVM doesn't invoke the NMI handler with NMIs enabled. Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221213060912.654668-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: don't use "unsigned long" in vmx_vcpu_enter_exit()Alexey Dobriyan
__vmx_vcpu_run_flags() returns "unsigned int" and uses only 2 bits of it so using "unsigned long" is very much pointless. Furthermore, __vmx_vcpu_run() and vmx_spec_ctrl_restore_host() take an "unsigned int", i.e. actually relying on an "unsigned long" value won't work. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/Y3e7UW0WNV2AZmsZ@p183 Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: VMX: Access @flags as a 32-bit value in __vmx_vcpu_run()Sean Christopherson
Access @flags using 32-bit operands when saving and testing @flags for VMX_RUN_VMRESUME, as using 8-bit operands is unnecessarily fragile due to relying on VMX_RUN_VMRESUME being in bits 0-7. The behavior of treating @flags a single byte is a holdover from when the param was "bool launched", i.e. is not deliberate. Cc: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20221119003747.2615229-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: SVM: Account scratch allocations used to decrypt SEV guest memoryAnish Ghulati
Account the temp/scratch allocation used to decrypt unaligned debug accesses to SEV guest memory, the allocation is very much tied to the target VM. Reported-by: Mingwei Zhang <mizhang@google.com> Signed-off-by: Anish Ghulati <aghulati@google.com> Link: https://lore.kernel.org/r/20230113220923.2834699-1-aghulati@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: svm/avic: Drop "struct kvm_x86_ops" for avic_hardware_setup()Like Xu
Even in commit 4bdec12aa8d6 ("KVM: SVM: Detect X2APIC virtualization (x2AVIC) support"), where avic_hardware_setup() was first introduced, its only pass-in parameter "struct kvm_x86_ops *ops" is not used at all. Clean it up a bit to avoid compiler ranting from LLVM toolchain. Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221109115952.92816-1-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: SVM: remove redundant ret variablezhang songyi
Return value from svm_nmi_blocked() directly instead of taking this in another redundant variable. Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn> Link: https://lore.kernel.org/r/202211282003389362484@zte.com.cn Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Introduce masked events to the pmu event filterAaron Lewis
When building a list of filter events, it can sometimes be a challenge to fit all the events needed to adequately restrict the guest into the limited space available in the pmu event filter. This stems from the fact that the pmu event filter requires each event (i.e. event select + unit mask) be listed, when the intention might be to restrict the event select all together, regardless of it's unit mask. Instead of increasing the number of filter events in the pmu event filter, add a new encoding that is able to do a more generalized match on the unit mask. Introduce masked events as another encoding the pmu event filter understands. Masked events has the fields: mask, match, and exclude. When filtering based on these events, the mask is applied to the guest's unit mask to see if it matches the match value (i.e. umask & mask == match). The exclude bit can then be used to exclude events from that match. E.g. for a given event select, if it's easier to say which unit mask values shouldn't be filtered, a masked event can be set up to match all possible unit mask values, then another masked event can be set up to match the unit mask values that shouldn't be filtered. Userspace can query to see if this feature exists by looking for the capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS. This feature is enabled by setting the flags field in the pmu event filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS. Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY(). It is an error to have a bit set outside the valid bits for a masked event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in such cases, including the high bits of the event select (35:32) if called on Intel. With these updates the filter matching code has been updated to match on a common event. Masked events were flexible enough to handle both event types, so they were used as the common event. This changes how guest events get filtered because regardless of the type of event used in the uAPI, they will be converted to masked events. Because of this there could be a slight performance hit because instead of matching the filter event with a lookup on event select + unit mask, it does a lookup on event select then walks the unit masks to find the match. This shouldn't be a big problem because I would expect the set of common event selects to be small, and if they aren't the set can likely be reduced by using masked events to generalize the unit mask. Using one type of event when filtering guest events allows for a common code path to be used. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: prepare the pmu event filter for masked eventsAaron Lewis
Refactor check_pmu_event_filter() in preparation for masked events. No functional changes intended Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-4-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Remove impossible events from the pmu event filterAaron Lewis
If it's not possible for an event in the pmu event filter to match a pmu event being programmed by the guest, it's pointless to have it in the list. Opt for a shorter list by removing those events. Because this is established uAPI the pmu event filter can't outright rejected these events as garbage and return an error. Instead, play nice and remove them from the list. Also, opportunistically rewrite the comment when the filter is set to clarify that it guards against *all* TOCTOU attacks on the verified data. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-3-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Correct the mask used in a pmu event filter lookupAaron Lewis
When checking if a pmu event the guest is attempting to program should be filtered, only consider the event select + unit mask in that decision. Use an architecture specific mask to mask out all other bits, including bits 35:32 on Intel. Those bits are not part of the event select and should not be considered in that decision. Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Use kstrtobool() instead of strtobool()Christophe JAILLET
strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/670882aa04dbdd171b46d3b20ffab87158454616.1673689135.git.christophe.jaillet@wanadoo.fr Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Cleanup range-based flushing for given pageHou Wenlong
Use the new kvm_flush_remote_tlbs_gfn() helper to cleanup the call sites of range-based flushing for given page, which makes the code clear. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/593ee1a876ece0e819191c0b23f56b940d6686db.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Fix wrong gfn range of tlb flushing in validate_direct_spte()Hou Wenlong
The spte pointing to the children SP is dropped, so the whole gfn range covered by the children SP should be flushed. Although, Hyper-V may treat a 1-page flush the same if the address points to a huge page, it still would be better to use the correct size of huge page. Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/5f297c566f7d7ff2ea6da3c66d050f69ce1b8ede.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Fix wrong start gfn of tlb flushing with rangeHou Wenlong
When a spte is dropped, the start gfn of tlb flushing should be the gfn of spte not the base gfn of SP which contains the spte. Also introduce a helper function to do range-based flushing when a spte is dropped, which would help prevent future buggy use of kvm_flush_remote_tlbs_with_address() in such case. Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.") Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/72ac2169a261976f00c1703e88cda676dfb960f5.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Reduce gfn range of tlb flushing in ↵Hou Wenlong
tdp_mmu_map_handle_target_level() Since the children SP is zapped, the gfn range of tlb flushing should be the range covered by children SP not parent SP. Replace sp->gfn which is the base gfn of parent SP with iter->gfn and use the correct size of gfn range for children SP to reduce tlb flushing range. Fixes: bb95dfb9e2df ("KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/528ab9c784a486e9ce05f61462ad9260796a8732.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Fix wrong gfn range of tlb flushing in kvm_set_pte_rmapp()Hou Wenlong
When the spte of hupe page is dropped in kvm_set_pte_rmapp(), the whole gfn range covered by the spte should be flushed. However, rmap_walk_init_level() doesn't align down the gfn for new level like tdp iterator does, then the gfn used in kvm_set_pte_rmapp() is not the base gfn of huge page. And the size of gfn range is wrong too for huge page. Use the base gfn of huge page and the size of huge page for flushing tlbs for huge page. Also introduce a helper function to flush the given page (huge or not) of guest memory, which would help prevent future buggy use of kvm_flush_remote_tlbs_with_address() in such case. Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/0ce24d7078fa5f1f8d64b0c59826c50f32f8065e.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Move round_gfn_for_level() helper into mmu_internal.hHou Wenlong
Rounding down the GFN to a huge page size is a common pattern throughout KVM, so move round_gfn_for_level() helper in tdp_iter.c to mmu_internal.h for common usage. Also rename it as gfn_round_for_level() to use gfn_* prefix and clean up the other call sites. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/415c64782f27444898db650e21cf28eeb6441dfa.1665214747.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: fix an incorrect comment in kvm_mmu_new_pgd()Wei Liu
There is no function named kvm_mmu_ensure_valid_pgd(). Fix the comment and remove the pair of braces to conform to Linux kernel coding style. Signed-off-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221128214709.224710-1-wei.liu@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24kvm: x86/mmu: Don't clear write flooding for direct SPLai Jiangshan
Although there is no harm, but there is no point to clear write flooding for direct SP. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230105100310.6700-1-jiangshanlai@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24kvm: x86/mmu: Rename SPTE_TDP_AD_ENABLED_MASK to SPTE_TDP_AD_ENABLEDLai Jiangshan
SPTE_TDP_AD_ENABLED_MASK, SPTE_TDP_AD_DISABLED_MASK and SPTE_TDP_AD_WRPROT_ONLY_MASK are actual value, not mask. Remove "MASK" from their names. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20230105100204.6521-1-jiangshanlai@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/reboot: Disable SVM, not just VMX, when stopping CPUsSean Christopherson
Disable SVM and more importantly force GIF=1 when halting a CPU or rebooting the machine. Similar to VMX, SVM allows software to block INITs via CLGI, and thus can be problematic for a crash/reboot. The window for failure is smaller with SVM as INIT is only blocked while GIF=0, i.e. between CLGI and STGI, but the window does exist. Fixes: fba4f472b33a ("x86/reboot: Turn off KVM when halting a CPU") Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/reboot: Disable virtualization in an emergency if SVM is supportedSean Christopherson
Disable SVM on all CPUs via NMI shootdown during an emergency reboot. Like VMX, SVM can block INIT, e.g. if the emergency reboot is triggered between CLGI and STGI, and thus can prevent bringing up other CPUs via INIT-SIPI-SIPI. Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/virt: Force GIF=1 prior to disabling SVM (for reboot flows)Sean Christopherson
Set GIF=1 prior to disabling SVM to ensure that INIT is recognized if the kernel is disabling SVM in an emergency, e.g. if the kernel is about to jump into a crash kernel or may reboot without doing a full CPU RESET. If GIF is left cleared, the new kernel (or firmware) will be unabled to awaken APs. Eat faults on STGI (due to EFER.SVME=0) as it's possible that SVM could be disabled via NMI shootdown between reading EFER.SVME and executing STGI. Link: https://lore.kernel.org/all/cbcb6f35-e5d7-c1c9-4db9-fe5cc4de579a@amd.com Cc: stable@vger.kernel.org Cc: Andrew Cooper <Andrew.Cooper3@citrix.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24x86/crash: Disable virt in core NMI crash handler to avoid double shootdownSean Christopherson
Disable virtualization in crash_nmi_callback() and rework the emergency_vmx_disable_all() path to do an NMI shootdown if and only if a shootdown has not already occurred. NMI crash shootdown fundamentally can't support multiple invocations as responding CPUs are deliberately put into halt state without unblocking NMIs. But, the emergency reboot path doesn't have any work of its own, it simply cares about disabling virtualization, i.e. so long as a shootdown occurred, emergency reboot doesn't care who initiated the shootdown, or when. If "crash_kexec_post_notifiers" is specified on the kernel command line, panic() will invoke crash_smp_send_stop() and result in a second call to nmi_shootdown_cpus() during native_machine_emergency_restart(). Invoke the callback _before_ disabling virtualization, as the current VMCS needs to be cleared before doing VMXOFF. Note, this results in a subtle change in ordering between disabling virtualization and stopping Intel PT on the responding CPUs. While VMX and Intel PT do interact, VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one another, which is all that matters when panicking. Harden nmi_shootdown_cpus() against multiple invocations to try and capture any such kernel bugs via a WARN instead of hanging the system during a crash/dump, e.g. prior to the recent hardening of register_nmi_handler(), re-registering the NMI handler would trigger a double list_add() and hang the system if CONFIG_BUG_ON_DATA_CORRUPTION=y. list_add double add: new=ffffffff82220800, prev=ffffffff8221cfe8, next=ffffffff82220800. WARNING: CPU: 2 PID: 1319 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 Call Trace: __register_nmi_handler+0xcf/0x130 nmi_shootdown_cpus+0x39/0x90 native_machine_emergency_restart+0x1c9/0x1d0 panic+0x237/0x29b Extract the disabling logic to a common helper to deduplicate code, and to prepare for doing the shootdown in the emergency reboot path if SVM is supported. Note, prior to commit ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported"), nmi_shootdown_cpus() was subtly protected against a second invocation by a cpu_vmx_enabled() check as the kdump handler would disable VMX if it ran first. Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported") Cc: stable@vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/xen: update Xen CPUID Leaf 4 (tsc info) sub-leaves, if presentPaul Durrant
The scaling information in subleaf 1 should match the values set by KVM in the 'vcpu_info' sub-structure 'time_info' (a.k.a. pvclock_vcpu_time_info) which is shared with the guest, but is not directly available to the VMM. The offset values are not set since a TSC offset is already applied. The TSC frequency should also be set in sub-leaf 2. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230106103600.528-3-pdurrant@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/cpuid: generalize kvm_update_kvm_cpuid_base() and also capture limitPaul Durrant
A subsequent patch will need to acquire the CPUID leaf range for emulated Xen so explicitly pass the signature of the hypervisor we're interested in to the new function. Also introduce a new kvm_hypervisor_cpuid structure so we can neatly store both the base and limit leaf indices. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230106103600.528-2-pdurrant@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Replace cpu_dirty_logging_count with nr_memslots_dirty_loggingDavid Matlack
Drop cpu_dirty_logging_count in favor of nr_memslots_dirty_logging. Both fields count the number of memslots that have dirty-logging enabled, with the only difference being that cpu_dirty_logging_count is only incremented when using PML. So while nr_memslots_dirty_logging is not a direct replacement for cpu_dirty_logging_count, it can be combined with enable_pml to get the same information. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20230105214303.2919415-1-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Replace 0-length arrays with flexible arraysKees Cook
Zero-length arrays are deprecated[1]. Replace struct kvm_nested_state's "data" union 0-length arrays with flexible arrays. (How are the sizes of these arrays verified?) Detected with GCC 13, using -fstrict-flex-arrays=3: arch/x86/kvm/svm/nested.c: In function 'svm_get_nested_state': arch/x86/kvm/svm/nested.c:1536:17: error: array subscript 0 is outside array bounds of 'struct kvm_svm_nested_state_data[0]' [-Werror=array-bounds=] 1536 | &user_kvm_nested_state->data.svm[0]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/uapi/linux/kvm.h:15, from include/linux/kvm_host.h:40, from arch/x86/kvm/svm/nested.c:18: arch/x86/include/uapi/asm/kvm.h:511:50: note: while referencing 'svm' 511 | struct kvm_svm_nested_state_data svm[0]; | ^~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230105190548.never.323-kees@kernel.org Link: https://lore.kernel.org/r/20230118195905.gonna.693-kees@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Advertise fast REP string features inherent to the CPUJim Mattson
Fast zero-length REP MOVSB, fast short REP STOSB, and fast short REP {CMPSB,SCASB} are inherent features of the processor that cannot be hidden by the hypervisor. When these features are present on the host, enumerate them in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220901211811.2883855-2-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>