summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-05-08x86/module: Use text_poke() for late relocationsPeter Zijlstra
Because of late module patching, a livepatch module needs to be able to apply some of its relocations well after it has been loaded. Instead of playing games with module_{dis,en}able_ro(), use existing text poking mechanisms to apply relocations after module loading. So far only x86, s390 and Power have HAVE_LIVEPATCH but only the first two also have STRICT_MODULE_RWX. This will allow removal of the last module_disable_ro() usage in livepatch. The ultimate goal is to completely disallow making executable mappings writable. [ jpoimboe: Split up patches. Use mod state to determine whether memcpy() can be used. Implement text_poke() for UML. ] Cc: x86@kernel.org Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-05-08livepatch: Remove .klp.archPeter Zijlstra
After the previous patch, vmlinux-specific KLP relocations are now applied early during KLP module load. This means that .klp.arch sections are no longer needed for *vmlinux-specific* KLP relocations. One might think they're still needed for *module-specific* KLP relocations. If a to-be-patched module is loaded *after* its corresponding KLP module is loaded, any corresponding KLP relocations will be delayed until the to-be-patched module is loaded. If any special sections (.parainstructions, for example) rely on those relocations, their initializations (apply_paravirt) need to be done afterwards. Thus the apparent need for arch_klp_init_object_loaded() and its corresponding .klp.arch sections -- it allows some of the special section initializations to be done at a later time. But... if you look closer, that dependency between the special sections and the module-specific KLP relocations doesn't actually exist in reality. Looking at the contents of the .altinstructions and .parainstructions sections, there's not a realistic scenario in which a KLP module's .altinstructions or .parainstructions section needs to access a symbol in a to-be-patched module. It might need to access a local symbol or even a vmlinux symbol; but not another module's symbol. When a special section needs to reference a local or vmlinux symbol, a normal rela can be used instead of a KLP rela. Since the special section initializations don't actually have any real dependency on module-specific KLP relocations, .klp.arch and arch_klp_init_object_loaded() no longer have a reason to exist. So remove them. As Peter said much more succinctly: So the reason for .klp.arch was that .klp.rela.* stuff would overwrite paravirt instructions. If that happens you're doing it wrong. Those RELAs are core kernel, not module, and thus should've happened in .rela.* sections at patch-module loading time. Reverting this removes the two apply_{paravirt,alternatives}() calls from the late patching path, and means we don't have to worry about them when removing module_disable_ro(). [ jpoimboe: Rewrote patch description. Tweaked klp_init_object_loaded() error path. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-05-07exec: Rename flush_old_exec begin_new_execEric W. Biederman
There is and has been for a very long time been a lot more going on in flush_old_exec than just flushing the old state. After the movement of code from setup_new_exec there is a whole lot more going on than just flushing the old executables state. Rename flush_old_exec to begin_new_exec to more accurately reflect what this function does. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-07exec: Merge install_exec_creds into setup_new_execEric W. Biederman
The two functions are now always called one right after the other so merge them together to make future maintenance easier. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-07binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elfEric W. Biederman
In 2016 Linus moved install_exec_creds immediately after setup_new_exec, in binfmt_elf as a cleanup and as part of closing a potential information leak. Perform the same cleanup for the other binary formats. Different binary formats doing the same things the same way makes exec easier to reason about and easier to maintain. Greg Ungerer reports: > I tested the the whole series on non-MMU m68k and non-MMU arm > (exercising binfmt_flat) and it all tested out with no problems, > so for the binfmt_flat changes: Tested-by: Greg Ungerer <gerg@linux-m68k.org> Ref: 9f834ec18def ("binfmt_elf: switch to new creds when switching to new mm") Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes, mostly for ARM and AMD, and more documentation. Slightly bigger than usual because I couldn't send out what was pending for rc4, but there is nothing worrisome going on. I have more fixes pending for guest debugging support (gdbstub) but I will send them next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly KVM: selftests: Fix build for evmcs.h kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path docs/virt/kvm: Document configuring and running nested guests KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts KVM: x86: Fixes posted interrupt check for IRQs delivery modes KVM: SVM: fill in kvm_run->debug.arch.dr[67] KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning KVM: arm64: Fix 32bit PC wrap-around KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS KVM: arm64: Save/restore sp_el0 as part of __guest_enter KVM: arm64: Delete duplicated label in invalid_vector KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi() KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests ...
2020-05-07x86/cpu/amd: Make erratum #1054 a legacy erratumKim Phillips
Commit 21b5ee59ef18 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF") mistakenly added erratum #1054 as an OS Visible Workaround (OSVW) ID 0. Erratum #1054 is not OSVW ID 0 [1], so make it a legacy erratum. There would never have been a false positive on older hardware that has OSVW bit 0 set, since the IRPERF feature was not available. However, save a couple of RDMSR executions per thread, on modern system configurations that correctly set non-zero values in their OSVW_ID_Length MSRs. [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors. The revision guide is available from the bugzilla link below. Fixes: 21b5ee59ef18 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200417143356.26054-1-kim.phillips@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
2020-05-07bpf, i386: Remove unneeded conversion to boolJason Yan
The '==' expression itself is bool, no need to convert it to bool again. This fixes the following coccicheck warning: arch/x86/net/bpf_jit_comp32.c:1478:50-55: WARNING: conversion to bool not needed here arch/x86/net/bpf_jit_comp32.c:1479:50-55: WARNING: conversion to bool not needed here Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200506140352.37154-1-yanaijie@huawei.com
2020-05-07x86/delay: Introduce TPAUSE delayKyung Min Park
TPAUSE instructs the processor to enter an implementation-dependent optimized state. The instruction execution wakes up when the time-stamp counter reaches or exceeds the implicit EDX:EAX 64-bit input value. The instruction execution also wakes up due to the expiration of the operating system time-limit or by an external interrupt or exceptions such as a debug exception or a machine check exception. TPAUSE offers a choice of two lower power states: 1. Light-weight power/performance optimized state C0.1 2. Improved power/performance optimized state C0.2 This way, it can save power with low wake-up latency in comparison to spinloop based delay. The selection between the two is governed by the input register. TPAUSE is available on processors with X86_FEATURE_WAITPKG. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1587757076-30337-4-git-send-email-kyung.min.park@intel.com
2020-05-07x86/delay: Refactor delay_mwaitx() for TPAUSE supportKyung Min Park
Refactor code to make it easier to add a new model specific function to delay for a number of cycles. No functional change. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1587757076-30337-3-git-send-email-kyung.min.park@intel.com
2020-05-07x86/delay: Preparatory code cleanupThomas Gleixner
The naming conventions in the delay code are confusing at best. All delay variants use a loops argument and or variable which originates from the original delay_loop() implementation. But all variants except delay_loop() are based on TSC cycles. Rename the argument to cycles and make it type u64 to avoid these weird expansions to u64 in the functions. Rename MWAITX_MAX_LOOPS to MWAITX_MAX_WAIT_CYCLES for the same reason and fixup the comment of delay_mwaitx() as well. Mark the delay_fn function pointer __ro_after_init and fixup the comment for it. No functional change and preparation for the upcoming TPAUSE based delay variant. [ Kyung Min Park: Added __init to use_tsc_delay() ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1587757076-30337-2-git-send-email-kyung.min.park@intel.com
2020-05-07x86/platform/uv: Remove the unused _uv_cpu_blade_processor_id() macroChristoph Hellwig
No users anywhere in the kernel tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-12-hch@lst.de
2020-05-07x86/platform/uv: Unexport uv_apicid_hibitsChristoph Hellwig
This variable is not used by modular code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200504171527.2845224-11-hch@lst.de
2020-05-07x86/platform/uv: Remove _uv_hub_info_check()Christoph Hellwig
Neither this functions nor the helpers used to implement it are used anywhere in the kernel tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-10-hch@lst.de
2020-05-07x86/platform/uv: Simplify uv_send_IPI_one()Christoph Hellwig
Merge two helpers only used by uv_send_IPI_one() into the main function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-9-hch@lst.de
2020-05-07x86/platform/uv: Mark uv_min_hub_revision_id staticChristoph Hellwig
This variable is only used inside x2apic_uv_x and not even declared in a header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-8-hch@lst.de
2020-05-07x86/platform/uv: Mark is_uv_hubless() staticChristoph Hellwig
is_uv_hubless() is only used in x2apic_uv_x.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-7-hch@lst.de
2020-05-07x86/platform/uv: Remove the UV*_HUB_IS_SUPPORTED macrosChristoph Hellwig
All of the macros are always defined to one. Remove them and the dead code keyed off them. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-6-hch@lst.de
2020-05-07x86/platform/uv: Unexport symbols only used by x2apic_uv_x.cChristoph Hellwig
uv_bios_set_legacy_vga_target, uv_bios_freq_base, uv_bios_get_sn_info, uv_type, system_serial_number and sn_region_size are only used in x2apic_uv_x.c, which can't be modular. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-5-hch@lst.de
2020-05-07x86/platform/uv: Unexport sn_coherency_idChristoph Hellwig
sn_coherency_id is only used by x2apic_uv_x.c, and uv_sysfs.c, both of which can't be modular. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-4-hch@lst.de
2020-05-07x86/platform/uv: Remove the uv_partition_coherence_id() macroChristoph Hellwig
uv_partition_coherence_id() is only used once. Just open code it in the only user. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-3-hch@lst.de
2020-05-07x86/platform/uv: Mark uv_bios_call() and uv_bios_call_irqsave() staticChristoph Hellwig
Both functions are only used inside of bios_uv.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-2-hch@lst.de
2020-05-07cpu/hotplug: Remove disable_nonboot_cpus()Qais Yousef
The single user could have called freeze_secondary_cpus() directly. Since this function was a source of confusion, remove it as it's just a pointless wrapper. While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to preserve the naming symmetry. Done automatically via: git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g' Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
2020-05-07x86/apic: Convert the TSC deadline timer matching to steppings macroBorislav Petkov
... and get rid of the function pointers which would spit out the microcode revision based on the CPU stepping. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mark Gross <mgross.linux.intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200506071516.25445-4-bp@alien8.de
2020-05-07x86/cpu: Add a X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macroBorislav Petkov
... to match Intel family 6 CPUs with steppings. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mark Gross <mgross@linux.intel.com> Link: https://lkml.kernel.org/r/20200506071516.25445-3-bp@alien8.de
2020-05-07KVM: nSVM: trap #DB and #BP to userspace if guest debugging is onPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07Merge 'x86/urgent' into x86/cpuBorislav Petkov
... to resolve conflicting changes to arch/x86/kernel/apic/apic.c Signed-off-by: Borislav Petkov <bp@suse.de>
2020-05-07KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUGPeter Xu
When single-step triggered with KVM_SET_GUEST_DEBUG, we should fill in the pc value with current linear RIP rather than the cached singlestep address. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505205000.188252-3-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUGPeter Xu
RTM should always been set even with KVM_EXIT_DEBUG on #DB. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505205000.188252-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07KVM: x86: fix DR6 delivery for various cases of #DB injectionPaolo Bonzini
Go through kvm_queue_exception_p so that the payload is correctly delivered through the exit qualification, and add a kvm_update_dr6 call to kvm_deliver_exception_payload that is needed on AMD. Reported-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a potential scheduling latency problem for the algorithms used by WireGuard" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: arch/nhpoly1305 - process in explicit 4k chunks crypto: arch/lib - limit simd usage to 4k chunks
2020-05-06x86/resctrl: Support wider MBM countersReinette Chatre
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR while the first-generation MBM implementation uses statically defined 24 bit counters. The MBM CPUID enumeration properties have been expanded to include the MBM counter width, encoded as an offset from 24 bits. While eight bits are available for the counter width offset IA32_QM_CTR MSR only supports 62 bit counters. Add a sanity check, with warning printed when encountered, to ensure counters cannot exceed the 62 bit limit. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/69d52abd5b14794d3a0f05ba7c755ed1f4c0d5ed.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Support CPUID enumeration of MBM counter widthReinette Chatre
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR while the first-generation MBM implementation uses statically defined 24 bit counters. Expand the MBM CPUID enumeration properties to include the MBM counter width. The previously undefined EAX output register contains, in bits [7:0], the MBM counter width encoded as an offset from 24 bits. Enumerating this property is only specified for Intel CPUs. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Maintain MBM counter width per resourceReinette Chatre
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR, and the first-generation MBM implementation uses 24 bit counters. Software is required to poll at 1 second or faster to ensure that data is retrieved before a counter rollover occurs more than once under worst conditions. As system bandwidths scale the software requirement is maintained with the introduction of a per-resource enumerable MBM counter width. In preparation for supporting hardware with an enumerable MBM counter width the current globally static MBM counter width is moved to a per-resource MBM counter width. Currently initialized to 24 always to result in no functional change. In essence there is one function, mbm_overflow_count() that needs to know the counter width to handle rollovers. The static value used within mbm_overflow_count() will be replaced with a value discovered from the hardware. Support for learning the MBM counter width from hardware is added in the change that follows. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/e36743b9800f16ce600f86b89127391f61261f23.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Query LLC monitoring properties once during bootReinette Chatre
Cache and memory bandwidth monitoring are features that are part of x86 CPU resource control that is supported by the resctrl subsystem. The monitoring properties are obtained via CPUID from every CPU and only used within the resctrl subsystem where the properties are only read from boot_cpu_data. Obtain the monitoring properties once, placed in boot_cpu_data, via the ->c_bsp_init() helpers of the vendors that support X86_FEATURE_CQM_LLC. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/6d74a6ac3e69f4b7a8b4115835f9455faf0f468d.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Remove unnecessary RMID checksReinette Chatre
The cache and memory bandwidth monitoring properties are read using CPUID on every CPU. After the information is read from the system a sanity check is run to (1) ensure that the RMID data is initialized for the boot CPU in case the information was not available on the boot CPU and (2) the boot CPU's RMID is set to the minimum of RMID obtained from all CPUs. Every known platform that supports resctrl has the same maximum RMID on all CPUs. Both sanity checks found in x86_init_cache_qos() can thus safely be removed. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c9a3b60d34091840c8b0bd1c6fab15e5ba92cb17.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/cpu: Move resctrl CPUID code to resctrl/Reinette Chatre
The function determining a platform's support and properties of cache occupancy and memory bandwidth monitoring (properties of X86_FEATURE_CQM_LLC) can be found among the common CPU code. After the feature's properties is populated in the per-CPU data the resctrl subsystem is the only consumer (via boot_cpu_data). Move the function that obtains the CPU information used by resctrl to the resctrl subsystem and rename it from init_cqm() to resctrl_cpu_detect(). The function continues to be called from the common CPU code. This move is done in preparation of the addition of some vendor specific code. No functional change. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.hReinette Chatre
asm/resctrl_sched.h is dedicated to the code used for configuration of the CPU resource control state when a task is scheduled. Rename resctrl_sched.h to resctrl.h in preparation of additions that will no longer make this file dedicated to work done during scheduling. No functional change. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/6914e0ef880b539a82a6d889f9423496d471ad1d.1588715690.git.reinette.chatre@intel.com
2020-05-06KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bitsPaolo Bonzini
Using CPUID data can be useful for the processor compatibility check, but that's it. Using it to compute guest-reserved bits can have both false positives (such as LA57 and UMIP which we are already handling) and false negatives: in particular, with this patch we don't allow anymore a KVM guest to set CR4.PKE when CR4.PKE is clear on the host. Fixes: b9dd21e104bc ("KVM: x86: simplify handling of PKRU") Reported-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB pathSean Christopherson
Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so that KVM doesn't interpret clobbered RFLAGS as a VM-Fail. Filling the RSB has always clobbered RFLAGS, its current incarnation just happens clear CF and ZF in the processs. Relying on the macro to clear CF and ZF is extremely fragile, e.g. commit 089dd8e53126e ("x86/speculation: Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such that the ZF flag is always set. Reported-by: Qian Cai <cai@lca.pw> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Fixes: f2fde6a5bcfcf ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-05signal: refactor copy_siginfo_to_user32Christoph Hellwig
Factor out a copy_siginfo_to_external32 helper from copy_siginfo_to_user32 that fills out the compat_siginfo, but does so on a kernel space data structure. With that we can let architectures override copy_siginfo_to_user32 with their own implementations using copy_siginfo_to_external32. That allows moving the x32 SIGCHLD purely to x86 architecture code. As a nice side effect copy_siginfo_to_external32 also comes in handy for avoiding a set_fs() call in the coredump code later on. Contains improvements from Eric W. Biederman <ebiederm@xmission.com> and Arnd Bergmann <arnd@arndb.de>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-05efi/libstub: Fix mixed mode boot issue after macro refactorArvind Sankar
Commit 22090f84bc3f ("efi/libstub: unify EFI call wrappers for non-x86") refactored the macros that are used to provide wrappers for mixed-mode calls on x86, allowing us to boot a 64-bit kernel on 32-bit firmware. Unfortunately, this broke mixed mode boot due to the fact that efi_is_native() is not a macro on x86. All of these macros should go together, so rather than testing each one to see if it is defined, condition the generic macro definitions on a new ARCH_HAS_EFISTUB_WRAPPERS, and remove the wrapper definitions on x86 as well if CONFIG_EFI_MIXED is not enabled. Fixes: 22090f84bc3f ("efi/libstub: unify EFI call wrappers for non-x86") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200504150248.62482-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-04x86/boot/compressed/64: Switch to __KERNEL_CS after GDT is loadedJoerg Roedel
When the pre-decompression code loads its first GDT in startup_64(), it is still running on the CS value of the previous GDT. In the case of SEV-ES, this is the EFI GDT but it can be anything depending on what has loaded the kernel (boot loader, container runtime, etc.) To make exception handling work (especially IRET) the CPU needs to switch to a CS value in the current GDT, so jump to __KERNEL_CS after the first GDT is loaded. This is prudent also as a general sanitization of CS to a known good value. [ bp: Massage commit message. ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200428151725.31091-13-joro@8bytes.org
2020-05-04kvm: ioapic: Restrict lazy EOI update to edge-triggered interruptsPaolo Bonzini
Commit f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") introduces the following infinite loop: BUG: stack guard page was hit at 000000008f595917 \ (stack is 00000000bdefe5a4..00000000ae2b06f5) kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI RIP: 0010:kvm_set_irq+0x51/0x160 [kvm] Call Trace: irqfd_resampler_ack+0x32/0x90 [kvm] kvm_notify_acked_irq+0x62/0xd0 [kvm] kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm] ioapic_set_irq+0x20e/0x240 [kvm] kvm_ioapic_set_irq+0x5c/0x80 [kvm] kvm_set_irq+0xbb/0x160 [kvm] ? kvm_hv_set_sint+0x20/0x20 [kvm] irqfd_resampler_ack+0x32/0x90 [kvm] kvm_notify_acked_irq+0x62/0xd0 [kvm] kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm] ioapic_set_irq+0x20e/0x240 [kvm] kvm_ioapic_set_irq+0x5c/0x80 [kvm] kvm_set_irq+0xbb/0x160 [kvm] ? kvm_hv_set_sint+0x20/0x20 [kvm] .... The re-entrancy happens because the irq state is the OR of the interrupt state and the resamplefd state. That is, we don't want to show the state as 0 until we've had a chance to set the resamplefd. But if the interrupt has _not_ gone low then ioapic_set_irq is invoked again, causing an infinite loop. This can only happen for a level-triggered interrupt, otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high and then low. Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because TMR is set. Thus, fix the bug by restricting the lazy invocation of the ack notifier to edge-triggered interrupts, the only ones that need it. Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reported-by: borisvk@bstnet.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://www.spinics.net/lists/kvm/msg213512.html Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04KVM: x86: Fixes posted interrupt check for IRQs delivery modesSuravee Suthikulpanit
Current logic incorrectly uses the enum ioapic_irq_destination_types to check the posted interrupt destination types. However, the value was set using APIC_DM_XXX macros, which are left-shifted by 8 bits. Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead. Fixes: (fdcf75621375 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes') Cc: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04KVM: SVM: fill in kvm_run->debug.arch.dr[67]Paolo Bonzini
The corresponding code was added for VMX in commit 42dbaa5a057 ("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD. Fix this. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-04KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warningSean Christopherson
Use BUG() in the impossible-to-hit default case when switching on the scope of INVEPT to squash a warning with clang 11 due to clang treating the BUG_ON() as conditional. >> arch/x86/kvm/vmx/nested.c:5246:3: warning: variable 'roots_to_free' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] BUG_ON(1); Reported-by: kbuild test robot <lkp@intel.com> Fixes: ce8fe7b77bd8 ("KVM: nVMX: Free only the affected contexts when emulating INVEPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200504153506.28898-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>