summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-03-16KVM: x86: Move gpa_val and gpa_available into the emulator contextSean Christopherson
Move the GPA tracking into the emulator context now that the context is guaranteed to be initialized via __init_emulate_ctxt() prior to dereferencing gpa_{available,val}, i.e. now that seeing a stale gpa_available will also trigger a WARN due to an invalid context. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Add EMULTYPE_PF when emulation is triggered by a page faultSean Christopherson
Add a new emulation type flag to explicitly mark emulation related to a page fault. Move the propation of the GPA into the emulator from the page fault handler into x86_emulate_instruction, using EMULTYPE_PF as an indicator that cr2 is valid. Similarly, don't propagate cr2 into the exception.address when it's *not* valid. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: apic: remove unused function apic_lvt_vector()Miaohe Lin
The function apic_lvt_vector() is unused now, remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: VMX: Add 'else' to split mutually exclusive caseMiaohe Lin
Each if branch in handle_external_interrupt_irqoff() is mutually exclusive. Add 'else' to make it clear and also avoid some unnecessary check. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: eliminate some unreachable codeMiaohe Lin
These code are unreachable, remove them. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Fix print format and coding styleMiaohe Lin
Use %u to print u32 var and correct some coding style. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: vmx: rewrite the comment in vmx_get_mt_maskChia-I Wu
Better reflect the structure of the code and metion why we could not always honor the guest. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Cc: Gurchetan Singh <gurchetansingh@chromium.org> Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-15Merge tag 'x86-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for x86: - Map EFI runtime service data as encrypted when SEV is enabled. Otherwise e.g. SMBIOS data cannot be properly decoded by dmidecode. - Remove the warning in the vector management code which triggered when a managed interrupt affinity changed outside of a CPU hotplug operation. The warning was correct until the recent core code change that introduced a CPU isolation feature which needs to migrate managed interrupts away from online CPUs under certain conditions to achieve the isolation" * tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vector: Remove warning on managed interrupt migration x86/ioremap: Map EFI runtime services data as encrypted for SEV
2020-03-15Merge tag 'perf-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A pile of perf fixes: Kernel side: - AMD uncore driver: Replace the open coded sanity check with the core variant, which provides the correct error code and also leaves a hint in dmesg Tooling: - Fix the stdio input handling with glibc versions >= 2.28 - Unbreak the futex-wake benchmark which was reduced to 0 test threads due to the conversion to cpumaps - Initialize sigaction structs before invoking sys_sigactio() - Plug the mapfile memory leak in perf jevents - Fix off by one relative directory includes - Fix an undefined string comparison in perf diff" * tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag tools: Fix off-by 1 relative directory includes perf jevents: Fix leak of mapfile memory perf bench: Clear struct sigaction before sigaction() syscall perf bench futex-wake: Restore thread count default to online CPU count perf top: Fix stdio interface input handling with glibc 2.28+ perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare perf symbols: Don't try to find a vmlinux file when looking for kernel modules perf bench: Share some global variables to fix build with gcc 10 perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files perf env: Do not return pointers to local variables perf tests bp_account: Make global variable static
2020-03-15Merge tag 'ras-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two RAS related fixes: - Shut down the per CPU thermal throttling poll work properly when a CPU goes offline. The missing shutdown caused the poll work to be migrated to a unbound worker which triggered warnings about the usage of smp_processor_id() in preemptible context - Fix the PPIN feature initialization which missed to enable the functionality when PPIN_CTL was enabled but the MSR locked against updates" * tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Fix logic and comments around MSR_PPIN_CTL x86/mce/therm_throt: Undo thermal polling properly on CPU offline
2020-03-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes for x86 and s390" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1 KVM: s390: Also reset registers in sync regs for initial cpu reset KVM: fix Kconfig menu text for -Werror KVM: x86: remove stale comment from struct x86_emulate_ctxt KVM: x86: clear stale x86_emulate_ctxt->intercept value KVM: SVM: Fix the svm vmexit code for WRMSR KVM: X86: Fix dereference null cpufreq policy
2020-03-14Merge branch 'kvm-null-pointer-fix' into kvm-masterPaolo Bonzini
2020-03-14KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAsVitaly Kuznetsov
When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter with EVMCS GPA = 0 the host crashes because the evmcs_gpa != vmx->nested.hv_evmcs_vmptr condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will happen on vmx->nested.hv_evmcs pointer dereference. Another problematic EVMCS ptr value is '-1' but it only causes host crash after nested_release_evmcs() invocation. The problem is exactly the same as with '0', we mistakenly think that the EVMCS pointer hasn't changed and thus nested.hv_evmcs_vmptr is valid. Resolve the issue by adding an additional !vmx->nested.hv_evmcs check to nested_vmx_handle_enlightened_vmptrld(), this way we will always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL and this is supposed to catch all invalid EVMCS GPAs. Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs() to be consistent with initialization where we don't currently set hv_evmcs_vmptr to '-1'. Cc: stable@vger.kernel.org Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirectNitesh Narayan Lal
Previously all fields of structure kvm_lapic_irq were not initialized before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause an issue when any of those fields are used for processing a request. For example not initializing the msi_redir_hint field before passing to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of kvm_apic_map_get_dest_lapic(). This will specifically happen when the kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage value of msi_redir_hint, which should not happen as the request belongs to APIC fixed delivery mode and we do not want to deliver the interrupt only to the lowest priority candidate. This patch initializes all the fields of kvm_lapic_irq based on the values of ioapic redirect_entry object before passing it on to kvm_bitmap_or_dest_vcpus(). Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Set level to false since the value doesn't really matter. Suggested by Vitaly Kuznetsov. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14acpi/x86: ignore unspecified bit positions in the ACPI global lock fieldJan Engelhardt
The value in "new" is constructed from "old" such that all bits defined as reserved by the ACPI spec[1] are left untouched. But if those bits do not happen to be all zero, "new < 3" will not evaluate to true. The firmware of the laptop(s) Medion MD63490 / Akoya P15648 comes with garbage inside the "FACS" ACPI table. The starting value is old=0x4944454d, therefore new=0x4944454e, which is >= 3. Mask off the reserved bits. [1] https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206553 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14acpi/x86: add a kernel parameter to disable ACPI BGRTAlex Hung
BGRT is for displaying seamless OEM logo from booting to login screen; however, this mechanism does not always work well on all configurations and the OEM logo can be displayed multiple times. This looks worse than without BGRT enabled. This patch adds a kernel parameter to disable BGRT in boot time. This is easier than re-compiling a kernel with CONFIG_ACPI_BGRT disabled. Signed-off-by: Alex Hung <alex.hung@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-14KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1Sean Christopherson
Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if the CPU supports SGX1. Per Intel's SDM, all ENCLS leafs #UD if SGX1 is not supported[*], i.e. intercepting ENCLS to inject a #UD is unnecessary. Avoiding ENCLS-exiting even when it is reported as supported by the CPU works around a reported issue where SGX is "hard" disabled after an S3 suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control are enumerated as unsupported. While the root cause of the S3 issue is unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a workaround for what is most likely a hardware or firmware issue. As a bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and vmcs02. Note, SGX must be disabled in BIOS to take advantage of this workaround [*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are cleared. Soft disabled meaning disabling SGX without clearing the primary CPUID bit (in leaf 0x7) and without poking into non-SGX CPU paths, e.g. for the VMCS controls. Fixes: 0b665d304028 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest") Reported-by: Toni Spets <toni.spets@iki.fi> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14x86/acpi: make "asmlinkage" part first thing in the function definitionAlexey Dobriyan
g++ insists that function declaration must start with extern "C" (which asmlinkage expands to). gcc doesn't care. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-03-13 The following pull-request contains BPF updates for your *net-next* tree. We've added 86 non-merge commits during the last 12 day(s) which contain a total of 107 files changed, 5771 insertions(+), 1700 deletions(-). The main changes are: 1) Add modify_return attach type which allows to attach to a function via BPF trampoline and is run after the fentry and before the fexit programs and can pass a return code to the original caller, from KP Singh. 2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher objects to be visible in /proc/kallsyms so they can be annotated in stack traces, from Jiri Olsa. 3) Extend BPF sockmap to allow for UDP next to existing TCP support in order in order to enable this for BPF based socket dispatch, from Lorenz Bauer. 4) Introduce a new bpftool 'prog profile' command which attaches to existing BPF programs via fentry and fexit hooks and reads out hardware counters during that period, from Song Liu. Example usage: bpftool prog profile id 337 duration 3 cycles instructions llc_misses 4228 run_cnt 3403698 cycles (84.08%) 3525294 instructions # 1.04 insn per cycle (84.05%) 13 llc_misses # 3.69 LLC misses per million isns (83.50%) 5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition of a new bpf_link abstraction to keep in particular BPF tracing programs attached even when the applicaion owning them exits, from Andrii Nakryiko. 6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering and which returns the PID as seen by the init namespace, from Carlos Neira. 7) Refactor of RISC-V JIT code to move out common pieces and addition of a new RV32G BPF JIT compiler, from Luke Nelson. 8) Add gso_size context member to __sk_buff in order to be able to know whether a given skb is GSO or not, from Willem de Bruijn. 9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output implementation but can be called from tracepoint programs, from Eelco Chaudron. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13x86/mm: Rename is_kernel_text to __is_kernel_textJiri Olsa
The kbuild test robot reported compile issue on x86 in one of the following patches that adds <linux/kallsyms.h> include into <linux/bpf.h>, which is picked up by init_32.c object. The problem is that <linux/kallsyms.h> defines global function is_kernel_text which colides with the static function of the same name defined in init_32.c: $ make ARCH=i386 ... >> arch/x86/mm/init_32.c:241:19: error: redefinition of 'is_kernel_text' static inline int is_kernel_text(unsigned long addr) ^~~~~~~~~~~~~~ In file included from include/linux/bpf.h:21:0, from include/linux/bpf-cgroup.h:5, from include/linux/cgroup-defs.h:22, from include/linux/cgroup.h:28, from include/linux/hugetlb.h:9, from arch/x86/mm/init_32.c:18: include/linux/kallsyms.h:31:19: note: previous definition of 'is_kernel_text' was here static inline int is_kernel_text(unsigned long addr) Renaming the init_32.c is_kernel_text function to __is_kernel_text. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-03-12 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 8 day(s) which contain a total of 12 files changed, 161 insertions(+), 15 deletions(-). The main changes are: 1) Andrii fixed two bugs in cgroup-bpf. 2) John fixed sockmap. 3) Luke fixed x32 jit. 4) Martin fixed two issues in struct_ops. 5) Yonghong fixed bpf_send_signal. 6) Yoshiki fixed BTF enum. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13x86/vector: Remove warning on managed interrupt migrationPeter Xu
The vector management code assumes that managed interrupts cannot be migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE() which triggers when a managed interrupt vector association on a online CPU is cleared. The CPU offline code uses a different mechanism which cannot trigger this. This assumption is not longer correct because the new CPU isolation feature which affects the placement of managed interrupts must be able to move a managed interrupt away from an online CPU. There are two reasons why this can happen: 1) When the interrupt is activated the affinity mask which was established in irq_create_affinity_masks() is handed in to the vector allocation code. This mask contains all CPUs to which the interrupt can be made affine to, but this does not take the CPU isolation 'managed_irq' mask into account. When the interrupt is finally requested by the device driver then the affinity is checked again and the CPU isolation 'managed_irq' mask is taken into account, which moves the interrupt to a non-isolated CPU if possible. 2) The interrupt can be affine to an isolated CPU because the non-isolated CPUs in the calculated affinity mask are not online. Once a non-isolated CPU which is in the mask comes online the interrupt is migrated to this non-isolated CPU In both cases the regular online migration mechanism is used which triggers the WARN_ON_ONCE() in free_moved_vector(). Case #1 could have been addressed by taking the isolation mask into account, but that would require a massive code change in the activation logic and the eventual migration event was accepted as a reasonable tradeoff when the isolation feature was developed. But even if #1 would be addressed, #2 would still trigger it. Of course the warning in free_moved_vector() was overlooked at that time and the above two cases which have been discussed during patch review have obviously never been tested before the final submission. So keep it simple and remove the warning. [ tglx: Rewrote changelog and added a comment to free_moved_vector() ] Fixes: 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed interrupts") Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lkml.kernel.org/r/20200312205830.81796-1-peterx@redhat.com
2020-03-12Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a build problem with x86/curve25519" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/curve25519 - support assemblers with no adx support
2020-03-12x86/mm/kmmio: Use this_cpu_ptr() instead get_cpu_var() for kmmio_ctxSebastian Andrzej Siewior
Both call sites that access kmmio_ctx, access kmmio_ctx with interrupts disabled. There is no need to use get_cpu_var() which additionally disables preemption. Use this_cpu_ptr() to access the kmmio_ctx variable of the current CPU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200205143426.2592512-1-bigeasy@linutronix.de
2020-03-12perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flagKim Phillips
Enable the sampling check in kernel/events/core.c::perf_event_open(), which returns the more appropriate -EOPNOTSUPP. BEFORE: $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses). /bin/dmesg | grep -i perf may provide additional information. With nothing relevant in dmesg. AFTER: $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true Error: l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Fixes: c43ca5091a37 ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200311191323.13124-1-kim.phillips@amd.com
2020-03-12ima: add a new CONFIG for loading arch-specific policiesNayna Jain
Every time a new architecture defines the IMA architecture specific functions - arch_ima_get_secureboot() and arch_ima_get_policy(), the IMA include file needs to be updated. To avoid this "noise", this patch defines a new IMA Kconfig IMA_SECURE_AND_OR_TRUSTED_BOOT option, allowing the different architectures to select it. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Philipp Rudo <prudo@linux.ibm.com> (s390) Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2020-03-12x86/cpu/amd: Call init_amd_zn() om Family 19h processors tooKim Phillips
Family 19h CPUs are Zen-based and still share most architectural features with Family 17h CPUs, and therefore still need to call init_amd_zn() e.g., to set the RECLAIM_DISTANCE override. init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used in amd_set_core_ssb_state(), which isn't called on some late model Family 17h CPUs, nor on any Family 19h CPUs: X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those later model CPUs, where the SSBD mitigation is done via the SPEC_CTRL MSR instead of the LS_CFG MSR. Family 19h CPUs also don't have the erratum where the CPB feature bit isn't set, but that code can stay unchanged and run safely on Family 19h. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.com
2020-03-11x86/tsc_msr: Make MSR derived TSC frequency more accurateHans de Goede
The "Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 4: Model-Specific Registers" has the following table for the values from freq_desc_byt: 000B: 083.3 MHz 001B: 100.0 MHz 010B: 133.3 MHz 011B: 116.7 MHz 100B: 080.0 MHz Notice how for e.g the 83.3 MHz value there are 3 significant digits, which translates to an accuracy of a 1000 ppm, where as a typical crystal oscillator is 20 - 100 ppm, so the accuracy of the frequency format used in the Software Developer’s Manual is not really helpful. As far as we know Bay Trail SoCs use a 25 MHz crystal and Cherry Trail uses a 19.2 MHz crystal, the crystal is the source clock for a root PLL which outputs 1600 and 100 MHz. It is unclear if the root PLL outputs are used directly by the CPU clock PLL or if there is another PLL in between. This does not matter though, we can model the chain of PLLs as a single PLL with a quotient equal to the quotients of all PLLs in the chain multiplied. So we can create a simplified model of the CPU clock setup using a reference clock of 100 MHz plus a quotient which gets us as close to the frequency from the SDM as possible. For the 83.3 MHz example from above this would give 100 MHz * 5 / 6 = 83 and 1/3 MHz, which matches exactly what has been measured on actual hardware. Use a simplified PLL model with a reference clock of 100 MHz for all Bay and Cherry Trail models. This has been tested on the following models: CPU freq before: CPU freq after: Intel N2840 2165.800 MHz 2166.667 MHz Intel Z3736 1332.800 MHz 1333.333 MHz Intel Z3775 1466.300 MHz 1466.667 MHz Intel Z8350 1440.000 MHz 1440.000 MHz Intel Z8750 1600.000 MHz 1600.000 MHz This fixes the time drifting by about 1 second per hour (20 - 30 seconds per day) on (some) devices which rely on the tsc_msr.c code to determine the TSC frequency. Reported-by: Vipul Kumar <vipulk0511@gmail.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200223140610.59612-3-hdegoede@redhat.com
2020-03-11x86/tsc_msr: Fix MSR_FSB_FREQ mask for Cherry Trail devicesHans de Goede
According to the "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 4: Model-Specific Registers" on Cherry Trail (Airmont) devices the 4 lowest bits of the MSR_FSB_FREQ mask indicate the bus freq unlike on e.g. Bay Trail where only the lowest 3 bits are used. This is also the reason why MAX_NUM_FREQS is defined as 9, since Cherry Trail SoCs have 9 possible frequencies, so the lo value from the MSR needs to be masked with 0x0f, not with 0x07 otherwise the 9th frequency will get interpreted as the 1st. Bump MAX_NUM_FREQS to 16 to avoid any possibility of addressing the array out of bounds and makes the mask part of the cpufreq struct so it can be set it per model. While at it also log an error when the index points to an uninitialized part of the freqs lookup-table. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200223140610.59612-2-hdegoede@redhat.com
2020-03-11x86/tsc_msr: Use named struct initializersHans de Goede
Use named struct initializers for the freq_desc struct-s initialization and change the "u8 msr_plat" to a "bool use_msr_plat" to make its meaning more clear instead of relying on a comment to explain it. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200223140610.59612-1-hdegoede@redhat.com
2020-03-11x86: Select HARDIRQS_SW_RESEND on x86Hans de Goede
Modern x86 laptops are starting to use GPIO pins as interrupts more and more, e.g. touchpads and touchscreens have almost all moved away from PS/2 and USB to using I2C with a GPIO pin as interrupt. Modern x86 laptops also have almost all moved to using s2idle instead of using the system S3 ACPI power state to suspend. The Intel and AMD pinctrl drivers do not define irq_retrigger handlers for the irqchips they register, this is causing edge triggered interrupts which happen while suspended using s2idle to get lost. One specific example of this is the lid switch on some devices, lid switches used to be handled by the embedded-controller, but now the lid open/closed sensor is sometimes directly connected to a GPIO pin. On most devices the ACPI code for this looks like this: Method (_E00, ...) { Notify (LID0, 0x80) // Status Change } Where _E00 is an ACPI event handler for changes on both edges of the GPIO connected to the lid sensor, this event handler is then combined with an _LID method which directly reads the pin. When the device is resumed by opening the lid, the GPIO interrupt will wake the system, but because the pinctrl irqchip doesn't have an irq_retrigger handler, the Notify will not happen. This is not a problem in the case the _LID method directly reads the GPIO, because the drivers/acpi/button.c code will call _LID on resume anyways. But some devices have an event handler for the GPIO connected to the lid sensor which looks like this: Method (_E00, ...) { if (LID_GPIO == One) LIDS = One else LIDS = Zero Notify (LID0, 0x80) // Status Change } And the _LID method returns the cached LIDS value, since on open we do not re-run the edge-interrupt handler when we re-enable IRQS on resume (because of the missing irq_retrigger handler), _LID now will keep reporting closed, as LIDS was never changed to reflect the open status, this causes userspace to re-resume the laptop again shortly after opening the lid. The Intel GPIO controllers do not allow implementing irq_retrigger without emulating it in software, at which point we are better of just using the generic HARDIRQS_SW_RESEND mechanism rather then re-implementing software emulation for this separately in aprox. 14 different pinctrl drivers. Select HARDIRQS_SW_RESEND to solve the problem of edge-triggered GPIO interrupts not being re-triggered on resume when they were triggered during suspend (s2idle) and/or when they were the cause of the wakeup. This requires 008f1d60fe25 ("x86/apic/vector: Force interupt handler invocation to irq context") c16816acd086 ("genirq: Add protection against unsafe usage of generic_handle_irq()") to protect the APIC based interrupts from being wreckaged by a software resend. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200123210242.53367-1-hdegoede@redhat.com
2020-03-11x86/ioremap: Map EFI runtime services data as encrypted for SEVTom Lendacky
The dmidecode program fails to properly decode the SMBIOS data supplied by OVMF/UEFI when running in an SEV guest. The SMBIOS area, under SEV, is encrypted and resides in reserved memory that is marked as EFI runtime services data. As a result, when memremap() is attempted for the SMBIOS data, it can't be mapped as regular RAM (through try_ram_remap()) and, since the address isn't part of the iomem resources list, it isn't mapped encrypted through the fallback ioremap(). Add a new __ioremap_check_other() to deal with memory types like EFI_RUNTIME_SERVICES_DATA which are not covered by the resource ranges. This allows any runtime services data which has been created encrypted, to be mapped encrypted too. [ bp: Move functionality to a separate function. ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Cc: <stable@vger.kernel.org> # 5.3 Link: https://lkml.kernel.org/r/2d9e16eb5b53dc82665c95c6764b7407719df7a0.1582645327.git.thomas.lendacky@amd.com
2020-03-11bpf: Fix trampoline generation for fmod_ret programsAlexei Starovoitov
fmod_ret progs are emitted as: start = __bpf_prog_enter(); call fmod_ret *(u64 *)(rbp - 8) = rax __bpf_prog_exit(, start); test eax, eax jne do_fexit That 'test eax, eax' is working by accident. The compiler is free to use rax inside __bpf_prog_exit() or inside functions that __bpf_prog_exit() is calling. Which caused "test_progs -t modify_return" to sporadically fail depending on compiler version and kconfig. Fix it by using 'cmp [rbp - 8], 0' instead of 'test eax, eax'. Fixes: ae24082331d9 ("bpf: Introduce BPF_MODIFY_RETURN") Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200311003906.3643037-1-ast@kernel.org
2020-03-10x86/entry/64: Trace irqflags unconditionally as ON when returning to user spaceThomas Gleixner
User space cannot disable interrupts any longer so trace return to user space unconditionally as IRQS_ON. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200308222609.314596327@linutronix.de
2020-03-10x86/entry/32: Remove unused label restore_nocheckThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/20200308222609.219366430@linutronix.de
2020-03-10x86/mce/dev-mcelog: Dynamically allocate space for machine check recordsTony Luck
We have had a hard coded limit of 32 machine check records since the dawn of time. But as numbers of cores increase, it is possible for more than 32 errors to be reported before a user process reads from /dev/mcelog. In this case the additional errors are lost. Keep 32 as the minimum. But tune the maximum value up based on the number of processors. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200218184408.GA23048@agluck-desk2.amr.corp.intel.com
2020-03-10x86/Kconfig: Drop vendor dependency for X86_UMIPTony W Wang-oc
Some Centaur family 7 CPUs and Zhaoxin family 7 CPUs support the UMIP feature too. The text size growth which UMIP adds is ~1K and distro kernels enable it anyway so remove the vendor dependency. [ bp: Rewrite commit message. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1583733990-2587-1-git-send-email-TonyWWang-oc@zhaoxin.com
2020-03-09PCI: hv: Introduce hv_msi_entryBoqun Feng
Add a new structure (hv_msi_entry), which is also defined in the TLFS, to describe the msi entry for HVCALL_RETARGET_INTERRUPT. The structure is needed because its layout may be different from architecture to architecture. Also add a new generic interface hv_set_msi_entry_from_desc() to allow different archs to set the msi entry from msi_desc. No functional change, only preparation for the future support of virtual PCI on non-x86 architectures. Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09PCI: hv: Move retarget related structures into tlfs headerBoqun Feng
Currently, retarget_msi_interrupt and other structures it relys on are defined in pci-hyperv.c. However, those structures are actually defined in Hypervisor Top-Level Functional Specification [1] and may be different in sizes of fields or layout from architecture to architecture. Let's move those definitions into x86's tlfs header file to support virtual PCI on non-x86 architectures in the future. Note that "__packed" attribute is added to these structures during the movement for the same reason as we use the attribute for other TLFS structures in the header file: make sure the structures meet the specification and avoid anything unexpected from the compilers. Additionally, rename struct retarget_msi_interrupt to hv_retarget_msi_interrupt for the consistent naming convention, also mirroring the name in TLFS. [1]: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09PCI: hv: Move hypercall related definitions into tlfs headerBoqun Feng
Currently HVCALL_RETARGET_INTERRUPT and HV_PARTITION_ID_SELF are defined in pci-hyperv.c. However, similar to other hypercall related definitions, it makes more sense to put them in the tlfs header file. Besides, these definitions are arch-dependent, so for the support of virtual PCI on non-x86 archs in the future, move them into arch-specific tlfs header file. Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-08x86/apic/vector: Force interupt handler invocation to irq contextThomas Gleixner
Sathyanarayanan reported that the PCI-E AER error injection mechanism can result in a NULL pointer dereference in apic_ack_edge(): BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 RIP: 0010:apic_ack_edge+0x1e/0x40 Call Trace: handle_edge_irq+0x7d/0x1e0 generic_handle_irq+0x27/0x30 aer_inject_write+0x53a/0x720 It crashes in irq_complete_move() which dereferences get_irq_regs() which is obviously NULL when this is called from non interrupt context. Of course the pointer could be checked, but that just papers over the real issue. Invoking the low level interrupt handling mechanism from random code can wreckage the fragile interrupt affinity mechanism of x86 as interrupts can only be moved in interrupt context or with special care when a CPU goes offline and the move has to be enforced. In the best case this triggers the warning in the MSI affinity setter, but if the call happens on the correct CPU it just corrupts state and might prevent further interrupt delivery for the affected device. Mark the APIC interrupts as unsuitable for being invoked in random contexts. This prevents the AER injection from proliferating the wreckage, but that's less broken than the current state of affairs and more correct than just papering over the problem by sprinkling random checks all over the place and silently corrupting state. Reported-by: sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200306130623.684591280@linutronix.de
2020-03-08efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()Ard Biesheuvel
Commit: 59f2a619a2db8611 ("efi: Add 'runtime' pointer to struct efi") modified the assembler routine called by efi_set_virtual_address_map(), to grab the 'runtime' EFI service pointer while running with paging disabled (which is tricky to do in C code) After the change, register %ebx is not restored correctly, resulting in all kinds of weird behavior, so fix that. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200304133515.15035-1-ardb@kernel.org Link: https://lore.kernel.org/r/20200308080859.21568-22-ardb@kernel.org
2020-03-08efi/x86: Don't relocate the kernel unless necessaryArvind Sankar
Add alignment slack to the PE image size, so that we can realign the decompression buffer within the space allocated for the image. Only relocate the kernel if it has been loaded at an unsuitable address: - Below LOAD_PHYSICAL_ADDR, or - Above 64T for 64-bit and 512MiB for 32-bit For 32-bit, the upper limit is conservative, but the exact limit can be difficult to calculate. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200303221205.4048668-6-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-20-ardb@kernel.org
2020-03-08efi/x86: Remove extra headroom for setup blockArvind Sankar
The following commit: 223e3ee56f77 ("efi/x86: add headroom to decompressor BSS to account for setup block") added headroom to the PE image to account for the setup block, which wasn't used for the decompression buffer. Now that the decompression buffer is located at the start of the image, and includes the setup block, this is no longer required. Add a check to make sure that the head section of the compressed kernel won't overwrite itself while relocating. This is only for future-proofing as with current limits on the setup and the actual size of the head section, this can never happen. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200303221205.4048668-5-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-19-ardb@kernel.org
2020-03-08efi/x86: Add kernel preferred address to PE headerArvind Sankar
Store the kernel's link address as ImageBase in the PE header. Note that the PE specification requires the ImageBase to be 64k aligned. The preferred address should almost always satisfy that, except for 32-bit kernel if the configuration has been customized. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200303221205.4048668-4-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-18-ardb@kernel.org
2020-03-08efi/x86: Decompress at start of PE image load addressArvind Sankar
When booted via PE loader, define image_offset to hold the offset of startup_32() from the start of the PE image, and use it as the start of the decompression buffer. [ mingo: Fixed the grammar in the comments. ] Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200303221205.4048668-3-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-17-ardb@kernel.org
2020-03-08x86/boot/compressed/32: Save the output address instead of recalculating itArvind Sankar
In preparation for being able to decompress into a buffer starting at a different address than startup_32, save the calculated output address instead of recalculating it later. We now keep track of three addresses: %edx: startup_32 as we were loaded by bootloader %ebx: new location of compressed kernel %ebp: start of decompression buffer Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200303221205.4048668-2-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-16-ardb@kernel.org
2020-03-08x86/boot: Use unsigned comparison for addressesArvind Sankar
The load address is compared with LOAD_PHYSICAL_ADDR using a signed comparison currently (using jge instruction). When loading a 64-bit kernel using the new efi32_pe_entry() point added by: 97aa276579b2 ("efi/x86: Add true mixed mode entry point into .compat section") using Qemu with -m 3072, the firmware actually loads us above 2Gb, resulting in a very early crash. Use the JAE instruction to perform a unsigned comparison instead, as physical addresses should be considered unsigned. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-6-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-14-ardb@kernel.org
2020-03-08efi/x86: Avoid using code32_startArvind Sankar
code32_start is meant for 16-bit real-mode bootloaders to inform the kernel where the 32-bit protected mode code starts. Nothing in the protected mode kernel except the EFI stub uses it. efi_main() currently returns boot_params, with code32_start set inside it to tell efi_stub_entry() where startup_32 is located. Since it was invoked by efi_stub_entry() in the first place, boot_params is already known. Return the address of startup_32 instead. This will allow a 64-bit kernel to live above 4Gb, for example, and it's cleaner as well. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-5-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-13-ardb@kernel.org
2020-03-08efi/x86: Make efi32_pe_entry() more readableArvind Sankar
Set up a proper frame pointer in efi32_pe_entry() so that it's easier to calculate offsets for arguments. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200301230436.2246909-4-nivedita@alum.mit.edu Link: https://lore.kernel.org/r/20200308080859.21568-12-ardb@kernel.org