summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-09-28KVM: X86: Move handling of INVPCID types to x86Babu Moger
INVPCID instruction handling is mostly same across both VMX and SVM. So, move the code to common x86.c. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.cBabu Moger
Handling of kvm_read/write_guest_virt*() errors can be moved to common code. The same code can be used by both VMX and SVM. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_interceptBabu Moger
Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep tfor all cr intercepts. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985253016.11252.16945893859439811480.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Add new intercept word in vmcb_control_areaBabu Moger
The new intercept bits have been added in vmcb control area to support few more interceptions. Here are the some of them. - INTERCEPT_INVLPGB, - INTERCEPT_INVLPGB_ILLEGAL, - INTERCEPT_INVPCID, - INTERCEPT_MCOMMIT, - INTERCEPT_TLBSYNC, Add a new intercept word in vmcb_control_area to support these instructions. Also update kvm_nested_vmrun trace function to support the new addition. AMD documentation for these instructions is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Modify 64 bit intercept field to two 32 bit vectorsBabu Moger
Convert all the intercepts to one array of 32 bit vectors in vmcb_control_area. This makes it easy for future intercept vector additions. Also update trace functions. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Modify intercept_exceptions to generic interceptsBabu Moger
Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_exceptions bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Change intercept_dr to generic interceptsBabu Moger
Modify intercept_dr to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_dr bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Change intercept_cr to generic interceptsBabu Moger
Change intercept_cr to generic intercepts in vmcb_control_area. Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept where applicable. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu> [Change constant names. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)Babu Moger
This is in preparation for the future intercept vector additions. Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept using kernel APIs __set_bit, __clear_bit and test_bit espectively. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985247876.11252.16039238014239824460.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nSVM: Remove unused fieldBabu Moger
host_intercept_exceptions is not used anywhere. Clean it up. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985252277.11252.8819848322175521354.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: refactor exit labels in svm_create_vcpuMaxim Levitsky
Kernel coding style suggests not to use labels like error1,error2 Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: use __GFP_ZERO instead of clear_pageMaxim Levitsky
Another small refactoring. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: refactor msr permission bitmap allocationMaxim Levitsky
Replace svm_vcpu_init_msrpm with svm_vcpu_alloc_msrpm, that also allocates the msr bitmap and add svm_vcpu_free_msrpm to free it. This will be used later to move the nested msr permission bitmap allocation to nested.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nSVM: rename nested vmcb to vmcb12Maxim Levitsky
This is to be more consistient with VMX, and to support upcoming addition of vmcb02 Hopefully no functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: SVM: rename a variable in the svm_create_vcpuMaxim Levitsky
The 'page' is to hold the vcpu's vmcb so name it as such to avoid confusion. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200827171145.374620-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: LAPIC: Reduce world switch latency caused by timer_advance_nsWanpeng Li
All the checks in lapic_timer_int_injected(), __kvm_wait_lapic_expire(), and these function calls waste cpu cycles when the timer mode is not tscdeadline. We can observe ~1.3% world switch time overhead by kvm-unit-tests/vmexit.flat vmcall testing on AMD server. This patch reduces the world switch latency caused by timer_advance_ns feature when the timer mode is not tscdeadline by simpling move the check against apic->lapic_timer.expired_tscdeadline much earlier. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-7-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: LAPIC: Narrow down the kick target vCPUWanpeng Li
The kick after setting KVM_REQ_PENDING_TIMER is used to handle the timer fires on a different pCPU which vCPU is running on. This kick costs about 1000 clock cycles and we don't need this when injecting already-expired timer or when using the VMX preemption timer because kvm_lapic_expired_hv_timer() is called from the target vCPU. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-6-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: LAPIC: Guarantee the timer is in tsc-deadline mode when settingWanpeng Li
Check apic_lvtt_tscdeadline() mode directly instead of apic_lvtt_oneshot() and apic_lvtt_period() to guarantee the timer is in tsc-deadline mode when wrmsr MSR_IA32_TSCDEADLINE. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-3-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: LAPIC: Return 0 when getting the tscdeadline timer if the lapic is hw ↵Wanpeng Li
disabled Return 0 when getting the tscdeadline timer if the lapic is hw disabled. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-2-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: LAPIC: Fix updating DFR missing apic map recalculationWanpeng Li
There is missing apic map recalculation after updating DFR, if it is INIT RESET, in x2apic mode, local apic is software enabled before. This patch fix it by introducing the function kvm_apic_set_dfr() to be called in INIT RESET handling path. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1597827327-25055-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nVMX: Simplify the initialization of nested_vmx_msrsChenyi Qiang
The nested VMX controls MSRs can be initialized by the global capability values stored in vmcs_config. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200828085622.8365-6-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabledChenyi Qiang
KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they are not exposed by the system ioctl KVM_GET_MSR. Add them to the setup of nested VMX controls MSR. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20200828085622.8365-2-chenyi.qiang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nSVM: Avoid freeing uninitialized pointers in svm_set_nested_state()Vitaly Kuznetsov
The save and ctl pointers are passed uninitialized to kfree() when svm_set_nested_state() follows the 'goto out_set_gif' path. While the issue could've been fixed by initializing these on-stack varialbles to NULL, it seems preferable to eliminate 'out_set_gif' label completely as it is not actually a failure path and duplicating a single svm_set_gif() call doesn't look too bad. [ bp: Drop obscure Addresses-Coverity: tag. ] Fixes: 6ccbd29ade0d ("KVM: SVM: nested: Don't allocate VMCB structures on stack") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Joerg Roedel <jroedel@suse.de> Reported-by: Colin King <colin.king@canonical.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Joerg Roedel <jroedel@suse.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20200914133725.650221-1-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28x86/hyperv: Remove aliases with X64 in their nameJoseph Salisbury
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-09-27x86/apic/msi: Unbreak DMAR and HPET MSIThomas Gleixner
Switching the DMAR and HPET MSI code to use the generic MSI domain ops missed to add the flag which tells the core code to update the domain operations with the defaults. As a consequence the core code crashes when an interrupt in one of those domains is allocated. Add the missing flags. Fixes: 9006c133a422 ("x86/msi: Use generic MSI domain ops") Reported-by: Qian Cai <cai@redhat.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87wo0fli8b.fsf@nanos.tec.linutronix.de
2020-09-27Merge tag 'x86-urgent-2020-09-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for the x86 interrupt code: - Unbreak the magic 'search the timer interrupt' logic in IO/APIC code which got wreckaged when the core interrupt code made the state tracking logic stricter. That caused the interrupt line to stay masked after switching from IO/APIC to PIC delivery mode, which obviously prevents interrupts from being delivered. - Make run_on_irqstack_code() typesafe. The function argument is a void pointer which is then cast to 'void (*fun)(void *). This breaks Control Flow Integrity checking in clang. Use proper helper functions for the three variants reuqired" * tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Unbreak check_timer() x86/irq: Make run_on_irqstack_cond() typesafe
2020-09-27x86/hyperv: Remove aliases with X64 in their nameJoseph Salisbury
In the architecture independent version of hyperv-tlfs.h, commit c55a844f46f958b removed the "X64" in the symbol names so they would make sense for both x86 and ARM64. That commit added aliases with the "X64" in the x86 version of hyperv-tlfs.h so that existing x86 code would continue to compile. As a cleanup, update the x86 code to use the symbols without the "X64", then remove the aliases. There's no functional change. Signed-off-by: Joseph Salisbury <joseph.salisbury@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/1601130386-11111-1-git-send-email-jsalisbury@linux.microsoft.com
2020-09-26arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writebackMikulas Patocka
If we copy less than 8 bytes and if the destination crosses a cache line, __copy_user_flushcache would invalidate only the first cache line. This patch makes it invalidate the second cache line as well. Fixes: 0aed55af88345b ("x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dan Williams <dan.j.wiilliams@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2009161451140.21915@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more kvm fixes from Paolo Bonzini: "Five small fixes. The nested migration bug will be fixed with a better API in 5.10 or 5.11, for now this is a fix that works with existing userspace but keeps the current ugly API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Add a dedicated INVD intercept routine KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE KVM: x86: fix MSR_IA32_TSC read for nested migration selftests: kvm: Fix assert failure in single-step test KVM: x86: VMX: Make smaller physical guest address space support user-configurable
2020-09-25KVM: SVM: Add a dedicated INVD intercept routineTom Lendacky
The INVD instruction intercept performs emulation. Emulation can't be done on an SEV guest because the guest memory is encrypted. Provide a dedicated intercept routine for the INVD intercept. And since the instruction is emulated as a NOP, just skip it instead. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <a0b9a19ffa7fef86a3cc700c7ea01cb2731e04e5.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-25x86/sev-es: Use GHCB accessor for setting the MMIO scratch bufferTom Lendacky
Use ghcb_set_sw_scratch() to set the GHCB scratch field, which will also set the corresponding bit in the GHCB valid_bitmap field to denote that sw_scratch is actually valid. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/ba84deabdf44a7a880454fb351d189c6ad79d4ba.1601041106.git.thomas.lendacky@amd.com
2020-09-25KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKESean Christopherson
Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled. Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are toggled inadvertantly skipped the MMU context reset due to the mask of bits that triggers PDPTR loads also being used to trigger MMU context resets. Fixes: 427890aff855 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode") Fixes: cb957adb4ea4 ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode") Cc: Jim Mattson <jmattson@google.com> Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923215352.17756-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-25dma-mapping: add a new dma_alloc_pages APIChristoph Hellwig
This API is the equivalent of alloc_pages, except that the returned memory is guaranteed to be DMA addressable by the passed in device. The implementation will also be used to provide a more sensible replacement for DMA_ATTR_NON_CONSISTENT flag. Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages as its backend. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> (MIPS part)
2020-09-25Merge branch 'master' of ↵Christoph Hellwig
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into dma-mapping-for-next Pull in the latest 5.9 tree for the commit to revert the V4L2_FLAG_MEMORY_NON_CONSISTENT uapi addition.
2020-09-24KVM: x86: fix MSR_IA32_TSC read for nested migrationMaxim Levitsky
MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception. Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset into account. This is documented in Intel's SDM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200921103805.9102-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-24perf/x86/intel/uncore: Support PCIe3 unit on Snow RidgeKan Liang
The Snow Ridge integrated PCIe3 uncore unit can be used to collect performance data, e.g. utilization, between PCIe devices, plugged into the PCIe port, and the components (in M2IOSF) responsible for translating and managing requests to/from the device. The performance data is very useful for analyzing the performance of PCIe devices. The device with the PCIe3 uncore PMON units is owned by the portdrv_pci driver. Create a PCI sub driver for the PCIe3 uncore PMON units. Here are some difference between PCIe3 uncore unit and other uncore pci units. - There may be several Root Ports on a system. But the uncore counters only exist in the Root Port A. A user can configure the channel mask to collect the data from other Root Ports. - The event format of the PCIe3 uncore unit is the same as IIO unit of SKX. - The Control Register of PCIe3 uncore unit is 64 bits. - The offset of each counters is 8, which is the same as M2M unit of SNR. - New MSR addresses for unit control, counter and counter config. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1600094060-82746-7-git-send-email-kan.liang@linux.intel.com
2020-09-24perf/x86/intel/uncore: Generic support for the PCI sub driverKan Liang
Some uncore counters may be located in the configuration space of a PCI device, which already has a bonded driver. Currently, the uncore driver cannot register a PCI uncore PMU for these counters, because, to register a PCI uncore PMU, the uncore driver must be bond to the device. However, one device can only have one bonded driver. Add an uncore PCI sub driver to support such kind of devices. The sub driver doesn't own the device. In initialization, the sub driver searches the device via pci_get_device(), and register the corresponding PMU for the device. In the meantime, the sub driver registers a PCI bus notifier, which is used to notify the sub driver once the device is removed. The sub driver can unregister the PMU accordingly. The sub driver only searches the devices defined in its id table. The id table varies on different platforms, which will be implemented in the following platform-specific patch. Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1600094060-82746-6-git-send-email-kan.liang@linux.intel.com
2020-09-24perf/x86/intel/uncore: Factor out uncore_pci_pmu_unregister()Kan Liang
The PMU unregistration in the uncore PCI sub driver is similar as the normal PMU unregistration for a PCI device. The codes to unregister a PCI PMU can be shared. Factor out uncore_pci_pmu_unregister(), which will be used later. Use uncore_pci_get_dev_die_info() to replace the codes which retrieve the socket and die informaion. The pci_set_drvdata() is not included in uncore_pci_pmu_unregister() as well, because the uncore PCI sub driver will not touch the private driver data pointer of the device. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1600094060-82746-5-git-send-email-kan.liang@linux.intel.com
2020-09-24perf/x86/intel/uncore: Factor out uncore_pci_pmu_register()Kan Liang
The PMU registration in the uncore PCI sub driver is similar as the normal PMU registration for a PCI device. The codes to register a PCI PMU can be shared. Factor out uncore_pci_pmu_register(), which will be used later. The pci_set_drvdata() is not included in uncore_pci_pmu_register(). The uncore PCI sub driver doesn't own the PCI device. It will not touch the private driver data pointer for the device. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1600094060-82746-4-git-send-email-kan.liang@linux.intel.com
2020-09-24perf/x86/intel/uncore: Factor out uncore_pci_find_dev_pmu()Kan Liang
When an uncore PCI sub driver gets a remove notification, the corresponding PMU has to be retrieved and unregistered. The codes, which find the corresponding PMU by comparing the pci_device_id table, can be shared. Factor out uncore_pci_find_dev_pmu(), which will be used later. There is no functional change. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1600094060-82746-3-git-send-email-kan.liang@linux.intel.com
2020-09-24perf/x86/intel/uncore: Factor out uncore_pci_get_dev_die_info()Kan Liang
The socket and die information is required to register/unregister a PMU in the uncore PCI sub driver. The codes, which get the socket and die information from a BUS number, can be shared. Factor out uncore_pci_get_dev_die_info(), which will be used later. There is no functional change. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1600094060-82746-2-git-send-email-kan.liang@linux.intel.com
2020-09-24perf/amd/uncore: Inform the user how many counters each uncore PMU hasKim Phillips
Previously, the uncore driver would say "NB counters detected" on F17h machines, which don't have NorthBridge (NB) counters. They have Data Fabric (DF) counters. Just use the pmu.name to inform users which pmu to use and its associated counter count. F17h dmesg BEFORE: amd_uncore: AMD NB counters detected amd_uncore: AMD LLC counters detected F17h dmesg AFTER: amd_uncore: 4 amd_df counters detected amd_uncore: 6 amd_l3 counters detected Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200921144330.6331-5-kim.phillips@amd.com
2020-09-24perf/amd/uncore: Allow F19h user coreid, threadmask, and sliceid specificationKim Phillips
On Family 19h, the driver checks for a populated 2-bit threadmask in order to establish that the user wants to measure individual slices, individual cores (only one can be measured at a time), and lets the user also directly specify enallcores and/or enallslices if desired. Example F19h invocation to measure L3 accesses (event 4, umask 0xff) by the first thread (id 0 -> mask 0x1) of the first core (id 0) on the first slice (id 0): perf stat -a -e instructions,amd_l3/umask=0xff,event=0x4,coreid=0,threadmask=1,sliceid=0,enallcores=0,enallslices=0/ <workload> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200921144330.6331-4-kim.phillips@amd.com
2020-09-24perf/amd/uncore: Allow F17h user threadmask and slicemask specificationKim Phillips
Continue to fully populate either one of threadmask or slicemask if the user doesn't. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200921144330.6331-3-kim.phillips@amd.com
2020-09-24perf/amd/uncore: Prepare to scale for more attributes that vary per familyKim Phillips
Replace AMD_FORMAT_ATTR with the more apropos DEFINE_UNCORE_FORMAT_ATTR stolen from arch/x86/events/intel/uncore.h. This way we can clearly see the bit-variants of each of the attributes that want to have the same name across families. Also unroll AMD_ATTRIBUTE because we are going to separately add new attributes that differ between DF and L3. Also clean up the if-Family 17h-else logic in amd_uncore_init. This is basically a rewrite of commit da6adaea2b7e ("perf/x86/amd/uncore: Update sysfs attributes for Family17h processors"). No functional changes. Tested F17h+ /sys/bus/event_source/devices/amd_{l3,df}/format/* content remains unchanged: /sys/bus/event_source/devices/amd_l3/format/event:config:0-7 /sys/bus/event_source/devices/amd_l3/format/umask:config:8-15 /sys/bus/event_source/devices/amd_df/format/event:config:0-7,32-35,59-60 /sys/bus/event_source/devices/amd_df/format/umask:config:8-15 Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200921144330.6331-2-kim.phillips@amd.com
2020-09-23x86/ioapic: Unbreak check_timer()Thomas Gleixner
Several people reported in the kernel bugzilla that between v4.12 and v4.13 the magic which works around broken hardware and BIOSes to find the proper timer interrupt delivery mode stopped working for some older affected platforms which need to fall back to ExtINT delivery mode. The reason is that the core code changed to keep track of the masked and disabled state of an interrupt line more accurately to avoid the expensive hardware operations. That broke an assumption in i8259_make_irq() which invokes disable_irq_nosync(); irq_set_chip_and_handler(); enable_irq(); Up to v4.12 this worked because enable_irq() unconditionally unmasked the interrupt line, but after the state tracking improvements this is not longer the case because the IO/APIC uses lazy disabling. So the line state is unmasked which means that enable_irq() does not call into the new irq chip to unmask it. In principle this is a shortcoming of the core code, but it's more than unclear whether the core code should try to reset state. At least this cannot be done unconditionally as that would break other existing use cases where the chip type is changed, e.g. when changing the trigger type, but the callers expect the state to be preserved. As the way how check_timer() is switching the delivery modes is truly unique, the obvious fix is to simply unmask the i8259 manually after changing the mode to ExtINT delivery and switching the irq chip to the legacy PIC. Note, that the fixes tag is not really precise, but identifies the commit which broke the assumptions in the IO/APIC and i8259 code and that's the kernel version to which this needs to be backported. Fixes: bf22ff45bed6 ("genirq: Avoid unnecessary low level irq function calls") Reported-by: p_c_chan@hotmail.com Reported-by: ecm4@mail.com Reported-by: perdigao1@yahoo.com Reported-by: matzes@users.sourceforge.net Reported-by: rvelascog@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: p_c_chan@hotmail.com Tested-by: matzes@users.sourceforge.net Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=197769
2020-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-09-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 95 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 4211 insertions(+), 2040 deletions(-). The main changes are: 1) Full multi function support in libbpf, from Andrii. 2) Refactoring of function argument checks, from Lorenz. 3) Make bpf_tail_call compatible with functions (subprograms), from Maciej. 4) Program metadata support, from YiFei. 5) bpf iterator optimizations, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23KVM: x86: VMX: Make smaller physical guest address space support ↵Mohammed Gamal
user-configurable This patch exposes allow_smaller_maxphyaddr to the user as a module parameter. Since smaller physical address spaces are only supported on VMX, the parameter is only exposed in the kvm_intel module. For now disable support by default, and let the user decide if they want to enable it. Modifications to VMX page fault and EPT violation handling will depend on whether that parameter is enabled. Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Message-Id: <20200903141122.72908-1-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-22fs: remove compat_sys_mountChristoph Hellwig
compat_sys_mount is identical to the regular sys_mount now, so remove it and use the native version everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>