summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2021-04-20KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)Sean Christopherson
Export the gva_to_gpa() helpers for use by SGX virtualization when executing ENCLS[ECREATE] and ENCLS[EINIT] on behalf of the guest. To execute ECREATE and EINIT, KVM must obtain the GPA of the target Secure Enclave Control Structure (SECS) in order to get its corresponding HVA. Because the SECS must reside in the Enclave Page Cache (EPC), copying the SECS's data to a host-controlled buffer via existing exported helpers is not a viable option as the EPC is not readable or writable by the kernel. SGX virtualization will also use gva_to_gpa() to obtain HVAs for non-EPC pages in order to pass user pointers directly to ECREATE and EINIT, which avoids having to copy pages worth of data into the kernel. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <02f37708321bcdfaa2f9d41c8478affa6e84b04d.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: vmx: add mismatched size assertions in vmcs_check32()Haiwei Li
Add compile-time assertions in vmcs_check32() to disallow accesses to 64-bit and 64-bit high fields via vmcs_{read,write}32(). Upper level KVM code should never do partial accesses to VMCS fields. KVM handles the split accesses automatically in vmcs_{read,write}64() when running as a 32-bit kernel. Reviewed-and-tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20210409022456.23528-1-lihaiwei.kernel@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: x86: Remove unused function declarationKeqian Zhu
kvm_mmu_slot_largepage_remove_write_access() is decared but not used, just remove it. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Message-Id: <20210406063504.17552-1-zhukeqian1@huawei.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Enhance and clean up the vmcb tracking comment in pre_svm_run()Sean Christopherson
Explicitly document why a vmcb must be marked dirty and assigned a new asid when it will be run on a different cpu. The "what" is relatively obvious, whereas the "why" requires reading the APM and/or KVM code. Opportunistically remove a spurious period and several unnecessary newlines in the comment. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Add a comment to clarify what vcpu_svm.vmcb points atSean Christopherson
Add a comment above the declaration of vcpu_svm.vmcb to call out that it is simply a shorthand for current_vmcb->ptr. The myriad accesses to svm->vmcb are quite confusing without this crucial detail. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Drop vcpu_svm.vmcb_paSean Christopherson
Remove vmcb_pa from vcpu_svm and simply read current_vmcb->pa directly in the one path where it is consumed. Unlike svm->vmcb, use of the current vmcb's address is very limited, as evidenced by the fact that its use can be trimmed to a single dereference. Opportunistically add a comment about using vmcb01 for VMLOAD/VMSAVE, at first glance using vmcb01 instead of vmcb_pa looks wrong. No functional change intended. Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Don't set current_vmcb->cpu when switching vmcbSean Christopherson
Do not update the new vmcb's last-run cpu when switching to a different vmcb. If the vCPU is migrated between its last run and a vmcb switch, e.g. for nested VM-Exit, then setting the cpu without marking the vmcb dirty will lead to KVM running the vCPU on a different physical cpu with stale clean bit settings. vcpu->cpu current_vmcb->cpu hardware pre_svm_run() cpu0 cpu0 cpu0,clean kvm_arch_vcpu_load() cpu1 cpu0 cpu0,clean svm_switch_vmcb() cpu1 cpu1 cpu0,clean pre_svm_run() cpu1 cpu1 kaboom Simply delete the offending code; unlike VMX, which needs to update the cpu at switch time due to the need to do VMPTRLD, SVM only cares about which cpu last ran the vCPU. Fixes: af18fa775d07 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb") Cc: Cathy Avery <cavery@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20x86/platform/uv: Remove dead !CONFIG_KEXEC_CORE codeIngo Molnar
The !CONFIG_KEXEC_CORE code in arch/x86/platform/uv/uv_nmi.c was unused, untested and didn't even build for 7 years. Since we fixed this by requiring X86_UV to depend on CONFIG_KEXEC_CORE, remove the (now) dead code. Also move the uv_nmi_kexec_failed definition back up to where the other file-scope global variables are defined. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Mike Travis <travis@sgi.com> Cc: linux-kernel@vger.kernel.org
2021-04-20x86/platform/uv: Fix !KEXEC build failureIngo Molnar
When KEXEC is disabled, the UV build fails: arch/x86/platform/uv/uv_nmi.c:875:14: error: ‘uv_nmi_kexec_failed’ undeclared (first use in this function) Since uv_nmi_kexec_failed is only defined in the KEXEC_CORE #ifdef branch, this code cannot ever have been build tested: if (main) pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n"); atomic_set(&uv_nmi_kexec_failed, 1); Nor is this use possible in uv_handle_nmi(): atomic_set(&uv_nmi_kexec_failed, 0); These bugs were introduced in this commit: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails") Which added the uv_nmi_kexec_failed assignments to !KEXEC code, while making the definition KEXEC-only - apparently without testing the !KEXEC case. Instead of complicating the #ifdef maze, simplify the code by requiring X86_UV to depend on KEXEC_CORE. This pattern is present in other architectures as well. ( We'll remove the untested, 7 years old !KEXEC complications from the file in a separate commit. ) Fixes: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails") Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Mike Travis <travis@sgi.com> Cc: linux-kernel@vger.kernel.org
2021-04-20powerpc/kvm: Fix PR KVM with KUAP/MEM_KEYS enabledMichael Ellerman
The changes to add KUAP support with the hash MMU broke booting of KVM PR guests. The symptom is no visible progress of the guest, or possibly just "SLOF" being printed to the qemu console. Host code is still executing, but breaking into xmon might show a stack trace such as: __might_fault+0x84/0xe0 (unreliable) kvm_read_guest+0x1c8/0x2f0 [kvm] kvmppc_ld+0x1b8/0x2d0 [kvm] kvmppc_load_last_inst+0x50/0xa0 [kvm] kvmppc_exit_pr_progint+0x178/0x220 [kvm_pr] kvmppc_handle_exit_pr+0x31c/0xe30 [kvm_pr] after_sprg3_load+0x80/0x90 [kvm_pr] kvmppc_vcpu_run_pr+0x104/0x260 [kvm_pr] kvmppc_vcpu_run+0x34/0x48 [kvm] kvm_arch_vcpu_ioctl_run+0x340/0x450 [kvm] kvm_vcpu_ioctl+0x2ac/0x8c0 [kvm] sys_ioctl+0x320/0x1060 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c Bisect points to commit b2ff33a10c8b ("powerpc/book3s64/hash/kuap: Enable kuap on hash"), but that's just the commit that enabled KUAP with hash and made the bug visible. The root cause seems to be that KVM PR is creating kernel mappings that don't use the correct key, since we switched to using key 3. We have a helper for adding the right key value, however it's designed to take a pteflags variable, which the KVM code doesn't have. But we can make it work by passing 0 for the pteflags, and tell it explicitly that it should use the kernel key. With that changed guests boot successfully. Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210419120139.1455937-1-mpe@ellerman.id.au
2021-04-20powerpc/pseries: Stop calling printk in rtas_stop_self()Michael Ellerman
RCU complains about us calling printk() from an offline CPU: ============================= WARNING: suspicious RCU usage 5.12.0-rc7-02874-g7cf90e481cb8 #1 Not tainted ----------------------------- kernel/locking/lockdep.c:3568 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/0/0. stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7-02874-g7cf90e481cb8 #1 Call Trace: dump_stack+0xec/0x144 (unreliable) lockdep_rcu_suspicious+0x124/0x144 __lock_acquire+0x1098/0x28b0 lock_acquire+0x128/0x600 _raw_spin_lock_irqsave+0x6c/0xc0 down_trylock+0x2c/0x70 __down_trylock_console_sem+0x60/0x140 vprintk_emit+0x1a8/0x4b0 vprintk_func+0xcc/0x200 printk+0x40/0x54 pseries_cpu_offline_self+0xc0/0x120 arch_cpu_idle_dead+0x54/0x70 do_idle+0x174/0x4a0 cpu_startup_entry+0x38/0x40 rest_init+0x268/0x388 start_kernel+0x748/0x790 start_here_common+0x1c/0x614 Which happens because by the time we get to rtas_stop_self() we are already offline. In addition the message can be spammy, and is not that helpful for users, so remove it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210418135413.1204031-1-mpe@ellerman.id.au
2021-04-20powerpc: Only define _TASK_CPU for 32-bitMichael Ellerman
We have some interesting code in our Makefile to define _TASK_CPU, based on awk'ing the value out of asm-offsets.h. It exists to circumvent some circular header dependencies that prevent us from referring to task_struct in the relevant code. See the comment around _TASK_CPU in smp.h for more detail. Maybe one day we can come up with a better solution, but for now we can at least limit that logic to 32-bit, because it's not needed for 64-bit. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210418131641.1186227-1-mpe@ellerman.id.au
2021-04-20powerpc/pseries: Add shutdown() to vio_driver and vio_busTyrel Datwyler
Currently, neither the vio_bus or vio_driver structures provide support for a shutdown() routine. Add support for shutdown() by allowing drivers to provide a implementation via function pointer in their vio_driver struct and provide a proper implementation in the driver template for the vio_bus that calls a vio drivers shutdown() if defined. In the case that no shutdown() is defined by a vio driver and a kexec is in progress we implement a big hammer that calls remove() to ensure no further DMA for the devices is possible. Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210402001325.939668-1-tyreld@linux.ibm.com
2021-04-20powerpc/perf: Expose processor pipeline stage cycles using ↵Athira Rajeev
PERF_SAMPLE_WEIGHT_STRUCT Performance Monitoring Unit (PMU) registers in powerpc provides information on cycles elapsed between different stages in the pipeline. This can be used for application tuning. On ISA v3.1 platform, this information is exposed by sampling registers. Patch adds kernel support to capture two of the cycle counters as part of perf sample using the sample type: PERF_SAMPLE_WEIGHT_STRUCT. The power PMU function 'get_mem_weight' currently uses 64 bit weight field of perf_sample_data to capture memory latency. But following the introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain 64-bit or 32-bit value depending on the architexture support for PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the pipeline stage cycles info. Hence update the ppmu functions to work for 64-bit and 32-bit weight values. If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field. if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem latency is stored in the low 32bits of perf_sample_weight structure. Also for CPU_FTR_ARCH_31, capture the two cycle counter information in two 16 bit fields of perf_sample_weight structure. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1616425047-1666-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-04-20powerpc/pseries: Set UNISOLATE on dlpar_cpu_remove() failureDaniel Henrique Barboza
The RTAS set-indicator call, when attempting to UNISOLATE a DRC that is already UNISOLATED or CONFIGURED, returns RTAS_OK and does nothing else for both QEMU and phyp. This gives us an opportunity to use this behavior to signal the hypervisor layer when an error during device removal happens, allowing it to do a proper error handling, while not breaking QEMU/phyp implementations that don't have this support. This patch introduces this idea by unisolating all CPU DRCs that failed to be removed by dlpar_cpu_remove_by_index(), when handling the PSERIES_HP_ELOG_ID_DRC_INDEX event. This is being done for this event only because its the only CPU removal event QEMU uses, and there's no need at this moment to add this mechanism for phyp only code. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210416210216.380291-3-danielhb413@gmail.com
2021-04-20powerpc/pseries: Introduce dlpar_unisolate_drc()Daniel Henrique Barboza
Next patch will execute a set-indicator call in hotplug-cpu.c. Create a dlpar_unisolate_drc() helper to avoid spreading more rtas_set_indicator() calls outside of dlpar.c. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210416210216.380291-2-danielhb413@gmail.com
2021-04-20powerpc/pseries/mce: Fix a typo in error type assignmentGanesh Goudar
The error type is ICACHE not DCACHE, for case MCE_ERROR_TYPE_ICACHE. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210416125750.49550-1-ganeshgr@linux.ibm.com
2021-04-20csky: Fixup typosJunlin Yang
fixes three typos found by codespell. Signed-off-by: Junlin Yang <yangjunlin@yulong.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2021-04-20csky: Remove duplicate include in arch/csky/kernel/entry.SZhang Yunkai
'asm/setup.h' included in 'arch/csky/kernel/entry.S' is duplicated. Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Signed-off-by: Guo Ren <guoren@kernel.org>
2021-04-19net: korina: Add support for device treeThomas Bogendoerfer
If there is no mac address passed via platform data try to get it via device tree and fall back to a random mac address, if all fail. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19net: korina: Only pass mac address via platform dataThomas Bogendoerfer
Get rid of access to struct korina_device by just passing the mac address via platform data and use drvdata for passing netdev to remove function. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19arm64: dts: ls1028a: declare the Integrated Endpoint Register Block nodeVladimir Oltean
Add a node describing the address in the SoC memory space for the IERB. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19KVM: SVM: Make sure GHCB is mapped before updatingTom Lendacky
Access to the GHCB is mainly in the VMGEXIT path and it is known that the GHCB will be mapped. But there are two paths where it is possible the GHCB might not be mapped. The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform the caller of the AP Reset Hold NAE event that a SIPI has been delivered. However, if a SIPI is performed without a corresponding AP Reset Hold, then the GHCB might not be mapped (depending on the previous VMEXIT), which will result in a NULL pointer dereference. The svm_complete_emulated_msr() routine will update the GHCB to inform the caller of a RDMSR/WRMSR operation about any errors. While it is likely that the GHCB will be mapped in this situation, add a safe guard in this path to be certain a NULL pointer dereference is not encountered. Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: X86: Do not yield to selfWanpeng Li
If the target is self we do not need to yield, we can avoid malicious guest to play this. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1617941911-5338-3-git-send-email-wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: X86: Count attempted/successful directed yieldWanpeng Li
To analyze some performance issues with lock contention and scheduling, it is nice to know when directed yield are successful or failing. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1617941911-5338-2-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19x86/kvm: Don't bother __pv_cpu_mask when !CONFIG_SMPWanpeng Li
Enable PV TLB shootdown when !CONFIG_SMP doesn't make sense. Let's move it inside CONFIG_SMP. In addition, we can avoid define and alloc __pv_cpu_mask when !CONFIG_SMP and get rid of 'alloc' variable in kvm_alloc_cpumask. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1617941911-5338-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returnsBen Gardon
To avoid saddling a vCPU thread with the work of tearing down an entire paging structure, take a reference on each root before they become obsolete, so that the thread initiating the fast invalidation can tear down the paging structure and (most likely) release the last reference. As a bonus, this teardown can happen under the MMU lock in read mode so as not to block the progress of vCPU threads. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-14-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Fast invalidation for TDP MMUBen Gardon
Provide a real mechanism for fast invalidation by marking roots as invalid so that their reference count will quickly fall to zero and they will be torn down. One negative side affect of this approach is that a vCPU thread will likely drop the last reference to a root and be saddled with the work of tearing down an entire paging structure. This issue will be resolved in a later commit. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-13-bgardon@google.com> [Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19perf/x86/rapl: Add support for Intel Alder LakeZhang Rui
Alder Lake RAPL support is the same as previous Sky Lake. Add Alder Lake model for RAPL. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-26-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/cstate: Add Alder Lake CPU supportKan Liang
Compared with the Rocket Lake, the CORE C1 Residency Counter is added for Alder Lake, but the CORE C3 Residency Counter is removed. Other counters are the same. Create a new adl_cstates for Alder Lake. Update the comments accordingly. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-25-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/msr: Add Alder Lake CPU supportKan Liang
PPERF and SMI_COUNT MSRs are also supported on Alder Lake. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-24-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/intel/uncore: Add Alder Lake supportKan Liang
The uncore subsystem for Alder Lake is similar to the previous Tiger Lake. The difference includes: - New MSR addresses for global control, fixed counters, CBOX and ARB. Add a new adl_uncore_msr_ops for uncore operations. - Add a new threshold field for CBOX. - New PCIIDs for IMC devices. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-23-git-send-email-kan.liang@linux.intel.com
2021-04-19perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHEKan Liang
Current Hardware events and Hardware cache events have special perf types, PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The two types don't pass the PMU type in the user interface. For a hybrid system, the perf subsystem doesn't know which PMU the events belong to. The first capable PMU will always be assigned to the events. The events never get a chance to run on the other capable PMUs. Extend the two types to become PMU aware types. The PMU type ID is stored at attr.config[63:32]. Add a new PMU capability, PERF_PMU_CAP_EXTENDED_HW_TYPE, to indicate a PMU which supports the extended PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The PMU type is only required when searching a specific PMU. The PMU specific codes will only be interested in the 'real' config value, which is stored in the low 32 bit of the event->attr.config. Update the event->attr.config in the generic code, so the PMU specific codes don't need to calculate it separately. If a user specifies a PMU type, but the PMU doesn't support the extended type, error out. If an event cannot be initialized in a PMU specified by a user, error out immediately. Perf should not try to open it on other PMUs. The new PMU capability is only set for the X86 hybrid PMUs for now. Other architectures, e.g., ARM, may need it as well. The support on ARM may be implemented later separately. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-22-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/intel: Add Alder Lake Hybrid supportKan Liang
Alder Lake Hybrid system has two different types of core, Golden Cove core and Gracemont core. The Golden Cove core is registered to "cpu_core" PMU. The Gracemont core is registered to "cpu_atom" PMU. The difference between the two PMUs include: - Number of GP and fixed counters - Events - The "cpu_core" PMU supports Topdown metrics. The "cpu_atom" PMU supports PEBS-via-PT. The "cpu_core" PMU is similar to the Sapphire Rapids PMU, but without PMEM. The "cpu_atom" PMU is similar to Tremont, but with different events, event_constraints, extra_regs and number of counters. The mem-loads AUX event workaround only applies to the Golden Cove core. Users may disable all CPUs of the same CPU type on the command line or in the BIOS. For this case, perf still register a PMU for the CPU type but the CPU mask is 0. Current caps/pmu_name is usually the microarch codename. Assign the "alderlake_hybrid" to the caps/pmu_name of both PMUs to indicate the hybrid Alder Lake microarchitecture. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-21-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Support filter_match callbackKan Liang
Implement filter_match callback for X86, which check whether an event is schedulable on the current CPU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-20-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/intel: Add attr_update for Hybrid PMUsKan Liang
The attribute_group for Hybrid PMUs should be different from the previous cpu PMU. For example, cpumask is required for a Hybrid PMU. The PMU type should be included in the event and format attribute. Add hybrid_attr_update for the Hybrid PMU. Check the PMU type in is_visible() function. Only display the event or format for the matched Hybrid PMU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-19-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Add structures for the attributes of Hybrid PMUsKan Liang
Hybrid PMUs have different events and formats. In theory, Hybrid PMU specific attributes should be maintained in the dedicated struct x86_hybrid_pmu, but it wastes space because the events and formats are similar among Hybrid PMUs. To reduce duplication, all hybrid PMUs will share a group of attributes in the following patch. To distinguish an attribute from different Hybrid PMUs, a PMU aware attribute structure is introduced. A PMU type is required for the attribute structure. The type is internal usage. It is not visible in the sysfs API. Hybrid PMUs may support the same event name, but with different event encoding, e.g., the mem-loads event on an Atom PMU has different event encoding from a Core PMU. It brings issue if two attributes are created for them. Current sysfs_update_group finds an attribute by searching the attr name (aka event name). If two attributes have the same event name, the first attribute will be replaced. To address the issue, only one attribute is created for the event. The event_str is extended and stores event encodings from all Hybrid PMUs. Each event encoding is divided by ";". The order of the event encodings must follow the order of the hybrid PMU index. The event_str is internal usage as well. When a user wants to show the attribute of a Hybrid PMU, only the corresponding part of the string is displayed. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-18-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Register hybrid PMUsKan Liang
Different hybrid PMUs have different PMU capabilities and events. Perf should registers a dedicated PMU for each of them. To check the X86 event, perf has to go through all possible hybrid pmus. All the hybrid PMUs are registered at boot time. Before the registration, add intel_pmu_check_hybrid_pmus() to check and update the counters information, the event constraints, the extra registers and the unique capabilities for each hybrid PMUs. Postpone the display of the PMU information and HW check to CPU_STARTING, because the boot CPU is the only online CPU in the init_hw_perf_events(). Perf doesn't know the availability of the other PMUs. Perf should display the PMU information only if the counters of the PMU are available. One type of CPUs may be all offline. For this case, users can still observe the PMU in /sys/devices, but its CPU mask is 0. All hybrid PMUs have capability PERF_PMU_CAP_HETEROGENEOUS_CPUS. The PMU name for hybrid PMUs will be "cpu_XXX", which will be assigned later in a separated patch. The PMU type id for the core PMU is still PERF_TYPE_RAW. For the other hybrid PMUs, the PMU type id is not hard code. The event->cpu must be compatitable with the supported CPUs of the PMU. Add a check in the x86_pmu_event_init(). The events in a group must be from the same type of hybrid PMU. The fake cpuc used in the validation must be from the supported CPU of the event->pmu. Perf may not retrieve a valid core type from get_this_hybrid_cpu_type(). For example, ADL may have an alternative configuration. With that configuration, Perf cannot retrieve the core type from the CPUID leaf 0x1a. Add a platform specific get_hybrid_cpu_type(). If the generic way fails, invoke the platform specific get_hybrid_cpu_type(). Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-17-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Factor out x86_pmu_show_pmu_capKan Liang
The PMU capabilities are different among hybrid PMUs. Perf should dump the PMU capabilities information for each hybrid PMU. Factor out x86_pmu_show_pmu_cap() which shows the PMU capabilities information. The function will be reused later when registering a dedicated hybrid PMU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-16-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Remove temporary pmu assignment in event_initKan Liang
The temporary pmu assignment in event_init is unnecessary. The assignment was introduced by commit 8113070d6639 ("perf_events: Add fast-path to the rescheduling code"). At that time, event->pmu is not assigned yet when initializing an event. The assignment is required. However, from commit 7e5b2a01d2ca ("perf: provide PMU when initing events"), the event->pmu is provided before event_init is invoked. The temporary pmu assignment in event_init should be removed. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-15-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/intel: Factor out intel_pmu_check_extra_regsKan Liang
Each Hybrid PMU has to check and update its own extra registers before registration. The intel_pmu_check_extra_regs will be reused later to check the extra registers of each hybrid PMU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-14-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/intel: Factor out intel_pmu_check_event_constraintsKan Liang
Each Hybrid PMU has to check and update its own event constraints before registration. The intel_pmu_check_event_constraints will be reused later to check the event constraints of each hybrid PMU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-13-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/intel: Factor out intel_pmu_check_num_countersKan Liang
Each Hybrid PMU has to check its own number of counters and mask fixed counters before registration. The intel_pmu_check_num_counters will be reused later to check the number of the counters for each hybrid PMU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-12-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Hybrid PMU support for extra_regsKan Liang
Different hybrid PMU may have different extra registers, e.g. Core PMU may have offcore registers, frontend register and ldlat register. Atom core may only have offcore registers and ldlat register. Each hybrid PMU should use its own extra_regs. An Intel Hybrid system should always have extra registers. Unconditionally allocate shared_regs for Intel Hybrid system. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-11-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Hybrid PMU support for event constraintsKan Liang
The events are different among hybrid PMUs. Each hybrid PMU should use its own event constraints. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-10-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Hybrid PMU support for hardware cache eventKan Liang
The hardware cache events are different among hybrid PMUs. Each hybrid PMU should have its own hw cache event table. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-9-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Hybrid PMU support for unconstrainedKan Liang
The unconstrained value depends on the number of GP and fixed counters. Each hybrid PMU should use its own unconstrained. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-8-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Hybrid PMU support for countersKan Liang
The number of GP and fixed counters are different among hybrid PMUs. Each hybrid PMU should use its own counter related information. When handling a certain hybrid PMU, apply the number of counters from the corresponding hybrid PMU. When reserving the counters in the initialization of a new event, reserve all possible counters. The number of counter recored in the global x86_pmu is for the architecture counters which are available for all hybrid PMUs. KVM doesn't support the hybrid PMU yet. Return the number of the architecture counters for now. For the functions only available for the old platforms, e.g., intel_pmu_drain_pebs_nhm(), nothing is changed. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-7-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Hybrid PMU support for intel_ctrlKan Liang
The intel_ctrl is the counter mask of a PMU. The PMU counter information may be different among hybrid PMUs, each hybrid PMU should use its own intel_ctrl to check and access the counters. When handling a certain hybrid PMU, apply the intel_ctrl from the corresponding hybrid PMU. When checking the HW existence, apply the PMU and number of counters from the corresponding hybrid PMU as well. Perf will check the HW existence for each Hybrid PMU before registration. Expose the check_hw_exists() for a later patch. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-6-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/intel: Hybrid PMU support for perf capabilitiesKan Liang
Some platforms, e.g. Alder Lake, have hybrid architecture. Although most PMU capabilities are the same, there are still some unique PMU capabilities for different hybrid PMUs. Perf should register a dedicated pmu for each hybrid PMU. Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and capabilities for each hybrid PMU. The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the architecture features which are available on all hybrid PMUs. The architecture features are stored in the global x86_pmu.intel_cap. For Alder Lake, the model-specific features are perf metrics and PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap should be 0 for these two features. Perf should not use the global intel_cap to check the features on a hybrid system. Add a dedicated intel_cap in the x86_hybrid_pmu to store the model-specific capabilities. Use the dedicated intel_cap to replace the global intel_cap for thse two features. The dedicated intel_cap will be set in the following "Add Alder Lake Hybrid support" patch. Add is_hybrid() to distinguish a hybrid system. ADL may have an alternative configuration. With that configuration, the X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit. Add a new static_key_false, perf_is_hybrid, to indicate a hybrid system. It will be assigned in the following "Add Alder Lake Hybrid support" patch as well. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-5-git-send-email-kan.liang@linux.intel.com