summaryrefslogtreecommitdiff
path: root/arch/x86/events/intel/core.c
AgeCommit message (Collapse)Author
2022-09-01perf/x86/core: Completely disable guest PEBS via guest's global_ctrlLike Xu
When a guest PEBS counter is cross-mapped by a host counter, software will remove the corresponding bit in the arr[global_ctrl].guest and expect hardware to perform a change of state "from enable to disable" via the msr_slot[] switch during the vmx transaction. The real world is that if user adjust the counter overflow value small enough, it still opens a tiny race window for the previously PEBS-enabled counter to write cross-mapped PEBS records into the guest's PEBS buffer, when arr[global_ctrl].guest has been prioritised (switch_msr_special stuff) to switch into the enabled state, while the arr[pebs_enable].guest has not. Close this window by clearing invalid bits in the arr[global_ctrl].guest. Fixes: 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220831033524.58561-1-likexu@tencent.com
2022-09-01perf/x86/intel: Fix unchecked MSR access error for Alder Lake NKan Liang
For some Alder Lake N machine, the below unchecked MSR access error may be triggered. [ 0.088017] rcu: Hierarchical SRCU implementation. [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30) [ 0.088017] Call Trace: [ 0.088017] <TASK> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0 The Alder Lake N only has e-cores. The X86_FEATURE_HYBRID_CPU flag is not set. The perf cannot retrieve the correct CPU type via get_this_hybrid_cpu_type(). The model specific get_hybrid_cpu_type() is hardcode to p-core. The wrong CPU type is given to the PMU of the Alder Lake N. Since Alder Lake N isn't in fact a hybrid CPU, remove ALDERLAKE_N from the rest of {ALDER,RAPTOP}LAKE and create a non-hybrid PMU setup. The differences between Gracemont and the previous Tremont are, - Number of GP counters - Load and store latency Events - PEBS event_constraints - Instruction Latency support - Data source encoding - Memory access latency encoding Fixes: c2a960f7c574 ("perf/x86: Add new Alder Lake and Raptor Lake support") Reported-by: Jianfeng Gao <jianfeng.gao@intel.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220831142702.153110-1-kan.liang@linux.intel.com
2022-08-19perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the BaselinePeter Zijlstra
The SDM explicitly states that PEBS Baseline implies Extended PEBS. For cpu model forward compatibility (e.g. on ICX, SPR, ADL), it's safe to stop doing FMS table thing such as setting pebs_capable and PMU_FL_PEBS_ALL since it's already set in the intel_ds_init(). The Goldmont Plus is the only platform which supports extended PEBS but doesn't have Baseline. Keep the status quo. Reported-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220816114057.51307-1-likexu@tencent.com
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "Quite a large pull request due to a selftest API overhaul and some patches that had come in too late for 5.19. ARM: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes RISC-V: - Track ISA extensions used by Guest using bitmap - Added system instruction emulation framework - Added CSR emulation framework - Added gfp_custom flag in struct kvm_mmu_memory_cache - Added G-stage ioremap() and iounmap() functions - Added support for Svpbmt inside Guest s390: - add an interface to provide a hypervisor dump for secure guests - improve selftests to use TAP interface - enable interpretive execution of zPCI instructions (for PCI passthrough) - First part of deferred teardown - CPU Topology - PV attestation - Minor fixes x86: - Permit guests to ignore single-bit ECC errors - Intel IPI virtualization - Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS - PEBS virtualization - Simplify PMU emulation by just using PERF_TYPE_RAW events - More accurate event reinjection on SVM (avoid retrying instructions) - Allow getting/setting the state of the speaker port data bit - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent - "Notify" VM exit (detect microarchitectural hangs) for Intel - Use try_cmpxchg64 instead of cmpxchg64 - Ignore benign host accesses to PMU MSRs when PMU is disabled - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior - Allow NX huge page mitigation to be disabled on a per-vm basis - Port eager page splitting to shadow MMU as well - Enable CMCI capability by default and handle injected UCNA errors - Expose pid of vcpu threads in debugfs - x2AVIC support for AMD - cleanup PIO emulation - Fixes for LLDT/LTR emulation - Don't require refcounted "struct page" to create huge SPTEs - Miscellaneous cleanups: - MCE MSR emulation - Use separate namespaces for guest PTEs and shadow PTEs bitmasks - PIO emulation - Reorganize rmap API, mostly around rmap destruction - Do not workaround very old KVM bugs for L0 that runs with nesting enabled - new selftests API for CPUID Generic: - Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache - new selftests API using struct kvm_vcpu instead of a (vm, id) tuple" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits) selftests: kvm: set rax before vmcall selftests: KVM: Add exponent check for boolean stats selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test selftests: KVM: Check stat name before other fields KVM: x86/mmu: remove unused variable RISC-V: KVM: Add support for Svpbmt inside Guest/VM RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap() RISC-V: KVM: Add G-stage ioremap() and iounmap() functions KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache RISC-V: KVM: Add extensible CSR emulation framework RISC-V: KVM: Add extensible system instruction emulation framework RISC-V: KVM: Factor-out instruction emulation into separate sources RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function RISC-V: KVM: Fix variable spelling mistake RISC-V: KVM: Improve ISA extension by using a bitmap KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT KVM: x86: Do not block APIC write for non ICR registers ...
2022-07-04perf/x86/intel: Fix PEBS data source encoding for ADLKan Liang
The PEBS data source encoding for the e-core is different from the p-core. Add the pebs_data_source[] in the struct x86_hybrid_pmu to store the data source encoding for each type of the core. Add intel_pmu_pebs_data_source_grt() for the e-core. There is nothing changed for the data source encoding of the p-core, which still reuse the intel_pmu_pebs_data_source_skl(). Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20220629150840.2235741-2-kan.liang@linux.intel.com
2022-07-04perf/x86/intel: Fix PEBS memory access info encoding for ADLKan Liang
The PEBS memory access latency encoding for the e-core is slightly different from the p-core. The bit 4 is Lock, while the bit 5 is TLB access. Add a new flag to indicate the load/store latency event on a hybrid platform. Add a new function pointer to retrieve the latency data for a hybrid platform. Only implement the new flag and function for the e-core on ADL. Still use the existing PERF_X86_EVENT_PEBS_LDLAT/STLAT flag for the p-core on ADL. Factor out pebs_set_tlb_lock() to set the generic memory data source information of the TLB access and lock for both load and store latency. Move the intel_get_event_constraints() to ahead of the :ppp check, otherwise the new flag never gets a chance to be set for the :ppp events. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20220629150840.2235741-1-kan.liang@linux.intel.com
2022-06-08perf/x86/intel: Fix the comment about guest LBR support on KVMLike Xu
Starting from v5.12, KVM reports guest LBR and extra_regs support when the host has relevant support. Just delete this part of the comment and fix a typo incidentally. Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Message-Id: <20220517154100.29983-2-weijiang.yang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Disable guest PEBS temporarily in two rare situationsLike Xu
The guest PEBS will be disabled when some users try to perf KVM and its user-space through the same PEBS facility OR when the host perf doesn't schedule the guest PEBS counter in a one-to-one mapping manner (neither of these are typical scenarios). The PEBS records in the guest DS buffer are still accurate and the above two restrictions will be checked before each vm-entry only if guest PEBS is deemed to be enabled. Suggested-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-15-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBSLike Xu
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the adaptive PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record enable bits (IA32_PERFEVTSELx.Adaptive_Record and IA32_FIXED_CTR_CTRL. FCx_Adaptive_Record) are also supported. Adaptive PEBS provides software the capability to configure the PEBS records to capture only the data of interest, keeping the record size compact. An overflow of PMCx results in generation of an adaptive PEBS record with state information based on the selections specified in MSR_PEBS_DATA_CFG.By default, the record only contain the Basic group. When guest adaptive PEBS is enabled, the IA32_PEBS_ENABLE MSR will be added to the perf_guest_switch_msr() and switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. According to Intel SDM, software is recommended to PEBS Baseline when the following is true. IA32_PERF_CAPABILITIES.PEBS_BASELINE[14] && IA32_PERF_CAPABILITIES.PEBS_FMT[11:8] ≥ 4. Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-12-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DSLike Xu
When CPUID.01H:EDX.DS[21] is set, the IA32_DS_AREA MSR exists and points to the linear address of the first byte of the DS buffer management area, which is used to manage the PEBS records. When guest PEBS is enabled, the MSR_IA32_DS_AREA MSR will be added to the perf_guest_switch_msr() and switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. The WRMSR to IA32_DS_AREA MSR brings a #GP(0) if the source register contains a non-canonical address. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-11-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counterLike Xu
The PEBS-PDIR facility on Ice Lake server is supported on IA31_FIXED0 only. If the guest configures counter 32 and PEBS is enabled, the PEBS-PDIR facility is supposed to be used, in which case KVM adjusts attr.precise_ip to 3 and request host perf to assign the exactly requested counter or fail. The CPU model check is also required since some platforms may place the PEBS-PDIR facility in another counter index. Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-10-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBSLike Xu
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed and general-purpose counters have corresponding bits in IA32_PEBS_ENABLE that enable generation of PEBS records. The general-purpose counter bits start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at bit IA32_PEBS_ENABLE[32]. When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be added to the perf_guest_switch_msr() and atomically switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. Based on whether the platform supports x86_pmu.pebs_ept, it has also refactored the way to add more msrs to arr[] in intel_guest_get_msrs() for extensibility. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-8-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK valuePeter Zijlstra (Intel)
The value of pebs_counter_mask will be accessed frequently for repeated use in the intel_guest_get_msrs(). So it can be optimized instead of endlessly mucking about with branches. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-7-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/core: Pass "struct kvm_pmu *" to determine the guest valuesLike Xu
Splitting the logic for determining the guest values is unnecessarily confusing, and potentially fragile. Perf should have full knowledge and control of what values are loaded for the guest. If we change .guest_get_msrs() to take a struct kvm_pmu pointer, then it can generate the full set of guest values by grabbing guest ds_area and pebs_data_cfg. Alternatively, .guest_get_msrs() could take the desired guest MSR values directly (ds_area and pebs_data_cfg), but kvm_pmu is vendor agnostic, so we don't see any reason to not just pass the pointer. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-4-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/intel: Handle guest PEBS overflow PMI for KVM guestLike Xu
With PEBS virtualization, the guest PEBS records get delivered to the guest DS, and the host pmi handler uses perf_guest_cbs->is_in_guest() to distinguish whether the PMI comes from the guest code like Intel PT. No matter how many guest PEBS counters are overflowed, only triggering one fake event is enough. The fake event causes the KVM PMI callback to be called, thereby injecting the PEBS overflow PMI into the guest. KVM may inject the PMI with BUFFER_OVF set, even if the guest DS is empty. That should really be harmless. Thus guest PEBS handler would retrieve the correct information from its own PEBS records buffer. Cc: linux-perf-users@vger.kernel.org Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake ServerLike Xu
Add support for EPT-Friendly PEBS, a new CPU feature that enlightens PEBS to translate guest linear address through EPT, and facilitates handling VM-Exits that occur when accessing PEBS records. More information can be found in the December 2021 release of Intel's SDM, Volume 3, 18.9.5 "EPT-Friendly PEBS". This new hardware facility makes sure the guest PEBS records will not be lost, which is available on Intel Ice Lake Server platforms (and later). KVM will check this field through perf_get_x86_pmu_capability() instead of hard coding the CPU models in the KVM code. If it is supported, the guest PEBS capability will be exposed to the guest. Guest PEBS can be enabled when and only when "EPT-Friendly PEBS" is supported and EPT is enabled. Cc: linux-perf-users@vger.kernel.org Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25perf/x86/intel: Fix event constraints for ICLKan Liang
According to the latest event list, the event encoding 0x55 INST_DECODED.DECODERS and 0x56 UOPS_DECODED.DEC0 are only available on the first 4 counters. Add them into the event constraints table. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220525133952.1660658-1-kan.liang@linux.intel.com
2022-05-11perf/x86: Add new Alder Lake and Raptor Lake supportKan Liang
From PMU's perspective, there is no difference for the new Alder Lake N and Raptor Lake P. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220504194413.1003071-1-kan.liang@linux.intel.com
2022-04-05perf/x86/intel: Update the FRONTEND MSR mask on Sapphire RapidsKan Liang
On Sapphire Rapids, the FRONTEND_RETIRED.MS_FLOWS event requires the FRONTEND MSR value 0x8. However, the current FRONTEND MSR mask doesn't support it. Update intel_spr_extra_regs[] to support it. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1648482543-14923-2-git-send-email-kan.liang@linux.intel.com
2022-04-05perf/x86/intel: Don't extend the pseudo-encoding to GP countersKan Liang
The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR. perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0 Performance counter stats for 'CPU(s) 0': 607,246 cpu/event=0xc0,umask=0x0/ 0 cpu/event=0x0,umask=0x1/ The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which doesn't work on the generic counters. However, current perf extends its mask to the generic counters. The pseudo event-code for a fixed counter must be 0x00. Check and avoid extending the mask for the fixed counter event which using the pseudo-encoding, e.g., ref-cycles and PREC_DIST event. With the patch, perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0 Performance counter stats for 'CPU(s) 0': 583,184 cpu/event=0xc0,umask=0x0/ 583,048 cpu/event=0x0,umask=0x1/ Fixes: 2de71ee153ef ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1648482543-14923-1-git-send-email-kan.liang@linux.intel.com
2022-04-05perf/x86: Add Intel Raptor Lake supportKan Liang
From PMU's perspective, Raptor Lake is the same as the Alder Lake. The only difference is the event list, which will be supported in the perf tool later. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/1647366360-82824-1-git-send-email-kan.liang@linux.intel.com
2022-03-22Merge tag 'perf-core-2022-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event updates from Ingo Molnar: - Fix address filtering for Intel/PT,ARM/CoreSight - Enable Intel/PEBS format 5 - Allow more fixed-function counters for x86 - Intel/PT: Enable not recording Taken-Not-Taken packets - Add a few branch-types * tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix the build on !CONFIG_PHYS_ADDR_T_64BIT perf: Add irq and exception return branch types perf/x86/intel/uncore: Make uncore_discovery clean for 64 bit addresses perf/x86/intel/pt: Add a capability and config bit for disabling TNTs perf/x86/intel/pt: Add a capability and config bit for event tracing perf/x86/intel: Increase max number of the fixed counters KVM: x86: use the KVM side max supported fixed counter perf/x86/intel: Enable PEBS format 5 perf/core: Allow kernel address filter when not filtering the kernel perf/x86/intel/pt: Fix address filter config for 32-bit kernel perf/core: Fix address filter parser for multiple filters x86: Share definition of __is_canonical_address() perf/x86/intel/pt: Relax address filter validation
2022-02-02perf/x86/intel: Increase max number of the fixed countersKan Liang
The new PEBS format 5 implies that the number of the fixed counters can be up to 16. The current INTEL_PMC_MAX_FIXED is still 4. If the current kernel runs on a future platform which has more than 4 fixed counters, a warning will be triggered. The number of the fixed counters will be clipped to 4. Users have to upgrade the kernel to access the new fixed counters. Add a new default constraint for PerfMon v5 and up, which can support up to 16 fixed counters. The pseudo-encoding is applied for the fixed counters 4 and later. The user can have generic support for the new fixed counters on the future platfroms without updating the kernel. Increase the INTEL_PMC_MAX_FIXED to 16. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1643750603-100733-3-git-send-email-kan.liang@linux.intel.com
2022-02-02x86/perf: Default set FREEZE_ON_SMI for allPeter Zijlstra
Kyle reported that rr[0] has started to malfunction on Comet Lake and later CPUs due to EFI starting to make use of CPL3 [1] and the PMU event filtering not distinguishing between regular CPL3 and SMM CPL3. Since this is a privilege violation, default disable SMM visibility where possible. Administrators wanting to observe SMM cycles can easily change this using the sysfs attribute while regular users don't have access to this file. [0] https://rr-project.org/ [1] See the Intel white paper "Trustworthy SMM on the Intel vPro Platform" at https://bugzilla.kernel.org/attachment.cgi?id=300300, particularly the end of page 5. Reported-by: Kyle Huey <me@kylehuey.com> Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@kernel.org Link: https://lkml.kernel.org/r/YfKChjX61OW4CkYm@hirez.programming.kicks-ass.net
2022-01-18perf/x86/intel/lbr: Support LBR format V7Peter Zijlstra (Intel)
The Goldmont plus and Tremont have LBR format V7. The V7 has LBR_INFO, which is the same as LBR format V5. But V7 doesn't support TSX. Without the patch, the associated misprediction and cycles information in the LBR_INFO may be lost on a Goldmont plus platform. For Tremont, the patch only impacts the non-PEBS events. Because of the adaptive PEBS, the LBR_INFO is always processed for a PEBS event. Currently, two different ways are used to check the LBR capabilities, which make the codes complex and confusing. For the LBR format V4 and earlier, the global static lbr_desc array is used to store the flags for the LBR capabilities in each LBR format. For LBR format V5 and V6, the current code checks the version number for the LBR capabilities. There are common LBR capabilities among LBR format versions. Several flags for the LBR capabilities are introduced into the struct x86_pmu. The flags, which can be shared among LBR formats, are used to check the LBR capabilities. Add intel_pmu_lbr_init() to set the flags accordingly at boot time. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/1641315077-96661-1-git-send-email-peterz@infradead.org
2022-01-18perf/x86/intel: Add a quirk for the calculation of the number of counters on ↵Kan Liang
Alder Lake For some Alder Lake machine with all E-cores disabled in a BIOS, the below warning may be triggered. [ 2.010766] hw perf events fixed 5 > max(4), clipping! Current perf code relies on the CPUID leaf 0xA and leaf 7.EDX[15] to calculate the number of the counters and follow the below assumption. For a hybrid configuration, the leaf 7.EDX[15] (X86_FEATURE_HYBRID_CPU) is set. The leaf 0xA only enumerate the common counters. Linux perf has to manually add the extra GP counters and fixed counters for P-cores. For a non-hybrid configuration, the X86_FEATURE_HYBRID_CPU should not be set. The leaf 0xA enumerates all counters. However, that's not the case when all E-cores are disabled in a BIOS. Although there are only P-cores in the system, the leaf 7.EDX[15] (X86_FEATURE_HYBRID_CPU) is still set. But the leaf 0xA is updated to enumerate all counters of P-cores. The inconsistency triggers the warning. Several software ways were considered to handle the inconsistency. - Drop the leaf 0xA and leaf 7.EDX[15] CPUID enumeration support. Hardcode the number of counters. This solution may be a problem for virtualization. A hypervisor cannot control the number of counters in a Linux guest via changing the guest CPUID enumeration anymore. - Find another CPUID bit that is also updated with E-cores disabled. There may be a problem in the virtualization environment too. Because a hypervisor may disable the feature/CPUID bit. - The P-cores have a maximum of 8 GP counters and 4 fixed counters on ADL. The maximum number can be used to detect the case. This solution is implemented in this patch. Fixes: ee72a94ea4a6 ("perf/x86/intel: Fix fixed counter check warning for some Alder Lake") Reported-by: Damjan Marion (damarion) <damarion@cisco.com> Reported-by: Chan Edison <edison_chan_gz@hotmail.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Damjan Marion (damarion) <damarion@cisco.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1641925238-149288-1-git-send-email-kan.liang@linux.intel.com
2022-01-12Merge tag 'perf_core_for_v5.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Borislav Petkov: "Cleanup of the perf/kvm interaction." * tag 'perf_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Drop guest callback (un)register stubs KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y KVM: arm64: Convert to the generic perf callbacks KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c KVM: Move x86's perf guest info callbacks to generic KVM KVM: x86: More precisely identify NMI from guest when handling PMI KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable perf/core: Use static_call to optimize perf_guest_info_callbacks perf: Force architectures to opt-in to guest callbacks perf: Add wrappers for invoking guest callbacks perf/core: Rework guest callbacks to prepare for static_call support perf: Drop dead and useless guest "support" from arm, csky, nds32 and riscv perf: Stop pretending that perf can handle multiple guest callbacks KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest KVM: x86: Register perf callbacks after calling vendor's hardware_setup() perf: Protect perf_guest_cbs with RCU
2021-11-17perf: Add wrappers for invoking guest callbacksSean Christopherson
Add helpers for the guest callbacks to prepare for burying the callbacks behind a Kconfig (it's a lot easier to provide a few stubs than to #ifdef piles of code), and also to prepare for converting the callbacks to static_call(). perf_instruction_pointer() in particular will have subtle semantics with static_call(), as the "no callbacks" case will return 0 if the callbacks are unregistered between querying guest state and getting the IP. Implement the change now to avoid a functional change when adding static_call() support, and because the new helper needs to return _something_ in this case. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20211111020738.2512932-8-seanjc@google.com
2021-11-17perf/core: Rework guest callbacks to prepare for static_call supportLike Xu
To prepare for using static_calls to optimize perf's guest callbacks, replace ->is_in_guest and ->is_user_mode with a new multiplexed hook ->state, tweak ->handle_intel_pt_intr to play nice with being called when there is no active guest, and drop "guest" from ->get_guest_ip. Return '0' from ->state and ->handle_intel_pt_intr to indicate "not in guest" so that DEFINE_STATIC_CALL_RET0 can be used to define the static calls, i.e. no callback == !guest. [sean: extracted from static_call patch, fixed get_ip() bug, wrote changelog] Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20211111020738.2512932-7-seanjc@google.com
2021-11-17perf: Protect perf_guest_cbs with RCUSean Christopherson
Protect perf_guest_cbs with RCU to fix multiple possible errors. Luckily, all paths that read perf_guest_cbs already require RCU protection, e.g. to protect the callback chains, so only the direct perf_guest_cbs touchpoints need to be modified. Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure perf_guest_cbs isn't reloaded between a !NULL check and a dereference. Fixed via the READ_ONCE() in rcu_dereference(). Bug #2 is that on weakly-ordered architectures, updates to the callbacks themselves are not guaranteed to be visible before the pointer is made visible to readers. Fixed by the smp_store_release() in rcu_assign_pointer() when the new pointer is non-NULL. Bug #3 is that, because the callbacks are global, it's possible for readers to run in parallel with an unregisters, and thus a module implementing the callbacks can be unloaded while readers are in flight, resulting in a use-after-free. Fixed by a synchronize_rcu() call when unregistering callbacks. Bug #1 escaped notice because it's extremely unlikely a compiler will reload perf_guest_cbs in this sequence. perf_guest_cbs does get reloaded for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest() guard all but guarantees the consumer will win the race, e.g. to nullify perf_guest_cbs, KVM has to completely exit the guest and teardown down all VMs before KVM start its module unload / unregister sequence. This also makes it all but impossible to encounter bug #3. Bug #2 has not been a problem because all architectures that register callbacks are strongly ordered and/or have a static set of callbacks. But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming kvm_intel module load/unload leads to: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:perf_misc_flags+0x1c/0x70 Call Trace: perf_prepare_sample+0x53/0x6b0 perf_event_output_forward+0x67/0x160 __perf_event_overflow+0x52/0xf0 handle_pmi_common+0x207/0x300 intel_pmu_handle_irq+0xcf/0x410 perf_event_nmi_handler+0x28/0x50 nmi_handle+0xc7/0x260 default_do_nmi+0x6b/0x170 exc_nmi+0x103/0x130 asm_exc_nmi+0x76/0xbf Fixes: 39447b386c84 ("perf: Enhance perf to allow for guest statistic collection from host") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211111020738.2512932-2-seanjc@google.com
2021-11-17x86/perf: Fix snapshot_branch_stack warning in VMSong Liu
When running in VM intel_pmu_snapshot_branch_stack triggers WRMSR warning like: [ ] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0000000000000000) at rIP: 0xffffffff81011a5b (intel_pmu_snapshot_branch_stack+0x3b/0xd0) This can be triggered with BPF selftests: tools/testing/selftests/bpf/test_progs -t get_branch_snapshot This warning is caused by __intel_pmu_pebs_disable_all() in the VM. Since it is not necessary to disable PEBS for LBR, remove it from intel_pmu_snapshot_branch_stack and intel_pmu_snapshot_arch_branch_stack. Fixes: c22ac2a3d4bd ("perf: Enable branch record for software events") Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20211112054510.2667030-1-songliubraving@fb.com
2021-11-11perf/x86/vlbr: Add c->flags to vlbr event constraintsLike Xu
Just like what we do in the x86_get_event_constraints(), the PERF_X86_EVENT_LBR_SELECT flag should also be propagated to event->hw.flags so that the host lbr driver can save/restore MSR_LBR_SELECT for the special vlbr event created by KVM or BPF. Fixes: 097e4311cda9 ("perf/x86: Add constraint to create guest LBR event without hw counter") Reported-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Wanpeng Li <wanpengli@tencent.com> Link: https://lore.kernel.org/r/20211103091716.59906-1-likexu@tencent.com
2021-11-02Merge tag 'net-next-for-5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ...
2021-11-01Merge tag 'perf-core-2021-10-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Core: - Allow ftrace to instrument parts of the perf core code - Add a new mem_hops field to perf_mem_data_src which allows to represent intra-node/package or inter-node/off-package details to prepare for next generation systems which have more hieararchy within the node/pacakge level. Tools: - Update for the new mem_hops field in perf_mem_data_src Arch: - A set of constraints fixes for the Intel uncore PMU - The usual set of small fixes and improvements for x86 and PPC" * tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses tools/perf: Add mem_hops field in perf_mem_data_src structure perf: Add mem_hops field in perf_mem_data_src structure perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line perf/core: Allow ftrace for functions in kernel/event/core.c perf/x86: Add new event for AUX output counter index perf/x86: Add compiler barrier after updating BTS perf/x86/intel/uncore: Fix Intel SPR M3UPI event constraints perf/x86/intel/uncore: Fix Intel SPR M2PCIE event constraints perf/x86/intel/uncore: Fix Intel SPR IIO event constraints perf/x86/intel/uncore: Fix Intel SPR CHA event constraints perf/x86/intel/uncore: Fix Intel ICX IIO event constraints perf/x86/intel/uncore: Fix invalid unit check perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
2021-10-30perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodingsStephane Eranian
This patch fixes the encoding for INST_RETIRED.PREC_DIST as published by Intel (download.01.org/perfmon/) for Icelake. The official encoding is event code 0x00 umask 0x1, a change from Skylake where it was code 0xc0 umask 0x1. With this patch applied it is possible to run: $ perf record -a -e cpu/event=0x00,umask=0x1/pp ..... Whereas before this would fail. To avoid problems with tools which may use the old code, we maintain the old encoding for Icelake. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211014001214.2680534-1-eranian@google.com
2021-10-15perf/x86: Add new event for AUX output counter indexAdrian Hunter
PEBS-via-PT records contain a mask of applicable counters. To identify which event belongs to which counter, a side-band event is needed. Until now, there has been no side-band event, and consequently users were limited to using a single event. Add such a side-band event. Note the event is optimised to output only when the counter index changes for an event. That works only so long as all PEBS-via-PT events are scheduled together, which they are for a recording session because they are in a single group. Also no attribute bit is used to select the new event, so a new kernel is not compatible with older perf tools. The assumption being that PEBS-via-PT is sufficiently esoteric that users will not be troubled by this. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210907163903.11820-2-adrian.hunter@intel.com
2021-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01perf/x86/intel: Update event constraints for ICXKan Liang
According to the latest event list, the event encoding 0xEF is only available on the first 4 counters. Add it into the event constraints table. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1632842343-25862-1-git-send-email-kan.liang@linux.intel.com
2021-09-13perf: Enable branch record for software eventsSong Liu
The typical way to access branch record (e.g. Intel LBR) is via hardware perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other hand, LBR could also be useful in non-PMI scenario. For example, in kretprobe or bpf fexit program, LBR could provide a lot of information on what happened with the function. Add API to use branch record for software use. Note that, when the software event triggers, it is necessary to stop the branch record hardware asap. Therefore, static_call is used to remove some branch instructions in this process. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/bpf/20210910183352.3151445-2-songliubraving@fb.com
2021-08-26perf/x86/intel: Replace deprecated CPU-hotplug functionsSebastian Andrzej Siewior
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210803141621.780504-11-bigeasy@linutronix.de
2021-08-06perf/x86/intel: Apply mid ACK for small coreKan Liang
A warning as below may be occasionally triggered in an ADL machine when these conditions occur: - Two perf record commands run one by one. Both record a PEBS event. - Both runs on small cores. - They have different adaptive PEBS configuration (PEBS_DATA_CFG). [ ] WARNING: CPU: 4 PID: 9874 at arch/x86/events/intel/ds.c:1743 setup_pebs_adaptive_sample_data+0x55e/0x5b0 [ ] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0 [ ] Call Trace: [ ] <NMI> [ ] intel_pmu_drain_pebs_icl+0x48b/0x810 [ ] perf_event_nmi_handler+0x41/0x80 [ ] </NMI> [ ] __perf_event_task_sched_in+0x2c2/0x3a0 Different from the big core, the small core requires the ACK right before re-enabling counters in the NMI handler, otherwise a stale PEBS record may be dumped into the later NMI handler, which trigger the warning. Add a new mid_ack flag to track the case. Add all PMI handler bits in the struct x86_hybrid_pmu to track the bits for different types of PMUs. Apply mid ACK for the small cores on an Alder Lake machine. The existing hybrid() macro has a compile error when taking address of a bit-field variable. Add a new macro hybrid_bit() to get the bit-field value of a given PMU. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Tested-by: Ammy Yi <ammy.yi@intel.com> Link: https://lkml.kernel.org/r/1627997128-57891-1-git-send-email-kan.liang@linux.intel.com
2021-06-28Merge tag 'perf-core-2021-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Platform PMU driver updates: - x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers - Fix RDPMC support - Fix [extended-]PEBS-via-PT support - Fix Sapphire Rapids event constraints - Fix :ppp support on Sapphire Rapids - Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU - Other heterogenous-PMU fixes - Kprobes: - Remove the unused and misguided kprobe::fault_handler callbacks. - Warn about kprobes taking a page fault. - Fix the 'nmissed' stat counter. - Misc cleanups and fixes. * tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix task context PMU for Hetero perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids perf/x86/intel: Fix fixed counter check warning for some Alder Lake perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kprobes: Do not increment probe miss count in the fault handler x86,kprobes: WARN if kprobes tries to handle a fault kprobes: Remove kprobe::fault_handler uprobes: Update uprobe_write_opcode() kernel-doc comment perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint perf/core: Fix DocBook warnings perf/core: Make local function perf_pmu_snapshot_aux() static perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
2021-06-28Merge tag 'x86_cpu_for_v5.14_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - New AMD models support - Allow MONITOR/MWAIT to be used for C1 state entry on Hygon too - Use the special RAPL CPUID bit to detect the functionality on AMD and Hygon instead of doing family matching. - Add support for new Intel microcode deprecating TSX on some models and do not enable kernel workarounds for those CPUs when TSX transactions always abort, as a result of that microcode update. * tag 'x86_cpu_for_v5.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsx: Clear CPUID bits when TSX always force aborts x86/events/intel: Do not deploy TSX force abort workaround when TSX is deprecated x86/msr: Define new bits in TSX_FORCE_ABORT MSR perf/x86/rapl: Use CPUID bit on AMD and Hygon parts x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systems x86/amd_nb: Add AMD family 19h model 50h PCI ids x86/cpu: Fix core name for Sapphire Rapids
2021-06-23perf/x86/intel: Fix instructions:ppp support in Sapphire RapidsKan Liang
Perf errors out when sampling instructions:ppp. $ perf record -e instructions:ppp -- true Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (instructions:ppp). The instruction PDIR is only available on the fixed counter 0. The event constraint has been updated to fixed0_constraint in icl_get_event_constraints(). The Sapphire Rapids codes unconditionally error out for the event which is not available on the GP counter 0. Make the instructions:ppp an exception. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Reported-by: Yasin, Ahmad <ahmad.yasin@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1624029174-122219-4-git-send-email-kan.liang@linux.intel.com
2021-06-23perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire RapidsKan Liang
On Sapphire Rapids, there are two more events 0x40ad and 0x04c2 which rely on the FRONTEND MSR. If the FRONTEND MSR is not set correctly, the count value is not correct. Update intel_spr_extra_regs[] to support them. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1624029174-122219-3-git-send-email-kan.liang@linux.intel.com
2021-06-23perf/x86/intel: Fix fixed counter check warning for some Alder LakeKan Liang
For some Alder Lake machine, the below fixed counter check warning may be triggered. [ 2.010766] hw perf events fixed 5 > max(4), clipping! Current perf unconditionally increases the number of the GP counters and the fixed counters for a big core PMU on an Alder Lake system, because the number enumerated in the CPUID only reflects the common counters. The big core may has more counters. However, Alder Lake may have an alternative configuration. With that configuration, the X86_FEATURE_HYBRID_CPU is not set. The number of the GP counters and fixed counters enumerated in the CPUID is accurate. Perf mistakenly increases the number of counters. The warning is triggered. Directly use the enumerated value on the system with the alternative configuration. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1624029174-122219-2-git-send-email-kan.liang@linux.intel.com
2021-06-15x86/events/intel: Do not deploy TSX force abort workaround when TSX is ↵Pawan Gupta
deprecated Earlier workaround added by 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort") for perf counter interactions [1] are not required on some client systems which received a microcode update that deprecates TSX. Bypass the perf workaround when such microcode is enumerated. [1] [ bp: Look for document ID 604224, "Performance Monitoring Impact of Intel Transactional Synchronization Extension Memory". Since there's no way for us to have stable links to documents... ] [ bp: Massage comment. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Link: https://lkml.kernel.org/r/e4d410f786946280ced02dd07c74e0a74f1d10cb.1623704845.git-series.pawan.kumar.gupta@linux.intel.com
2021-05-18perf/x86: Avoid touching LBR_TOS MSR for Arch LBRLike Xu
The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). In a guest that should support Architecture LBR, check_msr() will be a non-related check for the architecture MSR 0x0 (IA32_P5_MC_ADDR) that is also not supported by KVM. The failure will cause x86_pmu.lbr_nr = 0, thereby preventing the initialization of the guest Arch LBR. Fix it by avoiding this extraneous check in intel_pmu_init() for Arch LBR. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Signed-off-by: Like Xu <like.xu@linux.intel.com> [peterz: simpler still] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210430052247.3079672-1-like.xu@linux.intel.com
2021-04-28Merge tag 'perf-core-2021-04-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event updates from Ingo Molnar: - Improve Intel uncore PMU support: - Parse uncore 'discovery tables' - a new hardware capability enumeration method introduced on the latest Intel platforms. This table is in a well-defined PCI namespace location and is read via MMIO. It is organized in an rbtree. These uncore tables will allow the discovery of standard counter blocks, but fancier counters still need to be enumerated explicitly. - Add Alder Lake support - Improve IIO stacks to PMON mapping support on Skylake servers - Add Intel Alder Lake PMU support - which requires the introduction of 'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big') and Gracemont ('small' - Atom derived) cores. The CPU-side feature set is entirely symmetrical - but on the PMU side there's core type dependent PMU functionality. - Reduce data loss with CPU level hardware tracing on Intel PT / AUX profiling, by fixing the AUX allocation watermark logic. - Improve ring buffer allocation on NUMA systems - Put 'struct perf_event' into their separate kmem_cache pool - Add support for synchronous signals for select perf events. The immediate motivation is to support low-overhead sampling-based race detection for user-space code. The feature consists of the following main changes: - Add thread-only event inheritance via perf_event_attr::inherit_thread, which limits inheritance of events to CLONE_THREAD. - Add the ability for events to not leak through exec(), via perf_event_attr::remove_on_exec. - Allow the generation of SIGTRAP via perf_event_attr::sigtrap, extend siginfo with an u64 ::si_perf, and add the breakpoint information to ::si_addr and ::si_perf if the event is PERF_TYPE_BREAKPOINT. The siginfo support is adequate for breakpoints right now - but the new field can be used to introduce support for other types of metadata passed over siginfo as well. - Misc fixes, cleanups and smaller updates. * tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) signal, perf: Add missing TRAP_PERF case in siginfo_layout() signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures perf/x86: Allow for 8<num_fixed_counters<16 perf/x86/rapl: Add support for Intel Alder Lake perf/x86/cstate: Add Alder Lake CPU support perf/x86/msr: Add Alder Lake CPU support perf/x86/intel/uncore: Add Alder Lake support perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE perf/x86/intel: Add Alder Lake Hybrid support perf/x86: Support filter_match callback perf/x86/intel: Add attr_update for Hybrid PMUs perf/x86: Add structures for the attributes of Hybrid PMUs perf/x86: Register hybrid PMUs perf/x86: Factor out x86_pmu_show_pmu_cap perf/x86: Remove temporary pmu assignment in event_init perf/x86/intel: Factor out intel_pmu_check_extra_regs perf/x86/intel: Factor out intel_pmu_check_event_constraints perf/x86/intel: Factor out intel_pmu_check_num_counters perf/x86: Hybrid PMU support for extra_regs perf/x86: Hybrid PMU support for event constraints ...
2021-04-26Merge tag 'x86_cleanups_for_v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Borislav Petkov: "Trivial cleanups and fixes all over the place" * tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove me from IDE/ATAPI section x86/pat: Do not compile stubbed functions when X86_PAT is off x86/asm: Ensure asm/proto.h can be included stand-alone x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files x86/msr: Make locally used functions static x86/cacheinfo: Remove unneeded dead-store initialization x86/process/64: Move cpu_current_top_of_stack out of TSS tools/turbostat: Unmark non-kernel-doc comment x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL() x86/fpu/math-emu: Fix function cast warning x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes x86: Fix various typos in comments, take #2 x86: Remove unusual Unicode characters from comments x86/kaslr: Return boolean values from a function returning bool x86: Fix various typos in comments x86/setup: Remove unused RESERVE_BRK_ARRAY() stacktrace: Move documentation for arch_stack_walk_reliable() to header x86: Remove duplicate TSC DEADLINE MSR definitions