summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2024-10-31s390/cpum_sf: Fix and protect memory allocation of SDBs with mutexThomas Richter
Reservation of the PMU hardware is done at first event creation and is protected by a pair of mutex_lock() and mutex_unlock(). After reservation of the PMU hardware the memory required for the PMUs the event is to be installed on is allocated by allocate_buffers() and alloc_sampling_buffer(). This done outside of the mutex protection. Without mutex protection two or more concurrent invocations of perf_event_init() may run in parallel. This can lead to allocation of Sample Data Blocks (SDBs) multiple times for the same PMU. Prevent this and protect memory allocation of SDBs by mutex. Fixes: 8a6fe8f21ec4 ("s390/cpum_sf: Use refcount_t instead of atomic_t") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-31x86/mce/apei: Handle variable SMCA BERT record sizeYazen Ghannam
The ACPI Boot Error Record Table (BERT) is being used by the kernel to report errors that occurred in a previous boot. On some modern AMD systems, these very errors within the BERT are reported through the x86 Common Platform Error Record (CPER) format which consists of one or more Processor Context Information Structures. These context structures provide a starting address and represent an x86 MSR range in which the data constitutes a contiguous set of MSRs starting from, and including the starting address. It's common, for AMD systems that implement this behavior, that the MSR range represents the MCAX register space used for the Scalable MCA feature. The apei_smca_report_x86_error() function decodes and passes this information through the MCE notifier chain. However, this function assumes a fixed register size based on the original HW/FW implementation. This assumption breaks with the addition of two new MCAX registers viz. MCA_SYND1 and MCA_SYND2. These registers are added at the end of the MCAX register space, so they won't be included when decoding the CPER data. Rework apei_smca_report_x86_error() to support a variable register array size. This covers any case where the MSR context information starts at the MCAX address for MCA_STATUS and ends at any other register within the MCAX register space. [ Yazen: Add Avadhut as co-developer for wrapper changes.] [ bp: Massage. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241022194158.110073-5-avadhut.naik@amd.com
2024-10-31ARM: smp_twd: Remove clockevents shutdown call on offliningFrederic Weisbecker
The clockevents core already detached and unregistered it at this stage. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241029125451.54574-5-frederic@kernel.org
2024-10-31x86/MCE/AMD: Add support for new MCA_SYND{1,2} registersAvadhut Naik
Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers: MCA_SYND1 and MCA_SYND2. These registers will include supplemental error information in addition to the existing MCA_SYND register. The data within these registers is considered valid if MCA_STATUS[SyndV] is set. Userspace error decoding tools like rasdaemon gather related hardware error information through the tracepoints. Therefore, export these two registers through the mce_record tracepoint so that tools like rasdaemon can parse them and output the supplemental error information like FRU text contained in them. [ bp: Massage. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com
2024-10-31KVM: arm64: Handle WXN attributeMarc Zyngier
Until now, we didn't really care about WXN as it didn't have an effect on the R/W permissions (only the execution could be droppped), and therefore not of interest for AT. However, with S1POE, WXN can revoke the Write permission if an overlay is active and that execution is allowed. This *is* relevant to AT. Add full handling of WXN so that we correctly handle this case. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-38-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Handle stage-1 permission overlaysMarc Zyngier
We now have the intrastructure in place to emulate S1POE: - direct permissions are always overlay-capable - indirect permissions are overlay-capable if the permissions are in the 0b0xxx range - the overlays are strictly substractive Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-37-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Make PAN conditions part of the S1 walk contextMarc Zyngier
Move the conditions describing PAN as part of the s1_walk_info structure, in an effort to declutter the permission processing. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-36-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Disable hierarchical permissions when POE is enabledMarc Zyngier
The hierarchical permissions must be disabled when POE is enabled in the translation regime used for a given table walk. We store the two enable bits in the s1_walk_info structure so that they can be retrieved down the line, as they will be useful. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-35-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add POE save/restore for AT emulation fast-pathMarc Zyngier
Just like the other extensions affecting address translation, we must save/restore POE so that an out-of-context translation context can be restored and used with the AT instructions. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-34-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add save/restore support for POR_EL2Marc Zyngier
POR_EL2 needs saving when the guest is VHE, and restoring in any case. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-33-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add basic support for POR_EL2Marc Zyngier
S1POE support implies support for POR_EL2, which we provide by - adding it to the vcpu_sysreg enum - advertising it as mapped to its EL1 counterpart in get_el2_to_el1_mapping - wiring it in the sys_reg_desc table with the correct visibility - handling POR_EL1 in __vcpu_{read,write}_sys_reg_from_cpu() Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-32-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add kvm_has_s1poe() helperMarc Zyngier
Just like we have kvm_has_s1pie(), add its S1POE counterpart, making the code slightly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-31-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Subject S1PIE/S1POE registers to HCR_EL2.{TVM,TRVM}Marc Zyngier
All the El0/EL1 S1PIE/S1POE system register are caught by the HCR_EL2 TVM and TRVM bits. Reflect this in the coarse grained trap table. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-30-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Drop bogus CPTR_EL2.E0POE trap routingMarc Zyngier
It took me some time to realise it, but CPTR_EL2.E0POE does not apply to a guest, only to EL0 when InHost(). And when InHost(), CPCR_EL2 is mapped to CPACR_EL1, maning that the E0POE bit naturally takes effect without any trap. To sum it up, this trap bit is better left ignored, we will never have to hanedle it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-29-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Add encoding for POR_EL2Marc Zyngier
POR_EL2 is the equivalent of POR_EL1 for the EL2&0 translation regime, and it is sorely missing from the sysreg file. Add the sucker. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-28-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Rely on visibility to let PIR*_ELx/TCR2_ELx UNDEFMarc Zyngier
With a visibility defined for these registers, there is no need to check again for S1PIE or TCRX being implemented as perform_access() already handles it. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-27-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Hide S1PIE registers from userspace when disabled for guestsMark Brown
When the guest does not support S1PIE we should not allow any access to the system registers it adds in order to ensure that we do not create spurious issues with guest migration. Add a visibility operation for these registers. Fixes: 86f9de9db178 ("KVM: arm64: Save/restore PIE registers") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-3-376624fa829c@kernel.org [maz: simplify by using __el2_visibility(), kvm_has_s1pie() throughout] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-26-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guestsMark Brown
When the guest does not support FEAT_TCR2 we should not allow any access to it in order to ensure that we do not create spurious issues with guest migration. Add a visibility operation for it. Fixes: fbff56068232 ("KVM: arm64: Save/restore TCR2_EL1") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-2-376624fa829c@kernel.org [maz: simplify by using __el2_visibility(), kvm_has_tcr2() throughout] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-25-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Define helper for EL2 registers with custom visibilityMark Brown
In preparation for adding more visibility filtering for EL2 registers add a helper macro like EL2_REG() which allows specification of a custom visibility operation. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-1-376624fa829c@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-24-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add a composite EL2 visibility helperMarc Zyngier
We are starting to have a bunch of visibility helpers checking for EL2 + something else, and we are going to add more. Simplify things somehow by introducing a helper that implement extractly that by taking a visibility helper as a parameter, and convert the existing ones to that. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-23-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Implement AT S1PIE supportMarc Zyngier
It doesn't take much effort to implement S1PIE support in AT. It is only a matter of using the AArch64.S1IndirectBasePermissions() encodings for the permission, ignoring GCS which has no impact on AT, and enforce FEAT_PAN3 being enabled as this is a requirement of FEAT_S1PIE. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-22-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Disable hierarchical permissions when S1PIE is enabledMarc Zyngier
S1PIE implicitly disables hierarchical permissions, as specified in R_JHSVW, by making TCR_ELx.HPDn RES1. Add a predicate for S1PIE being enabled for a given translation regime, and emulate this behaviour by forcing the hpd field to true if S1PIE is enabled for that translation regime. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-21-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Split S1 permission evaluation into direct and hierarchical partsMarc Zyngier
The AArch64.S1DirectBasePermissions() pseudocode deals with both direct and hierarchical S1 permission evaluation. While this is probably convenient in the pseudocode, we would like a bit more flexibility to slot things like indirect permissions. To that effect, split the two permission check parts out of handle_at_slow() and into their own functions. The permissions are passed around as part of the walk_result structure. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-20-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add AT fast-path support for S1PIEMarc Zyngier
Emulating AT using AT instructions requires that the live state matches the translation regime the AT instruction targets. If targeting the EL1&0 translation regime and that S1PIE is supported, we also need to restore that state (covering TCR2_EL1, PIR_EL1, and PIRE0_EL1). Add the required system register switcheroo. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-19-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Handle PIR{,E0}_EL2 trapsMarc Zyngier
Add the FEAT_S1PIE EL2 registers the sysreg descriptor array so that they can be handled as a trap. Access to these registers is conditional based on ID_AA64MMFR3_EL1.S1PIE being advertised. Similarly to other other changes, PIRE0_EL2 is guaranteed to trap thanks to the D22677 update to the architecture. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-17-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add save/restore for PIR{,E0}_EL2Marc Zyngier
Like their EL1 equivalent, the EL2-specific FEAT_S1PIE registers are context-switched. This is made conditional on both FEAT_TCRX and FEAT_S1PIE being adversised. Note that this change only makes sense if read together with the issue D22677 contained in 102105_K.a_04_en. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add PIR{,E0}_EL2 to the sysreg arraysMarc Zyngier
Add the FEAT_S1PIE EL2 registers to the per-vcpu sysreg register array. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Extend masking facility to arbitrary registersMarc Zyngier
We currently only use the masking (RES0/RES1) facility for VNCR registers, as they are memory-based and thus easy to sanitise. But we could apply the same thing to other registers if we: - split the sanitisation from __VNCR_START__ - apply the sanitisation when reading from a HW register This involves a new "marker" in the vcpu_sysreg enum, which defines the point at which the sanitisation applies (the VNCR registers being of course after this marker). Whle we are at it, rename kvm_vcpu_sanitise_vncr_reg() to kvm_vcpu_apply_reg_masks(), which is vaguely more explicit, and harden set_sysreg_masks() against setting masks for random registers... Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add save/restore for TCR2_EL2Marc Zyngier
Like its EL1 equivalent, TCR2_EL2 gets context-switched. This is made conditional on FEAT_TCRX being adversised. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Correctly access TCR2_EL1, PIR_EL1, PIRE0_EL1 with VHEMarc Zyngier
For code that accesses any of the guest registers for emulation purposes, it is crucial to know where the most up-to-date data is. While this is pretty clear for nVHE (memory is the sole repository), things are a lot muddier for VHE, as depending on the SYSREGS_ON_CPU flag, registers can either be loaded on the HW or be in memory. Even worse with NV, where the loaded state is by definition partial. For these reasons, KVM offers the vcpu_read_sys_reg() and vcpu_write_sys_reg() primitives that always do the right thing. However, these primitive must know what register to access, and this is the role of the __vcpu_read_sys_reg_from_cpu() and __vcpu_write_sys_reg_to_cpu() helpers. As it turns out, TCR2_EL1, PIR_EL1, PIRE0_EL1 and not described in the latter helpers, meaning that the AT code cannot use them to emulate S1PIE. Add the three registers to the (long) list. Fixes: 86f9de9db178 ("KVM: arm64: Save/restore PIE registers") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Sanitise TCR2_EL2Marc Zyngier
TCR2_EL2 is a bag of control bits, all of which are only valid if certain features are present, and RES0 otherwise. Describe these constraints and register them with the masking infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-13-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Save/Restore vEL2 sysregsMarc Zyngier
Whenever we need to restore the guest's system registers to the CPU, we now need to take care of the EL2 system registers as well. Most of them are accessed via traps only, but some have an immediate effect and also a guest running in VHE mode would expect them to be accessible via their EL1 encoding, which we do not trap. For vEL2 we write the virtual EL2 registers with an identical format directly into their EL1 counterpart, and translate the few registers that have a different format for the same effect on the execution when running a non-VHE guest guest hypervisor. Based on an initial patch from Andre Przywara, rewritten many times since. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add TCR2_EL2 to the sysreg arraysMarc Zyngier
Add the TCR2_EL2 register to the per-vcpu sysreg register array, the sysreg descriptor array, and advertise it as mapped to TCR2_EL1 for NV purposes. Access to this register is conditional based on ID_AA64MMFR3_EL1.TCRX being advertised. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Define ID_AA64MMFR1_EL1.HAFDBS advertising FEAT_HAFTMarc Zyngier
This definition is missing, and we are going to need it to sanitise TCR2_ELx. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Handle CNTHCTL_EL2 speciallyMarc Zyngier
Accessing CNTHCTL_EL2 is fraught with danger if running with HCR_EL2.E2H=1: half of the bits are held in CNTKCTL_EL1, and thus can be changed behind our back, while the rest lives in the CNTHCTL_EL2 shadow copy that is memory-based. Yes, this is a lot of fun! Make sure that we merge the two on read access, while we can write to CNTKCTL_EL1 in a more straightforward manner. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Add missing EL2->EL1 mappings in get_el2_to_el1_mapping()Marc Zyngier
As KVM has grown a bunch of new system register for NV, it appears that we are missing them in the get_el2_to_el1_mapping() list. Most of them are not crucial as they don't tend to be accessed via vcpu_read_sys_reg() and vcpu_write_sys_reg(). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Drop useless struct s2_mmu in __kvm_at_s1e2()Marc Zyngier
__kvm_at_s1e2() contains the definition of an s2_mmu for the current context, but doesn't make any use of it. Drop it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Add encoding for PIRE0_EL2Marc Zyngier
PIRE0_EL2 is the equivalent of PIRE0_EL1 for the EL2&0 translation regime, and it is sorely missing from the sysreg file. Add the sucker. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Remove VNCR definition for PIRE0_EL2Marc Zyngier
As of the ARM ARM Known Issues document 102105_K.a_04_en, D22677 fixes a problem with the PIRE0_EL2 register, resulting in its removal from the VNCR page (it had no purpose being there the first place). Follow the architecture update by removing this offset. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Drop SKL0/SKL1 from TCR2_EL2Marc Zyngier
Despite what the documentation says, TCR2_EL2.{SKL0,SKL1} do not exist, and the corresponding information is in the respective TTBRx_EL2. This is a leftover from a development version of the architecture. This change makes TCR2_EL2 similar to TCR2_EL1 in that respect. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-30s390/time: Add PtP driverSven Schnelle
Add a small PtP driver which allows user space to get the values of the physical and tod clock. This allows programs like chrony to use STP as clock source and steer the kernel clock. The physical clock can be used as a debugging aid to get the clock without any additional offsets like STP steering or LPAR offset. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://patch.msgid.link/20241023065601.449586-3-svens@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30s390/time: Add clocksource id to TOD clockSven Schnelle
To allow specifying the clock source in the upcoming PtP driver, add a clocksource ID to the s390 TOD clock. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://patch.msgid.link/20241023065601.449586-2-svens@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31powerpc64/bpf: Add support for bpf trampolinesNaveen N Rao
Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline() for 64-bit powerpc. While the code is generic, BPF trampolines are only enabled on 64-bit powerpc. 32-bit powerpc will need testing and some updates. BPF Trampolines adhere to the existing ftrace ABI utilizing a two-instruction profiling sequence, as well as the newer ABI utilizing a three-instruction profiling sequence enabling return with a 'blr'. The trampoline code itself closely follows x86 implementation. BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace having a single nop at function entry, followed by the function profiling sequence out-of-line and a separate long branch stub for calls to trampolines that are out of range. A dummy_tramp is provided to simplify synchronization similar to arm64. When attaching a bpf trampoline to a bpf prog, we can patch up to three things: - the nop at bpf prog entry to go to the out-of-line stub - the instruction in the out-of-line stub to either call the bpf trampoline directly, or to branch to the long_branch stub. - the trampoline address before the long_branch stub. We do not need any synchronization here since we always have a valid branch target regardless of the order in which the above stores are seen. dummy_tramp ensures that the long_branch stub goes to a valid destination on other cpus, even when the branch to the long_branch stub is seen before the updated trampoline address. However, when detaching a bpf trampoline from a bpf prog, or if changing the bpf trampoline address, we need synchronization to ensure that other cpus can no longer branch into the older trampoline so that it can be safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus make forward progress, but we still need to ensure that other cpus execute isync (or some CSI) so that they don't go back into the trampoline again. While here, update the stale comment that describes the redzone usage in ppc64 BPF JIT. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-18-hbathini@linux.ibm.com
2024-10-31samples/ftrace: Add support for ftrace direct samples on powerpcNaveen N Rao
Add powerpc 32-bit and 64-bit samples for ftrace direct. This serves to show the sample instruction sequence to be used by ftrace direct calls to adhere to the ftrace ABI. On 64-bit powerpc, TOC setup requires some additional work. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-17-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLSNaveen N Rao
Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64 implementation. ftrace direct calls allow custom trampolines to be called into directly from function ftrace call sites, bypassing the ftrace trampoline completely. This functionality is currently utilized by BPF trampolines to hook into kernel function entries. Since we have limited relative branch range, we support ftrace direct calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this approach, ftrace trampoline is not entirely bypassed. Rather, it is re-purposed into a stub that reads direct_call field from the associated ftrace_ops structure and branches into that, if it is not NULL. For this, it is sufficient if we can ensure that the ftrace trampoline is reachable from all traceable functions. When multiple ftrace_ops are associated with a call site, we utilize a call back to set pt_regs->orig_gpr3 that can then be tested on the return path from the ftrace trampoline to branch into the direct caller. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-16-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPSNaveen N Rao
Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the arm64 implementation. This works by patching-in a pointer to an associated ftrace_ops structure before each traceable function. If multiple ftrace_ops are associated with a call site, then a special ftrace_list_ops is used to enable iterating over all the registered ftrace_ops. If no ftrace_ops are associated with a call site, then a special ftrace_nop_ops structure is used to render the ftrace call as a no-op. ftrace trampoline can then read the associated ftrace_ops for a call site by loading from an offset from the LR, and branch directly to the associated function. The primary advantage with this approach is that we don't have to iterate over all the registered ftrace_ops for call sites that have a single ftrace_ops registered. This is the equivalent of implementing support for dynamic ftrace trampolines, which set up a special ftrace trampoline for each registered ftrace_ops and have individual call sites branch into those directly. A secondary advantage is that this gives us a way to add support for direct ftrace callers without having to resort to using stubs. The address of the direct call trampoline can be loaded from the ftrace_ops structure. To support this, we reserve a nop before each function on 32-bit powerpc. For 64-bit powerpc, two nops are reserved before each out-of-line stub. During ftrace activation, we update this location with the associated ftrace_ops pointer. Then, on ftrace entry, we load from this location and call into ftrace_ops->func(). For 64-bit powerpc, we ensure that the out-of-line stub area is doubleword aligned so that ftrace_ops address can be updated atomically. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-15-hbathini@linux.ibm.com
2024-10-31powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubsNaveen N Rao
We are restricted to a .text size of ~32MB when using out-of-line function profile sequence. Allow this to be extended up to the previous limit of ~64MB by reserving space in the middle of .text. A new config option CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE is introduced to specify the number of function stubs that are reserved in .text. On boot, ftrace utilizes stubs from this area first before using the stub area at the end of .text. A ppc64le defconfig has ~44k functions that can be traced. A more conservative value of 32k functions is chosen as the default value of PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE so that we do not allot more space than necessary by default. If building a kernel that only has 32k trace-able functions, we won't allot any more space at the end of .text during the pass on vmlinux.o. Otherwise, only the remaining functions get space for stubs at the end of .text. This default value should help cover a .text size of ~48MB in total (including space reserved at the end of .text which can cover up to 32MB), which should be sufficient for most common builds. For a very small kernel build, this can be set to 0. Or, this can be bumped up to a larger value to support vmlinux .text size up to ~64MB. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-14-hbathini@linux.ibm.com
2024-10-31powerpc64/ftrace: Move ftrace sequence out of lineNaveen N Rao
Function profile sequence on powerpc includes two instructions at the beginning of each function: mflr r0 bl ftrace_caller The call to ftrace_caller() gets nop'ed out during kernel boot and is patched in when ftrace is enabled. Given the sequence, we cannot return from ftrace_caller with 'blr' as we need to keep LR and r0 intact. This results in link stack (return address predictor) imbalance when ftrace is enabled. To address that, we would like to use a three instruction sequence: mflr r0 bl ftrace_caller mtlr r0 Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to reserve two instruction slots before the function. This results in a total of five instruction slots to be reserved for ftrace use on each function that is traced. Move the function profile sequence out-of-line to minimize its impact. To do this, we reserve a single nop at function entry using -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine the total number of functions that can be traced. This is then used to generate a .S file reserving the appropriate amount of space for use as ftrace stubs, which is built and linked into vmlinux. On bootup, the stub space is split into separate stubs per function and populated with the proper instruction sequence. A pointer to the associated stub is maintained in dyn_arch_ftrace. For modules, space for ftrace stubs is reserved from the generic module stub space. This is restricted to and enabled by default only on 64-bit powerpc, though there are some changes to accommodate 32-bit powerpc. This is done so that 32-bit powerpc could choose to opt into this based on further tests and benchmarks. As an example, after this patch, kernel functions will have a single nop at function entry: <kernel_clone>: addis r2,r12,467 addi r2,r2,-16028 nop mfocrf r11,8 ... When ftrace is enabled, the nop is converted to an unconditional branch to the stub associated with that function: <kernel_clone>: addis r2,r12,467 addi r2,r2,-16028 b ftrace_ool_stub_text_end+0x11b28 mfocrf r11,8 ... The associated stub: <ftrace_ool_stub_text_end+0x11b28>: mflr r0 bl ftrace_caller mtlr r0 b kernel_clone+0xc ... This change showed an improvement of ~10% in null_syscall benchmark on a Power 10 system with ftrace enabled. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-13-hbathini@linux.ibm.com
2024-10-31kbuild: Add generic hook for architectures to use before the final vmlinux linkNaveen N Rao
On powerpc, we would like to be able to make a pass on vmlinux.o and generate a new object file to be linked into vmlinux. Add a generic pass in Makefile.vmlinux that architectures can use for this purpose. Architectures need to select CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX and must provide arch/<arch>/tools/Makefile with .arch.vmlinux.o target, which will be invoked prior to the final vmlinux link step. Acked-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-12-hbathini@linux.ibm.com
2024-10-31powerpc/ftrace: Add a postlink script to validate function tracerNaveen N Rao
Function tracer on powerpc can only work with vmlinux having a .text size of up to ~64MB due to powerpc branch instruction having a limited relative branch range of 32MB. Today, this is only detected on kernel boot when ftrace is init'ed. Add a post-link script to check the size of .text so that we can detect this at build time, and break the build if necessary. We add a dependency on !COMPILE_TEST for CONFIG_HAVE_FUNCTION_TRACER so that allyesconfig and other test builds can continue to work without enabling ftrace. Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-11-hbathini@linux.ibm.com