summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-25Merge branches 'for-next/memory-hotremove', 'for-next/arm_sdei', ↵Catalin Marinas
'for-next/amu', 'for-next/final-cap-helper', 'for-next/cpu_ops-cleanup', 'for-next/misc' and 'for-next/perf' into for-next/core * for-next/memory-hotremove: : Memory hot-remove support for arm64 arm64/mm: Enable memory hot remove arm64/mm: Hold memory hotplug lock while walking for kernel page table dump * for-next/arm_sdei: : SDEI: fix double locking on return from hibernate and clean-up firmware: arm_sdei: clean up sdei_event_create() firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp firmware: arm_sdei: fix possible double-lock on hibernate error path firmware: arm_sdei: fix double-lock on hibernate with shared events * for-next/amu: : ARMv8.4 Activity Monitors support clocksource/drivers/arm_arch_timer: validate arch_timer_rate arm64: use activity monitors for frequency invariance cpufreq: add function to get the hardware max frequency Documentation: arm64: document support for the AMU extension arm64/kvm: disable access to AMU registers from kvm guests arm64: trap to EL1 accesses to AMU counters from EL0 arm64: add support for the AMU extension v1 * for-next/final-cap-helper: : Introduce cpus_have_final_cap_helper(), migrate arm64 KVM to it arm64: kvm: hyp: use cpus_have_final_cap() arm64: cpufeature: add cpus_have_final_cap() * for-next/cpu_ops-cleanup: : cpu_ops[] access code clean-up arm64: Introduce get_cpu_ops() helper function arm64: Rename cpu_read_ops() to init_cpu_ops() arm64: Declare ACPI parking protocol CPU operation if needed * for-next/misc: : Various fixes and clean-ups arm64: define __alloc_zeroed_user_highpage arm64/kernel: Simplify __cpu_up() by bailing out early arm64: remove redundant blank for '=' operator arm64: kexec_file: Fixed code style. arm64: add blank after 'if' arm64: fix spelling mistake "ca not" -> "cannot" arm64: entry: unmask IRQ in el0_sp() arm64: efi: add efi-entry.o to targets instead of extra-$(CONFIG_EFI) arm64: csum: Optimise IPv6 header checksum arch/arm64: fix typo in a comment arm64: remove gratuitious/stray .ltorg stanzas arm64: Update comment for ASID() macro arm64: mm: convert cpu_do_switch_mm() to C arm64: fix NUMA Kconfig typos * for-next/perf: : arm64 perf updates arm64: perf: Add support for ARMv8.5-PMU 64-bit counters KVM: arm64: limit PMU version to PMUv3 for ARMv8.1 arm64: cpufeature: Extract capped perfmon fields arm64: perf: Clean up enable/disable calls perf: arm-ccn: Use scnprintf() for robustness arm64: perf: Support new DT compatibles arm64: perf: Refactor PMU init callbacks perf: arm_spe: Remove unnecessary zero check on 'nr_pages'
2020-03-24arm64: Introduce get_cpu_ops() helper functionGavin Shan
This introduces get_cpu_ops() to return the CPU operations according to the given CPU index. For now, it simply returns the @cpu_ops[cpu] as before. Also, helper function __cpu_try_die() is introduced to be shared by cpu_die() and ipi_cpu_crash_stop(). So it shouldn't introduce any functional changes. Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2020-03-24arm64: Rename cpu_read_ops() to init_cpu_ops()Gavin Shan
This renames cpu_read_ops() to init_cpu_ops() as the function is only called in initialization phase. Also, we will introduce get_cpu_ops() in the subsequent patches, to retireve the CPU operation by the given CPU index. The usage of cpu_read_ops() and get_cpu_ops() are difficult to be distinguished from their names. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-24arm64: Declare ACPI parking protocol CPU operation if neededGavin Shan
It's obvious we needn't declare the corresponding CPU operation when CONFIG_ARM64_ACPI_PARKING_PROTOCOL is disabled, even it doesn't cause any compiling warnings. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17arm64: perf: Add support for ARMv8.5-PMU 64-bit countersAndrew Murray
At present ARMv8 event counters are limited to 32-bits, though by using the CHAIN event it's possible to combine adjacent counters to achieve 64-bits. The perf config1:0 bit can be set to use such a configuration. With the introduction of ARMv8.5-PMU support, all event counters can now be used as 64-bit counters. Let's enable 64-bit event counters where support exists. Unless the user sets config1:0 we will adjust the counter value such that it overflows upon 32-bit overflow. This follows the same behaviour as the cycle counter which has always been (and remains) 64-bits. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> [Mark: fix ID field names, compare with 8.5 value] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-17KVM: arm64: limit PMU version to PMUv3 for ARMv8.1Andrew Murray
We currently expose the PMU version of the host to the guest via emulation of the DFR0_EL1 and AA64DFR0_EL1 debug feature registers. However many of the features offered beyond PMUv3 for 8.1 are not supported in KVM. Examples of this include support for the PMMIR registers (added in PMUv3 for ARMv8.4) and 64-bit event counters added in (PMUv3 for ARMv8.5). Let's trap the Debug Feature Registers in order to limit PMUVer/PerfMon in the Debug Feature Registers to PMUv3 for ARMv8.1 to avoid unexpected behaviour. Both ID_AA64DFR0.PMUVer and ID_DFR0.PerfMon follow the "Alternative ID scheme used for the Performance Monitors Extension version" where 0xF means an IMPLEMENTATION DEFINED PMU is implemented, and values 0x0-0xE are treated as with an unsigned field (with 0x0 meaning no PMU is present). As we don't expect to expose an IMPLEMENTATION DEFINED PMU, and our cap is below 0xF, we can treat these fields as unsigned when applying the cap. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> [Mark: make field names consistent, use perfmon cap] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-17arm64: cpufeature: Extract capped perfmon fieldsAndrew Murray
When emulating ID registers there is often a need to cap the version bits of a feature such that the guest will not use features that the host is not aware of. For example, when KVM mediates access to the PMU by emulating register accesses. Let's add a helper that extracts a performance monitors ID field and caps the version to a given value. Fields that identify the version of the Performance Monitors Extension do not follow the standard ID scheme, and instead follow the scheme described in ARM DDI 0487E.a page D13-2825 "Alternative ID scheme used for the Performance Monitors Extension version". The value 0xF means an IMPLEMENTATION DEFINED PMU is present, and values 0x0-OxE can be treated the same as an unsigned field with 0x0 meaning no PMU is present. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> [Mark: rework to handle perfmon fields] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-17arm64: perf: Clean up enable/disable callsRobin Murphy
Reading this code bordered on painful, what with all the repetition and pointless return values. More fundamentally, dribbling the hardware enables and disables in one bit at a time incurs needless system register overhead for chained events and on reset. We already use bitmask values for the KVM hooks, so consolidate all the register accesses to match, and make a reasonable saving in both source and object code. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-17perf: arm-ccn: Use scnprintf() for robustnessTakashi Iwai
snprintf() is a hard-to-use function, it's especially difficult to use it for concatenating substrings in a buffer with a limited size. Since snprintf() returns the would-be-output size, not the actual size, the subsequent use of snprintf() may point to the incorrect position easily. Although the current code doesn't actually overflow the buffer, it's an incorrect usage. This patch replaces such snprintf() calls with a safer version, scnprintf(). Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-17arm64: define __alloc_zeroed_user_highpageglider@google.com
When running the kernel with init_on_alloc=1, calling the default implementation of __alloc_zeroed_user_highpage() from include/linux/highmem.h leads to double-initialization of the allocated page (first by the page allocator, then by clear_user_page(). Calling alloc_page_vma() with __GFP_ZERO, similarly to e.g. x86, seems to be enough to ensure the user page is zeroed only once. Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17arm64/kernel: Simplify __cpu_up() by bailing out earlyGavin Shan
The function __cpu_up() is invoked to bring up the target CPU through the backend, PSCI for example. The nested if statements won't be needed if we bail out early on the following two conditions where the status won't be checked. The code looks simplified in that case. * Error returned from the backend (e.g. PSCI) * The target CPU has been marked as onlined Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
2020-03-17arm64: remove redundant blank for '=' operator韩科才
remove redundant blank for '=' operator, it may be more elegant. Signed-off-by: hankecai <hankecai@vivo.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17arm64: kexec_file: Fixed code style.Li Tao
Remove unnecessary blank. Signed-off-by: Li Tao <tao.li@vivo.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17arm64: add blank after 'if'Zheng Wei
add blank after 'if' for armv8_deprecated_init() to make it comply with kernel coding style. Signed-off-by: Zheng Wei <wei.zheng@vivo.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-17arm64: fix spelling mistake "ca not" -> "cannot"韩科才
There is a spelling mistake in the comment, Fix it. Signed-off-by: hankecai <hankecai@bbktel.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-13arm64: kvm: hyp: use cpus_have_final_cap()Mark Rutland
The KVM hyp code is only run after system capabilities have been finalized, and thus all const cap checks have been patched. This is noted in in __cpu_init_hyp_mode(), where we BUG() if called too early: | /* | * Call initialization code, and switch to the full blown HYP code. | * If the cpucaps haven't been finalized yet, something has gone very | * wrong, and hyp will crash and burn when it uses any | * cpus_have_const_cap() wrapper. | */ Given this, the hyp code can use cpus_have_final_cap() and avoid generating code to check the cpu_hwcaps array, which would be unsafe to run in hyp context. This patch migrate the KVM hyp code to cpus_have_final_cap(), avoiding this redundant code generation, and making it possible to detect if we accidentally invoke this code too early. In the latter case, the BUG() in cpus_have_final_cap() will cause a hyp panic. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-13arm64: cpufeature: add cpus_have_final_cap()Mark Rutland
When cpus_have_const_cap() was originally introduced it was intended to be safe in hyp context, where it is not safe to access the cpu_hwcaps array as cpus_have_cap() did. For more details see commit: a4023f682739439b ("arm64: Add hypervisor safe helper for checking constant capabilities") We then made use of cpus_have_const_cap() throughout the kernel. Subsequently, we had to defer updating the static_key associated with each capability in order to avoid lockdep complaints. To avoid breaking kernel-wide usage of cpus_have_const_cap(), this was updated to fall back to the cpu_hwcaps array if called before the static_keys were updated. As the kvm hyp code was only called later than this, the fallback is redundant but not functionally harmful. For more details, see commit: 63a1e1c95e60e798 ("arm64/cpufeature: don't use mutex in bringup path") Today we have more users of cpus_have_const_cap() which are only called once the relevant static keys are initialized, and it would be beneficial to avoid the redundant code. To that end, this patch adds a new cpus_have_final_cap(), helper which is intend to be used in code which is only run once capabilities have been finalized, and will never check the cpus_hwcap array. This helps the compiler to generate better code as it no longer needs to generate code to address and test the cpus_hwcap array. To help catch misuse, cpus_have_final_cap() will BUG() if called before capabilities are finalized. In hyp context, BUG() will result in a hyp panic, but the specific BUG() instance will not be identified in the usual way. Comments are added to the various cpus_have_*_cap() helpers to describe the constraints on when they can be used. For clarity cpus_have_cap() is moved above the other helpers. Similarly the helpers are updated to use system_capabilities_finalized() consistently, and this is made __always_inline as required by its new callers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-11arm64: entry: unmask IRQ in el0_sp()Mark Rutland
Currently, the EL0 SP alignment handler masks IRQs unnecessarily. It does so due to historic code sharing of the EL0 SP and PC alignment handlers, and branch predictor hardening applicable to the EL0 SP handler. We began masking IRQs in the EL0 SP alignment handler in commit: 5dfc6ed27710c42c ("arm64: entry: Apply BP hardening for high-priority synchronous exception") ... as this shared code with the EL0 PC alignment handler, and branch predictor hardening made it necessary to disable IRQs for early parts of the EL0 PC alignment handler. It was not necessary to mask IRQs during EL0 SP alignment exceptions, but it was not considered harmful to do so. This masking was carried forward into C code in commit: 582f95835a8fc812 ("arm64: entry: convert el0_sync to C") ... where the SP/PC cases were split into separate handlers, and the masking duplicated. Subsequently the EL0 PC alignment handler was refactored to perform branch predictor hardening before unmasking IRQs, in commit: bfe298745afc9548 ("arm64: entry-common: don't touch daif before bp-hardening") ... but the redundant masking of IRQs was not removed from the EL0 SP alignment handler. Let's do so now, and make it interruptible as with most other synchronous exception handlers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: James Morse <james.morse@arm.com>
2020-03-09arm64: efi: add efi-entry.o to targets instead of extra-$(CONFIG_EFI)Masahiro Yamada
efi-entry.o is built on demand for efi-entry.stub.o, so you do not have to repeat $(CONFIG_EFI) here. Adding it to 'targets' is enough. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
2020-03-09arm64: csum: Optimise IPv6 header checksumRobin Murphy
Throwing our __uint128_t idioms at csum_ipv6_magic() makes it about 1.3x-2x faster across a range of microarchitecture/compiler combinations. Not much in absolute terms, but every little helps. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-09arch/arm64: fix typo in a comment王程刚
Fix typo in a comment in arch/arm64/include/asm/esr.h "Unallocted" -> "Unallocated" Signed-off-by: Chenggang Wang <wangchenggang@vivo.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-06clocksource/drivers/arm_arch_timer: validate arch_timer_rateIonela Voinescu
Using an arch timer with a frequency of less than 1MHz can potentially result in incorrect functionality in systems that assume a reasonable rate of the arch timer of 1 to 50MHz, described as typical in the architecture specification. Therefore, warn if the arch timer rate is below 1MHz, which is considered atypical and worth emphasizing. Suggested-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-06arm64: use activity monitors for frequency invarianceIonela Voinescu
The Frequency Invariance Engine (FIE) is providing a frequency scaling correction factor that helps achieve more accurate load-tracking. So far, for arm and arm64 platforms, this scale factor has been obtained based on the ratio between the current frequency and the maximum supported frequency recorded by the cpufreq policy. The setting of this scale factor is triggered from cpufreq drivers by calling arch_set_freq_scale. The current frequency used in computation is the frequency requested by a governor, but it may not be the frequency that was implemented by the platform. This correction factor can also be obtained using a core counter and a constant counter to get information on the performance (frequency based only) obtained in a period of time. This will more accurately reflect the actual current frequency of the CPU, compared with the alternative implementation that reflects the request of a performance level from the OS. Therefore, implement arch_scale_freq_tick to use activity monitors, if present, for the computation of the frequency scale factor. The use of AMU counters depends on: - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present - CONFIG_CPU_FREQ - the current frequency obtained using counter information is divided by the maximum frequency obtained from the cpufreq policy. While it is possible to have a combination of CPUs in the system with and without support for activity monitors, the use of counters for frequency invariance is only enabled for a CPU if all related CPUs (CPUs in the same frequency domain) support and have enabled the core and constant activity monitor counters. In this way, there is a clear separation between the policies for which arch_set_freq_scale (cpufreq based FIE) is used, and the policies for which arch_scale_freq_tick (counter based FIE) is used to set the frequency scale factor. For this purpose, a late_initcall_sync is registered to trigger validation work for policies that will enable or disable the use of AMU counters for frequency invariance. If CONFIG_CPU_FREQ is not defined, the use of counters is enabled on all CPUs only if all possible CPUs correctly support the necessary counters. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-06cpufreq: add function to get the hardware max frequencyIonela Voinescu
Add weak function to return the hardware maximum frequency of a CPU, with the default implementation returning cpuinfo.max_freq, which is the best information we can generically get from the cpufreq framework. The default can be overwritten by a strong function in platforms that want to provide an alternative implementation, with more accurate information, obtained either from hardware or firmware. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-06Documentation: arm64: document support for the AMU extensionIonela Voinescu
The activity monitors extension is an optional extension introduced by the ARMv8.4 CPU architecture. Add initial documentation for the AMUv1 extension: - arm64/amu.txt: AMUv1 documentation - arm64/booting.txt: system registers initialisation Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-06arm64/kvm: disable access to AMU registers from kvm guestsIonela Voinescu
Access to the AMU counters should be disabled by default in kvm guests, as information from the counters might reveal activity in other guests or activity on the host. Therefore, disable access to AMU registers from EL0 and EL1 in kvm guests by: - Hiding the presence of the extension in the feature register (SYS_ID_AA64PFR0_EL1) on the VCPU. - Disabling access to the AMU registers before switching to the guest. - Trapping accesses and injecting an undefined instruction into the guest. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-06arm64: trap to EL1 accesses to AMU counters from EL0Ionela Voinescu
The activity monitors extension is an optional extension introduced by the ARMv8.4 CPU architecture. In order to access the activity monitors counters safely, if desired, the kernel should detect the presence of the extension through the feature register, and mediate the access. Therefore, disable direct accesses to activity monitors counters from EL0 (userspace) and trap them to EL1 (kernel). To be noted that the ARM64_AMU_EXTN kernel config does not have an effect on this code. Given that the amuserenr_el0 resets to an UNKNOWN value, setting the trap of EL0 accesses to EL1 is always attempted for safety and security considerations. Therefore firmware should still ensure accesses to AMU registers are not trapped in EL2/EL3 as this code cannot be bypassed if the CPU implements the Activity Monitors Unit. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-06arm64: add support for the AMU extension v1Ionela Voinescu
The activity monitors extension is an optional extension introduced by the ARMv8.4 CPU architecture. This implements basic support for version 1 of the activity monitors architecture, AMUv1. This support includes: - Extension detection on each CPU (boot, secondary, hotplugged) - Register interface for AMU aarch64 registers Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-04arm64: remove gratuitious/stray .ltorg stanzasRemi Denis-Courmont
There are no applicable literals above them. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Remi Denis-Courmont <remi.denis.courmont@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-04arm64/mm: Enable memory hot removeAnshuman Khandual
The arch code for hot-remove must tear down portions of the linear map and vmemmap corresponding to memory being removed. In both cases the page tables mapping these regions must be freed, and when sparse vmemmap is in use the memory backing the vmemmap must also be freed. This patch adds unmap_hotplug_range() and free_empty_tables() helpers which can be used to tear down either region and calls it from vmemmap_free() and ___remove_pgd_mapping(). The free_mapped argument determines whether the backing memory will be freed. It makes two distinct passes over the kernel page table. In the first pass with unmap_hotplug_range() it unmaps, invalidates applicable TLB cache and frees backing memory if required (vmemmap) for each mapped leaf entry. In the second pass with free_empty_tables() it looks for empty page table sections whose page table page can be unmapped, TLB invalidated and freed. While freeing intermediate level page table pages bail out if any of its entries are still valid. This can happen for partially filled kernel page table either from a previously attempted failed memory hot add or while removing an address range which does not span the entire page table page range. The vmemmap region may share levels of table with the vmalloc region. There can be conflicts between hot remove freeing page table pages with a concurrent vmalloc() walking the kernel page table. This conflict can not just be solved by taking the init_mm ptl because of existing locking scheme in vmalloc(). So free_empty_tables() implements a floor and ceiling method which is borrowed from user page table tear with free_pgd_range() which skips freeing page table pages if intermediate address range is not aligned or maximum floor-ceiling might not own the entire page table page. Boot memory on arm64 cannot be removed. Hence this registers a new memory hotplug notifier which prevents boot memory offlining and it's removal. While here update arch_add_memory() to handle __add_pages() failures by just unmapping recently added kernel linear mapping. Now enable memory hot remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. This implementation is overall inspired from kernel page table tear down procedure on X86 architecture and user page table tear down method. [Mike and Catalin added P4D page table level support] Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-04arm64/mm: Hold memory hotplug lock while walking for kernel page table dumpAnshuman Khandual
The arm64 page table dump code can race with concurrent modification of the kernel page tables. When a leaf entries are modified concurrently, the dump code may log stale or inconsistent information for a VA range, but this is otherwise not harmful. When intermediate levels of table are freed, the dump code will continue to use memory which has been freed and potentially reallocated for another purpose. In such cases, the dump code may dereference bogus addresses, leading to a number of potential problems. Intermediate levels of table may by freed during memory hot-remove, which will be enabled by a subsequent patch. To avoid racing with this, take the memory hotplug lock when walking the kernel page table. Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-02arm64: perf: Support new DT compatiblesRobin Murphy
Add support for matching the new PMUs. For now, this just wires them up as generic PMUv3 such that people writing DTs for new SoCs can do the right thing, and at least have architectural and raw events be usable. We can come back and fill in event maps for sysfs and/or perf tools at a later date. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-02arm64: perf: Refactor PMU init callbacksRobin Murphy
The PMU init callbacks are already drowning in boilerplate, so before doubling the number of supported PMU models, give it a sensible refactor to significantly reduce the bloat, both in source and object code. Although nobody uses non-default sysfs attributes today, there's minimal impact to preserving the notion that maybe, some day, somebody might, so we may as well keep up appearances. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-02perf: arm_spe: Remove unnecessary zero check on 'nr_pages'luanshi
We already check that the 'nr_pages' is > 2, so there's no need to check that it's != 0 later on. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-02-28arm64: Update comment for ASID() macroWill Deacon
Commit 25b92693a1b6 ("arm64: mm: convert cpu_do_switch_mm() to C") added a new use of the ASID() macro, so update the comment in asm/mmu.h which reasons about why an atomic reload of 'mm->context.id.counter' is not required. Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-27firmware: arm_sdei: clean up sdei_event_create()Liguang Zhang
Function sdei_event_find() is always called in sdei_event_create(), but it is already called in sdei_event_register(). This code is trying to avoid a double-create of the same event, which can't happen as we still hold the sdei_events_lock. We can remove this needless sdei_event_find() call. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> [expanded commit message] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-27firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhpJames Morse
SDEI has private events that need registering and enabling on each CPU. CPUs can come and go while we are trying to do this. SDEI tries to avoid these problems by setting the reregister flag before the register call, so any CPUs that come online register the event too. Sticking plaster like this doesn't work, as if the register call fails, a CPU that subsequently comes online will register the event before reregister is cleared. Take cpus_read_lock() around the register and enable calls. We don't want surprise CPUs to do the wrong thing if they race with these calls failing. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-27firmware: arm_sdei: fix possible double-lock on hibernate error pathLiguang Zhang
We call sdei_reregister_event() with sdei_list_lock held, if the register fails we call sdei_event_destroy() which also acquires sdei_list_lock thus creating A-A deadlock. Add '_llocked' to sdei_reregister_event(), to indicate the list lock is held, and add a _llocked variant of sdei_event_destroy(). Fixes: da351827240e ("firmware: arm_sdei: Add support for CPU and system power states") Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> [expanded subject, added wrappers instead of duplicating contents] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-27firmware: arm_sdei: fix double-lock on hibernate with shared eventsJames Morse
SDEI has private events that must be registered on each CPU. When CPUs come and go they must re-register and re-enable their private events. Each event has flags to indicate whether this should happen to protect against an event being registered on a CPU coming online, while all the others are unregistering the event. These flags are protected by the sdei_list_lock spinlock, because the cpuhp callbacks can't take the mutex. Hibernate needs to unregister all events, but keep the in-memory re-register and re-enable as they are. sdei_unregister_shared() takes the spinlock to walk the list, then calls _sdei_event_unregister() on each shared event. _sdei_event_unregister() tries to take the same spinlock to update re-register and re-enable. This doesn't go so well. Push the re-register and re-enable updates out to their callers. sdei_unregister_shared() doesn't want these values updated, so doesn't need to do anything. This also fixes shared events getting lost over hibernate as this path made them look unregistered. Fixes: da351827240e ("firmware: arm_sdei: Add support for CPU and system power states") Reported-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-27arm64: mm: convert cpu_do_switch_mm() to CMark Rutland
There's no reason that cpu_do_switch_mm() needs to be written as an assembly function, and having it as a C function would make it easier to maintain. This patch converts cpu_do_switch_mm() to C, removing code that this change makes redundant (e.g. the mmid macro). Since the header comment was stale and the prototype now implies all the necessary information, this comment is removed. The 'pgd_phys' argument is made a phys_addr_t to match the return type of virt_to_phys(). At the same time, post_ttbr_update_workaround() is updated to use IS_ENABLED(), which allows the compiler to figure out it can elide calls for !CONFIG_CAVIUM_ERRATUM_27456 builds. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> [catalin.marinas@arm.com: change comments from asm-style to C-style] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-27arm64: fix NUMA Kconfig typosRandy Dunlap
Fix typos in arch/arm64/Kconfig: - spell Numa as NUMA - add hyphenation to Non-Uniform Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-23Linux 5.6-rc3Linus Torvalds
2020-02-23Merge tag 'for-5.6-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "These are fixes that were found during testing with help of error injection, plus some other stable material. There's a fixup to patch added to rc1 causing locking in wrong context warnings, tests found one more deadlock scenario. The patches are tagged for stable, two of them now in the queue but we'd like all three released at the same time. I'm not happy about fixes to fixes in such a fast succession during rcs, but I hope we found all the fallouts of commit 28553fa992cb ('Btrfs: fix race between shrinking truncate and fiemap')" * tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents btrfs: fix bytes_may_use underflow in prealloc error condtition btrfs: handle logged extent failure properly btrfs: do not check delayed items are empty for single transaction cleanup btrfs: reset fs_root to NULL on error in open_ctree btrfs: destroy qgroup extent records on transaction abort
2020-02-23Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "More miscellaneous ext4 bug fixes (all stable fodder)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix mount failure with quota configured as module jbd2: fix ocfs2 corrupt when clearing block group bits ext4: fix race between writepages and enabling EXT4_EXTENTS_FL ext4: rename s_journal_flag_rwsem to s_writepages_rwsem ext4: fix potential race between s_flex_groups online resizing and access ext4: fix potential race between s_group_info online resizing and access ext4: fix potential race between online resizing and write operations ext4: add cond_resched() to __ext4_find_entry() ext4: fix a data race in EXT4_I(inode)->i_disksize
2020-02-23Merge tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linuxLinus Torvalds
Pull csky updates from Guo Ren: "Sorry, I missed 5.6-rc1 merge window, but in this pull request the most are the fixes and the rests are between fixes and features. The only outside modification is the MAINTAINERS file update with our mailing list. - cache flush implementation fixes - ftrace modify panic fix - CONFIG_SMP boot problem fix - fix pt_regs saving for atomic.S - fix fixaddr_init without highmem. - fix stack protector support - fix fake Tightly-Coupled Memory code compile and use - fix some typos and coding convention" * tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux: (23 commits) csky: Replace <linux/clk-provider.h> by <linux/of_clk.h> csky: Implement copy_thread_tls csky: Add PCI support csky: Minimize defconfig to support buildroot config.fragment csky: Add setup_initrd check code csky: Cleanup old Kconfig options arch/csky: fix some Kconfig typos csky: Fixup compile warning for three unimplemented syscalls csky: Remove unused cache implementation csky: Fixup ftrace modify panic csky: Add flush_icache_mm to defer flush icache all csky: Optimize abiv2 copy_to_user_page with VM_EXEC csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860) csky: Remove unnecessary flush_icache_* implementation csky: Support icache flush without specific instructions csky/Kconfig: Add Kconfig.platforms to support some drivers csky/smp: Fixup boot failed when CONFIG_SMP csky: Set regs->usp to kernel sp, when the exception is from kernel csky/mm: Fixup export invalid_pte_table symbol csky: Separate fixaddr_init from highmem ...
2020-02-23csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>Geert Uytterhoeven
The C-Sky platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
2020-02-22Merge tag 'ras-urgent-2020-02-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two fixes for the AMD MCE driver: - Populate the per CPU MCA bank descriptor pointer only after it has been completely set up to prevent a use-after-free in case that one of the subsequent initialization step fails - Implement a proper release function for the sysfs entries of MCA threshold controls instead of freeing the memory right in the CPU teardown code, which leads to another use-after-free when the associated sysfs file is opened and accessed" * tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/amd: Fix kobject lifetime x86/mce/amd: Publish the bank pointer only after setup has succeeded
2020-02-22Merge tag 'irq-urgent-2020-02-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "Two fixes for the irq core code which are follow ups to the recent MSI fixes: - The WARN_ON which was put into the MSI setaffinity callback for paranoia reasons actually triggered via a callchain which escaped when all the possible ways to reach that code were analyzed. The proc/irq/$N/*affinity interfaces have a quirk which came in when ALPHA moved to the generic interface: In case that the written affinity mask does not contain any online CPU it calls into ALPHAs magic auto affinity setting code. A few years later this mechanism was also made available to x86 for no good reasons and in a way which circumvents all sanity checks for interrupts which cannot have their affinity set from process context on X86 due to the way the X86 interrupt delivery works. It would be possible to make this work properly, but there is no point in doing so. If the interrupt is not yet started then the affinity setting has no effect and if it is started already then it is already assigned to an online CPU so there is no point to randomly move it to some other CPU. Just return EINVAL as the code has done before that change forever. - The new MSI quirk bit in the irq domain flags turned out to be already occupied, which escaped the author and the reviewers because the already in use bits were 0,6,2,3,4,5 listed in that order. That bit 6 was simply overlooked because the ordering was straight forward linear otherwise. So the new bit ended up being a duplicate. Fix it up by switching the oddball 6 to the obvious 1" * tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/irqdomain: Make sure all irq domain flags are distinct genirq/proc: Reject invalid affinity masks (again)
2020-02-22Merge tag 'x86-urgent-2020-02-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for x86: - Remove the __force_oder definiton from the kaslr boot code as it is already defined in the page table code which makes GCC 10 builds fail because it changed the default to -fno-common. - Address the AMD erratum 1054 concerning the IRPERF capability and enable the Instructions Retired fixed counter on machines which are not affected by the erratum" * tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF x86/boot/compressed: Don't declare __force_order in kaslr_64.c
2020-02-22Merge tag 'zonefs-5.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: "A single patch fixing typos in the documentation file" * tag 'zonefs-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: fix documentation typos etc.