summaryrefslogtreecommitdiff
path: root/arch/arm64/include/asm/atomic_ll_sc.h
AgeCommit message (Collapse)Author
2019-08-30arm64: atomics: Use K constraint when toolchain appears to support itWill Deacon
The 'K' constraint is a documented AArch64 machine constraint supported by GCC for matching integer constants that can be used with a 32-bit logical instruction. Unfortunately, some released compilers erroneously accept the immediate '4294967295' for this constraint, which is later refused by GAS at assembly time. This had led us to avoid the use of the 'K' constraint altogether. Instead, detect whether the compiler is up to the job when building the kernel and pass the 'K' constraint to our 32-bit atomic macros when it appears to be supported. Signed-off-by: Will Deacon <will@kernel.org>
2019-08-29arm64: atomics: avoid out-of-line ll/sc atomicsAndrew Murray
When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware or toolchain doesn't support it the existing code will fallback to ll/sc atomics. It achieves this by branching from inline assembly to a function that is built with special compile flags. Further this results in the clobbering of registers even when the fallback isn't used increasing register pressure. Improve this by providing inline implementations of both LSE and ll/sc and use a static key to select between them, which allows for the compiler to generate better atomics code. Put the LL/SC fallback atomics in their own subsection to improve icache performance. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-29arm64: Use correct ll/sc atomic constraintsAndrew Murray
The A64 ISA accepts distinct (but overlapping) ranges of immediates for: * add arithmetic instructions ('I' machine constraint) * sub arithmetic instructions ('J' machine constraint) * 32-bit logical instructions ('K' machine constraint) * 64-bit logical instructions ('L' machine constraint) ... but we currently use the 'I' constraint for many atomic operations using sub or logical instructions, which is not always valid. When CONFIG_ARM64_LSE_ATOMICS is not set, this allows invalid immediates to be passed to instructions, potentially resulting in a build failure. When CONFIG_ARM64_LSE_ATOMICS is selected the out-of-line ll/sc atomics always use a register as they have no visibility of the value passed by the caller. This patch adds a constraint parameter to the ATOMIC_xx and __CMPXCHG_CASE macros so that we can pass appropriate constraints for each case, with uses updated accordingly. Unfortunately prior to GCC 8.1.0 the 'K' constraint erroneously accepted '4294967295', so we must instead force the use of a register. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-07-08Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - rwsem scalability improvements, phase #2, by Waiman Long, which are rather impressive: "On a 2-socket 40-core 80-thread Skylake system with 40 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255 After the patchset, they became: 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098" There's a lot of changes to the locking implementation that makes it similar to qrwlock, including owner handoff for more fair locking. Another microbenchmark shows how across the spectrum the improvements are: "With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 2-socket Skylake system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 2,618 4,193 4 1,202 3,726 8 802 3,622 16 729 3,359 32 319 2,826 64 102 2,744" The changes are extensive and the patch-set has been through several iterations addressing various locking workloads. There might be more regressions, but unless they are pathological I believe we want to use this new implementation as the baseline going forward. - jump-label optimizations by Daniel Bristot de Oliveira: the primary motivation was to remove IPI disturbance of isolated RT-workload CPUs, which resulted in the implementation of batched jump-label updates. Beyond the improvement of the real-time characteristics kernel, in one test this patchset improved static key update overhead from 57 msecs to just 1.4 msecs - which is a nice speedup as well. - atomic64_t cross-arch type cleanups by Mark Rutland: over the last ~10 years of atomic64_t existence the various types used by the APIs only had to be self-consistent within each architecture - which means they became wildly inconsistent across architectures. Mark puts and end to this by reworking all the atomic64 implementations to use 's64' as the base type for atomic64_t, and to ensure that this type is consistently used for parameters and return values in the API, avoiding further problems in this area. - A large set of small improvements to lockdep by Yuyang Du: type cleanups, output cleanups, function return type and othr cleanups all around the place. - A set of percpu ops cleanups and fixes by Peter Zijlstra. - Misc other changes - please see the Git log for more details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits) locking/lockdep: increase size of counters for lockdep statistics locking/atomics: Use sed(1) instead of non-standard head(1) option locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING x86/jump_label: Make tp_vec_nr static x86/percpu: Optimize raw_cpu_xchg() x86/percpu, sched/fair: Avoid local_clock() x86/percpu, x86/irq: Relax {set,get}_irq_regs() x86/percpu: Relax smp_processor_id() x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() locking/rwsem: Guard against making count negative locking/rwsem: Adaptive disabling of reader optimistic spinning locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Make rwsem->owner an atomic_long_t locking/rwsem: Enable readers spinning on writer locking/rwsem: Clarify usage of owner's nonspinaable bit locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: More optimal RT task handling of null owner locking/rwsem: Always release wait_lock before waking up tasks locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Make rwsem_spin_on_owner() return owner state ...
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-03locking/atomic, arm64: Use s64 for atomic64Mark Rutland
As a step towards making the atomic64 API use consistent types treewide, let's have the arm64 atomic64 implementation use s64 as the underlying type for atomic64_t, rather than long, matching the generated headers. As atomic64_read() depends on the generic defintion of atomic64_t, this still returns long. This will be converted in a subsequent patch. Note that in arch_atomic64_dec_if_positive(), the x0 variable is left as long, as this variable is also used to hold the pointer to the atomic64_t. Otherwise, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: aou@eecs.berkeley.edu Cc: arnd@arndb.de Cc: bp@alien8.de Cc: davem@davemloft.net Cc: fenghua.yu@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: ink@jurassic.park.msu.ru Cc: jhogan@kernel.org Cc: linux@armlinux.org.uk Cc: mattst88@gmail.com Cc: mpe@ellerman.id.au Cc: palmer@sifive.com Cc: paul.burton@mips.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: tony.luck@intel.com Cc: vgupta@synopsys.com Link: https://lkml.kernel.org/r/20190522132250.26499-8-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11Merge branch 'locking/atomics' into locking/core, to pick up WIP commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-07arm64: cmpxchg: Use "K" instead of "L" for ll/sc immediate constraintWill Deacon
The "L" AArch64 machine constraint, which we use for the "old" value in an LL/SC cmpxchg(), generates an immediate that is suitable for a 64-bit logical instruction. However, for cmpxchg() operations on types smaller than 64 bits, this constraint can result in an invalid instruction which is correctly rejected by GAS, such as EOR W1, W1, #0xffffffff. Whilst we could special-case the constraint based on the cmpxchg size, it's far easier to change the constraint to "K" and put up with using a register for large 64-bit immediates. For out-of-line LL/SC atomics, this is all moot anyway. Reported-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07arm64: Avoid masking "old" for LSE cmpxchg() implementationWill Deacon
The CAS instructions implicitly access only the relevant bits of the "old" argument, so there is no need for explicit masking via type-casting as there is in the LL/SC implementation. Move the casting into the LL/SC code and remove it altogether for the LSE implementation. Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07arm64: Avoid redundant type conversions in xchg() and cmpxchg()Will Deacon
Our atomic instructions (either LSE atomics of LDXR/STXR sequences) natively support byte, half-word, word and double-word memory accesses so there is no need to mask the data register prior to being stored. Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-01arm64, locking/atomics: Use instrumented atomicsMark Rutland
Now that the generic atomic headers provide instrumented wrappers of all the atomics implemented by arm64, let's migrate arm64 over to these. The additional instrumentation will help to find bugs (e.g. when fuzzing with Syzkaller). Mostly this change involves adding an arch_ prefix to a number of function names and macro definitions. When LSE atomics are used, the out-of-line LL/SC atomics will be named __ll_sc_arch_atomic_${OP}. Adding the arch_ prefix requires some whitespace fixups to keep things aligned. Some other unusual whitespace is fixed up at the same time (e.g. in the cmpxchg wrappers). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linuxdrivers@attotech.com Cc: dvyukov@google.com Cc: boqun.feng@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: glider@google.com Link: http://lkml.kernel.org/r/20180904104830.2975-7-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-15arm64: Remove redundant mov from LL/SC cmpxchgRobin Murphy
The cmpxchg implementation introduced by commit c342f78217e8 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU") performs an apparently redundant register move of [old] to [oldval] in the success case - it always uses the same register width as [oldval] was originally loaded with, and is only executed when [old] and [oldval] are known to be equal anyway. The only effect it seemingly does have is to take up a surprising amount of space in the kernel text, as removing it reveals: text data bss dec hex filename 12426658 1348614 4499749 18275021 116dacd vmlinux.o.new 12429238 1348614 4499749 18277601 116e4e1 vmlinux.o.old Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-06-16locking/atomic, arch/arm64: Implement ↵Peter Zijlstra
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}() Implement FETCH-OP atomic primitives, these are very similar to the existing OP-RETURN primitives we already have, except they return the value of the atomic variable _before_ modification. This is especially useful for irreversible operations -- such as bitops (because it becomes impossible to reconstruct the state prior to modification). [wildea01: compile fixes for ll/sc] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-05arm64: cmpxchg_dbl: fix return value typeLorenzo Pieralisi
The current arm64 __cmpxchg_double{_mb} implementations carry out the compare exchange by first comparing the old values passed in to the values read from the pointer provided and by stashing the cumulative bitwise difference in a 64-bit register. By comparing the register content against 0, it is possible to detect if the values read differ from the old values passed in, so that the compare exchange detects whether it has to bail out or carry on completing the operation with the exchange. Given the current implementation, to detect the cmpxchg operation status, the __cmpxchg_double{_mb} functions should return the 64-bit stashed bitwise difference so that the caller can detect cmpxchg failure by comparing the return value content against 0. The current implementation declares the return value as an int, which means that the 64-bit value stashing the bitwise difference is truncated before being returned to the __cmpxchg_double{_mb} callers, which means that any bitwise difference present in the top 32 bits goes undetected, triggering false positives and subsequent kernel failures. This patch fixes the issue by declaring the arm64 __cmpxchg_double{_mb} return values as a long, so that the bitwise difference is properly propagated on failure, restoring the expected behaviour. Fixes: e9a4b795652f ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Cc: <stable@vger.kernel.org> # 4.3+ Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-12arm64: atomics: implement native {relaxed, acquire, release} atomicsWill Deacon
Commit 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operation") introduced a relaxed atomic API to Linux that maps nicely onto the arm64 memory model, including the new ARMv8.1 atomic instructions. This patch hooks up the API to our relaxed atomic instructions, rather than have them all expand to the full-barrier variants as they do currently. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-08-04arm64: make ll/sc __cmpxchg_case_##name asm consistentMark Rutland
The ll/sc __cmpxchg_case_##name assembly mostly uses symbolic names for operands, but in a single case uses %2 to refer to what is otherwise known as %[v]. This makes the code more painful to read than is necessary. Use %[v] instead. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: atomic64_dec_if_positive: fix incorrect branch conditionWill Deacon
If we attempt to atomic64_dec_if_positive on INT_MIN, we will underflow and incorrectly decide that the original parameter was positive. This patches fixes the broken condition code so that we handle this corner case correctly. Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: atomics: implement atomic{,64}_cmpxchg using cmpxchgWill Deacon
We don't need duplicate cmpxchg implementations, so use cmpxchg to implement atomic{,64}_cmpxchg, like we do for xchg already. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: atomics: prefetch the destination word for write prior to stxrWill Deacon
The cost of changing a cacheline from shared to exclusive state can be significant, especially when this is triggered by an exclusive store, since it may result in having to retry the transaction. This patch makes use of prfm to prefetch cachelines for write prior to ldxr/stxr loops when using the ll/sc atomic routines. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: cmpxchg: avoid memory barrier on comparison failureWill Deacon
cmpxchg doesn't require memory barrier semantics when the value comparison fails, so make the barrier conditional on success. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: cmpxchg: avoid "cc" clobber in ll/sc routinesWill Deacon
We can perform the cmpxchg comparison using eor and cbnz which avoids the "cc" clobber for the ll/sc case and consequently for the LSE case where we may have to fall-back on the ll/sc code at runtime. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPUWill Deacon
On CPUs which support the LSE atomic instructions introduced in ARMv8.1, it makes sense to use them in preference to ll/sc sequences. This patch introduces runtime patching of our cmpxchg_double primitives so that the LSE casp instruction is used instead. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: cmpxchg: patch in lse instructions when supported by the CPUWill Deacon
On CPUs which support the LSE atomic instructions introduced in ARMv8.1, it makes sense to use them in preference to ll/sc sequences. This patch introduces runtime patching of our cmpxchg primitives so that the LSE cas instruction is used instead. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: atomics: patch in lse instructions when supported by the CPUWill Deacon
On CPUs which support the LSE atomic instructions introduced in ARMv8.1, it makes sense to use them in preference to ll/sc sequences. This patch introduces runtime patching of atomic_t and atomic64_t routines so that the call-site for the out-of-line ll/sc sequences is patched with an LSE atomic instruction when we detect that the CPU supports it. If binutils is not recent enough to assemble the LSE instructions, then the ll/sc sequences are inlined as though CONFIG_ARM64_LSE_ATOMICS=n. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomicsWill Deacon
In order to patch in the new atomic instructions at runtime, we need to generate wrappers around the out-of-line exclusive load/store atomics. This patch adds a new Kconfig option, CONFIG_ARM64_LSE_ATOMICS. which causes our atomic functions to branch to the out-of-line ll/sc implementations. To avoid the register spill overhead of the PCS, the out-of-line functions are compiled with specific compiler flags to force out-of-line save/restore of any registers that are usually caller-saved. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27arm64: atomics: move ll/sc atomics into separate header fileWill Deacon
In preparation for the Large System Extension (LSE) atomic instructions introduced by ARM v8.1, move the current exclusive load/store (LL/SC) atomics into their own header file. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>