Age | Commit message (Collapse) | Author |
|
Reduce system call overhead time (round trip time for invoking a
non-existent system call) by 25%.
With the removal of set_fs() [1] lazy control register handling was removed
in order to keep kernel entry and exit simple. However this made system
calls slower.
With the conversion to generic entry [2] and numerous follow up changes
which simplified the entry code significantly, adding support for lazy asce
handling doesn't add much complexity to the entry code anymore.
In particular this means:
- On kernel entry the primary asce is not modified and contains the user
asce
- Kernel accesses which require secondary-space mode (for example futex
operations) are surrounded by enable_sacf_uaccess() and
disable_sacf_uaccess() calls. enable_sacf_uaccess() sets the primary asce
to kernel asce so that the sacf instruction can be used to switch to
secondary-space mode. The primary asce is changed back to user asce with
disable_sacf_uaccess().
The state of the control register which contains the primary asce is
reflected with a new TIF_ASCE_PRIMARY bit. This is required on context
switch so that the correct asce is restored for the scheduled in process.
In result address spaces are now setup like this:
CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
-----------------------------|-----------|-----------|-----------
user space | user | user | kernel
kernel (no sacf) | user | user | kernel
kernel (during sacf uaccess) | kernel | user | kernel
kernel (kvm guest execution) | guest | user | kernel
In result cr1 control register content is not changed except for:
- futex system calls
- legacy s390 PCI system calls
- the kvm specific cmpxchg_user_key() uaccess helper
This leads to faster system call execution.
[1] 87d598634521 ("s390/mm: remove set_fs / rework address space handling")
[2] 56e62a737028 ("s390: convert to generic entry")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Use asm_inline for all inline assemblies which make use of the EX_TABLE or
ALTERNATIVE macros.
These macros expand to many lines and the compiler assumes the number of
lines within an inline assembly is the same as the number of instructions
within an inline assembly. This has an effect on inlining and loop
unrolling decisions.
In order to avoid incorrect assumptions use asm_inline, which tells the
compiler that an inline assembly has the smallest possible size.
In order to avoid confusion when asm_inline should be used or not, since a
couple of inline assemblies are quite large: the rule is to always use
asm_inline whenever the EX_TABLE or ALTERNATIVE macro is used. In specific
cases there may be reasons to not follow this guideline, but that should
be documented with the corresponding code.
Using the inline qualifier everywhere has only a small effect on the kernel
image size:
add/remove: 0/10 grow/shrink: 19/8 up/down: 1492/-1858 (-366)
The only location where this seems to matter is load_unaligned_zeropad()
from word-at-a-time.h where the compiler inlines more functions within the
dcache code, which is indeed code where performance matters.
Suggested-by: Juergen Christ <jchrist@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Rework __clear_user() similar to raw_copy_from_user() / raw_copy_to_user()
and inline the function saving the overhead of branches.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
constant sizes
Avoid that the compiler generates an mvcos loop for constant sizes
smaller than 4096 bytes. The mvcos instruction copies between zero and
4096 bytes (effective length) with one operation. Therefore it is not
necessary to implement a loop for sizes smaller or equal to 4096
bytes.
This reduces the kernel text size by ~50kb (defconfig, gcc 14.2.0):
add/remove: 4/5 grow/shrink: 6/471 up/down: 2294/-51700 (-49406)
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Inline copy_from_user() and copy_to_user(). With the shortened inline
assemblies of raw_copy_to_user() and raw_copy_from_user() the additional
kernel text size is acceptable, considering that this avoids function
calls on hot paths.
This increases the kernel text size by ~90kb (defconfig, gcc 14.2.0):
add/remove: 13/4 grow/shrink: 650/14 up/down: 93484/-3254 (90230)
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Implement separate raw_copy_to_user_key() and raw_copy_from_user_key()
functions, which allows to remove the open-coded operand access control
handling from the normal raw_copy_to_user() / raw_copy_from_user()
functions - they are simplified to use immediate instructions to load
hard-coded operand access control values into register zero, which saves
one instruction.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
assemblies
Add specific exception handler for copy_to_user() / copy_from_user()
mvcos fault handling, which allows to shorten the inline assemblies to
three instructions.
On fault the exception handler adjusts the length used by the mvcos
instruction in a way that the instruction completes with condition code
zero, indicating the number of bytes copied with the input/output operand
'size'. This allows to calculate and return the number of bytes not copied,
if any, like required.
Loop and return value handling is changed to C so that the compiler may
optimize the code.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Rename get_put_user_noinstr_attributes to a more generic
uaccess_kmsan_or_inline name. This allows to use it for other non
put_user()/get_user() uaccess functions withour causing confusion.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The s390 implementations of raw_copy_from_user() and raw_copy_to_user() are
never inlined. However INLINE_COPY_FROM_USER and INLINE_COPY_TO_USER are
still set. This leads to the odd situation that only the error handling
(memset to zero of the not copied bytes) of copy_from_user() is inlined,
while the actual fast path code is out-of-line.
This would make sense if raw_copy_from_user() and raw_copy_to_user() were
implemented in assembler files, where inlining is not possible. But the
current s390 setup does not make any sense.
Address this by moving the raw uaccess copy inline assemblies to the
uaccess header file, and remove INLINE_COPY_FROM_USER and
INLINE_COPY_TO_USER definitions. This way the uaccess code, but now
including error handling, is still out-of-line with the common code
_copy_from_user() and _copy_to_user() variants, which inline the raw
uaccess functions via _inline_copy_from_user() and _inline_copy_to_user().
This reduces the size of the kernel image by ~17kb.
(defconfig, gcc 14.2.0)
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Use asm goto if available for put_user() and get_user().
This generates slightly better code.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Remove usage of the operand access control specifier for put_user()
and get_user() (again). Instead hardcode the specifier for both inline
assemblies. This saves one instruction.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Remove EX_TABLE_UA_LOAD_MEM exception handling and replace it with
EX_TABLE_UA_FAULT. Open code the return code check, and also open code
setting of the destination to zero in case of an error. In almost all cases
the compiler is able to optimize the open coded checks away, since all
users of get_users() must check the return code, and are not supposed to
use the result in case of an error.
In addition this allows to change the get_user() inline assembly so that
the "Q" constraint can be used for the destination, instead of only an "a"
constraint. This generates slightly better code.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
constraints
Remove superfluous underscores, brackets, and early clobber to make
the code a bit more readable.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The __put_user_fn() and __get_user_fn() wrappers are leftovers from the
time where the kernel was compiled with or without mvcos support plus they
were later used to workaround the problems that came with asm register
constructs.
Both reasons do not exist anymore, therefore remove the wrappers and
shorten the code.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Keep put_user() and get_user() code separated.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Use asm goto for __mvc_kernel_nofault() if available. This generates
slightly better code, since the error checking happens implicitly with
the goto (aka exception) and the good path comes without any checks and
branches.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Use the mvc instruction in order to implement __get_kernel_nofault() and
__put_kernel_nofault(). Both functions have a source and destination
address where the code is supposed to read from and write to. Use the mvc
instruction to copy from source to destination instead of lg/stg like
instructions. This generates slightly better code.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Rename EX_TABLE_UA_STORE to a more generic EX_TABLE_UA_FAULT
name. This allows to use the extable type also for uaccess inline
assemblies which read from userspace, without causing confusion.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
To avoid lots of ifdefs in C code make s390_kernel_write() usable for
the decompressor: simply use memcpy() for this case since there is no
write protection enabled that early.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
uaccess.h uses instrument_get_user() and instrument_put_user(), which are
defined in linux/instrumented.h. Currently we get this header from
somewhere else by accident; prefer to be explicit about it and include it
directly.
Link: https://lkml.kernel.org/r/20240621113706.315500-36-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Suggested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <kasan-dev@googlegroups.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
put_user() uses inline assembly with precise constraints, so Clang is in
principle capable of instrumenting it automatically. Unfortunately, one
of the constraints contains a dereferenced user pointer, and Clang does
not currently distinguish user and kernel pointers. Therefore KMSAN
attempts to access shadow for user pointers, which is not a right thing to
do.
An obvious fix to add __no_sanitize_memory to __put_user_fn() does not
work, since it's __always_inline. And __always_inline cannot be removed
due to the __put_user_bad() trick.
A different obvious fix of using the "a" instead of the "+Q" constraint
degrades the code quality, which is very important here, since it's a hot
path.
Instead, repurpose the __put_user_asm() macro to define
__put_user_{char,short,int,long}_noinstr() functions and mark them with
__no_sanitize_memory. For the non-KMSAN builds make them __always_inline
in order to keep the generated code quality. Also define
__put_user_{char,short,int,long}() functions, which call the
aforementioned ones and which *are* instrumented, because they call KMSAN
hooks, which may be implemented as macros.
The same applies to get_user() as well.
Link: https://lkml.kernel.org/r/20240621113706.315500-35-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <kasan-dev@googlegroups.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Control register handling has nothing to do with low level SMP code.
Move it to a separate file.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
__cmpxchg_user_key() uses 128 bit types which, depending on compiler
and config options, may lead to an __ashlti3() library call.
Get rid of that by simply casting the 128 bit values to 32 bit values.
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Fixes: 51098f0eb22e ("s390/cmpxchg: make loop condition for 1,2 byte cases precise")
Link: https://lore.kernel.org/all/4b96b112d5415d08a81d30657feec2c8c3000f7c.camel@linux.ibm.com/
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
cmpxchg_user_key() for byte and short values is implemented via a one word
cmpxchg loop. Give up trying to perform the cmpxchg if it fails too often
because of contention on the cache line. This ensures that the thread
cannot become stuck in the kernel.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20221117100745.3253896-1-scgl@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The cmpxchg implementation for 1 and 2 bytes consists of a 4 byte
cmpxchg loop. Currently, the decision to retry is imprecise, looping if
bits outside the target byte(s) change instead of retrying until the
target byte(s) differ from the old value.
E.g. if an attempt to exchange (prev_left_0 old_bytes prev_right_0) is
made and it fails because the word at the address is
(prev_left_1 x prev_right_1) where both x != old_bytes and one of the
prev_*_1 values differs from the respective prev_*_0 value, the cmpxchg
is retried, even if by a semantic equivalent to a normal cmpxchg, the
exchange would fail.
Instead exit the loop if x != old_bytes and retry otherwise.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20221116144711.3811011-1-scgl@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add cmpxchg_user_key() which allows to execute a compare and exchange
on a user space address. This allows also to specify a storage key
which makes sure that key-controlled protection is considered.
This is based on a patch written by Janis Schoetterl-Glausch.
Link: https://lore.kernel.org/all/20220930210751.225873-2-scgl@linux.ibm.com
Cc: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/Y2J8axs+bcQ2dO/l@osiris
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Rework copy_oldmem_page() callback to take an iov_iter.
This includes a few prerequisite updates and fixes to the oldmem
reading code.
- Rework cpufeature implementation to allow for various CPU feature
indications, which is not only limited to hardware capabilities, but
also allows CPU facilities.
- Use the cpufeature rework to autoload Ultravisor module when CPU
facility 158 is available.
- Add ELF note type for encrypted CPU state of a protected virtual CPU.
The zgetdump tool from s390-tools package will decrypt the CPU state
using a Customer Communication Key and overwrite respective notes to
make the data accessible for crash and other debugging tools.
- Use vzalloc() instead of vmalloc() + memset() in ChaCha20 crypto
test.
- Fix incorrect recovery of kretprobe modified return address in
stacktrace.
- Switch the NMI handler to use generic irqentry_nmi_enter() and
irqentry_nmi_exit() helper functions.
- Rework the cryptographic Adjunct Processors (AP) pass-through design
to support dynamic changes to the AP matrix of a running guest as
well as to implement more of the AP architecture.
- Minor boot code cleanups.
- Grammar and typo fixes to hmcdrv and tape drivers.
* tag 's390-5.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
Revert "s390/smp: enforce lowcore protection on CPU restart"
Revert "s390/smp: rework absolute lowcore access"
Revert "s390/smp,ptdump: add absolute lowcore markers"
s390/unwind: fix fgraph return address recovery
s390/nmi: use irqentry_nmi_enter()/irqentry_nmi_exit()
s390: add ELF note type for encrypted CPU state of a PV VCPU
s390/smp,ptdump: add absolute lowcore markers
s390/smp: rework absolute lowcore access
s390/setup: rearrange absolute lowcore initialization
s390/boot: cleanup adjust_to_uv_max() function
s390/smp: enforce lowcore protection on CPU restart
s390/tape: fix comment typo
s390/hmcdrv: fix Kconfig "its" grammar
s390/docs: fix warnings for vfio_ap driver doc
s390/docs: fix warnings for vfio_ap driver lock usage doc
s390/crash: support multi-segment iterators
s390/crash: use static swap buffer for copy_to_user_real()
s390/crash: move copy_to_user_real() to crash_dump.c
s390/zcore: fix race when reading from hardware system area
s390/crash: fix incorrect number of bytes to copy to user space
...
|
|
Function copy_to_user_real() does not really belong to maccess.c.
It is only used for copying oldmem to user space, so let's move
it to the friends.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Link: https://lore.kernel.org/r/e8de968d40202d87caa09aef12e9c67ec23a1c1a.1658206891.git.agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
it's inline and unlikely() inside of it (including the implicit one
in WARN_ON_ONCE()) suffice to convince the compiler that getting
false from check_copy_size() is unlikely.
Spotted-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Whitespace cleanup to get rid if some checkpatch findings, but mainly
to have consistent coding style within the header file again.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Historically the uaccess code pre-initializes the result of get_user()
(and now also __get_kernel_nofault()) to zero and uses the result as
input parameter for inline assemblies. This is different to what most,
if not all, other architectures are doing, which set the result to
zero within the exception handler in case of a fault.
Use the new extable mechanism and handle zeroing of the result within
the exception handler in case of a fault.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Make code easier to read by using symbolic names.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Raise minimum supported machine generation to z10, which comes with
various cleanups and code simplifications (usercopy/spectre
mitigation/etc).
- Rework extables and get rid of anonymous out-of-line fixups.
- Page table helpers cleanup. Add set_pXd()/set_pte() helper functions.
Covert pte_val()/pXd_val() macros to functions.
- Optimize kretprobe handling by avoiding extra kprobe on
__kretprobe_trampoline.
- Add support for CEX8 crypto cards.
- Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.
- Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
group sections which simplifies kpatch support.
- Always use the packed stack layout and extend kernel unwinder tests.
- Add sanity checks for ftrace code patching.
- Add s390dbf debug log for the vfio_ap device driver.
- Various virtual vs physical address confusion fixes.
- Various small fixes and improvements all over the code.
* tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (69 commits)
s390/test_unwind: add kretprobe tests
s390/kprobes: Avoid additional kprobe in kretprobe handling
s390: convert ".insn" encoding to instruction names
s390: assume stckf is always present
s390/nospec: move to single register thunks
s390: raise minimum supported machine generation to z10
s390/uaccess: Add copy_from/to_user_key functions
s390/nospec: align and size extern thunks
s390/nospec: add an option to use thunk-extern
s390/nospec: generate single register thunks if possible
s390/pci: make zpci_set_irq()/zpci_clear_irq() static
s390: remove unused expoline to BC instructions
s390/irq: use assignment instead of cast
s390/traps: get rid of magic cast for per code
s390/traps: get rid of magic cast for program interruption code
s390/signal: fix typo in comments
s390/asm-offsets: remove unused defines
s390/test_unwind: avoid build warning with W=1
s390: remove .fixup section
s390/bpf: encode register within extable entry
...
|
|
Pull kvm updates from Paolo Bonzini:
"ARM:
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical
sections that last too long for very large guests backed by 4
KiB SPTEs.
- Zap invalid and defunct roots asynchronously via
concurrency-managed work queue.
- Allowing yielding when zapping TDP MMU roots in response to the
root's last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the
paging structure now holds RCU as a proxy for all vCPUs running
in the guest, i.e. to prolongs the grace period on their behalf.
It then kicks the the vCPUs out of guest mode before doing
rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that need
memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
KVM: use kvcalloc for array allocations
KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
kvm: x86: Require const tsc for RT
KVM: x86: synthesize CPUID leaf 0x80000021h if useful
KVM: x86: add support for CPUID leaf 0x80000021
KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
KVM: arm64: fix typos in comments
KVM: arm64: Generalise VM features into a set of flags
KVM: s390: selftests: Add error memop tests
KVM: s390: selftests: Add more copy memop tests
KVM: s390: selftests: Add named stages for memop test
KVM: s390: selftests: Add macro as abstraction for MEM_OP
KVM: s390: selftests: Split memop tests
KVM: s390x: fix SCK locking
RISC-V: KVM: Implement SBI HSM suspend call
RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
RISC-V: Add SBI HSM suspend related defines
RISC-V: KVM: Implement SBI v0.3 SRST extension
...
|
|
Machine generations up to z9 (released in May 2006) have been officially
out of service for several years now (z9 end of service - January 31, 2019).
No distributions build kernels supporting those old machine generations
anymore, except Debian, which seems to pick the oldest supported
generation. The team supporting Debian on s390 has been notified about
the change.
Raising minimum supported machine generation to z10 helps to reduce
maintenance cost and effectively remove code, which is not getting
enough testing coverage due to lack of older hardware and distributions
support. Besides that this unblocks some optimization opportunities and
allows to use wider instruction set in asm files for future features
implementation. Due to this change spectre mitigation and usercopy
implementations could be drastically simplified and many newer instructions
could be converted from ".insn" encoding to instruction names.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add copy_from/to_user_key functions, which perform storage key checking.
These functions can be used by KVM for emulating instructions that need
to be key checked.
These functions differ from their non _key counterparts in
include/linux/uaccess.h only in the additional key argument and must be
kept in sync with those.
Since the existing uaccess implementation on s390 makes use of move
instructions that support having an additional access key supplied,
we can implement raw_copy_from/to_user_key by enhancing the
existing implementation.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20220211182215.2730017-2-scgl@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This is more or less a combination of commit 2e77a62cb3a6 ("arm64:
extable: add a dedicated uaccess handler") and commit 4b5305decc84
("x86/extable: Extend extable functionality").
To describe the problem that needs to solved let's cite the full arm64
commit message:
------
For inline assembly, we place exception fixups out-of-line in the
`.fixup` section such that these are out of the way of the fast path.
This has a few drawbacks:
* Since the fixup code is anonymous, backtraces will symbolize fixups
as offsets from the nearest prior symbol, currently
`__entry_tramp_text_end`. This is confusing, and painful to debug
without access to the relevant vmlinux.
* Since the exception handler adjusts the PC to execute the fixup, and
the fixup uses a direct branch back into the function it fixes,
backtraces of fixups miss the original function. This is confusing,
and violates requirements for RELIABLE_STACKTRACE (and therefore
LIVEPATCH).
* Inline assembly and associated fixups are generated from templates,
and we have many copies of logically identical fixups which only
differ in which specific registers are written to and which address
is branched to at the end of the fixup. This is potentially wasteful
of I-cache resources, and makes it hard to add additional logic to
fixups without significant bloat.
This patch address all three concerns for inline uaccess fixups by
adding a dedicated exception handler which updates registers in
exception context and subsequent returns back into the function which
faulted, removing the need for fixups specialized to each faulting
instruction.
Other than backtracing, there should be no functional change as a result
of this patch.
------
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Follow arm64 and riscv and move the EX_TABLE define to asm-extable.h
which is a lot less generic than the current linkage.h.
Also make sure that all files which contain EX_TABLE usages actually
include the new header file. This should make sure that the files
always compile and there won't be any random compile breakage due to
other header file dependencies.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.
Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.
For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.
Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.
Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Nine architectures are still missing __{get,put}_kernel_nofault:
alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa.
Add a generic version that lets everything use the normal
copy_{from,to}_kernel_nofault() code based on these, removing the last
use of get_fs()/set_fs() from architecture-independent code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Add copy_from/to_user_key functions, which perform storage key checking.
These functions can be used by KVM for emulating instructions that need
to be key checked.
These functions differ from their non _key counterparts in
include/linux/uaccess.h only in the additional key argument and must be
kept in sync with those.
Since the existing uaccess implementation on s390 makes use of move
instructions that support having an additional access key supplied,
we can implement raw_copy_from/to_user_key by enhancing the
existing implementation.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20220211182215.2730017-2-scgl@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
There is a confusion with regard to the source address of
memcpy_real() and calling functions. While the declared
type for a source assumes a virtual address, in fact it
always called with physical address of the source.
This confusion led to bugs in copy_oldmem_kernel() and
copy_oldmem_user() functions, where __pa() macro applied
mistakenly to physical addresses. It does not lead to a
real issue, since virtual and physical addresses are
currently the same.
Fix both the bugs and memcpy_real() prototype by making
type of source address consistent to the function name
and the way it actually used.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Compiling with e.g MARCH=z900 results in compile errors:
arch/s390/lib/uaccess.c: In function 'copy_from_user_mvcos':
>> arch/s390/lib/uaccess.c:65:15: error: variable 'spec' has initializer but incomplete type
65 | union oac spec = {
Therefore make definition of union oac visible for all MARCHs.
Reported-by: kernel test robot <lkp@intel.com>
Cc: Nico Boehr <nrb@linux.ibm.com>
Cc: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Fixes: 012a224e1fa3 ("s390/uaccess: introduce bit field for OAC specifier")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Previously, we've used magic values to specify the OAC
(operand-access control) for mvcos.
Instead we introduce a bit field for it.
When using a bit field, we cannot use an immediate value with K
constraint anymore, since GCC older than 10 doesn't recognize
the bit field union as a compile time constant.
To make things work with older compilers,
load the OAC value through a register.
Bloat-o-meter reports a slight increase in kernel size with this change:
Total: Before=15692135, After=15693015, chg +0.01%
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Co-developed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20220111100003.743116-1-scgl@linux.ibm.com
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
All users of compat_alloc_user_space() and copy_in_user() have been
removed from the kernel, only a few functions in sparc remain that can be
changed to calling arch_copy_in_user() instead.
Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The s390 variant of strncpy_from_user() is slightly faster than the
generic variant, however convert to the generic variant now to follow
most if not all other architectures.
Converting to the generic variant was already considered a couple of
years ago. See commit f5c8b9601036 ("s390/uaccess: use sane length for
__strncpy_from_user()").
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This patch converts s390 to use the generic entry infrastructure from
kernel/entry/*.
There are a few special things on s390:
- PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't
know about our PIF flags in exit_to_user_mode_loop().
- The old code had several ways to restart syscalls:
a) PIF_SYSCALL_RESTART, which was only set during execve to force a
restart after upgrading a process (usually qemu-kvm) to pgste page
table extensions.
b) PIF_SYSCALL, which is set by do_signal() to indicate that the
current syscall should be restarted. This is changed so that
do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use
PIF_SYSCALL doesn't work with the generic code, and changing it
to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART
more unique.
- On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by
executing a svc instruction on the process stack which causes a fault.
While handling that fault the fault code sets PIF_SYSCALL to hand over
processing to the syscall code on exit to usermode.
The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets
a return value for a syscall. The s390x ptrace ABI uses r2 both for the
syscall number and return value, so ptrace cannot set the syscall number +
return value at the same time. The flag makes handling that a bit easier.
do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET
is set.
CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY.
CR1/7/13 will be checked both on kernel entry and exit to contain the
correct asces.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Verify on exit to user space that always
- the primary ASCE (cr1) is set to kernel ASCE
- the secondary ASCE (cr7) is set to user ASCE
If this is not the case: panic since something went terribly wrong.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|