summaryrefslogtreecommitdiff
path: root/arch/s390/include/asm/processor.h
AgeCommit message (Collapse)Author
2021-01-19s390: pass struct pt_regs instead of registers to syscallsSven Schnelle
Instead of fetching all registers from struct pt_regs and passing them to the syscall wrappers, let the system call wrappers only fetch the values really required. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19s390: convert to generic entrySven Schnelle
This patch converts s390 to use the generic entry infrastructure from kernel/entry/*. There are a few special things on s390: - PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't know about our PIF flags in exit_to_user_mode_loop(). - The old code had several ways to restart syscalls: a) PIF_SYSCALL_RESTART, which was only set during execve to force a restart after upgrading a process (usually qemu-kvm) to pgste page table extensions. b) PIF_SYSCALL, which is set by do_signal() to indicate that the current syscall should be restarted. This is changed so that do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use PIF_SYSCALL doesn't work with the generic code, and changing it to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART more unique. - On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by executing a svc instruction on the process stack which causes a fault. While handling that fault the fault code sets PIF_SYSCALL to hand over processing to the syscall code on exit to usermode. The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets a return value for a syscall. The s390x ptrace ABI uses r2 both for the syscall number and return value, so ptrace cannot set the syscall number + return value at the same time. The flag makes handling that a bit easier. do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET is set. CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY. CR1/7/13 will be checked both on kernel entry and exit to contain the correct asces. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-12-16s390/idle: merge enabled_wait() and arch_cpu_idle()Heiko Carstens
The only caller of enabled_wait() besides arch_cpu_idle() was udelay(). Since that call doesn't exist anymore, merge enabled_wait() and arch_cpu_idle(). Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-16s390/delay: simplify udelayHeiko Carstens
udelay is implemented by using quite subtle details to make it possible to load an idle psw and waiting for an interrupt even in irq context or when interrupts are disabled. Also handling (or better: no handling) of softirqs is taken into account. All this is done to optimize for something which should in normal circumstances never happen: calling udelay to busy wait. Therefore get rid of the whole complexity and just busy loop like other architectures are doing it also. It could have been possible to use diag 0x44 instead of cpu_relax() in the busy loop, however we have seen too many bad things happen with diag 0x44 that it seems to be better to simply busy loop. Also note that with this new implementation kernel preemption does work when within the udelay loop. This did not work before. To get a feeling what the former code optimizes for: IPL'ing a kernel with 'defconfig' and afterwards compiling a kernel ends with a total of zero udelay calls. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-23s390/mm: remove set_fs / rework address space handlingHeiko Carstens
Remove set_fs support from s390. With doing this rework address space handling and simplify it. As a result address spaces are now setup like this: CPU running in | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE ----------------------------|-----------|-----------|----------- user space | user | user | kernel kernel, normal execution | kernel | user | kernel kernel, kvm guest execution | gmap | user | kernel To achieve this the getcpu vdso syscall is removed in order to avoid secondary address mode and a separate vdso address space in for user space. The getcpu vdso syscall will be implemented differently with a subsequent patch. The kernel accesses user space always via secondary address space. This happens in different ways: - with mvcos in home space mode and directly read/write to secondary address space - with mvcs/mvcp in primary space mode and copy from primary space to secondary space or vice versa - with e.g. cs in secondary space mode and access secondary space Switching translation modes happens with sacf before and after instructions which access user space, like before. Lazy handling of control register reloading is removed in the hope to make everything simpler, but at the cost of making kernel entry and exit a bit slower. That is: on kernel entry the primary asce is always changed to contain the kernel asce, and on kernel exit the primary asce is changed again so it contains the user asce. In kernel mode there is only one exception to the primary asce: when kvm guests are executed the primary asce contains the gmap asce (which describes the guest address space). The primary asce is reset to kernel asce whenever kvm guest execution is interrupted, so that this doesn't has to be taken into account for any user space accesses. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-11-09s390: remove unused s390_base_ext_handlerVasily Gorbik
s390_base_ext_handler_fn haven't been used since its introduction in commit ab14de6c37fa ("[S390] Convert memory detection into C code."). s390_base_ext_handler itself is currently falsely storing 16 registers at __LC_SAVE_AREA_ASYNC rewriting several following lowcore values: cpu_flags, return_psw, return_mcck_psw, sync_enter_timer and async_enter_timer. Besides that s390_base_ext_handler itself is only potentially hiding EXT interrupts which should not have happen in the first place. Any piece of code which requires EXT interrupts before fully functional ext_int_handler is enabled has to do it on its own, like this is done by sclp_early_cmd() which is doing EXT interrupts handling synchronously in sclp_early_wait_irq(). With s390_base_ext_handler removed unexpected EXT interrupt leads to disabled wait with the address 0x1b0 (__LC_EXT_NEW_PSW), which is currently setup in the decompressor. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-05-28s390: remove critical section cleanup from entry.SSven Schnelle
The current code is rather complex and caused a lot of subtle and hard to debug bugs in the past. Simplify the code by calling the system_call handler with interrupts disabled, save machine state, and re-enable them later. This requires significant changes to the machine check handling code as well. When the machine check interrupt arrived while being in kernel mode the new code will signal pending machine checks with a SIGP external call. When userspace was interrupted, the handler will switch to the kernel stack and directly execute s390_handle_mcck(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-04-04Merge tag 's390-5.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Update maintainers. Niklas Schnelle takes over zpci and Vineeth Vijayan common io code. - Extend cpuinfo to include topology information. - Add new extended counters for IBM z15 and sampling buffer allocation rework in perf code. - Add control over zeroing out memory during system restart. - CCA protected key block version 2 support and other fixes/improvements in crypto code. - Convert to new fallthrough; annotations. - Replace zero-length arrays with flexible-arrays. - QDIO debugfs and other small improvements. - Drop 2-level paging support optimization for compat tasks. Varios mm cleanups. - Remove broken and unused hibernate / power management support. - Remove fake numa support which does not bring any benefits. - Exclude offline CPUs from CPU topology masks to be more consistent with other architectures. - Prevent last branching instruction address leaking to userspace. - Other small various fixes and improvements all over the code. * tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (57 commits) s390/mm: cleanup init_new_context() callback s390/mm: cleanup virtual memory constants usage s390/mm: remove page table downgrade support s390/qdio: set qdio_irq->cdev at allocation time s390/qdio: remove unused function declarations s390/ccwgroup: remove pm support s390/ap: remove power management code from ap bus and drivers s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc s390/mm: cleanup arch_get_unmapped_area() and friends s390/ism: remove pm support s390/cio: use fallthrough; s390/vfio: use fallthrough; s390/zcrypt: use fallthrough; s390: use fallthrough; s390/cpum_sf: Fix wrong page count in error message s390/diag: fix display of diagnose call statistics s390/ap: Remove ap device suspend and resume callbacks s390/pci: Improve handling of unset UID s390/pci: Fix zpci_alloc_domain() over allocation s390/qdio: pass ISC as parameter to chsc_sadc() ...
2020-03-28s390/mm: cleanup virtual memory constants usageAlexander Gordeev
Remove duplicate definitions and consolidate usage of virutal and address translation constants. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-28s390/mm: remove page table downgrade supportAlexander Gordeev
This update consolidates page table handling code. Because there are hardly any 31-bit binaries left we do not need to optimize for that. No extra efforts are needed to ensure that a compat task does not map anything above 2GB. The TASK_SIZE limit for 31-bit tasks is 2GB already and the generic code does check that a resulting map address would not surpass that limit. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-10s390: prevent leaking kernel address in BEARSven Schnelle
When userspace executes a syscall or gets interrupted, BEAR contains a kernel address when returning to userspace. This make it pretty easy to figure out where the kernel is mapped even with KASLR enabled. To fix this, add lpswe to lowcore and always execute it there, so userspace sees only the lowcore address of lpswe. For this we have to extend both critical_cleanup and the SWITCH_ASYNC macro to also check for lpswe addresses in lowcore. Fixes: b2d24b97b2a9 ("s390/kernel: add support for kernel address space layout randomization (KASLR)") Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-02-19s390: remove obsolete ieee_emulation_warningsStephen Kitt
s390 math emulation was removed with commit 5a79859ae0f3 ("s390: remove 31 bit support"), rendering ieee_emulation_warnings useless. The code still built because it was protected by CONFIG_MATHEMU, which was no longer selectable. This patch removes the sysctl_ieee_emulation_warnings declaration and the sysctl entry declaration. Link: https://lkml.kernel.org/r/20200214172628.3598516-1-steve@sk2.org Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Stephen Kitt <steve@sk2.org> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390: always inline disabled_waitVasily Gorbik
disabled_wait uses _THIS_IP_ and assumes that compiler would inline it. Make sure this assumption is always correct by utilizing __always_inline. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31s390: always inline current_stack_pointer()Heiko Carstens
This function must be inlined since any caller expects the current stack pointer; which wouldn't be true if the function isn't inlined. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-03s390/base: remove unused s390_base_mcck_handlerVasily Gorbik
s390_base_mcck_handler was used during system reset if diag308 set was not available. But after commit d485235b0054 ("s390: assume diag308 set always works") is a dead code and could be removed. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-07-16arch: replace _BITUL() in kernel-space headers with BIT()Masahiro Yamada
Now that BIT() can be used from assembly code, we can safely replace _BITUL() with equivalent BIT(). UAPI headers are still required to use _BITUL(), but there is no more reason to use it in kernel headers. BIT() is shorter. Link: http://lkml.kernel.org/r/20190609153941.17249-2-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-15processor: get rid of cpu_relax_yieldHeiko Carstens
stop_machine is the only user left of cpu_relax_yield. Given that it now has special semantics which are tied to stop_machine introduce a weak stop_machine_yield function which architectures can override, and get rid of the generic cpu_relax_yield implementation. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-06-15s390: improve wait logic of stop_machineMartin Schwidefsky
The stop_machine loop to advance the state machine and to wait for all affected CPUs to check-in calls cpu_relax_yield in a tight loop until the last missing CPUs acknowledged the state transition. On a virtual system where not all logical CPUs are backed by real CPUs all the time it can take a while for all CPUs to check-in. With the current definition of cpu_relax_yield a diagnose 0x44 is done which tells the hypervisor to schedule *some* other CPU. That can be any CPU and not necessarily one of the CPUs that need to run in order to advance the state machine. This can lead to a pretty bad diagnose 0x44 storm until the last missing CPU finally checked-in. Replace the undirected cpu_relax_yield based on diagnose 0x44 with a directed yield. Each CPU in the wait loop will pick up the next CPU in the cpumask of stop_machine. The diagnose 0x9c is used to tell the hypervisor to run this next CPU instead of the current one. If there is only a limited number of real CPUs backing the virtual CPUs we end up with the real CPUs passed around in a round-robin fashion. [heiko.carstens@de.ibm.com]: Use cpumask_next_wrap as suggested by Peter Zijlstra. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2019-05-02s390: simplify disabled_waitMartin Schwidefsky
The disabled_wait() function uses its argument as the PSW address when it stops the CPU with a wait PSW that is disabled for interrupts. The different callers sometimes use a specific number like 0xdeadbeef to indicate a specific failure, the early boot code uses 0 and some other calls sites use __builtin_return_address(0). At the time a dump is created the current PSW and the registers of a CPU are written to lowcore to make them avaiable to the dump analysis tool. For a CPU stopped with disabled_wait the PSW and the registers do not really make sense together, the PSW address does not point to the function the registers belong to. Simplify disabled_wait() by using _THIS_IP_ for the PSW address and drop the argument to the function. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02s390/unwind: introduce stack unwind APIMartin Schwidefsky
Rework the dump_trace() stack unwinder interface to support different unwinding algorithms. The new interface looks like this: struct unwind_state state; unwind_for_each_frame(&state, task, regs, start_stack) do_something(state.sp, state.ip, state.reliable); The unwind_bc.c file contains the implementation for the classic back-chain unwinder. One positive side effect of the new code is it now handles ftraced functions gracefully. It prints the real name of the return function instead of 'return_to_handler'. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-04-11s390: make __load_psw_mask work with clangArnd Bergmann
clang fails to use the %O and %R inline assembly modifiers the same way as gcc, leading to build failures with every use of __load_psw_mask(): /tmp/nmi-4a9f80.s: Assembler messages: /tmp/nmi-4a9f80.s:571: Error: junk at end of line: `+8(160(%r11))' /tmp/nmi-4a9f80.s:626: Error: junk at end of line: `+8(160(%r11))' Replace these with a more conventional way of passing the addresses that should work with both clang and gcc. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-04-11s390: fine-tune stack switch helperMartin Schwidefsky
The CALL_ON_STACK helper currently does not work with clang and for calls without arguments. It does not initialize r2 although the constraint is "+&d". Rework the CALL_FMT_x and the CALL_ON_STACK macros to work with clang and produce optimal code in all cases. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-05compiler: remove __no_sanitize_address_or_inline againMartin Schwidefsky
The __no_sanitize_address_or_inline and __no_kasan_or_inline defines are almost identical. The only difference is that __no_kasan_or_inline does not have the 'notrace' attribute. To be able to replace __no_sanitize_address_or_inline with the older definition, add 'notrace' to __no_kasan_or_inline and change to two users of __no_sanitize_address_or_inline in the s390 code. The 'notrace' option is necessary for e.g. the __load_psw_mask function in arch/s390/include/asm/processor.h. Without the option it is possible to trace __load_psw_mask which leads to kernel stack overflow. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Pointed-out-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31treewide: remove current_text_addrNick Desaulniers
Prefer _THIS_IP_ defined in linux/kernel.h. Most definitions of current_text_addr were the same as _THIS_IP_, but a few archs had inline assembly instead. This patch removes the final call site of current_text_addr, making all of the definitions dead code. [akpm@linux-foundation.org: fix arch/csky/include/asm/processor.h] Link: http://lkml.kernel.org/r/20180911182413.180715-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-09s390/kasan: reipl and kexec supportVasily Gorbik
Some functions from both arch/s390/kernel/ipl.c and arch/s390/kernel/machine_kexec.c are called without DAT enabled (or with and without DAT enabled code paths). There is no easy way to partially disable kasan for those files without a substantial rework. Disable kasan for both files for now. To avoid disabling kasan for arch/s390/kernel/diag.c DAT flag is enabled in diag308 call. pcpu_delegate which disables DAT is marked with __no_sanitize_address to disable instrumentation for that one function. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09s390/smp: kasan stack instrumentation supportVasily Gorbik
smp_start_secondary function is called without DAT enabled. To avoid disabling kasan instrumentation for entire arch/s390/kernel/smp.c smp_start_secondary has been split in 2 parts. smp_start_secondary has instrumentation disabled, it does minimal setup and enables DAT. Then instrumentated __smp_start_secondary is called to do the rest. __load_psw_mask function instrumentation has been disabled as well to be able to call it from smp_start_secondary. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09s390: unify stack size definitionsVasily Gorbik
Remove STACK_ORDER and STACK_SIZE in favour of identical THREAD_SIZE_ORDER and THREAD_SIZE definitions. THREAD_SIZE and THREAD_SIZE_ORDER naming is misleading since it is used as general kernel stack size information. But both those definitions are used in the common code and throughout architectures specific code, so changing the naming is problematic. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09s390: add support for virtually mapped kernel stacksMartin Schwidefsky
With virtually mapped kernel stacks the kernel stack overflow detection is now fault based, every stack has a guard page in the vmalloc space. The panic_stack is renamed to nodat_stack and is used for all function that need to run without DAT, e.g. memcpy_real or do_start_kdump. The main effect is a reduction in the kernel image size as with vmap stacks the old style overflow checking that adds two instructions per function is not needed anymore. Result from bloat-o-meter: add/remove: 20/1 grow/shrink: 13/26854 up/down: 2198/-216240 (-214042) In regard to performance the micro-benchmark for fork has a hit of a few microseconds, allocating 4 pages in vmalloc space is more expensive compare to an order-2 page allocation. But with real workload I could not find a noticeable difference. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-09s390: add stack switch helperMartin Schwidefsky
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-05s390: run user space and KVM guests with modified branch predictionMartin Schwidefsky
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary plumbing in entry.S to be able to run user space and KVM guests with limited branch prediction. To switch a user space process to limited branch prediction the s390_isolate_bp() function has to be call, and to run a vCPU of a KVM guest associated with the current task with limited branch prediction call s390_isolate_bp_guest(). Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-05s390: add options to change branch prediction behaviour for the kernelMartin Schwidefsky
Add the PPA instruction to the system entry and exit path to switch the kernel to a different branch prediction behaviour. The instructions are added via CPU alternatives and can be disabled with the "nospec" or the "nobp=0" kernel parameter. If the default behaviour selected with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be used to enable the changed kernel branch prediction. Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-14s390: correct some inline assembly constraintsVasily Gorbik
Inline assembly code changed in this patch should really use "Q" constraint "Memory reference without index register and with short displacement". The kernel does not compile with kasan support enabled otherwise (due to stack instrumentation). Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14s390: remove all code using the access register modeMartin Schwidefsky
The vdso code for the getcpu() and the clock_gettime() call use the access register mode to access the per-CPU vdso data page with the current code. An alternative to the complicated AR mode is to use the secondary space mode. This makes the vdso faster and quite a bit simpler. The downside is that the uaccess code has to be changed quite a bit. Which instructions are used depends on the machine and what kind of uaccess operation is requested. The instruction dictates which ASCE value needs to be loaded into %cr1 and %cr7. The different cases: * User copy with MVCOS for z10 and newer machines The MVCOS instruction can copy between the primary space (aka user) and the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already loaded in %cr1. * User copy with MVCP/MVCS for older machines To be able to execute the MVCP/MVCS instructions the kernel needs to switch to primary mode. The control register %cr1 has to be set to the kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent on set_fs(KERNEL_DS) vs set_fs(USER_DS). * Data access in the user address space for strnlen / futex To use "normal" instruction with data from the user address space the secondary space mode is used. The kernel needs to switch to primary mode, %cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the kernel ASCE, dependent on set_fs. To load a new value into %cr1 or %cr7 is an expensive operation, the kernel tries to be lazy about it. E.g. for multiple user copies in a row with MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is done only once. On return to user space a CPU bit is checked that loads the vdso ASCE again. To enable and disable the data access via the secondary space two new functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact that a context is in secondary space uaccess mode is stored in the mm_segment_t value for the task. The code of an interrupt may use set_fs as long as it returns to the previous state it got with get_fs with another call to set_fs. The code in finish_arch_post_lock_switch simply has to do a set_fs with the current mm_segment_t value for the task. For CPUs with MVCOS: CPU running in | %cr1 ASCE | %cr7 ASCE | --------------------------------------|-----------|-----------| user space | user | vdso | kernel, USER_DS, normal-mode | user | vdso | kernel, USER_DS, normal-mode, lazy | user | user | kernel, USER_DS, sacf-mode | kernel | user | kernel, KERNEL_DS, normal-mode | kernel | vdso | kernel, KERNEL_DS, normal-mode, lazy | kernel | kernel | kernel, KERNEL_DS, sacf-mode | kernel | kernel | For CPUs without MVCOS: CPU running in | %cr1 ASCE | %cr7 ASCE | --------------------------------------|-----------|-----------| user space | user | vdso | kernel, USER_DS, normal-mode | user | vdso | kernel, USER_DS, normal-mode lazy | kernel | user | kernel, USER_DS, sacf-mode | kernel | user | kernel, KERNEL_DS, normal-mode | kernel | vdso | kernel, KERNEL_DS, normal-mode, lazy | kernel | kernel | kernel, KERNEL_DS, sacf-mode | kernel | kernel | The lines with "lazy" refer to the state after a copy via the secondary space with a delayed reload of %cr1 and %cr7. There are three hardware address spaces that can cause a DAT exception, primary, secondary and home space. The exception can be related to four different fault types: user space fault, vdso fault, kernel fault, and the gmap faults. Dependent on the set_fs state and normal vs. sacf mode there are a number of fault combinations: 1) user address space fault via the primary ASCE 2) gmap address space fault via the primary ASCE 3) kernel address space fault via the primary ASCE for machines with MVCOS and set_fs(KERNEL_DS) 4) vdso address space faults via the secondary ASCE with an invalid address while running in secondary space in problem state 5) user address space fault via the secondary ASCE for user-copy based on the secondary space mode, e.g. futex_ops or strnlen_user 6) kernel address space fault via the secondary ASCE for user-copy with secondary space mode with set_fs(KERNEL_DS) 7) kernel address space fault via the primary ASCE for user-copy with secondary space mode with set_fs(USER_DS) on machines without MVCOS. 8) kernel address space fault via the home space ASCE Replace user_space_fault() with a new function get_fault_type() that can distinguish all four different fault types. With these changes the futex atomic ops from the kernel and the strnlen_user will get a little bit slower, as well as the old style uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as fast as before. On the positive side, the user space vdso code is a lot faster and Linux ceases to use the complicated AR mode. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: "Since Martin is on vacation you get the s390 pull request for the v4.15 merge window this time from me. Besides a lot of cleanups and bug fixes these are the most important changes: - a new regset for runtime instrumentation registers - hardware accelerated AES-GCM support for the aes_s390 module - support for the new CEX6S crypto cards - support for FORTIFY_SOURCE - addition of missing z13 and new z14 instructions to the in-kernel disassembler - generate opcode tables for the in-kernel disassembler out of a simple text file instead of having to manually maintain those tables - fast memset16, memset32 and memset64 implementations - removal of named saved segment support - hardware counter support for z14 - queued spinlocks and queued rwlocks implementations for s390 - use the stack_depth tracking feature for s390 BPF JIT - a new s390_sthyi system call which emulates the sthyi (store hypervisor information) instruction - removal of the old KVM virtio transport - an s390 specific CPU alternatives implementation which is used in the new spinlock code" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (88 commits) MAINTAINERS: add virtio-ccw.h to virtio/s390 section s390/noexec: execute kexec datamover without DAT s390: fix transactional execution control register handling s390/bpf: take advantage of stack_depth tracking s390: simplify transactional execution elf hwcap handling s390/zcrypt: Rework struct ap_qact_ap_info. s390/virtio: remove unused header file kvm_virtio.h s390: avoid undefined behaviour s390/disassembler: generate opcode tables from text file s390/disassembler: remove insn_to_mnemonic() s390/dasd: avoid calling do_gettimeofday() s390: vfio-ccw: Do not attempt to free no-op, test and tic cda. s390: remove named saved segment support s390/archrandom: Reconsider s390 arch random implementation s390/pci: do not require AIS facility s390/qdio: sanitize put_indicator s390/qdio: use atomic_cmpxchg s390/nmi: avoid using long-displacement facility s390: pass endianness info to sparse s390/decompressor: remove informational messages ...
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-28s390/topology: add detection of dedicated vs shared CPUsMartin Schwidefsky
The topology information returned by STSI 15.x.x contains a flag if the CPUs of a topology-list are dedicated or shared. Make this information available if the machine provides topology information. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28s390/guarded storage: simplify task exit handlingHeiko Carstens
Free data structures required for guarded storage from arch_release_task_struct(). This allows to simplify the code a bit, and also makes the semantics a bit easier: arch_release_task_struct() is never called from the task that is being removed. In addition this allows to get rid of exit_thread() in a later patch. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-09-28s390: convert release_thread() into a static inline functionHeiko Carstens
release_thread() is an empty function that gets called on every task exit. Move the function to a header file and force inlining of it, so that the compiler can optimize it away instead of generating a pointless function call. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "The bulk of the s390 patches for 4.13. Some new things but mostly bug fixes and cleanups. Noteworthy changes: - The SCM block driver is converted to blk-mq - Switch s390 to 5 level page tables. The virtual address space for a user space process can now have up to 16EB-4KB. - Introduce a ELF phdr flag for qemu to avoid the global vm.alloc_pgste which forces all processes to large page tables - A couple of PCI improvements to improve error recovery - Included is the merge of the base support for proper machine checks for KVM" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits) s390/dasd: Fix faulty ENODEV for RO sysfs attribute s390/pci: recognize name clashes with uids s390/pci: provide more debug information s390/pci: fix handling of PEC 306 s390/pci: improve pci hotplug s390/pci: introduce clp_get_state s390/pci: improve error handling during fmb (de)registration s390/pci: improve unreg_ioat error handling s390/pci: improve error handling during interrupt deregistration s390/pci: don't cleanup in arch_setup_msi_irqs KVM: s390: Backup the guest's machine check info s390/nmi: s390: New low level handling for machine check happening in guest s390/fpu: export save_fpu_regs for all configs s390/kvm: avoid global config of vm.alloc_pgste=1 s390: rename struct psw_bits members s390: rename psw_bits enums s390/mm: use correct address space when enabling DAT s390/cio: introduce io_subchannel_type s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPL s390/dumpstack: remove raw stack dump ...
2017-06-28arch: remove unused macro/function thread_saved_pc()Tobias Klauser
The only user of thread_saved_pc() in non-arch-specific code was removed in commit 8243d5597793 ("sched/core: Remove pointless printout in sched_show_task()"). Remove the implementations as well. Some architectures use thread_saved_pc() in their arch-specific code. Leave their thread_saved_pc() intact. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-28Merge tag 'nmiforkvm' of ↵Martin Schwidefsky
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into features Pull kvm patches from Christian Borntraeger: "s390,kvm: provide plumbing for machines checks when running guests" This provides the basic plumbing for handling machine checks when running guests
2017-06-27s390/nmi: s390: New low level handling for machine check happening in guestQingFeng Hao
Add the logic to check if the machine check happens when the guest is running. If yes, set the exit reason -EINTR in the machine check's interrupt handler. Refactor s390_do_machine_check to avoid panicing the host for some kinds of machine checks which happen when guest is running. Reinject the instruction processing damage's machine checks including Delayed Access Exception instead of damaging the host if it happens in the guest because it could be caused by improper update on TLB entry or other software case and impacts the guest only. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-12s390/mm: implement 5 level pages tablesMartin Schwidefsky
Add the logic to upgrade the page table for a 64-bit process to five levels. This increases the TASK_SIZE from 8PB to 16EB-4K. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-04-25s390/mm: make TASK_SIZE independent from the number of page table levelsMartin Schwidefsky
The TASK_SIZE for a process should be maximum possible size of the address space, 2GB for a 31-bit process and 8PB for a 64-bit process. The number of page table levels required for a given memory layout is a consequence of the mapped memory areas and their location. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-22s390: add a system call for guarded storageMartin Schwidefsky
This adds a new system call to enable the use of guarded storage for user space processes. The system call takes two arguments, a command and pointer to a guarded storage control block: s390_guarded_storage(int command, struct gs_cb *gs_cb); The second argument is relevant only for the GS_SET_BC_CB command. The commands in detail: 0 - GS_ENABLE Enable the guarded storage facility for the current task. The initial content of the guarded storage control block will be all zeros. After the enablement the user space code can use load-guarded-storage-controls instruction (LGSC) to load an arbitrary control block. While a task is enabled the kernel will save and restore the current content of the guarded storage registers on context switch. 1 - GS_DISABLE Disables the use of the guarded storage facility for the current task. The kernel will cease to save and restore the content of the guarded storage registers, the task specific content of these registers is lost. 2 - GS_SET_BC_CB Set a broadcast guarded storage control block. This is called per thread and stores a specific guarded storage control block in the task struct of the current task. This control block will be used for the broadcast event GS_BROADCAST. 3 - GS_CLEAR_BC_CB Clears the broadcast guarded storage control block. The guarded- storage control block is removed from the task struct that was established by GS_SET_BC_CB. 4 - GS_BROADCAST Sends a broadcast to all thread siblings of the current task. Every sibling that has established a broadcast guarded storage control block will load this control block and will be enabled for guarded storage. The broadcast guarded storage control block is used up, a second broadcast without a refresh of the stored control block with GS_SET_BC_CB will not have any effect. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-24s390: TASK_SIZE for kernel threadsMartin Schwidefsky
Return a sensible value if TASK_SIZE if called from a kernel thread. This gets us around an issue with copy_mount_options that does a magic size calculation "TASK_SIZE - (unsigned long)data" while in a kernel thread and data pointing to kernel space. Cc: <stable@vger.kernel.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-23s390: restore address space when returning to user spaceHeiko Carstens
Unbalanced set_fs usages (e.g. early exit from a function and a forgotten set_fs(USER_DS) call) may lead to a situation where the secondary asce is the kernel space asce when returning to user space. This would allow user space to modify kernel space at will. This would only be possible with the above mentioned kernel bug, however we can detect this and fix the secondary asce before returning to user space. Therefore a new TIF_ASCE_SECONDARY which is used within set_fs. When returning to user space check if TIF_ASCE_SECONDARY is set, which would indicate a bug. If it is set print a message to the console, fixup the secondary asce, and then return to user space. This is similar to what is being discussed for x86 and arm: "[RFC] syscalls: Restore address limit after a syscall". Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-23s390: rename CIF_ASCE to CIF_ASCE_PRIMARYHeiko Carstens
This is just a preparation patch in order to keep the "restore address space after syscall" patch small. Rename CIF_ASCE to CIF_ASCE_PRIMARY to be unique and specific when introducing a second CIF_ASCE_SECONDARY CIF flag. Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - New entropy generation for the pseudo random number generator. - Early boot printk output via sclp to help debug crashes on boot. This needs to be enabled with a kernel parameter. - Add proper no-execute support with a bit in the page table entry. - Bug fixes and cleanups. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (65 commits) s390/syscall: fix single stepped system calls s390/zcrypt: make ap_bus explicitly non-modular s390/zcrypt: Removed unneeded debug feature directory creation. s390: add missing "do {} while (0)" loop constructs to multiline macros s390/mm: add cond_resched call to kernel page table dumper s390: get rid of MACHINE_HAS_PFMF and MACHINE_HAS_HPAGE s390/mm: make memory_block_size_bytes available for !MEMORY_HOTPLUG s390: replace ACCESS_ONCE with READ_ONCE s390: Audit and remove any remaining unnecessary uses of module.h s390: mm: Audit and remove any unnecessary uses of module.h s390: kernel: Audit and remove any unnecessary uses of module.h s390/kdump: Use "LINUX" ELF note name instead of "CORE" s390: add no-execute support s390: report new vector facilities s390: use correct input data address for setup_randomness s390/sclp: get rid of common response code handling s390/sclp: don't add new lines to each printed string s390/sclp: make early sclp code readable s390/sclp: disable early sclp code as soon as the base sclp driver is active s390/sclp: move early printk code to drivers ...
2017-02-17s390: add missing "do {} while (0)" loop constructs to multiline macrosHeiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>