summaryrefslogtreecommitdiff
path: root/arch/s390/kernel/nmi.c
AgeCommit message (Collapse)Author
2024-06-18s390/nmi: Remove duplicate get_lowcore() callsSven Schnelle
Assign the output from get_lowcore() to a local variable, so the code is easier to read. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-06-18s390: Replace S390_lowcore by get_lowcore()Sven Schnelle
Replace all S390_lowcore usages in arch/s390/ by get_lowcore(). Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-04-17s390/irq,nmi: Include <asm/vtime.h> header directlyAlexander Gordeev
update_timer_sys() and update_timer_mcck() are inlines used for CPU time accounting from the interrupt and machine-check handlers. These routines are specific to s390 architecture, but included via <linux/vtime.h> header implicitly. Avoid the extra loop and include <asm/vtime.h> header directly. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/3fb696637c0eb7e9d6ffd6cbf9e647d7c5986b3d.1712760275.git.agordeev@linux.ibm.com
2024-02-16s390/fpu: move, rename, and merge header filesHeiko Carstens
Move, rename, and merge the fpu and vx header files. This way fpu header files have a consistent naming scheme (fpu*.h). Also get rid of the fpu subdirectory and move header files to asm directory, so that all fpu and vx header files can be found at the same location. Merge internal.h header file into other header files, since the internal helpers are used at many locations. so those helper functions are really not internal. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/nmi: remove register validation codeHeiko Carstens
Remove the historic machine check handler code which validates registers. Registers are automatically validated as part of the machine check handling sequence (see Principles of Operation, Machine-Check Handling chapter, Validation). Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-12s390/switch_to: use generic header fileHeiko Carstens
Move the switch_to() implementation to process.c and use the generic switch_to.h header file instead, like some other architectures. This addresses also the oddity that the old switch_to() implementation assigns the return value of __switch_to() to 'prev' instead of 'last', like it should. Remove also all includes of switch_to.h from C files, except process.c. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-12-11s390/fpu: get rid of MACHINE_HAS_VXHeiko Carstens
Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx() which is a short readable wrapper for "test_facility(129)". Facility bit 129 is set if the vector facility is present. test_facility() returns also true for all bits which are set in the architecture level set of the cpu that the kernel is compiled for. This means that test_facility(129) is a compile time constant which returns true for z13 and later, since the vector facility bit is part of the z13 kernel ALS. In result the compiled code will have less runtime checks, and less code. Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-11s390/nmi: implement and use local_mcck_save() / local_mcck_restore()Heiko Carstens
Instead of using local_mcck_disable() / local_mcck_enable() implement and use local_mcck_save() / local_mcck_restore() to disable machine checks, and restoring the previous state. The problem with using local_mcck_disable() / local_mcck_enable() is that there is an assumption that machine checks are always enabled. While this is currently the case the code still looks quite odd, readers need to double check if the code is correct. In order to increase readability save and then restore the old machine check mask bit, instead of assuming that it must have been enabled. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-09-19s390: use control register bit definesHeiko Carstens
Use control register bit defines instead of plain numbers where possible. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19s390/ctlreg: add struct ctlregHeiko Carstens
Add struct ctlreg to enforce strict type checking / usage for control register functions. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19s390/ctlreg: use local_ctl_load() and local_ctl_store() where possibleHeiko Carstens
Convert all single control register usages of __local_ctl_load() and __local_ctl_store() to local_ctl_load() and local_ctl_store(). This also requires to change the type of some struct lowcore members from __u64 to unsigned long. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19s390/ctlreg: add local and system prefix to some functionsHeiko Carstens
Add local and system prefix to some functions to clarify they change control register contents on either the local CPU or the on all CPUs. This results in the following API: Two defines which load and save multiple control registers. The defines correlate with the following C prototypes: void __local_ctl_load(unsigned long *, unsigned int cr_low, unsigned int cr_high); void __local_ctl_store(unsigned long *, unsigned int cr_low, unsigned int cr_high); Two functions which locally set or clear one bit for a specified control register: void local_ctl_set_bit(unsigned int cr, unsigned int bit); void local_ctl_clear_bit(unsigned int cr, unsigned int bit); Two functions which set or clear one bit for a specified control register on all CPUs: void system_ctl_set_bit(unsigned int cr, unsigned int bit); void system_ctl_clear_bit(unsigend int cr, unsigned int bit); Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19s390/ctlreg: rename ctl_reg.h to ctlreg.hHeiko Carstens
Rename ctl_reg.h to ctlreg.h so it matches not only ctlreg.c but also other control register related function, union, and structure names, which all come with a ctlreg prefix. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-02-28s390/mcck: cleanup user process termination pathAlexander Gordeev
If a machine check interrupt hits while user process is running __s390_handle_mcck() helper function is called directly from the interrupt handler and terminates the current process by calling make_task_dead() routine. The make_task_dead() is not allowed to be called from interrupt context which forces the machine check handler switch to the kernel stack and enable local interrupts first. The __s390_handle_mcck() could also be called to service pending work, but this time from the external interrupts handler. It is the machine check handler that establishes the work and schedules the external interrupt, therefore the machine check interrupt itself should be disabled while reading out the corresponding variable: local_mcck_disable(); mcck = *this_cpu_ptr(&cpu_mcck); memset(this_cpu_ptr(&cpu_mcck), 0, sizeof(mcck)); local_mcck_enable(); However, local_mcck_disable() does not have effect when __s390_handle_mcck() is called directly form the machine check handler, since the machine check interrupt is still disabled. Therefore, it is not the opening bracket to the following local_mcck_enable() function. Simplify the user process termination flow by scheduling the external interrupt and killing the affected process from the interrupt context. Assume a kernel-generated signal is always delivered and ignore a value returned by do_send_sig_info() funciton. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-27s390/nmi: fix virtual-physical address confusionNico Boehr
When a machine check is received while in SIE, it is reinjected into the guest in some cases. The respective code needs to access the sie_block, which is taken from the backed up R14. Since reinjection only occurs while we are in SIE (i.e. between the labels sie_entry and sie_leave in entry.S and thus if CIF_MCCK_GUEST is set), the backed up R14 will always contain a physical address in s390_backup_mcck_info. This currently works, because virtual and physical addresses are the same. Add phys_to_virt() to resolve the virtual-physical confusion. Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20230216121208.4390-2-nrb@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-12-06s390/nmi: get rid of private slab cacheHeiko Carstens
Get rid of private "nmi_save_areas" slab cache. The only reason this was introduced years ago was that with some slab debugging options allocations would only guarantee a minimum alignment of ARCH_KMALLOC_MINALIGN, which was eight bytes back then. This is not sufficient for the extended machine check save area. However since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") kmalloc guarantees a power-of-two alignment even with debugging options enabled. Therefore the private slab cache can be removed. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-12-06s390/nmi: move storage error checking back to C, enter with DAT onHeiko Carstens
Checking for storage errors in machine check entry code was done in order to handle also storage errors on kernel page tables. However this is extremely unlikely and some basic assumptions what works on machine check entry are necessary anyway. In order to simplify machine check handling delay checking for storage errors to C code. With this also change the machine check new PSW to have DAT on, which simplifies the entry code even further. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-12-06s390/nmi: print machine check interruption code before stopping systemHeiko Carstens
In case a system will be stopped because of e.g. missing validity bits print the machine check interruption code before the system is stopped. This is helpful, since up to now no message was printed in such a case. Only a disabled wait PSW was loaded, which doesn't give a hint of what went wrong. Improve this by printing a message with debug information. Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-12-06s390/nmi: rework register validation handlingHeiko Carstens
If a machine check happens in kernel mode, and the machine check interruption code indicates that e.g. vector register contents in the machine check area are not valid, the logic is to kill current. The idea behind this was that if within kernel context vector registers are not used then it is sufficient to kill the current user space process to avoid that it continues with potentially corrupt register contents. This however does not necessarily work, since the current code does not take into account that a machine check can also happen when a kernel thread is running (= no user space context), and in addition there is no way to distinguish between the "previous" and "next" user process task, if the machine check happens when a task switch happens. Given that machine checks with invalid saved register contents in the machine check save area are extremely rare, simplify the logic: if register contents are invalid and the previous context was kernel mode, stop the whole machine. If the previous context was user mode, kill the corresponding task. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-12-06s390/nmi: use vector instruction macros instead of byte patternsHeiko Carstens
Use vector instruction macros instead of byte patterns to increase readability. The generated code is nearly identical: - 1e8: e7 0f 10 00 00 36 vlm %v0,%v15,0(%r1) - 1ee: e7 0f 11 00 0c 36 vlm %v16,%v31,256(%r1) + 1e8: e7 0f 10 00 30 36 vlm %v0,%v15,0(%r1),3 + 1ee: e7 0f 11 00 3c 36 vlm %v16,%v31,256(%r1),3 By using the VLM macro the alignment hint is automatically specified too. Even though from a performance perspective it doesn't matter at all for the machine check code, this shows yet another benefit when using the macros. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-09-07s390/boot: fix absolute zero lowcore corruption on bootAlexander Gordeev
Crash dump always starts on CPU0. In case CPU0 is offline the prefix page is not installed and the absolute zero lowcore is used. However, struct lowcore::mcesad is never assigned and stays zero. That leads to __machine_kdump() -> save_vx_regs() call silently stores vector registers to the absolute lowcore at 0x11b0 offset. Fixes: a62bc0739253 ("s390/kdump: add support for vector extension") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-07-28s390/nmi: use irqentry_nmi_enter()/irqentry_nmi_exit()Sven Schnelle
With generic entry in place switch the nmi handler to use the generic entry helper functions. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-05-09s390/pai: add support for cryptography countersThomas Richter
PMU device driver perf_pai_crypto supports Processor Activity Instrumentation (PAI), available with IBM z16: - maps a full page to lowcore address 0x1500. - uses CR0 bit 13 to turn PAI crypto counting on and off. - creates a sample with raw data on each context switch out when at context switch some mapped counters have a value of nonzero. This device driver only supports CPU wide context, no task context is allowed. Support for counting: - one or more counters can be specified using perf stat -e pai_crypto/xxx/ where xxx stands for the counter event name. Multiple invocation of this command is possible. The counter names are listed in /sys/devices/pai_crypto/events directory. - one special counters can be specified using perf stat -e pai_crypto/CRYPTO_ALL/ which returns the sum of all incremented crypto counters. - one event pai_crypto/CRYPTO_ALL/ is reserved for sampling. No multiple invocations are possible. The event collects data at context switch out and saves them in the ring buffer. Add qpaci assembly instruction to query supported memory mapped crypto counters. It returns the number of counters (no holes allowed in that range). The PAI crypto counter events are system wide and can not be executed in parallel. Therefore some restrictions documented in function paicrypt_busy apply. In particular event CRYPTO_ALL for sampling must run exclusive. Only counting events can run in parallel. PAI crypto counter events can not be created when a CPU hot plug add is processed. This means a CPU hot plug add does not get the necessary PAI event to record PAI cryptography counter increments on the newly added CPU. CPU hot plug remove removes the event and terminates the counting of PAI counters immediately. Co-developed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20220504062351.2954280-3-tmricht@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-02-06s390: remove invalid email address of Heiko CarstensHeiko Carstens
Remove my old invalid email address which can be found in a couple of files. Instead of updating it, just remove my contact data completely from source files. We have git and other tools which allow to figure out who is responsible for what with recent contact data. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-01-23s390/nmi: handle vector validity failures for KVM guestsChristian Borntraeger
The machine check validity bit tells about the context. If a KVM guest was running the bit tells about the guest validity and the host state is not affected. As a guest can disable the guest validity this might result in unwanted host errors on machine checks. Cc: stable@vger.kernel.org Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-01-23s390/nmi: handle guarded storage validity failures for KVM guestsChristian Borntraeger
machine check validity bits reflect the state of the machine check. If a guest does not make use of guarded storage, the validity bit might be off. We can not use the host CR bit to decide if the validity bit must be on. So ignore "invalid" guarded storage controls for KVM guests in the host and rely on the machine check being forwarded to the guest. If no other errors happen from a host perspective everything is fine and no process must be killed and the host can continue to run. Cc: stable@vger.kernel.org Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Reported-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Tested-by: Carsten Otte <cotte@de.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-01-17Merge branch 'signal-for-v5.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal/exit/ptrace updates from Eric Biederman: "This set of changes deletes some dead code, makes a lot of cleanups which hopefully make the code easier to follow, and fixes bugs found along the way. The end-game which I have not yet reached yet is for fatal signals that generate coredumps to be short-circuit deliverable from complete_signal, for force_siginfo_to_task not to require changing userspace configured signal delivery state, and for the ptrace stops to always happen in locations where we can guarantee on all architectures that the all of the registers are saved and available on the stack. Removal of profile_task_ext, profile_munmap, and profile_handoff_task are the big successes for dead code removal this round. A bunch of small bug fixes are included, as most of the issues reported were small enough that they would not affect bisection so I simply added the fixes and did not fold the fixes into the changes they were fixing. There was a bug that broke coredumps piped to systemd-coredump. I dropped the change that caused that bug and replaced it entirely with something much more restrained. Unfortunately that required some rebasing. Some successes after this set of changes: There are few enough calls to do_exit to audit in a reasonable amount of time. The lifetime of struct kthread now matches the lifetime of struct task, and the pointer to struct kthread is no longer stored in set_child_tid. The flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is removed. Issues where task->exit_code was examined with signal->group_exit_code should been examined were fixed. There are several loosely related changes included because I am cleaning up and if I don't include them they will probably get lost. The original postings of these changes can be found at: https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org I trimmed back the last set of changes to only the obviously correct once. Simply because there was less time for review than I had hoped" * 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits) ptrace/m68k: Stop open coding ptrace_report_syscall ptrace: Remove unused regs argument from ptrace_report_syscall ptrace: Remove second setting of PT_SEIZED in ptrace_attach taskstats: Cleanup the use of task->exit_code exit: Use the correct exit_code in /proc/<pid>/stat exit: Fix the exit_code for wait_task_zombie exit: Coredumps reach do_group_exit exit: Remove profile_handoff_task exit: Remove profile_task_exit & profile_munmap signal: clean up kernel-doc comments signal: Remove the helper signal_group_exit signal: Rename group_exit_task group_exec_task coredump: Stop setting signal->group_exit_task signal: Remove SIGNAL_GROUP_COREDUMP signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process signal: Make coredump handling explicit in complete_signal signal: Have prepare_signal detect coredumps using signal->core_state signal: Have the oom killer detect coredumps using signal->core_state exit: Move force_uaccess back into do_exit exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit ...
2021-12-16s390/nmi: disable interrupts on extended save area updateAlexander Gordeev
Updating of the pointer to machine check extended save area on the IPL CPU needs the lowcore protection to be disabled. Disable interrupts while the protection is off to avoid unnoticed writes to the lowcore. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-12-13exit: Add and use make_task_dead.Eric W. Biederman
There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-06s390/nmi: add missing __pa/__va address conversion of extended save areaHeiko Carstens
Add missing __pa/__va address conversion of machine check extended save area designation, which is an absolute address. Note: this currently doesn't fix a real bug, since virtual addresses are indentical to physical ones. Reported-by: Vineeth Vijayan <vneethv@linux.ibm.com> Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-07-05s390/mcck: move register validation to C codeAlexander Gordeev
This update partially reverts commit 3037a52f9846 ("s390/nmi: do register validation as early as possible"). Storage error checks and control registers validation are left in the assembler code, since correct ASCEs and page tables are required to enable DAT - which is done before the C handler is entered. System damage, kernel instruction address and PSW MWP checks are left in the assembler code as well, since there is no way to proceed if one of these checks is failed. The getcpu vdso syscall reads CPU number from the programmable field of the TOD clock. Disregard the TOD programmable register validity bit and load the CPU number into the TOD programmable field unconditionally. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05s390/mcck: move storage error checks to assemblerAlexander Gordeev
The current storage errors tackling is wrong - the DAT is enabled in assembler code before the actual storage checks in C half are executed. In case the page tables themselves are damaged such approach is not going to work. With this update unrecoverable storage errors are not passed to C code for handling, but rather the machine is stopped right away. The only exception to this flow is when a machine check occurred in KVM guest - in this case the errors are reinjected by the handler. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-05s390/mcck: always enter C handler with DAT enabledAlexander Gordeev
The machine check handler must be entered with DAT disabled in case control registers are corrupted or a storage error happened and we can not tell if such error corresponds to a page table. Both of described conditions end up in stopping all CPUs and entering the disabled wait in C half of the handler. However, the storage errors are still checked after the DAT is enabled and C code is entered. In case a page table is damaged such flow is not expected to work. This update paves the way for moving the storage error checks from C to assembler half. All fatal errors that can only be handled with DAT disabled are handled in assembler half also. As result, the C half is only entered if the DAT is secured. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19s390: convert to generic entrySven Schnelle
This patch converts s390 to use the generic entry infrastructure from kernel/entry/*. There are a few special things on s390: - PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't know about our PIF flags in exit_to_user_mode_loop(). - The old code had several ways to restart syscalls: a) PIF_SYSCALL_RESTART, which was only set during execve to force a restart after upgrading a process (usually qemu-kvm) to pgste page table extensions. b) PIF_SYSCALL, which is set by do_signal() to indicate that the current syscall should be restarted. This is changed so that do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use PIF_SYSCALL doesn't work with the generic code, and changing it to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART more unique. - On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by executing a svc instruction on the process stack which causes a fault. While handling that fault the fault code sets PIF_SYSCALL to hand over processing to the syscall code on exit to usermode. The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets a return value for a syscall. The s390x ptrace ABI uses r2 both for the syscall number and return value, so ptrace cannot set the syscall number + return value at the same time. The flag makes handling that a bit easier. do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET is set. CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY. CR1/7/13 will be checked both on kernel entry and exit to contain the correct asces. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-28s390: remove critical section cleanup from entry.SSven Schnelle
The current code is rather complex and caused a lot of subtle and hard to debug bugs in the past. Simplify the code by calling the system_call handler with interrupts disabled, save machine state, and re-enable them later. This requires significant changes to the machine check handling code as well. When the machine check interrupt arrived while being in kernel mode the new code will signal pending machine checks with a SIGP external call. When userspace was interrupted, the handler will switch to the kernel stack and directly execute s390_handle_mcck(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-05-02s390: simplify disabled_waitMartin Schwidefsky
The disabled_wait() function uses its argument as the PSW address when it stops the CPU with a wait PSW that is disabled for interrupts. The different callers sometimes use a specific number like 0xdeadbeef to indicate a specific failure, the early boot code uses 0 and some other calls sites use __builtin_return_address(0). At the time a dump is created the current PSW and the registers of a CPU are written to lowcore to make them avaiable to the dump analysis tool. For a CPU stopped with disabled_wait the PSW and the registers do not really make sense together, the PSW address does not point to the function the registers belong to. Simplify disabled_wait() by using _THIS_IP_ for the PSW address and drop the argument to the function. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-05headers: untangle kmemleak.h from mm.hRandy Dunlap
Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious reason. It looks like it's only a convenience, so remove kmemleak.h from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that don't already #include it. Also remove <linux/kmemleak.h> from source files that do not use it. This is tested on i386 allmodconfig and x86_64 allmodconfig. It would be good to run it through the 0day bot for other $ARCHes. I have neither the horsepower nor the storage space for the other $ARCHes. Update: This patch has been extensively build-tested by both the 0day bot & kisskb/ozlabs build farms. Both of them reported 2 build failures for which patches are included here (in v2). [ slab.h is the second most used header file after module.h; kernel.h is right there with slab.h. There could be some minor error in the counting due to some #includes having comments after them and I didn't combine all of those. ] [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr] Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org Link: http://kisskb.ellerman.id.au/kisskb/head/13396/ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures] Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-24s390: kernel: add SPDX identifiers to the remaining filesGreg Kroah-Hartman
It's good to have SPDX identifiers in all files to make it easier to audit the kernel tree for correct licenses. Update the arch/s390/kernel/ files with the correct SPDX license identifier based on the license text in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This work is based on a script and data from Thomas Gleixner, Philippe Ombredanne, and Kate Stewart. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-14s390/nmi: remove unused codeHeiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-10-19s390/nmi: do register validation as early as possibleMartin Schwidefsky
The validation of the CPU registers in the machine check handler is currently split into two parts. The first part is done at the start of the low level mcck_int_handler function, this includes the CPU timer register and the general purpose registers. The second part is done a bit later in s390_do_machine_check for all the other registers, including the control registers, floating pointer control, vector or floating pointer registers, the access registers, the guarded storage registers, the TOD programmable registers and the clock comparator. This is working fine to far but in theory a future extensions could cause the C code to use registers that are not validated yet. A better approach is to validate all CPU registers in "safe" assembler code before any C function is called. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-19s390/nmi: allocation of the extended save areaMartin Schwidefsky
The machine check extended save area is needed to store the vector registers and the guarded storage control block when a CPU is interrupted by a machine check. Move the slab cache allocation of the full save area to nmi.c, for early boot use a static __initdata block. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-19s390/ctl_reg: use decoding unions in update_cr_regsMartin Schwidefsky
Add a decoding union for the bits in control registers 2 and use 'union ctlreg0' and 'union ctlreg2' in update_cr_regs to improve readability. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-10-19s390/nmi: use smp_emergency_stop instead of smp_send_stopMartin Schwidefsky
The smp_send_stop() function can be called from s390_handle_damage while DAT is off. This happens if a machine check indicates that kernel gprs or control registers can not be restored. The function smp_send_stop reenables DAT via __load_psw_mask. That should work for the case of lost kernel gprs and the system will do the expected stop of all CPUs. But if control registers are lost, in particular CR13 with the home space ASCE, interesting secondary crashes may occur. Make smp_emergency_stop callable from nmi.c and remove the cpumask argument. Replace the smp_send_stop call with smp_emergency_stop in the s390_handle_damage function. In addition add notrace and NOKPROBE_SYMBOL annotations for all functions required for the emergency shutdown. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-27KVM: s390: Backup the guest's machine check infoQingFeng Hao
When a machine check happens in the guest, related mcck info (mcic, external damage code, ...) is stored in the vcpu's lowcore on the host. Then the machine check handler's low-level part is executed, followed by the high-level part. If the high-level part's execution is interrupted by a new machine check happening on the same vcpu on the host, the mcck info in the lowcore is overwritten with the new machine check's data. If the high-level part's execution is scheduled to a different cpu, the mcck info in the lowcore is uncertain. Therefore, for both cases, the further reinjection to the guest will use the wrong data. Let's backup the mcck info in the lowcore to the sie page for further reinjection, so that the right data will be used. Add new member into struct sie_page to store related machine check's info of mcic, failing storage address and external damage code. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-27s390/nmi: s390: New low level handling for machine check happening in guestQingFeng Hao
Add the logic to check if the machine check happens when the guest is running. If yes, set the exit reason -EINTR in the machine check's interrupt handler. Refactor s390_do_machine_check to avoid panicing the host for some kinds of machine checks which happen when guest is running. Reinject the instruction processing damage's machine checks including Delayed Access Exception instead of damaging the host if it happens in the guest because it could be caused by improper update on TLB entry or other software case and impacts the guest only. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-03-22s390: add a system call for guarded storageMartin Schwidefsky
This adds a new system call to enable the use of guarded storage for user space processes. The system call takes two arguments, a command and pointer to a guarded storage control block: s390_guarded_storage(int command, struct gs_cb *gs_cb); The second argument is relevant only for the GS_SET_BC_CB command. The commands in detail: 0 - GS_ENABLE Enable the guarded storage facility for the current task. The initial content of the guarded storage control block will be all zeros. After the enablement the user space code can use load-guarded-storage-controls instruction (LGSC) to load an arbitrary control block. While a task is enabled the kernel will save and restore the current content of the guarded storage registers on context switch. 1 - GS_DISABLE Disables the use of the guarded storage facility for the current task. The kernel will cease to save and restore the content of the guarded storage registers, the task specific content of these registers is lost. 2 - GS_SET_BC_CB Set a broadcast guarded storage control block. This is called per thread and stores a specific guarded storage control block in the task struct of the current task. This control block will be used for the broadcast event GS_BROADCAST. 3 - GS_CLEAR_BC_CB Clears the broadcast guarded storage control block. The guarded- storage control block is removed from the task struct that was established by GS_SET_BC_CB. 4 - GS_BROADCAST Sends a broadcast to all thread siblings of the current task. Every sibling that has established a broadcast guarded storage control block will load this control block and will be enabled for guarded storage. The broadcast guarded storage control block is used up, a second broadcast without a refresh of the stored control block with GS_SET_BC_CB will not have any effect. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar
<linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-23s390/nmi: purge tlbs after control register validationHeiko Carstens
Play safe and purge all tlbs after the control registers that contain the primary, secondary and home space asces have been validated. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-23s390/nmi: fix order of register validationHeiko Carstens
When validating register contents first validate control registers since these control the availability of features later being validated. For example the control register 0 should be validated first, before the additional floating point (AFP) registers are validated, since control register 0 contains the AFP-register control bit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-17s390: kernel: Audit and remove any unnecessary uses of module.hPaul Gortmaker
Historically a lot of these existed because we did not have a distinction between what was modular code and what was providing support to modules via EXPORT_SYMBOL and friends. That changed when we forked out support for the latter into the export.h file. This means we should be able to reduce the usage of module.h in code that is obj-y Makefile or bool Kconfig. The advantage in doing so is that module.h itself sources about 15 other headers; adding significantly to what we feed cpp, and it can obscure what headers we are effectively using. Since module.h was the source for init.h (for __init) and for export.h (for EXPORT_SYMBOL) we consider each change instance for the presence of either and replace as needed. Build testing revealed some implicit header usage that was fixed up accordingly. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>