summaryrefslogtreecommitdiff
path: root/arch/s390/kernel
AgeCommit message (Collapse)Author
2020-04-04Merge tag 's390-5.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Update maintainers. Niklas Schnelle takes over zpci and Vineeth Vijayan common io code. - Extend cpuinfo to include topology information. - Add new extended counters for IBM z15 and sampling buffer allocation rework in perf code. - Add control over zeroing out memory during system restart. - CCA protected key block version 2 support and other fixes/improvements in crypto code. - Convert to new fallthrough; annotations. - Replace zero-length arrays with flexible-arrays. - QDIO debugfs and other small improvements. - Drop 2-level paging support optimization for compat tasks. Varios mm cleanups. - Remove broken and unused hibernate / power management support. - Remove fake numa support which does not bring any benefits. - Exclude offline CPUs from CPU topology masks to be more consistent with other architectures. - Prevent last branching instruction address leaking to userspace. - Other small various fixes and improvements all over the code. * tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (57 commits) s390/mm: cleanup init_new_context() callback s390/mm: cleanup virtual memory constants usage s390/mm: remove page table downgrade support s390/qdio: set qdio_irq->cdev at allocation time s390/qdio: remove unused function declarations s390/ccwgroup: remove pm support s390/ap: remove power management code from ap bus and drivers s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc s390/mm: cleanup arch_get_unmapped_area() and friends s390/ism: remove pm support s390/cio: use fallthrough; s390/vfio: use fallthrough; s390/zcrypt: use fallthrough; s390: use fallthrough; s390/cpum_sf: Fix wrong page count in error message s390/diag: fix display of diagnose call statistics s390/ap: Remove ap device suspend and resume callbacks s390/pci: Improve handling of unset UID s390/pci: Fix zpci_alloc_domain() over allocation s390/qdio: pass ISC as parameter to chsc_sadc() ...
2020-04-03Merge tag 'spdx-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments
2020-04-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - GICv4.1 support - 32bit host removal PPC: - secure (encrypted) using under the Protected Execution Framework ultravisor s390: - allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: - New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. - Initial work on making nested SVM event injection more similar to VMX, and less buggy. - Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". - A large refactoring of CPUID features, which now use an array that parallels the core x86_features. - Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. - New Tigerlake CPUID features. - More bugfixes, optimizations and cleanups. Generic: - selftests: cleanups, new MMU notifier stress test, steal-time test - CSV output for kvm_stat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits) x86/kvm: fix a missing-prototypes "vmread_error" KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y KVM: VMX: Add a trampoline to fix VMREAD error handling KVM: SVM: Annotate svm_x86_ops as __initdata KVM: VMX: Annotate vmx_x86_ops as __initdata KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes KVM: VMX: Configure runtime hooks using vmx_x86_ops KVM: VMX: Move hardware_setup() definition below vmx_x86_ops KVM: x86: Move init-only kvm_x86_ops to separate struct KVM: Pass kvm_init()'s opaque param to additional arch funcs s390/gmap: return proper error code on ksm unsharing KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move() KVM: Fix out of range accesses to memslots KVM: X86: Micro-optimize IPI fastpath delay KVM: X86: Delay read msr data iff writes ICR MSR KVM: PPC: Book3S HV: Add a capability for enabling secure guests KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs ...
2020-03-25s390: use fallthrough;Joe Perches
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-25s390/cpum_sf: Fix wrong page count in error messageThomas Richter
When perf record -e SF_CYCLES_BASIC_DIAG runs with very high frequency, the samples arrive faster than the perf process can save them to file. Eventually, for longer running processes, this leads to the siutation where the trace buffers allocated by perf slowly fills up. At one point the auxiliary trace buffer is full and the CPU Measurement sampling facility is turned off. Furthermore a warning is printed to the kernel log buffer: cpum_sf: The AUX buffer with 0 pages for the diagnostic-sampling mode is full The number of allocated pages for the auxiliary trace buffer is shown as zero pages. That is wrong. Fix this by saving the number of allocated pages before entering the work loop in the interrupt handler. When the interrupt handler processes the samples, it may detect the buffer full condition and stop sampling, reducing the buffer size to zero. Print the correct value in the error message: cpum_sf: The AUX buffer with 256 pages for the diagnostic-sampling mode is full Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-25s390/diag: fix display of diagnose call statisticsMichael Mueller
Show the full diag statistic table and not just parts of it. The issue surfaced in a KVM guest with a number of vcpus defined smaller than NR_DIAG_STAT. Fixes: 1ec2772e0c3c ("s390/diag: add a statistic for diagnose calls") Cc: stable@vger.kernel.org Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-25.gitignore: add SPDX License IdentifierMasahiro Yamada
Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-23s390: remove broken hibernate / power management supportHeiko Carstens
Hibernation is known to be broken for many years on s390. Given that there aren't any real use cases, remove the code instead of spending time to fix and maintain it. Without hibernate support it doesn't make too much sense to keep power management support; therefore remove it completely. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/cpuinfo: do not skip info for CPUs without MHz featureAlexander Gordeev
In the past there were no per-CPU information in /proc/cpuinfo other than CPU frequency. Hence, for machines without CPU MHz feature there were nothing to show. Now CPU topology and IDs still could be shown, so do not skip this information from the output. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> [heiko.carstens@de.ibm.com: moved comparison] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/cpuinfo: fix wrong output when CPU0 is offlineAlexander Gordeev
/proc/cpuinfo should not print information about CPU 0 when it is offline. Fixes: 281eaa8cb67c ("s390/cpuinfo: simplify locking and skip offline cpus early") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> [heiko.carstens@de.ibm.com: shortened commit message] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/cpuinfo: show number of online CPUs within a packageAlexander Gordeev
Show number of online CPUs within a package (which is the socket in case of s390). For what it worth, present that value as "siblings" field - just like x86 does. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/cpuinfo: show number of online coresAlexander Gordeev
Show number of cores that run at least one SMT thread Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/ipl: add support to control memory clearing for FCP and CCW re-IPLGerald Schaefer
Re-IPL for both CCW and FCP is currently done by using diag 308 with the "Load Clear" subcode, which means that all memory will be cleared. This can increase re-IPL duration considerably on very large machines. For CCW devices, there is also a "Load Normal" subcode that was only used for dump kernels so far. For FCP devices, a similar "Load Normal" subcode was introduced with z14. The "Load Normal" diag 308 subcode allows to re-IPL without clearing memory. This patch adds a new "clear" sysfs attribute to /sys/firmware/reipl for both the ccw and fcp subdirectories, which can be set to either "0" or "1" to disable or enable re-IPL with memory clearing. The default value is "0", which disables memory clearing. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/topology: remove offline CPUs from CPU topology masksAlexander Gordeev
The CPU topology masks on s390 contain also bits of CPUs which are offline. Currently this is already a problem, since common code scheduler expects e.g. cpu_smt_mask() to reflect reality. This update changes the described behaviour and s390 starts to behave like all other architectures. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/numa: remove redundant cpus_with_topology variableAlexander Gordeev
Variable cpus_with_topology is a leftover that became unneeded once the fake NUMA support has been removed. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/cpuinfo: show processor physical addressAlexander Gordeev
Show CPU physical address as reported by STAP instruction Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-12ima: add a new CONFIG for loading arch-specific policiesNayna Jain
Every time a new architecture defines the IMA architecture specific functions - arch_ima_get_secureboot() and arch_ima_get_policy(), the IMA include file needs to be updated. To avoid this "noise", this patch defines a new IMA Kconfig IMA_SECURE_AND_OR_TRUSTED_BOOT option, allowing the different architectures to select it. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Philipp Rudo <prudo@linux.ibm.com> (s390) Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2020-03-11s390/traps: mark test_monitor_call __initHeiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-11s390/irq: make init_ext_interrupts staticHeiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-11s390/irq: replace setup_irq() by request_irq()afzal mohammed
request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready. Per tglx[1], setup_irq() existed in olden days when allocators were not ready by the time early interrupts were initialized. Hence replace setup_irq() by request_irq(). [1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos Signed-off-by: afzal mohammed <afzal.mohd.ma@gmail.com> Message-Id: <20200304005049.5291-1-afzal.mohd.ma@gmail.com> [heiko.carstens@de.ibm.com: replace pr_err with panic] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-10s390/cpuinfo: add system topology informationAlexander Gordeev
This update adjusts /proc/cpuinfo format to meet some user level programs expectations. It also makes the layout consistent with x86 where CPU topology is presented as blocks of key-value pairs. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-10s390: prevent leaking kernel address in BEARSven Schnelle
When userspace executes a syscall or gets interrupted, BEAR contains a kernel address when returning to userspace. This make it pretty easy to figure out where the kernel is mapped even with KASLR enabled. To fix this, add lpswe to lowcore and always execute it there, so userspace sees only the lowcore address of lpswe. For this we have to extend both critical_cleanup and the SWITCH_ASYNC macro to also check for lpswe addresses in lowcore. Fixes: b2d24b97b2a9 ("s390/kernel: add support for kernel address space layout randomization (KASLR)") Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-10s390/cpum_cf: Add new extended counters for IBM z15Thomas Richter
Add CPU measurement counter facility event description for IBM z15. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-02-27s390/protvirt: Add sysfs firmware interface for Ultravisor informationJanosch Frank
That information, e.g. the maximum number of guests or installed Ultravisor facilities, is interesting for QEMU, Libvirt and administrators. Let's provide an easily parsable API to get that information. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27s390/mm: add (non)secure page access exceptions handlersVasily Gorbik
Add exceptions handlers performing transparent transition of non-secure pages to secure (import) upon guest access and secure pages to non-secure (export) upon hypervisor access. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> [frankja@linux.ibm.com: adding checks for failures] Signed-off-by: Janosch Frank <frankja@linux.ibm.com> [imbrenda@linux.ibm.com: adding a check for gmap fault] Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27s390/mm: provide memory management functions for protected KVM guestsClaudio Imbrenda
This provides the basic ultravisor calls and page table handling to cope with secure guests: - provide arch_make_page_accessible - make pages accessible after unmapping of secure guests - provide the ultravisor commands convert to/from secure - provide the ultravisor commands pin/unpin shared - provide callbacks to make pages secure (inacccessible) - we check for the expected pin count to only make pages secure if the host is not accessing them - we fence hugetlbfs for secure pages - add missing radix-tree include into gmap.h The basic idea is that a page can have 3 states: secure, normal or shared. The hypervisor can call into a firmware function called ultravisor that allows to change the state of a page: convert from/to secure. The convert from secure will encrypt the page and make it available to the host and host I/O. The convert to secure will remove the host capability to access this page. The design is that on convert to secure we will wait until writeback and page refs are indicating no host usage. At the same time the convert from secure (export to host) will be called in common code when the refcount or the writeback bit is already set. This avoids races between convert from and to secure. Then there is also the concept of shared pages. Those are kind of secure where the host can still access those pages. We need to be notified when the guest "unshares" such a page, basically doing a convert to secure by then. There is a call "pin shared page" that we use instead of convert from secure when possible. We do use PG_arch_1 as an optimization to minimize the convert from secure/pin shared. Several comments have been added in the code to explain the logic in the relevant places. Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27s390/protvirt: add ultravisor initializationVasily Gorbik
Before being able to host protected virtual machines, donate some of the memory to the ultravisor. Besides that the ultravisor might impose addressing limitations for memory used to back protected VM storage. Treat that limit as protected virtualization host's virtual memory limit. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27s390/protvirt: introduce host side setupVasily Gorbik
Add "prot_virt" command line option which controls if the kernel protected VMs support is enabled at early boot time. This has to be done early, because it needs large amounts of memory and will disable some features like STP time sync for the lpar. Extend ultravisor info definitions and expose it via uv_info struct filled in during startup. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27s390/mm: remove fake numa supportHeiko Carstens
It turned out that fake numa support is rather useless on s390, since there are no scenarios where there is any performance or other benefit when used. However it does provide maintenance cost and breaks from time to time. Therefore remove it. CONFIG_NUMA is still supported with a very small backend and only one node. This way userspace applications which require NUMA interfaces continue to work. Note that NODES_SHIFT is set to 1 (= 2 nodes) instead of 0 (= 1 node), since there is quite a bit of kernel code which assumes that more than one node is possible if CONFIG_NUMA is enabled. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-02-17s390/cpum_sf: Rework sampling buffer allocationThomas Richter
Adjust sampling buffer allocation depending on frequency and correct comments. Investigation on the interrupt handler revealed that almost always one interupt services one SDB, even when running with the maximum frequency of 100000. Very rarely there have been 2 SBD serviced per interrupt. Therefore reduce the number of SBD per CPU. Each SDB is one page in size. The new formula results in freq:4000 n_sdb:32 new:16 freq:10000 n_sdb:80 new:16 freq:20000 n_sdb:159 new:17 freq:40000 n_sdb:318 new:19 freq:50000 n_sdb:397 new:20 freq:62500 n_sdb:497 new:22 freq:83333 n_sdb:662 new:24 freq:100000 n_sdb:794 new:25 Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-02-05Merge tag 's390-5.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: "The second round of s390 fixes and features for 5.6: - Add KPROBES_ON_FTRACE support - Add EP11 AES secure keys support - PAES rework and prerequisites for paes-s390 ciphers selftests - Fix page table upgrade for hugetlbfs" * tag 's390-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pkey/zcrypt: Support EP11 AES secure keys s390/zcrypt: extend EP11 card and queue sysfs attributes s390/zcrypt: add new low level ep11 functions support file s390/zcrypt: ep11 structs rework, export zcrypt_send_ep11_cprb s390/zcrypt: enable card/domain autoselect on ep11 cprbs s390/crypto: enable clear key values for paes ciphers s390/pkey: Add support for key blob with clear key value s390/crypto: Rework on paes implementation s390: support KPROBES_ON_FTRACE s390/mm: fix dynamic pagetable upgrade for hugetlbfs
2020-01-31s390/boot: add dfltcc= kernel command line parameterMikhail Zaslonko
Add the new kernel command line parameter 'dfltcc=' to configure s390 zlib hardware support. Format: { on | off | def_only | inf_only | always } on: s390 zlib hardware support for compression on level 1 and decompression (default) off: No s390 zlib hardware support def_only: s390 zlib hardware support for deflate only (compression on level 1) inf_only: s390 zlib hardware support for inflate only (decompression) always: Same as 'on' but ignores the selected compression level always using hardware support (used for debugging) Link: http://lkml.kernel.org/r/20200103223334.20669-5-zaslonko@linux.ibm.com Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Sterba <dsterba@suse.com> Cc: Eduard Shishkin <edward6@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31mm/memblock: define memblock_physmem_add()Anshuman Khandual
On the s390 platform memblock.physmem array is being built by directly calling into memblock_add_range() which is a low level function not intended to be used outside of memblock. Hence lets conditionally add helper functions for physmem array when HAVE_MEMBLOCK_PHYS_MAP is enabled. Also use MAX_NUMNODES instead of 0 as node ID similar to memblock_add() and memblock_reserve(). Make memblock_add_range() a static function as it is no longer getting used outside of memblock. Link: http://lkml.kernel.org/r/1578283835-21969-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Collin Walling <walling@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-30s390: support KPROBES_ON_FTRACESven Schnelle
Instead of using our own kprobes-on-ftrace handling convert the code to support KPROBES_ON_FTRACE. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-29Merge tag 'threads-v5.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread management updates from Christian Brauner: "Sargun Dhillon over the last cycle has worked on the pidfd_getfd() syscall. This syscall allows for the retrieval of file descriptors of a process based on its pidfd. A task needs to have ptrace_may_access() permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and Andy) on the target. One of the main use-cases is in combination with seccomp's user notification feature. As a reminder, seccomp's user notification feature was made available in v5.0. It allows a task to retrieve a file descriptor for its seccomp filter. The file descriptor is usually handed of to a more privileged supervising process. The supervisor can then listen for syscall events caught by the seccomp filter of the supervisee and perform actions in lieu of the supervisee, usually emulating syscalls. pidfd_getfd() is needed to expand its uses. There are currently two major users that wait on pidfd_getfd() and one future user: - Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to 1.2.3.4:80 connects them to e.g. 127.0.0.1:8080. - LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible. - The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval. The thread for this can be found at https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html With pidfd_getfd() it is e.g. possible to bridge socket connections for the supervisee (binding to a privileged port) and taking actions on file descriptors on behalf of the supervisee in general. Sargun's first version was using an ioctl on pidfds but various people pushed for it to be a proper syscall which he duely implemented as well over various review cycles. Selftests are of course included. I've also added instructions how to deal with merge conflicts below. There's also a small fix coming from the kernel mentee project to correctly annotate struct sighand_struct with __rcu to fix various sparse warnings. We've received a few more such fixes and even though they are mostly trivial I've decided to postpone them until after -rc1 since they came in rather late and I don't want to risk introducing build warnings. Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is needed to avoid allocation recursions triggerable by storage drivers that have userspace parts that run in the IO path (e.g. dm-multipath, iscsi, etc). These allocation recursions deadlock the device. The new prctl() allows such privileged userspace components to avoid allocation recursions by setting the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags. The patch carries the necessary acks from the relevant maintainers and is routed here as part of prctl() thread-management." * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim sched.h: Annotate sighand_struct with __rcu test: Add test for pidfd getfd arch: wire up pidfd_getfd syscall pid: Implement pidfd_getfd syscall vfs, fdtable: Add fget_task helper
2020-01-29Merge branch 'work.openat2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull openat2 support from Al Viro: "This is the openat2() series from Aleksa Sarai. I'm afraid that the rest of namei stuff will have to wait - it got zero review the last time I'd posted #work.namei, and there had been a leak in the posted series I'd caught only last weekend. I was going to repost it on Monday, but the window opened and the odds of getting any review during that... Oh, well. Anyway, openat2 part should be ready; that _did_ get sane amount of review and public testing, so here it comes" From Aleksa's description of the series: "For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Furthermore, the need for some sort of control over VFS's path resolution (to avoid malicious paths resulting in inadvertent breakouts) has been a very long-standing desire of many userspace applications. This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum project[5]) with a few additions and changes made based on the previous discussion within [6] as well as others I felt were useful. In line with the conclusions of the original discussion of AT_NO_JUMPS, the flag has been split up into separate flags. However, instead of being an openat(2) flag it is provided through a new syscall openat2(2) which provides several other improvements to the openat(2) interface (see the patch description for more details). The following new LOOKUP_* flags are added: LOOKUP_NO_XDEV: Blocks all mountpoint crossings (upwards, downwards, or through absolute links). Absolute pathnames alone in openat(2) do not trigger this. Magic-link traversal which implies a vfsmount jump is also blocked (though magic-link jumps on the same vfsmount are permitted). LOOKUP_NO_MAGICLINKS: Blocks resolution through /proc/$pid/fd-style links. This is done by blocking the usage of nd_jump_link() during resolution in a filesystem. The term "magic-links" is used to match with the only reference to these links in Documentation/, but I'm happy to change the name. It should be noted that this is different to the scope of ~LOOKUP_FOLLOW in that it applies to all path components. However, you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it will *not* fail (assuming that no parent component was a magic-link), and you will have an fd for the magic-link. In order to correctly detect magic-links, the introduction of a new LOOKUP_MAGICLINK_JUMPED state flag was required. LOOKUP_BENEATH: Disallows escapes to outside the starting dirfd's tree, using techniques such as ".." or absolute links. Absolute paths in openat(2) are also disallowed. Conceptually this flag is to ensure you "stay below" a certain point in the filesystem tree -- but this requires some additional to protect against various races that would allow escape using "..". Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it can trivially beam you around the filesystem (breaking the protection). In future, there might be similar safety checks done as in LOOKUP_IN_ROOT, but that requires more discussion. In addition, two new flags are added that expand on the above ideas: LOOKUP_NO_SYMLINKS: Does what it says on the tin. No symlink resolution is allowed at all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an fd for the symlink as long as no parent path had a symlink component. LOOKUP_IN_ROOT: This is an extension of LOOKUP_BENEATH that, rather than blocking attempts to move past the root, forces all such movements to be scoped to the starting point. This provides chroot(2)-like protection but without the cost of a chroot(2) for each filesystem operation, as well as being safe against race attacks that chroot(2) is not. If a race is detected (as with LOOKUP_BENEATH) then an error is generated, and similar to LOOKUP_BENEATH it is not permitted to cross magic-links with LOOKUP_IN_ROOT. The primary need for this is from container runtimes, which currently need to do symlink scoping in userspace[7] when opening paths in a potentially malicious container. There is a long list of CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a few). In order to make all of the above more usable, I'm working on libpathrs[8] which is a C-friendly library for safe path resolution. It features a userspace-emulated backend if the kernel doesn't support openat2(2). Hopefully we can get userspace to switch to using it, and thus get openat2(2) support for free once it's ready. Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes though stale NFS handles)" * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Documentation: path-lookup: include new LOOKUP flags selftests: add openat2(2) selftests open: introduce openat2(2) syscall namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution namei: LOOKUP_IN_ROOT: chroot-like scoped resolution namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution namei: LOOKUP_NO_XDEV: block mountpoint crossing namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution namei: LOOKUP_NO_SYMLINKS: block symlink resolution namei: allow set_root() to produce errors namei: allow nd_jump_link() to produce errors nsfs: clean-up ns_get_path() signature to return int namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29Merge tag 'tty-5.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here are the big set of tty and serial driver updates for 5.6-rc1 Included in here are: - dummy_con cleanups (touches lots of arch code) - sysrq logic cleanups (touches lots of serial drivers) - samsung driver fixes (wasn't really being built) - conmakeshash move to tty subdir out of scripts - lots of small tty/serial driver updates All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits) tty: n_hdlc: Use flexible-array member and struct_size() helper tty: baudrate: SPARC supports few more baud rates tty: baudrate: Synchronise baud_table[] and baud_bits[] tty: serial: meson_uart: Add support for kernel debugger serial: imx: fix a race condition in receive path serial: 8250_bcm2835aux: Document struct bcm2835aux_data serial: 8250_bcm2835aux: Use generic remapping code serial: 8250_bcm2835aux: Allocate uart_8250_port on stack serial: 8250_bcm2835aux: Suppress register_port error on -EPROBE_DEFER serial: 8250_bcm2835aux: Suppress clk_get error on -EPROBE_DEFER serial: 8250_bcm2835aux: Fix line mismatch on driver unbind serial_core: Remove unused member in uart_port vt: Correct comment documenting do_take_over_console() vt: Delete comment referencing non-existent unbind_con_driver() arch/xtensa/setup: Drop dummy_con initialization arch/x86/setup: Drop dummy_con initialization arch/unicore32/setup: Drop dummy_con initialization arch/sparc/setup: Drop dummy_con initialization arch/sh/setup: Drop dummy_con initialization arch/s390/setup: Drop dummy_con initialization ...
2020-01-28Merge tag 's390-5.6-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add clang 10 build support. - Fix BUG() implementation to contain precise bug address, which is relevant for kprobes. - Make ftraced function appear in a stacktrace. - Minor perf improvements and refactoring. - Possible deadlock and recovery fixes in pci code. * tag 's390-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: fix __EMIT_BUG() macro s390/ftrace: generate traced function stack frame s390: adjust -mpacked-stack support check for clang 10 s390/jump_label: use "i" constraint for clang s390/cpum_sf: Use DIV_ROUND_UP s390/cpum_sf: Use kzalloc and minor changes s390/cpum_sf: Convert debug trace to common layout s390/pci: Fix possible deadlock in recover_store() s390/pci: Recover handle in clp_set_pci_fn()
2020-01-28Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...
2020-01-22s390: fix __EMIT_BUG() macroSven Schnelle
Setting a kprobe on getname_flags() failed: $ echo 'p:tmr1 getname_flags +0(%r2):ustring' > kprobe_events -bash: echo: write error: Invalid argument Debugging the kprobes code showed that the address of getname_flags() is contained in the __bug_table. Kprobes doesn't allow to set probes at BUG() locations. $ objdump -j __bug_table -x build/fs/namei.o [..] 0000000000000108 R_390_PC32 .text+0x00000000000075a8 000000000000010c R_390_PC32 .L223+0x0000000000000004 I was expecting getname_flags() to start with a BUG(), but: 7598: e3 20 10 00 00 04 lg %r2,0(%r1) 759e: c0 f4 00 00 00 00 jg 759e <putname+0x7e> 75a0: R_390_PLT32DBL kmem_cache_free+0x2 75a4: a7 f4 00 01 j 75a6 <putname+0x86> 00000000000075a8 <getname_flags>: 75a8: c0 04 00 00 00 00 brcl 0,75a8 <getname_flags> 75ae: eb 6f f0 48 00 24 stmg %r6,%r15,72(%r15) 75b4: b9 04 00 ef lgr %r14,%r15 75b8: e3 f0 ff a8 ff 71 lay %r15,-88(%r15) So the BUG() is actually the last opcode of the previous function. Fix this by switching to using the MONITOR CALL (MC) instruction, and set the entry in __bug_table to the beginning of that MC. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/ftrace: generate traced function stack frameVasily Gorbik
Currently backtrace from ftraced function does not contain ftraced function itself. e.g. for "path_openat": arch_stack_walk+0x15c/0x2d8 stack_trace_save+0x50/0x68 stack_trace_call+0x15e/0x3d8 ftrace_graph_caller+0x0/0x1c <-- ftrace code do_filp_open+0x7c/0xe8 <-- ftraced function caller do_open_execat+0x76/0x1b8 open_exec+0x52/0x78 load_elf_binary+0x180/0x1160 search_binary_handler+0x8e/0x288 load_script+0x2a8/0x2b8 search_binary_handler+0x8e/0x288 __do_execve_file.isra.39+0x6fa/0xb40 __s390x_sys_execve+0x56/0x68 system_call+0xdc/0x2d8 Ftraced function is expected in the backtrace by ftrace kselftests, which are now failing. It would also be nice to have it for clarity reasons. "ftrace_caller" itself is called without stack frame allocated for it and does not store its caller (ftraced function). Instead it simply allocates a stack frame for "ftrace_trace_function" and sets backchain to point to ftraced function stack frame (which contains ftraced function caller in saved r14). To fix this issue make "ftrace_caller" allocate a stack frame for itself just to store ftraced function for the stack unwinder. As a result backtrace looks like the following: arch_stack_walk+0x15c/0x2d8 stack_trace_save+0x50/0x68 stack_trace_call+0x15e/0x3d8 ftrace_graph_caller+0x0/0x1c <-- ftrace code path_openat+0x6/0xd60 <-- ftraced function do_filp_open+0x7c/0xe8 <-- ftraced function caller do_open_execat+0x76/0x1b8 open_exec+0x52/0x78 load_elf_binary+0x180/0x1160 search_binary_handler+0x8e/0x288 load_script+0x2a8/0x2b8 search_binary_handler+0x8e/0x288 __do_execve_file.isra.39+0x6fa/0xb40 __s390x_sys_execve+0x56/0x68 system_call+0xdc/0x2d8 Reported-by: Sven Schnelle <sven.schnelle@ibm.com> Tested-by: Sven Schnelle <sven.schnelle@ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/cpum_sf: Use DIV_ROUND_UPThomas Richter
Use macro DIV_ROUND_UP() for calculation of number of SDBT SDBT pages required for index pages. This macro is already used throughout the file. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/cpum_sf: Use kzalloc and minor changesThomas Richter
Use kzalloc() to allocate auxiliary buffer structure initialized with all zeroes to avoid random value in trace output. Avoid double access to SBD hardware flags. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/cpum_sf: Convert debug trace to common layoutThomas Richter
Convert debug traces to print the head/alert/empty marks consistently as decimal numbers. Add some trace statements to enable easier debugging during auxiliary tracing. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-18open: introduce openat2(2) syscallAleksa Sarai
/* Background. */ For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Userspace also has a hard time figuring out whether a particular flag is supported on a particular kernel. While it is now possible with contemporary kernels (thanks to [3]), older kernels will expose unknown flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during openat(2) time matches modern syscall designs and is far more fool-proof. In addition, the newly-added path resolution restriction LOOKUP flags (which we would like to expose to user-space) don't feel related to the pre-existing O_* flag set -- they affect all components of path lookup. We'd therefore like to add a new flag argument. Adding a new syscall allows us to finally fix the flag-ignoring problem, and we can make it extensible enough so that we will hopefully never need an openat3(2). /* Syscall Prototype. */ /* * open_how is an extensible structure (similar in interface to * clone3(2) or sched_setattr(2)). The size parameter must be set to * sizeof(struct open_how), to allow for future extensions. All future * extensions will be appended to open_how, with their zero value * acting as a no-op default. */ struct open_how { /* ... */ }; int openat2(int dfd, const char *pathname, struct open_how *how, size_t size); /* Description. */ The initial version of 'struct open_how' contains the following fields: flags Used to specify openat(2)-style flags. However, any unknown flag bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR) will result in -EINVAL. In addition, this field is 64-bits wide to allow for more O_ flags than currently permitted with openat(2). mode The file mode for O_CREAT or O_TMPFILE. Must be set to zero if flags does not contain O_CREAT or O_TMPFILE. resolve Restrict path resolution (in contrast to O_* flags they affect all path components). The current set of flags are as follows (at the moment, all of the RESOLVE_ flags are implemented as just passing the corresponding LOOKUP_ flag). RESOLVE_NO_XDEV => LOOKUP_NO_XDEV RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS RESOLVE_BENEATH => LOOKUP_BENEATH RESOLVE_IN_ROOT => LOOKUP_IN_ROOT open_how does not contain an embedded size field, because it is of little benefit (userspace can figure out the kernel open_how size at runtime fairly easily without it). It also only contains u64s (even though ->mode arguably should be a u16) to avoid having padding fields which are never used in the future. Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE is no longer permitted for openat(2). As far as I can tell, this has always been a bug and appears to not be used by userspace (and I've not seen any problems on my machines by disallowing it). If it turns out this breaks something, we can special-case it and only permit it for openat(2) but not openat2(2). After input from Florian Weimer, the new open_how and flag definitions are inside a separate header from uapi/linux/fcntl.h, to avoid problems that glibc has with importing that header. /* Testing. */ In a follow-up patch there are over 200 selftests which ensure that this syscall has the correct semantics and will correctly handle several attack scenarios. In addition, I've written a userspace library[4] which provides convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care must be taken when using RESOLVE_IN_ROOT'd file descriptors with other syscalls). During the development of this patch, I've run numerous verification tests using libpathrs (showing that the API is reasonably usable by userspace). /* Future Work. */ Additional RESOLVE_ flags have been suggested during the review period. These can be easily implemented separately (such as blocking auto-mount during resolution). Furthermore, there are some other proposed changes to the openat(2) interface (the most obvious example is magic-link hardening[5]) which would be a good opportunity to add a way for userspace to restrict how O_PATH file descriptors can be re-opened. Another possible avenue of future work would be some kind of CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace which openat2(2) flags and fields are supported by the current kernel (to avoid userspace having to go through several guesses to figure it out). [1]: https://lwn.net/Articles/588444/ [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com [3]: commit 629e014bb834 ("fs: completely ignore unknown open flags") [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/ [6]: https://youtu.be/ggD-eb3yPVs Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-14arch/s390/setup: Drop dummy_con initializationArvind Sankar
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset. Drop it from arch setup code. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20191218214506.49252-20-nivedita@alum.mit.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13arch: wire up pidfd_getfd syscallSargun Dhillon
This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-09s390/setup: Fix secure ipl messagePhilipp Rudo
The new machine loader on z15 always creates an IPL Report block and thus sets the IPL_PL_FLAG_IPLSR even when secure boot is disabled. This causes the wrong message being printed at boot. Fix this by checking for IPL_PL_FLAG_SIPL instead. Fixes: 9641b8cc733f ("s390/ipl: read IPL report at early boot") Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-25Merge tag 'v5.5-rc3' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-18s390/ftrace: save traced function callerVasily Gorbik
A typical backtrace acquired from ftraced function currently looks like the following (e.g. for "path_openat"): arch_stack_walk+0x15c/0x2d8 stack_trace_save+0x50/0x68 stack_trace_call+0x15a/0x3b8 ftrace_graph_caller+0x0/0x1c 0x3e0007e3c98 <- ftraced function caller (should be do_filp_open+0x7c/0xe8) do_open_execat+0x70/0x1b8 __do_execve_file.isra.0+0x7d8/0x860 __s390x_sys_execve+0x56/0x68 system_call+0xdc/0x2d8 Note random "0x3e0007e3c98" stack value as ftraced function caller. This value causes either imprecise unwinder result or unwinding failure. That "0x3e0007e3c98" comes from r14 of ftraced function stack frame, which it haven't had a chance to initialize since the very first instruction calls ftrace code ("ftrace_caller"). (ftraced function might never save r14 as well). Nevertheless according to s390 ABI any function is called with stack frame allocated for it and r14 contains return address. "ftrace_caller" itself is called with "brasl %r0,ftrace_caller". So, to fix this issue simply always save traced function caller onto ftraced function stack frame. Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>