summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)Author
2020-01-22s390/ftrace: generate traced function stack frameVasily Gorbik
Currently backtrace from ftraced function does not contain ftraced function itself. e.g. for "path_openat": arch_stack_walk+0x15c/0x2d8 stack_trace_save+0x50/0x68 stack_trace_call+0x15e/0x3d8 ftrace_graph_caller+0x0/0x1c <-- ftrace code do_filp_open+0x7c/0xe8 <-- ftraced function caller do_open_execat+0x76/0x1b8 open_exec+0x52/0x78 load_elf_binary+0x180/0x1160 search_binary_handler+0x8e/0x288 load_script+0x2a8/0x2b8 search_binary_handler+0x8e/0x288 __do_execve_file.isra.39+0x6fa/0xb40 __s390x_sys_execve+0x56/0x68 system_call+0xdc/0x2d8 Ftraced function is expected in the backtrace by ftrace kselftests, which are now failing. It would also be nice to have it for clarity reasons. "ftrace_caller" itself is called without stack frame allocated for it and does not store its caller (ftraced function). Instead it simply allocates a stack frame for "ftrace_trace_function" and sets backchain to point to ftraced function stack frame (which contains ftraced function caller in saved r14). To fix this issue make "ftrace_caller" allocate a stack frame for itself just to store ftraced function for the stack unwinder. As a result backtrace looks like the following: arch_stack_walk+0x15c/0x2d8 stack_trace_save+0x50/0x68 stack_trace_call+0x15e/0x3d8 ftrace_graph_caller+0x0/0x1c <-- ftrace code path_openat+0x6/0xd60 <-- ftraced function do_filp_open+0x7c/0xe8 <-- ftraced function caller do_open_execat+0x76/0x1b8 open_exec+0x52/0x78 load_elf_binary+0x180/0x1160 search_binary_handler+0x8e/0x288 load_script+0x2a8/0x2b8 search_binary_handler+0x8e/0x288 __do_execve_file.isra.39+0x6fa/0xb40 __s390x_sys_execve+0x56/0x68 system_call+0xdc/0x2d8 Reported-by: Sven Schnelle <sven.schnelle@ibm.com> Tested-by: Sven Schnelle <sven.schnelle@ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390: adjust -mpacked-stack support check for clang 10Vasily Gorbik
clang 10 introduces -mpacked-stack compiler option implementation. At the same time currently it does not support a combination of -mpacked-stack and -mbackchain. This leads to the following build error: clang: error: unsupported option '-mpacked-stack with -mbackchain' for target 's390x-ibm-linux' If/when clang adds support for a combination of -mpacked-stack and -mbackchain it would also require -msoft-float (like gcc does). According to Ulrich Weigand "stack slot assigned to the kernel backchain overlaps the stack slot assigned to the FPR varargs (both are required to be placed immediately after the saved r15 slot if present)." Extend -mpacked-stack compiler option support check to include all 3 options -mpacked-stack -mbackchain -msoft-float which must present to support -mpacked-stack with -mbackchain. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/jump_label: use "i" constraint for clangVasily Gorbik
Currently kernel build fails under clang if jump labels are enabled. The problem is "X" constraint usage "Any operand whatsoever is allowed", for which clang produces the following: .pushsection __jump_table,"aw" .balign 8 .long 0b-.,.Ltmp577-. .quad %r0+0-. # %r0 is not allowed here .popsection Under gcc constraints "X" or "jdd" (gcc > 9) are used for static keys. Ideally, we'd have used "i" for gcc, but it doesn't work in all cases with -fPIC code. This is gcc-specific problem that doesn't exist in llvm. Since clang does not have "jdd" simply always use "i" constraint for it. Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/cpum_sf: Use DIV_ROUND_UPThomas Richter
Use macro DIV_ROUND_UP() for calculation of number of SDBT SDBT pages required for index pages. This macro is already used throughout the file. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/cpum_sf: Use kzalloc and minor changesThomas Richter
Use kzalloc() to allocate auxiliary buffer structure initialized with all zeroes to avoid random value in trace output. Avoid double access to SBD hardware flags. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/cpum_sf: Convert debug trace to common layoutThomas Richter
Convert debug traces to print the head/alert/empty marks consistently as decimal numbers. Add some trace statements to enable easier debugging during auxiliary tracing. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/pci: Fix possible deadlock in recover_store()Niklas Schnelle
With zpci_disable() working, lockdep detected a potential deadlock (lockdep output at the end). The deadlock is between recovering a PCI function via the /sys/bus/pci/devices/<dev>/recover attribute vs powering it off via /sys/bus/pci/slots/<slot>/power. The fix is analogous to the changes in commit 0ee223b2e1f6 ("scsi: core: Avoid that SCSI device removal through sysfs triggers a deadlock") that fixed a potential deadlock on removing a SCSI device via sysfs. [ 204.830107] ====================================================== [ 204.830109] WARNING: possible circular locking dependency detected [ 204.830111] 5.5.0-rc2-06072-gbc03ecc9a672 #6 Tainted: G W [ 204.830112] ------------------------------------------------------ [ 204.830113] bash/1034 is trying to acquire lock: [ 204.830115] 0000000192a1a610 (kn->count#200){++++}, at: kernfs_remove_by_name_ns+0x5c/0xa8 [ 204.830122] but task is already holding lock: [ 204.830123] 00000000c16134a8 (pci_rescan_remove_lock){+.+.}, at: pci_stop_and_remove_bus_device_locked+0x26/0x48 [ 204.830128] which lock already depends on the new lock. [ 204.830129] the existing dependency chain (in reverse order) is: [ 204.830130] -> #1 (pci_rescan_remove_lock){+.+.}: [ 204.830134] validate_chain+0x93a/0xd08 [ 204.830136] __lock_acquire+0x4ae/0x9d0 [ 204.830137] lock_acquire+0x114/0x280 [ 204.830140] __mutex_lock+0xa2/0x960 [ 204.830142] mutex_lock_nested+0x32/0x40 [ 204.830145] recover_store+0x4c/0xa8 [ 204.830147] kernfs_fop_write+0xe6/0x218 [ 204.830151] vfs_write+0xb0/0x1b8 [ 204.830152] ksys_write+0x6c/0xf8 [ 204.830154] system_call+0xd8/0x2d8 [ 204.830155] -> #0 (kn->count#200){++++}: [ 204.830187] check_noncircular+0x1e6/0x240 [ 204.830189] check_prev_add+0xfc/0xdb0 [ 204.830190] validate_chain+0x93a/0xd08 [ 204.830192] __lock_acquire+0x4ae/0x9d0 [ 204.830193] lock_acquire+0x114/0x280 [ 204.830194] __kernfs_remove.part.0+0x2e4/0x360 [ 204.830196] kernfs_remove_by_name_ns+0x5c/0xa8 [ 204.830198] remove_files.isra.0+0x4c/0x98 [ 204.830199] sysfs_remove_group+0x66/0xc8 [ 204.830201] sysfs_remove_groups+0x46/0x68 [ 204.830204] device_remove_attrs+0x52/0x90 [ 204.830207] device_del+0x182/0x418 [ 204.830208] pci_remove_bus_device+0x8a/0x130 [ 204.830210] pci_stop_and_remove_bus_device_locked+0x3a/0x48 [ 204.830212] disable_slot+0x68/0x100 [ 204.830213] power_write_file+0x7c/0x130 [ 204.830215] kernfs_fop_write+0xe6/0x218 [ 204.830217] vfs_write+0xb0/0x1b8 [ 204.830218] ksys_write+0x6c/0xf8 [ 204.830220] system_call+0xd8/0x2d8 [ 204.830221] other info that might help us debug this: [ 204.830223] Possible unsafe locking scenario: [ 204.830224] CPU0 CPU1 [ 204.830225] ---- ---- [ 204.830226] lock(pci_rescan_remove_lock); [ 204.830227] lock(kn->count#200); [ 204.830229] lock(pci_rescan_remove_lock); [ 204.830231] lock(kn->count#200); [ 204.830233] *** DEADLOCK *** [ 204.830234] 4 locks held by bash/1034: [ 204.830235] #0: 00000001b6fbc498 (sb_writers#4){.+.+}, at: vfs_write+0x158/0x1b8 [ 204.830239] #1: 000000018c9f5090 (&of->mutex){+.+.}, at: kernfs_fop_write+0xaa/0x218 [ 204.830242] #2: 00000001f7da0810 (kn->count#235){.+.+}, at: kernfs_fop_write+0xb6/0x218 [ 204.830245] #3: 00000000c16134a8 (pci_rescan_remove_lock){+.+.}, at: pci_stop_and_remove_bus_device_locked+0x26/0x48 [ 204.830248] stack backtrace: [ 204.830250] CPU: 2 PID: 1034 Comm: bash Tainted: G W 5.5.0-rc2-06072-gbc03ecc9a672 #6 [ 204.830252] Hardware name: IBM 8561 T01 703 (LPAR) [ 204.830253] Call Trace: [ 204.830257] [<00000000c05e10c0>] show_stack+0x88/0xf0 [ 204.830260] [<00000000c112dca4>] dump_stack+0xa4/0xe0 [ 204.830261] [<00000000c0694c06>] check_noncircular+0x1e6/0x240 [ 204.830263] [<00000000c0695bec>] check_prev_add+0xfc/0xdb0 [ 204.830264] [<00000000c06971da>] validate_chain+0x93a/0xd08 [ 204.830266] [<00000000c06994c6>] __lock_acquire+0x4ae/0x9d0 [ 204.830267] [<00000000c069867c>] lock_acquire+0x114/0x280 [ 204.830269] [<00000000c09ca15c>] __kernfs_remove.part.0+0x2e4/0x360 [ 204.830270] [<00000000c09cb5c4>] kernfs_remove_by_name_ns+0x5c/0xa8 [ 204.830272] [<00000000c09cee14>] remove_files.isra.0+0x4c/0x98 [ 204.830274] [<00000000c09cf2ae>] sysfs_remove_group+0x66/0xc8 [ 204.830276] [<00000000c09cf356>] sysfs_remove_groups+0x46/0x68 [ 204.830278] [<00000000c0e3dfe2>] device_remove_attrs+0x52/0x90 [ 204.830280] [<00000000c0e40382>] device_del+0x182/0x418 [ 204.830281] [<00000000c0dcfd7a>] pci_remove_bus_device+0x8a/0x130 [ 204.830283] [<00000000c0dcfe92>] pci_stop_and_remove_bus_device_locked+0x3a/0x48 [ 204.830285] [<00000000c0de7190>] disable_slot+0x68/0x100 [ 204.830286] [<00000000c0de6514>] power_write_file+0x7c/0x130 [ 204.830288] [<00000000c09cc846>] kernfs_fop_write+0xe6/0x218 [ 204.830290] [<00000000c08f3480>] vfs_write+0xb0/0x1b8 [ 204.830291] [<00000000c08f378c>] ksys_write+0x6c/0xf8 [ 204.830293] [<00000000c1154374>] system_call+0xd8/0x2d8 [ 204.830294] INFO: lockdep is turned off. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-22s390/pci: Recover handle in clp_set_pci_fn()Niklas Schnelle
When we try to recover a PCI function using echo 1 > /sys/bus/pci/devices/<id>/recover or manually with echo 1 > /sys/bus/pci/devices/<id>/remove echo 0 > /sys/bus/pci/slots/<slot>/power echo 1 > /sys/bus/pci/slots/<slot>/power clp_disable_fn() / clp_enable_fn() call clp_set_pci_fn() to first disable and then reenable the function. When the function is already in the requested state we may be left with an invalid function handle. To get a new valid handle we do a clp_list_pci() call. For this we need both the function ID and function handle in clp_set_pci_fn() so pass the zdev and get both. To simplify things also pull setting the refreshed function handle into clp_set_pci_fn() Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-20Merge tag 'v5.5-rc7' into efi/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-18open: introduce openat2(2) syscallAleksa Sarai
/* Background. */ For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Userspace also has a hard time figuring out whether a particular flag is supported on a particular kernel. While it is now possible with contemporary kernels (thanks to [3]), older kernels will expose unknown flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during openat(2) time matches modern syscall designs and is far more fool-proof. In addition, the newly-added path resolution restriction LOOKUP flags (which we would like to expose to user-space) don't feel related to the pre-existing O_* flag set -- they affect all components of path lookup. We'd therefore like to add a new flag argument. Adding a new syscall allows us to finally fix the flag-ignoring problem, and we can make it extensible enough so that we will hopefully never need an openat3(2). /* Syscall Prototype. */ /* * open_how is an extensible structure (similar in interface to * clone3(2) or sched_setattr(2)). The size parameter must be set to * sizeof(struct open_how), to allow for future extensions. All future * extensions will be appended to open_how, with their zero value * acting as a no-op default. */ struct open_how { /* ... */ }; int openat2(int dfd, const char *pathname, struct open_how *how, size_t size); /* Description. */ The initial version of 'struct open_how' contains the following fields: flags Used to specify openat(2)-style flags. However, any unknown flag bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR) will result in -EINVAL. In addition, this field is 64-bits wide to allow for more O_ flags than currently permitted with openat(2). mode The file mode for O_CREAT or O_TMPFILE. Must be set to zero if flags does not contain O_CREAT or O_TMPFILE. resolve Restrict path resolution (in contrast to O_* flags they affect all path components). The current set of flags are as follows (at the moment, all of the RESOLVE_ flags are implemented as just passing the corresponding LOOKUP_ flag). RESOLVE_NO_XDEV => LOOKUP_NO_XDEV RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS RESOLVE_BENEATH => LOOKUP_BENEATH RESOLVE_IN_ROOT => LOOKUP_IN_ROOT open_how does not contain an embedded size field, because it is of little benefit (userspace can figure out the kernel open_how size at runtime fairly easily without it). It also only contains u64s (even though ->mode arguably should be a u16) to avoid having padding fields which are never used in the future. Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE is no longer permitted for openat(2). As far as I can tell, this has always been a bug and appears to not be used by userspace (and I've not seen any problems on my machines by disallowing it). If it turns out this breaks something, we can special-case it and only permit it for openat(2) but not openat2(2). After input from Florian Weimer, the new open_how and flag definitions are inside a separate header from uapi/linux/fcntl.h, to avoid problems that glibc has with importing that header. /* Testing. */ In a follow-up patch there are over 200 selftests which ensure that this syscall has the correct semantics and will correctly handle several attack scenarios. In addition, I've written a userspace library[4] which provides convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care must be taken when using RESOLVE_IN_ROOT'd file descriptors with other syscalls). During the development of this patch, I've run numerous verification tests using libpathrs (showing that the API is reasonably usable by userspace). /* Future Work. */ Additional RESOLVE_ flags have been suggested during the review period. These can be easily implemented separately (such as blocking auto-mount during resolution). Furthermore, there are some other proposed changes to the openat(2) interface (the most obvious example is magic-link hardening[5]) which would be a good opportunity to add a way for userspace to restrict how O_PATH file descriptors can be re-opened. Another possible avenue of future work would be some kind of CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace which openat2(2) flags and fields are supported by the current kernel (to avoid userspace having to go through several guesses to figure it out). [1]: https://lwn.net/Articles/588444/ [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com [3]: commit 629e014bb834 ("fs: completely ignore unknown open flags") [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/ [6]: https://youtu.be/ggD-eb3yPVs Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-14arch/s390/setup: Drop dummy_con initializationArvind Sankar
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset. Drop it from arch setup code. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20191218214506.49252-20-nivedita@alum.mit.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13arch: wire up pidfd_getfd syscallSargun Dhillon
This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-10Merge branch 'x86/mm' into efi/core, to pick up dependenciesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-09s390/setup: Fix secure ipl messagePhilipp Rudo
The new machine loader on z15 always creates an IPL Report block and thus sets the IPL_PL_FLAG_IPLSR even when secure boot is disabled. This causes the wrong message being printed at boot. Fix this by checking for IPL_PL_FLAG_SIPL instead. Fixes: 9641b8cc733f ("s390/ipl: read IPL report at early boot") Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-09crypto: remove propagation of CRYPTO_TFM_RES_* flagsEric Biggers
The CRYPTO_TFM_RES_* flags were apparently meant as a way to make the ->setkey() functions provide more information about errors. But these flags weren't actually being used or tested, and in many cases they weren't being set correctly anyway. So they've now been removed. Also, if someone ever actually needs to start better distinguishing ->setkey() errors (which is somewhat unlikely, as this has been unneeded for a long time), we'd be much better off just defining different return values, like -EINVAL if the key is invalid for the algorithm vs. -EKEYREJECTED if the key was rejected by a policy like "no weak keys". That would be much simpler, less error-prone, and easier to test. So just remove CRYPTO_TFM_RES_MASK and all the unneeded logic that propagates these flags around. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-09crypto: remove CRYPTO_TFM_RES_BAD_KEY_LENEric Biggers
The CRYPTO_TFM_RES_BAD_KEY_LEN flag was apparently meant as a way to make the ->setkey() functions provide more information about errors. However, no one actually checks for this flag, which makes it pointless. Also, many algorithms fail to set this flag when given a bad length key. Reviewing just the generic implementations, this is the case for aes-fixed-time, cbcmac, echainiv, nhpoly1305, pcrypt, rfc3686, rfc4309, rfc7539, rfc7539esp, salsa20, seqiv, and xcbc. But there are probably many more in arch/*/crypto/ and drivers/crypto/. Some algorithms can even set this flag when the key is the correct length. For example, authenc and authencesn set it when the key payload is malformed in any way (not just a bad length), the atmel-sha and ccree drivers can set it if a memory allocation fails, and the chelsio driver sets it for bad auth tag lengths, not just bad key lengths. So even if someone actually wanted to start checking this flag (which seems unlikely, since it's been unused for a long time), there would be a lot of work needed to get it working correctly. But it would probably be much better to go back to the drawing board and just define different return values, like -EINVAL if the key is invalid for the algorithm vs. -EKEYREJECTED if the key was rejected by a policy like "no weak keys". That would be much simpler, less error-prone, and easier to test. So just remove this flag. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-04mm/memory_hotplug: shrink zones when offlining memoryDavid Hildenbrand
We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): BUG: unable to handle page fault for address: 000000000000353d #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:clear_zone_contiguous+0x5/0x10 Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __remove_pages+0x4b/0x640 arch_remove_memory+0x63/0x8d try_remove_memory+0xdb/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x70/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x227/0x3a0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x221/0x550 worker_thread+0x50/0x3b0 kthread+0x105/0x140 ret_from_fork+0x3a/0x50 Modules linked in: CR2: 000000000000353d Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-03compat: provide compat_ptr() on all architecturesArnd Bergmann
In order to avoid needless #ifdef CONFIG_COMPAT checks, move the compat_ptr() definition to linux/compat.h where it can be seen by any file regardless of the architecture. Only s390 needs a special definition, this can use the self-#define trick we have elsewhere. Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-25Merge tag 'v5.5-rc3' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-18s390/ftrace: save traced function callerVasily Gorbik
A typical backtrace acquired from ftraced function currently looks like the following (e.g. for "path_openat"): arch_stack_walk+0x15c/0x2d8 stack_trace_save+0x50/0x68 stack_trace_call+0x15a/0x3b8 ftrace_graph_caller+0x0/0x1c 0x3e0007e3c98 <- ftraced function caller (should be do_filp_open+0x7c/0xe8) do_open_execat+0x70/0x1b8 __do_execve_file.isra.0+0x7d8/0x860 __s390x_sys_execve+0x56/0x68 system_call+0xdc/0x2d8 Note random "0x3e0007e3c98" stack value as ftraced function caller. This value causes either imprecise unwinder result or unwinding failure. That "0x3e0007e3c98" comes from r14 of ftraced function stack frame, which it haven't had a chance to initialize since the very first instruction calls ftrace code ("ftrace_caller"). (ftraced function might never save r14 as well). Nevertheless according to s390 ABI any function is called with stack frame allocated for it and r14 contains return address. "ftrace_caller" itself is called with "brasl %r0,ftrace_caller". So, to fix this issue simply always save traced function caller onto ftraced function stack frame. Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-18s390/unwind: stop gracefully at user mode pt_regs in irq stackVasily Gorbik
Consider reaching user mode pt_regs at the bottom of irq stack graceful unwinder termination. This is the case when irq/mcck/ext interrupt arrives while in user mode. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-18s390/purgatory: do not build purgatory with kcov, kasan and friendsChristian Borntraeger
the purgatory must not rely on functions from the "old" kernel, so we must disable kasan and friends. We also need to have a separate copy of string.c as the default does not build memcmp with KASAN. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-18s390/purgatory: Make sure we fail the build if purgatory has missing symbolsHans de Goede
Since we link purgatory with -r aka we enable "incremental linking" no checks for unresolved symbols are done while linking the purgatory. This commit adds an extra check for unresolved symbols by calling ld without -r before running objcopy to generate purgatory.ro. This will help us catch missing symbols in the purgatory sooner. Note this commit also removes --no-undefined from LDFLAGS_purgatory as that has no effect. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/lkml/20191212205304.191610-1-hdegoede@redhat.com Tested-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-18s390/ftrace: fix endless recursion in function_graph tracerSven Schnelle
The following sequence triggers a kernel stack overflow on s390x: mount -t tracefs tracefs /sys/kernel/tracing cd /sys/kernel/tracing echo function_graph > current_tracer [crash] This is because preempt_count_{add,sub} are in the list of traced functions, which can be demonstrated by: echo preempt_count_add >set_ftrace_filter echo function_graph > current_tracer [crash] The stack overflow happens because get_tod_clock_monotonic() gets called by ftrace but itself calls preempt_{disable,enable}(), which leads to a endless recursion. Fix this by using preempt_{disable,enable}_notrace(). Fixes: 011620688a71 ("s390/time: ensure get_clock_monotonic() returns monotonic values") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-13scripts/sorttable: Rename 'sortextable' to 'sorttable'Shile Zhang
Use a more generic name for additional table sorting usecases, such as the upcoming ORC table sorting feature. This tool is not tied to exception table sorting anymore. No functional changes intended. [ mingo: Rewrote the changelog. ] Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/20191204004633.88660-6-shile.zhang@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-11s390/kasan: add KASAN_VMALLOC supportVasily Gorbik
Add KASAN_VMALLOC support which now enables vmalloc memory area access checks as well as enables usage of VMAP_STACK under kasan. KASAN_VMALLOC changes the way vmalloc and modules areas shadow memory is handled. With this new approach only top level page tables are pre-populated and lower levels are filled dynamically upon memory allocation. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11s390: remove last diag 0x44 callerHeiko Carstens
diag 0x44 is a voluntary undirected yield of a virtual CPU. This has caused a lot of performance issues in the past. There is only one caller left, and that one is only executed if diag 0x9c (directed yield) is not present. Given that all hypervisors implement diag 0x9c anyway, remove the last diag 0x44 to avoid that more callers will be added. Worst case that could happen now, if diag 0x9c is not present, is that a virtual CPU would loop a bit instead of giving its time slice up. diag 0x44 statistics in debugfs are kept and will always be zero, so that user space can tell that there are no calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11s390/uv: use EOPNOTSUPP instead of ENOTSUPPChristian Borntraeger
ENOTSUP is just an internal kernel error and should never reach userspace. The return value of the share function is not exported to userspace, but to avoid giving bad examples let us use EOPNOTSUPP: Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11s390/cpum_sf: Avoid SBD overflow condition in irq handlerThomas Richter
The s390 CPU Measurement sampling facility has an overflow condition which fires when all entries in a SBD are used. The measurement alert interrupt is triggered and reads out all samples in this SDB. It then tests the successor SDB, if this SBD is not full, the interrupt handler does not read any samples at all from this SDB The design waits for the hardware to fill this SBD and then trigger another meassurement alert interrupt. This scheme works nicely until an perf_event_overflow() function call discards the sample due to a too high sampling rate. The interrupt handler has logic to read out a partially filled SDB when the perf event overflow condition in linux common code is met. This causes the CPUM sampling measurement hardware and the PMU device driver to operate on the same SBD's trailer entry. This should not happen. This can be seen here using this trace: cpumsf_pmu_add: tear:0xb5286000 hw_perf_event_update: sdbt 0xb5286000 full 1 over 0 flush_all:0 hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0 above shows 1. interrupt hw_perf_event_update: sdbt 0xb5286008 full 1 over 0 flush_all:0 hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0 above shows 2. interrupt ... this goes on fine until... hw_perf_event_update: sdbt 0xb5286068 full 1 over 0 flush_all:0 perf_push_sample1: overflow one or more samples read from the IRQ handler are rejected by perf_event_overflow() and the IRQ handler advances to the next SDB and modifies the trailer entry of a partially filled SDB. hw_perf_event_update: sdbt 0xb5286070 full 0 over 0 flush_all:1 timestamp: 14:32:52.519953 Next time the IRQ handler is called for this SDB the trailer entry shows an overflow count of 19 missed entries. hw_perf_event_update: sdbt 0xb5286070 full 1 over 19 flush_all:1 timestamp: 14:32:52.970058 Remove access to a follow on SDB when event overflow happened. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11s390/cpum_sf: Adjust sampling interval to avoid hitting sample limitsThomas Richter
Function perf_event_ever_overflow() and perf_event_account_interrupt() are called every time samples are processed by the interrupt handler. However function perf_event_account_interrupt() has checks to avoid being flooded with interrupts (more then 1000 samples are received per task_tick). Samples are then dropped and a PERF_RECORD_THROTTLED is added to the perf data. The perf subsystem limit calculation is: maximum sample frequency := 100000 --> 1 samples per 10 us task_tick = 10ms = 10000us --> 1000 samples per task_tick The work flow is measurement_alert() uses SDBT head and each SBDT points to 511 SDB pages, each with 126 sample entries. After processing 8 SBDs and for each valid sample calling: perf_event_overflow() perf_event_account_interrupts() there is a considerable amount of samples being dropped, especially when the sample frequency is very high and near the 100000 limit. To avoid the high amount of samples being dropped near the end of a task_tick time frame, increment the sampling interval in case of dropped events. The CPU Measurement sampling facility on the s390 supports only intervals, specifiing how many CPU cycles have to be executed before a sample is generated. Increase the interval when the samples being generated hit the task_tick limit. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11s390/test_unwind: fix spelling mistake "reqister" -> "register"Colin Ian King
There is a spelling mistake in a pr_info message. Fix it. Link: https://lkml.kernel.org/r/20191202090215.28766-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11s390/spinlock: remove confusing comment in arch_spin_lock_waitVasily Gorbik
arch_spin_lock_wait does not take steal time into consideration. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-10mm/vmalloc: Add empty <asm/vmalloc.h> headers and use them from ↵Ingo Molnar
<linux/vmalloc.h> In the x86 MM code we'd like to untangle various types of historic header dependency spaghetti, but for this we'd need to pass to the generic vmalloc code various vmalloc related defines that customarily come via the <asm/page.h> low level arch header. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08sched/rt, s390: Use CONFIG_PREEMPTIONThomas Gleixner
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the preemption and entry code over to use CONFIG_PREEMPTION. Add PREEMPT_RT output to die(). [bigeasy: +Kconfig, dumpstack.c] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: linux-s390@vger.kernel.org Link: https://lore.kernel.org/r/20191015191821.11479-18-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-06Merge tag 'powerpc-5.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "A few commits splitting the KASAN instrumented bitops header in three, to match the split of the asm-generic bitops headers. This is needed on powerpc because we use the generic bitops for the non-atomic case only, whereas the existing KASAN instrumented bitops assume all the underlying operations are provided by the arch as arch_foo() versions. Thanks to: Daniel Axtens & Christophe Leroy" * tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: docs/core-api: Remove possibly confusing sub-headings from Bit Operations powerpc: support KASAN instrumentation of bitops kasan: support instrumented bitops combined with generic bitops
2019-12-04arch: ipcbuf.h: make uapi asm/ipcbuf.h self-containedMasahiro Yamada
Userspace cannot compile <asm/ipcbuf.h> due to some missing type definitions. For example, building it for x86 fails as follows: CC usr/include/asm/ipcbuf.h.s In file included from usr/include/asm/ipcbuf.h:1:0, from <command-line>:32: usr/include/asm-generic/ipcbuf.h:21:2: error: unknown type name `__kernel_key_t' __kernel_key_t key; ^~~~~~~~~~~~~~ usr/include/asm-generic/ipcbuf.h:22:2: error: unknown type name `__kernel_uid32_t' __kernel_uid32_t uid; ^~~~~~~~~~~~~~~~ usr/include/asm-generic/ipcbuf.h:23:2: error: unknown type name `__kernel_gid32_t' __kernel_gid32_t gid; ^~~~~~~~~~~~~~~~ usr/include/asm-generic/ipcbuf.h:24:2: error: unknown type name `__kernel_uid32_t' __kernel_uid32_t cuid; ^~~~~~~~~~~~~~~~ usr/include/asm-generic/ipcbuf.h:25:2: error: unknown type name `__kernel_gid32_t' __kernel_gid32_t cgid; ^~~~~~~~~~~~~~~~ usr/include/asm-generic/ipcbuf.h:26:2: error: unknown type name `__kernel_mode_t' __kernel_mode_t mode; ^~~~~~~~~~~~~~~ usr/include/asm-generic/ipcbuf.h:28:35: error: `__kernel_mode_t' undeclared here (not in a function) unsigned char __pad1[4 - sizeof(__kernel_mode_t)]; ^~~~~~~~~~~~~~~ usr/include/asm-generic/ipcbuf.h:31:2: error: unknown type name `__kernel_ulong_t' __kernel_ulong_t __unused1; ^~~~~~~~~~~~~~~~ usr/include/asm-generic/ipcbuf.h:32:2: error: unknown type name `__kernel_ulong_t' __kernel_ulong_t __unused2; ^~~~~~~~~~~~~~~~ It is just a matter of missing include directive. Include <linux/posix_types.h> to make it self-contained, and add it to the compile-test coverage. Link: http://lkml.kernel.org/r/20191030063855.9989-1-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-03Merge tag 'pci-v5.5-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Warn if a host bridge has no NUMA info (Yunsheng Lin) - Add PCI_STD_NUM_BARS for the number of standard BARs (Denis Efremov) Resource management: - Fix boot-time Embedded Controller GPE storm caused by incorrect resource assignment after ACPI Bus Check Notification (Mika Westerberg) - Protect pci_reassign_bridge_resources() against concurrent addition/removal (Benjamin Herrenschmidt) - Fix bridge dma_ranges resource list cleanup (Rob Herring) - Add "pci=hpmmiosize" and "pci=hpmmioprefsize" parameters to control the MMIO and prefetchable MMIO window sizes of hotplug bridges independently (Nicholas Johnson) - Fix MMIO/MMIO_PREF window assignment that assigned more space than desired (Nicholas Johnson) - Only enforce bus numbers from bridge EA if the bridge has EA devices downstream (Subbaraya Sundeep) - Consolidate DT "dma-ranges" parsing and convert all host drivers to use shared parsing (Rob Herring) Error reporting: - Restore AER capability after resume (Mayurkumar Patel) - Add PoisonTLPBlocked AER counter (Rajat Jain) - Use for_each_set_bit() to simplify AER code (Andy Shevchenko) - Fix AER kernel-doc (Andy Shevchenko) - Add "pcie_ports=dpc-native" parameter to allow native use of DPC even if platform didn't grant control over AER (Olof Johansson) Hotplug: - Avoid returning prematurely from sysfs requests to enable or disable a PCIe hotplug slot (Lukas Wunner) - Don't disable interrupts twice when suspending hotplug ports (Mika Westerberg) - Fix deadlocks when PCIe ports are hot-removed while suspended (Mika Westerberg) Power management: - Remove unnecessary ASPM locking (Bjorn Helgaas) - Add support for disabling L1 PM Substates (Heiner Kallweit) - Allow re-enabling Clock PM after it has been disabled (Heiner Kallweit) - Add sysfs attributes for controlling ASPM link states (Heiner Kallweit) - Remove CONFIG_PCIEASPM_DEBUG, including "link_state" and "clk_ctl" sysfs files (Heiner Kallweit) - Avoid AMD FCH XHCI USB PME# from D0 defect that prevents wakeup on USB 2.0 or 1.1 connect events (Kai-Heng Feng) - Move power state check out of pci_msi_supported() (Bjorn Helgaas) - Fix incorrect MSI-X masking on resume and revert related nvme quirk for Kingston NVME SSD running FW E8FK11.T (Jian-Hong Pan) - Always return devices to D0 when thawing to fix hibernation with drivers like mlx4 that used legacy power management (previously we only did it for drivers with new power management ops) (Dexuan Cui) - Clear PCIe PME Status even for legacy power management (Bjorn Helgaas) - Fix PCI PM documentation errors (Bjorn Helgaas) - Use dev_printk() for more power management messages (Bjorn Helgaas) - Apply D2 delay as milliseconds, not microseconds (Bjorn Helgaas) - Convert xen-platform from legacy to generic power management (Bjorn Helgaas) - Removed unused .resume_early() and .suspend_late() legacy power management hooks (Bjorn Helgaas) - Rearrange power management code for clarity (Rafael J. Wysocki) - Decode power states more clearly ("4" or "D4" really refers to "D3cold") (Bjorn Helgaas) - Notice when reading PM Control register returns an error (~0) instead of interpreting it as being in D3hot (Bjorn Helgaas) - Add missing link delays required by the PCIe spec (Mika Westerberg) Virtualization: - Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI (Bjorn Helgaas) - Allow VFs to use PRI (the PF PRI is shared by the VFs, but the code previously didn't recognize that) (Kuppuswamy Sathyanarayanan) - Allow VFs to use PASID (the PF PASID capability is shared by the VFs, but the code previously didn't recognize that) (Kuppuswamy Sathyanarayanan) - Disconnect PF and VF ATS enablement, since ATS in PFs and associated VFs can be enabled independently (Kuppuswamy Sathyanarayanan) - Cache PRI and PASID capability offsets (Kuppuswamy Sathyanarayanan) - Cache the PRI PRG Response PASID Required bit (Bjorn Helgaas) - Consolidate ATS declarations in linux/pci-ats.h (Krzysztof Wilczynski) - Remove unused PRI and PASID stubs (Bjorn Helgaas) - Removed unnecessary EXPORT_SYMBOL_GPL() from ATS, PRI, and PASID interfaces that are only used by built-in IOMMU drivers (Bjorn Helgaas) - Hide PRI and PASID state restoration functions used only inside the PCI core (Bjorn Helgaas) - Add a DMA alias quirk for the Intel VCA NTB (Slawomir Pawlowski) - Serialize sysfs sriov_numvfs reads vs writes (Pierre Crégut) - Update Cavium ACS quirk for ThunderX2 and ThunderX3 (George Cherian) - Fix the UPDCR register address in the Intel ACS quirk (Steffen Liebergeld) - Unify ACS quirk implementations (Bjorn Helgaas) Amlogic Meson host bridge driver: - Fix meson PERST# GPIO polarity problem (Remi Pommarel) - Add DT bindings for Amlogic Meson G12A (Neil Armstrong) - Fix meson clock names to match DT bindings (Neil Armstrong) - Add meson support for Amlogic G12A SoC with separate shared PHY (Neil Armstrong) - Add meson extended PCIe PHY functions for Amlogic G12A USB3+PCIe combo PHY (Neil Armstrong) - Add arm64 DT for Amlogic G12A PCIe controller node (Neil Armstrong) - Add commented-out description of VIM3 USB3/PCIe mux in arm64 DT (Neil Armstrong) Broadcom iProc host bridge driver: - Invalidate iProc PAXB address mapping before programming it (Abhishek Shah) - Fix iproc-msi and mvebu __iomem annotations (Ben Dooks) Cadence host bridge driver: - Refactor Cadence PCIe host controller to use as a library for both host and endpoint (Tom Joseph) Freescale Layerscape host bridge driver: - Add layerscape LS1028a support (Xiaowei Bao) Intel VMD host bridge driver: - Add VMD bus 224-255 restriction decode (Jon Derrick) - Add VMD 8086:9A0B device ID (Jon Derrick) - Remove Keith from VMD maintainer list (Keith Busch) Marvell ARMADA 3700 / Aardvark host bridge driver: - Use LTSSM state to build link training flag since Aardvark doesn't implement the Link Training bit (Remi Pommarel) - Delay before training Aardvark link in case PERST# was asserted before the driver probe (Remi Pommarel) - Fix Aardvark issues with Root Control reads and writes (Remi Pommarel) - Don't rely on jiffies in Aardvark config access path since interrupts may be disabled (Remi Pommarel) - Fix Aardvark big-endian support (Grzegorz Jaszczyk) Marvell ARMADA 370 / XP host bridge driver: - Make mvebu_pci_bridge_emul_ops static (Ben Dooks) Microsoft Hyper-V host bridge driver: - Add hibernation support for Hyper-V virtual PCI devices (Dexuan Cui) - Track Hyper-V pci_protocol_version per-hbus, not globally (Dexuan Cui) - Avoid kmemleak false positive on hv hbus buffer (Dexuan Cui) Mobiveil host bridge driver: - Change mobiveil csr_read()/write() function names that conflict with riscv arch functions (Kefeng Wang) NVIDIA Tegra host bridge driver: - Fix Tegra CLKREQ dependency programming (Vidya Sagar) Renesas R-Car host bridge driver: - Remove unnecessary header include from rcar (Andrew Murray) - Tighten register index checking for rcar inbound range programming (Marek Vasut) - Fix rcar inbound range alignment calculation to improve packing of multiple entries (Marek Vasut) - Update rcar MACCTLR setting to match documentation (Yoshihiro Shimoda) - Clear bit 0 of MACCTLR before PCIETCTLR.CFINIT per manual (Yoshihiro Shimoda) - Add Marek Vasut and Yoshihiro Shimoda as R-Car maintainers (Simon Horman) Rockchip host bridge driver: - Make rockchip 0V9 and 1V8 power regulators non-optional (Robin Murphy) Socionext UniPhier host bridge driver: - Set uniphier to host (RC) mode always (Kunihiko Hayashi) Endpoint drivers: - Fix endpoint driver sign extension problem when shifting page number to phys_addr_t (Alan Mikhak) Misc: - Add NumaChip SPDX header (Krzysztof Wilczynski) - Replace EXTRA_CFLAGS with ccflags-y (Krzysztof Wilczynski) - Remove unused includes (Krzysztof Wilczynski) - Removed unused sysfs attribute groups (Ben Dooks) - Remove PTM and ASPM dependencies on PCIEPORTBUS (Bjorn Helgaas) - Add PCIe Link Control 2 register field definitions to replace magic numbers in AMDGPU and Radeon CIK/SI (Bjorn Helgaas) - Fix incorrect Link Control 2 Transmit Margin usage in AMDGPU and Radeon CIK/SI PCIe Gen3 link training (Bjorn Helgaas) - Use pcie_capability_read_word() instead of pci_read_config_word() in AMDGPU and Radeon CIK/SI (Frederick Lawler) - Remove unused pci_irq_get_node() Greg Kroah-Hartman) - Make asm/msi.h mandatory and simplify PCI_MSI_IRQ_DOMAIN Kconfig (Palmer Dabbelt, Michal Simek) - Read all 64 bits of Switchtec part_event_bitmap (Logan Gunthorpe) - Fix erroneous intel-iommu dependency on CONFIG_AMD_IOMMU (Bjorn Helgaas) - Fix bridge emulation big-endian support (Grzegorz Jaszczyk) - Fix dwc find_next_bit() usage (Niklas Cassel) - Fix pcitest.c fd leak (Hewenliang) - Fix typos and comments (Bjorn Helgaas) - Fix Kconfig whitespace errors (Krzysztof Kozlowski)" * tag 'pci-v5.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (160 commits) PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist asm-generic: Make msi.h a mandatory include/asm header Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T" PCI/MSI: Fix incorrect MSI-X masking on resume PCI/MSI: Move power state check out of pci_msi_supported() PCI/MSI: Remove unused pci_irq_get_node() PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer PCI: hv: Change pci_protocol_version to per-hbus PCI: hv: Add hibernation support PCI: hv: Reorganize the code in preparation of hibernation MAINTAINERS: Remove Keith from VMD maintainer PCI/ASPM: Remove PCIEASPM_DEBUG Kconfig option and related code PCI/ASPM: Add sysfs attributes for controlling ASPM link states PCI: Fix indentation drm/radeon: Prefer pcie_capability_read_word() drm/radeon: Replace numbers with PCI_EXP_LNKCTL2 definitions drm/radeon: Correct Transmit Margin masks drm/amdgpu: Prefer pcie_capability_read_word() PCI: uniphier: Set mode register to host mode drm/amdgpu: Replace numbers with PCI_EXP_LNKCTL2 definitions ...
2019-12-03Merge tag 's390-5.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: - Make stack unwinder reliable and suitable for livepatching. Add unwinder testing module. - Fixes for CALL_ON_STACK helper used for stack switching. - Fix unwinding from bpf code. - Fix getcpu and remove compat support in vdso code. - Fix address space control registers initialization. - Save KASLR offset for early dumps. - Handle new FILTERED_BY_HYPERVISOR reply code in crypto code. - Minor perf code cleanup and potential memory leak fix. - Add couple of error messages for corner cases during PCI device creation. * tag 's390-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (33 commits) s390: remove compat vdso code s390/livepatch: Implement reliable stack tracing for the consistency model s390/unwind: add stack pointer alignment sanity checks s390/unwind: filter out unreliable bogus %r14 s390/unwind: start unwinding from reliable state s390/test_unwind: add program check context tests s390/test_unwind: add irq context tests s390/test_unwind: print verbose unwinding results s390/test_unwind: add CALL_ON_STACK tests s390: fix register clobbering in CALL_ON_STACK s390/test_unwind: require that unwinding ended successfully s390/unwind: add a test for the internal API s390/unwind: always inline get_stack_pointer s390/pci: add error message on device number limit s390/pci: add error message for UID collision s390/cpum_sf: Check for SDBT and SDB consistency s390/cpum_sf: Use TEAR_REG macro consistantly s390/cpum_sf: Remove unnecessary check for pending SDBs s390/cpum_sf: Replace function name in debug statements s390/kaslr: store KASLR offset for early dumps ...
2019-12-01s390: remove compat vdso codeHeiko Carstens
Remove compat vdso code, since there is hardly any compat user space left. Still existing compat user space will have to use system calls instead. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30Merge tag 'seccomp-v5.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "Mostly this is implementing the new flag SECCOMP_USER_NOTIF_FLAG_CONTINUE, but there are cleanups as well. - implement SECCOMP_USER_NOTIF_FLAG_CONTINUE (Christian Brauner) - fixes to selftests (Christian Brauner) - remove secure_computing() argument (Christian Brauner)" * tag 'seccomp-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test seccomp: simplify secure_computing() seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE seccomp: avoid overflow in implicit constant conversion
2019-11-30s390/livepatch: Implement reliable stack tracing for the consistency modelMiroslav Benes
The livepatch consistency model requires reliable stack tracing architecture support in order to work properly. In order to achieve this, two main issues have to be solved. First, reliable and consistent call chain backtracing has to be ensured. Second, the unwinder needs to be able to detect stack corruptions and return errors. The "zSeries ELF Application Binary Interface Supplement" says: "The stack pointer points to the first word of the lowest allocated stack frame. If the "back chain" is implemented this word will point to the previously allocated stack frame (towards higher addresses), except for the first stack frame, which shall have a back chain of zero (NULL). The stack shall grow downwards, in other words towards lower addresses." "back chain" is optional. GCC option -mbackchain enables it. Quoting Martin Schwidefsky [1]: "The compiler is called with the -mbackchain option, all normal C function will store the backchain in the function prologue. All functions written in assembler code should do the same, if you find one that does not we should fix that. The end result is that a task that *voluntarily* called schedule() should have a proper backchain at all times. Dependent on the use case this may or may not be enough. Asynchronous interrupts may stop the CPU at the beginning of a function, if kernel preemption is enabled we can end up with a broken backchain. The production kernels for IBM Z are all compiled *without* kernel preemption. So yes, we might get away without the objtool support. On a side-note, we do have a line item to implement the ORC unwinder for the kernel, that includes the objtool support. Once we have that we can drop the -mbackchain option for the kernel build. That gives us a nice little performance benefit. I hope that the change from backchain to the ORC unwinder will not be too hard to implement in the livepatch tools." Since -mbackchain is enabled by default when the kernel is compiled, the call chain backtracing should be currently ensured and objtool should not be necessary for livepatch purposes. Regarding the second issue, stack corruptions and non-reliable states have to be recognized by the unwinder. Mainly it means to detect preemption or page faults, the end of the task stack must be reached, return addresses must be valid text addresses and hacks like function graph tracing and kretprobes must be properly detected. Unwinding a running task's stack is not a problem, because there is a livepatch requirement that every checked task is blocked, except for the current task. Due to that, the implementation can be much simpler compared to the existing non-reliable infrastructure. We can consider a task's kernel/thread stack only and skip the other stacks. [1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org] Link: https://lkml.kernel.org/r/20191106095601.29986-5-mbenes@suse.cz Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/unwind: add stack pointer alignment sanity checksMiroslav Benes
ABI requires SP to be aligned 8 bytes, report unwinding error otherwise. Link: https://lkml.kernel.org/r/20191106095601.29986-5-mbenes@suse.cz Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/unwind: filter out unreliable bogus %r14Vasily Gorbik
Currently unwinder unconditionally returns %r14 from the first frame pointed by %r15 from pt_regs. A task could be interrupted when a function already allocated this frame (if it needs it) for its callees or to store local variables. In that case this frame would contain random values from stack or values stored there by a callee. As we are only interested in %r14 to get potential return address, skip bogus return addresses which doesn't belong to kernel text. This helps to avoid duplicating filtering logic in unwider users, most of which use unwind_get_return_address() and would choke on bogus 0 address returned by it otherwise. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/unwind: start unwinding from reliable stateVasily Gorbik
A comment in arch/s390/include/asm/unwind.h says: > If 'first_frame' is not zero unwind_start skips unwind frames until it > reaches the specified stack pointer. > The end of the unwinding is indicated with unwind_done, this can be true > right after unwind_start, e.g. with first_frame!=0 that can not be found. > unwind_next_frame skips to the next frame. > Once the unwind is completed unwind_error() can be used to check if there > has been a situation where the unwinder could not correctly understand > the tasks call chain. With this change backchain unwinder now comply with behaviour described. As well as matches orc unwinder implementation. Now unwinder starts from reliable state, i.e. __unwind_start own stack frame is taken or stack frame generated by __switch_to (ksp) - both known to be valid. In case of pt_regs %r15 is better match for pt_regs psw, than sometimes random "sp" caller passed. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/test_unwind: add program check context testsVasily Gorbik
Add unwinding from program check handler tests. Unwinder should be able to unwind through pt_regs stored by program check handler on task stack. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/test_unwind: add irq context testsVasily Gorbik
Add unwinding from irq context tests. Unwinder should be able to unwind through irq stack to task stack up to task pt_regs. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/test_unwind: print verbose unwinding resultsVasily Gorbik
Add stack name, sp and reliable information into test unwinding results. Also consider ip outside of kernel text as failure if the state is reported reliable. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/test_unwind: add CALL_ON_STACK testsVasily Gorbik
Add CALL_ON_STACK helper testing. Tests make sure that we can unwind from switched stack to original one up to task pt_regs (nodat -> task stack). UWM_SWITCH_STACK could not be used together with UWM_THREAD because get_stack_info explicitly restricts unwinding to task stack if task != current. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390: fix register clobbering in CALL_ON_STACKVasily Gorbik
CALL_ON_STACK defines and initializes register variables. Inline assembly which follows might trigger compiler to generate memory access for "stack" argument (e.g. in case of S390_lowcore.nodat_stack). This memory access produces a function call under kasan with outline instrumentation which clobbers registers. Switch "stack" argument in CALL_ON_STACK helper to use memory reference constraint and perform load instead. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390/test_unwind: require that unwinding ended successfullyVasily Gorbik
Currently unwinder test passes if unwinding results contain unwindme_func2 and unwindme_func1 functions. Now that unwinder reports success upon reaching task pt_regs, check that unwinding ended successfully in every test. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>