summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2018-04-05powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bitNicholas Piggin
The pkey code added a CPU_FTR_PKEY bit, but did not add it to the dt_cpu_ftrs feature set. Although capability is supported by all processors in the base dt_cpu_ftrs set for 64s, it's a significant and sufficiently well defined feature to make it optional. So add it as a quirk for now, which can be versioned out then controlled by the firmware (once dt_cpu_ftrs gains versioning support). Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bitsNicholas Piggin
Presently the dt_cpu_ftrs restore_cpu will only add bits to the LPCR for secondaries, but some bits must be removed (e.g., UPRT for HPT). Not clearing these bits on secondaries causes checkstops when booting with disable_radix. restore_cpu can not just set LPCR, because it is also called by the idle wakeup code which relies on opal_slw_set_reg to restore the value of LPCR, at least on P8 which does not save LPCR to stack in the idle code. Fix this by including a mask of bits to clear from LPCR as well, which is used by restore_cpu. This is a little messy now, but it's a minimal fix that can be backported. Longer term, the idle SPR save/restore code can be reworked to completely avoid calls to restore_cpu, then restore_cpu would be able to unconditionally set LPCR to match boot processor environment. Fixes: 5a61ef74f269f ("powerpc/64s: Support new device tree binding for discovering CPU features") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"Michael Ellerman
As described in that commit: When stop is executed with EC=ESL=0, it appears to execute like a normal instruction (resuming from NIP when woken by interrupt). So all the save/restore handling can be avoided completely. This is true, except in the case of an NMI interrupt (sreset or machine check) interrupting the instruction. In that case, the NMI gets an "interrupt occurred while the processor was in power-saving mode" indication. The power-save wakeup code uses that bit to decide whether to restore some registers (e.g., LR). Because these are no longer saved, this causes random register corruption. It may be possible to restore this optimisation by detecting the case of no register loss on the wakeup side, and avoid restoring in that case, but that's not a minor fix because the wakeup code itself uses some registers that would be live (e.g., LR). Fixes: b9ee31e100e7 ("powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}Logan Gunthorpe
These functions will be introduced into the generic iomap.c so they can deal with PIO accesses in hi-lo/lo-hi variants. Thus, the powerpc version of iomap.c will need to provide the same functions even though, in this arch, they are identical to the regular io{read|write}64 functions. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Horia Geantă <horia.geanta@nxp.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-05powerpc: io.h: move iomap.h include so that it can use readq/writeq defsLogan Gunthorpe
Subsequent patches in this series makes use of the readq and writeq defines in iomap.h. However, as is, they get missed on the powerpc platform seeing the include comes before the define. This patch moves the include down to fix this. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports itNaveen N. Rao
We get the below warning if we try to use kexec on P9: kexec_core: Starting new kernel WARNING: CPU: 0 PID: 1223 at arch/powerpc/kernel/process.c:826 __set_breakpoint+0xb4/0x140 [snip] NIP __set_breakpoint+0xb4/0x140 LR kexec_prepare_cpus_wait+0x58/0x150 Call Trace: 0xc0000000ee70fb20 (unreliable) 0xc0000000ee70fb20 default_machine_kexec+0x234/0x2c0 machine_kexec+0x84/0x90 kernel_kexec+0xd8/0xe0 SyS_reboot+0x214/0x2c0 system_call+0x58/0x6c This happens since we are trying to clear hw breakpoint on POWER9, though we don't have CPU_FTR_DAWR enabled. Guard __set_breakpoint() within hw_breakpoint_disable() with ppc_breakpoint_available() to address this. Fixes: 9654153158d3 ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/radix: Update command line parsing for disable_radixAneesh Kumar K.V
kernel parameter disable_radix takes different options disable_radix=yes|no|1|0 or just disable_radix. prom_init parsing is not supporting these options. Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/radix: Parse disable_radix commandline correctly.Aneesh Kumar K.V
kernel parameter disable_radix takes different options disable_radix=yes|no|1|0 or just disable_radix. When using the later format we get below error. `Malformed early option 'disable_radix'` Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlbAneesh Kumar K.V
With 64k page size, we have hugetlb pte entries at the pmd and pud level for book3s64. We don't need to create a separate page table cache for that. With 4k we need to make sure hugepd page table cache for 16M is placed at PUD level and 16G at the PGD level. Simplify all these by not using HUGEPD_PD_SHIFT which is confusing for book3s64. Without this patch, with 64k page size we create pagetable caches with shift value 10 and 7 which are not used at all. Fixes: 419df06eea5b ("powerpc: Reduce the PTE_INDEX_SIZE") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/radix: Update pte fragment count from 16 to 256 on radixAneesh Kumar K.V
With split PTL (page table lock) config, we allocate the level 4 (leaf) page table using pte fragment framework instead of slab cache like other levels. This was done to enable us to have split page table lock at the level 4 of the page table. We use page->plt backing the all the level 4 pte fragment for the lock. Currently with Radix, we use only 16 fragments out of the allocated page. In radix each fragment is 256 bytes which means we use only 4k out of the allocated 64K page wasting 60k of the allocated memory. This was done earlier to keep it closer to hash. This patch update the pte fragment count to 256, thereby using the full 64K page and reducing the memory usage. Performance tests shows really low impact even with THP disabled. With THP disabled we will be contenting further less on level 4 ptl and hence the impact should be further low. 256 threads: without patch (10 runs of ./ebizzy -m -n 1000 -s 131072 -S 100) median = 15678.5 stdev = 42.1209 with patch: median = 15354 stdev = 194.743 This is with THP disabled. With THP enabled the impact of the patch will be less. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/mm/keys: Update documentation and remove unnecessary checkAneesh Kumar K.V
Adds more code comments. We also remove an unnecessary pkey check after we check for pkey error in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overheadNicholas Piggin
When stop is executed with EC=ESL=0, it appears to execute like a normal instruction (resuming from NIP when woken by interrupt). So all the save/restore handling can be avoided completely. In particular NV GPRs do not have to be saved, and MSR does not have to be switched back to kernel MSR. So move the test for EC=ESL=0 sleep states out to power9_idle_stop, and return directly to the caller after stop in that case. This improves performance for ping-pong benchmark with the stop0_lite idle state by 2.54% for 2 threads in the same core, and 2.57% for different cores. Performance increase with HV_POSSIBLE defined will be improved further by avoiding the hwsync. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-04powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()Michael Ellerman
Commit 3d4fbffdd703 ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug") that added power9_offline_stop() was written before commit 7672691a08c8 ("powerpc/powernv: Provide a way to force a core into SMT4 mode"). When merging the former I failed to notice that it caused us to skip the force-SMT4 logic for offline CPUs. The result is that offlined CPUs will not correctly participate in the force-SMT4 logic, which presumably will result in badness (not tested). Reconcile the two commits by making power9_offline_stop() a pre-cursor to power9_idle_stop(), so that they share the force-SMT4 logic. This is based on an original commit from Nick, all breakage is my own. Fixes: 3d4fbffdd703 ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2018-04-03Merge tag 'kbuild-v4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - add a shell script to get Clang version - improve portability of build scripts - drop always-enabled CONFIG_THIN_ARCHIVE and remove unused code - rename built-in.o which is now thin archive to built-in.a - process clean/build targets one by one to get along with -j option - simplify ld-option - improve building with CONFIG_TRIM_UNUSED_KSYMS - define KBUILD_MODNAME even for objects shared among multiple modules - avoid linking multiple instances of same objects from composite objects - move <linux/compiler_types.h> to c_flags to include it only for C files - clean-up various Makefiles * tag 'kbuild-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits) kbuild: get <linux/compiler_types.h> out of <linux/kconfig.h> kbuild: clean up link rule of composite modules kbuild: clean up archive rule of built-in.a kbuild: remove partial section mismatch detection for built-in.a net: liquidio: clean up Makefile for simpler composite object handling lib: zstd: clean up Makefile for simpler composite object handling kbuild: link $(real-obj-y) instead of $(obj-y) into built-in.a kbuild: rename real-objs-y/m to real-obj-y/m kbuild: move modname and modname-multi close to modname_flags kbuild: simplify modname calculation kbuild: fix modname for composite modules kbuild: define KBUILD_MODNAME even if multiple modules share objects kbuild: remove unnecessary $(subst $(obj)/, , ...) in modname-multi kbuild: Use ls(1) instead of stat(1) to obtain file size kbuild: link vmlinux only once for CONFIG_TRIM_UNUSED_KSYMS kbuild: move include/config/ksym/* to include/ksym/* kbuild: move CONFIG_TRIM_UNUSED_KSYMS code unneeded for external module kbuild: restore autoksyms.h touch to the top Makefile kbuild: move 'scripts' target below kbuild: remove wrong 'touch' in adjust_autoksyms.sh ...
2018-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds
Pull sparc updates from David Miller: 1) Add support for ADI (Application Data Integrity) found in more recent sparc64 cpus. Essentially this is keyed based access to virtual memory, and if the key encoded in the virual address is wrong you get a trap. The mm changes were reviewed by Andrew Morton and others. Work by Khalid Aziz. 2) Validate DAX completion index range properly, from Rob Gardner. 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Make atomic_xchg() an inline function rather than a macro. sparc64: Properly range check DAX completion index sparc: Make auxiliary vectors for ADI available on 32-bit as well sparc64: Oracle DAX driver depends on SPARC64 sparc64: Update signal delivery to use new helper functions sparc64: Add support for ADI (Application Data Integrity) mm: Allow arch code to override copy_highpage() mm: Clear arch specific VM flags on protection change mm: Add address parameter to arch_validate_prot() sparc64: Add auxiliary vectors to report platform ADI properties sparc64: Add handler for "Memory Corruption Detected" trap sparc64: Add HV fault type handlers for ADI related faults sparc64: Add support for ADI register fields, ASIs and traps mm, swap: Add infrastructure for saving page metadata on swap signals, sparc: Add signal codes for ADI violations
2018-04-03powerpc/powernv: Always stop secondaries before reboot/shutdownNicholas Piggin
Currently powernv reboot and shutdown requests just leave secondaries to do their own things. This is undesirable because they can trigger any number of watchdogs while waiting for reboot, but also we don't know what else they might be doing -- they might be causing trouble, trampling memory, etc. The opal scheduled flash update code already ran into watchdog problems due to flashing taking a long time, and it was fixed with 2196c6f1ed ("powerpc/powernv: Return secondary CPUs to firmware before FW update"), which returns secondaries to opal. It's been found that regular reboots can take over 10 seconds, which can result in the hard lockup watchdog firing, reboot: Restarting system [ 360.038896709,5] OPAL: Reboot request... Watchdog CPU:0 Hard LOCKUP Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 Watchdog CPU:16 Hard LOCKUP watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] This patch removes the special case for flash update, and calls smp_send_stop in all cases before calling reboot/shutdown. smp_send_stop could return CPUs to OPAL, the main reason not to is that the request could come from a NMI that interrupts OPAL code, so re-entry to OPAL can cause a number of problems. Putting secondaries into simple spin loops improves the chances of a successful reboot. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: hard disable irqs in smp_send_stop loopNicholas Piggin
The hard lockup watchdog can fire under local_irq_disable on platforms with irq soft masking. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: use NMI IPI for smp_send_stopNicholas Piggin
Use the NMI IPI rather than smp_call_function for smp_send_stop. Have stopped CPUs hard disable interrupts rather than just soft disable. This function is used in crash/panic/shutdown paths to bring other CPUs down as quickly and reliably as possible, and minimizing their potential to cause trouble. Avoiding the Linux smp_call_function infrastructure and (if supported) using true NMI IPIs makes this more robust. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc/powernv: Fix SMT4 forcing idle codeNicholas Piggin
The PSSCR value is not stored to PACA_REQ_PSSCR if the CPU does not have the XER[SO] bug. Fix this by storing up-front, outside the workaround code. The initial test is not required because it is a slow path. The workaround is made to depend on CONFIG_KVM_BOOK3S_HV_POSSIBLE, to match pnv_power9_force_smt4_catch() where it is used. Drop the comment on pnv_power9_force_smt4_catch() as it's no longer true. Fixes: 7672691a08c8 ("powerpc/powernv: Provide a way to force a core into SMT4 mode") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc/pseries: Restore default security feature flags on setupMauricio Faria de Oliveira
After migration the security feature flags might have changed (e.g., destination system with unpatched firmware), but some flags are not set/clear again in init_cpu_char_feature_flags() because it assumes the security flags to be the defaults. Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then init_cpu_char_feature_flags() does not run again, which potentially might leave the system in an insecure or sub-optimal configuration. So, just restore the security feature flags to the defaults assumed by init_cpu_char_feature_flags() so it can set/clear them correctly, and to ensure safe settings are in place in case the hypercall fail. Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags") Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags") Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: Move default security feature flagsMauricio Faria de Oliveira
This moves the definition of the default security feature flags (i.e., enabled by default) closer to the security feature flags. This can be used to restore current flags to the default flags. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: Don't write to DABR on >= Power8 if DAWR is disabledNicholas Piggin
flush_thread() calls __set_breakpoint() via set_debug_reg_defaults() without checking ppc_breakpoint_available(). On Power8 or later CPUs which have the DAWR feature disabled that will cause a write to the DABR which is incorrect as those CPUs don't have a DABR. Fix it two ways, by checking ppc_breakpoint_available() in set_debug_reg_defaults(), and also by reworking __set_breakpoint() to only write to DABR on Power7 or earlier. Fixes: 9654153158d3 ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rework the logic in __set_breakpoint()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03KVM: PPC: Book3S HV: Fix ppc_breakpoint_available compile errorNicholas Piggin
arch/powerpc/kvm/book3s_hv.c: In function ‘kvmppc_h_set_mode’: arch/powerpc/kvm/book3s_hv.c:745:8: error: implicit declaration of function ‘ppc_breakpoint_available’ if (!ppc_breakpoint_available()) ^~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 398e712c007f ("KVM: PPC: Book3S HV: Return error from h_set_mode(SET_DAWR) on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc: Fix oops due to bad access of lppaca on bare metalAneesh Kumar K.V
Commit 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") removed allocation of lppaca on bare metal platforms. But with CONFIG_PPC_SPLPAR enabled, we still access the lppaca on bare metal in some code paths. Fix this but adding runtime checks for SPLPAR (shared processor LPAR). Fixes: 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-02Merge branch 'syscalls-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_*() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_*() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...
2018-04-02Merge branch 'x86-dma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 dma mapping updates from Ingo Molnar: "This tree, by Christoph Hellwig, switches over the x86 architecture to the generic dma-direct and swiotlb code, and also unifies more of the dma-direct code between architectures. The now unused x86-only primitives are removed" * 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrs swiotlb: Make swiotlb_{alloc,free}_buffer depend on CONFIG_DMA_DIRECT_OPS dma/swiotlb: Remove swiotlb_{alloc,free}_coherent() dma/direct: Handle force decryption for DMA coherent buffers in common code dma/direct: Handle the memory encryption bit in common code dma/swiotlb: Remove swiotlb_set_mem_attributes() set_memory.h: Provide set_memory_{en,de}crypted() stubs x86/dma: Remove dma_alloc_coherent_gfp_flags() iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent() iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() x86/dma/amd_gart: Use dma_direct_{alloc,free}() x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA x86/dma: Use generic swiotlb_ops x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y) x86/dma: Remove dma_alloc_coherent_mask()
2018-04-02mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()Dominik Brodowski
Using this helper allows us to avoid the in-kernel calls to the sys_readahead() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_readahead(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()Dominik Brodowski
Using this helper allows us to avoid the in-kernel calls to the sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_mmap_pgoff(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()Dominik Brodowski
Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as ksys_fadvise64_64(). Some compat stubs called sys_fadvise64(), which then just passed through the arguments to sys_fadvise64_64(). Get rid of this indirection, and call ksys_fadvise64_64() directly. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()Dominik Brodowski
Using the ksys_fallocate() wrapper allows us to get rid of in-kernel calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_fallocate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscallsDominik Brodowski
Using the ksys_p{read,write}64() wrappers allows us to get rid of in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_p{read,write}64(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()Dominik Brodowski
Using the ksys_truncate() wrapper allows us to get rid of in-kernel calls to the sys_truncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_truncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscallDominik Brodowski
Using this helper allows us to avoid the in-kernel calls to the sys_sync_file_range() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_sync_file_range(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate()Dominik Brodowski
Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_ftruncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - Modernize the kprobe and uprobe creation/destruction tooling ABIs: The existing text based APIs (kprobe_events and uprobe_events in tracefs), are naive, limited ABIs in that they require user-space to clean up after themselves, which is both difficult and fragile if the tool is buggy or exits unexpectedly. In other words they are not really suited for modern, robust tooling. So introduce a modern, file descriptor based ABI that does not have these limitations: introduce the 'perf_kprobe' and 'perf_uprobe' PMUs and extend the perf_event_open() syscall to create events with a kprobe/uprobe attached to them. These [k,u]probe are associated with this file descriptor, so they are not available in tracefs. (Song Liu) - Intel Cannon Lake CPU support (Harry Pan) - Intel PT cleanups (Alexander Shishkin) - Improve the performance of pinned/flexible event groups by using RB trees (Alexey Budankov) - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification of hardware breakpoints, which new ABI variant massively speeds up existing tooling that uses hardware breakpoints to instrument (and debug) memory usage. (Milind Chabbi, Jiri Olsa) - Various Intel PEBS handling fixes and improvements, and other Intel PMU improvements (Kan Liang) - Various perf core improvements and optimizations (Peter Zijlstra) - ... misc cleanups, fixes and updates. There's over 200 tooling commits, here's an (imperfect) list of highlights: - 'perf annotate' improvements: * Recognize and handle jumps to other functions as calls, which improves the navigation along jumps and back. (Arnaldo Carvalho de Melo) * Add the 'P' hotkey in TUI annotation to dump annotation output into a file, to ease e-mail reporting of annotation details. (Arnaldo Carvalho de Melo) * Add an IPC/cycles column to the TUI (Jin Yao) * Improve s390 assembly annotation (Thomas Richter) * Refactor the output formatting logic to better separate it into interactive and non-interactive features and add the --stdio2 output variant to demonstrate this. (Arnaldo Carvalho de Melo) - 'perf script' improvements: * Add Python 3 support (Jaroslav Škarvada) * Add --show-round-event (Jiri Olsa) - 'perf c2c' improvements: * Add NUMA analysis support (Jiri Olsa) - 'perf trace' improvements: * Improve PowerPC support (Ravi Bangoria) - 'perf inject' improvements: * Integrate ARM CoreSight traces (Robert Walker) - 'perf stat' improvements: * Add the --interval-count option (yuzhoujian) * Add the --timeout option (yuzhoujian) - 'perf sched' improvements (Changbin Du) - Vendor events improvements : * Add IBM s390 vendor events (Thomas Richter) * Add and improve arm64 vendor events (John Garry, Ganapatrao Kulkarni) * Update POWER9 vendor events (Sukadev Bhattiprolu) - Intel PT tooling improvements (Adrian Hunter) - PMU handling improvements (Agustin Vega-Frias) - Record machine topology in perf.data (Jiri Olsa) - Various overwrite related cleanups (Kan Liang) - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet) - ... and lots of other changes, cleanups and fixes, see the shortlog and Git history for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits) perf/x86/intel: Enable C-state residency events for Cannon Lake perf/x86/intel: Add Cannon Lake support for RAPL profiling perf/x86/pt, coresight: Clean up address filter structure perf vendor events s390: Add JSON files for IBM z14 perf vendor events s390: Add JSON files for IBM z13 perf vendor events s390: Add JSON files for IBM zEC12 zBC12 perf vendor events s390: Add JSON files for IBM z196 perf vendor events s390: Add JSON files for IBM z10EC z10BC perf mmap: Be consistent when checking for an unmaped ring buffer perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done() perf build: Fix check-headers.sh opts assignment perf/x86: Update rdpmc_always_available static key to the modern API perf annotate: Use absolute addresses to calculate jump target offsets perf annotate: Defer searching for comma in raw line till it is needed perf annotate: Support jumping from one function to another perf annotate: Add "_local" to jump/offset validation routines perf python: Reference Py_None before returning it perf annotate: Mark jumps to outher functions with the call arrow perf annotate: Pass function descriptor to its instruction parsing routines perf annotate: No need to calculate notes->start twice ...
2018-04-01powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXTMathieu Malaterre
In commit 9690c1574268 ("powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT") an issue was discovered where `mm->context.id` was being truncated to an `unsigned int`, while the PID is actually an `unsigned long`. Update the earlier patch by fixing one remaining occurrence. Discovered during a compilation with W=1: arch/powerpc/mm/tlb-radix.c:702:19: error: comparison is always false due to limited range of data type [-Werror=type-limits] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAPMatt Evans
When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the user context when single_step_exception() prepares the SIGTRAP delivery. The resulting branch-trap-within-the-SIGTRAP-handler isn't healthy. Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call to clear_single_step() which only clears MSR_SE. This patch adds a new helper, clear_br_trace(), which clears the debug trap before invoking the signal handler. This helper is a NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Add POWER9 CPU type selectionNicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Refine feature sets for little endian buildsNicholas Piggin
This reduces vmlinux text size by 1kB and data by 1.5kB with a small build! Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add the recently added CPU_FTRS_POWER9_DD2_2 to the little endian possible mask as noticed by Nick.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64: Add GENERIC_CPU support for little endianNicholas Piggin
Add GENERIC_CPU support for little-endian rather than using POWER8 specific selection for POWER9 and above. Restrict GENERIC_CPU to POWER8 and above on little endian. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Duplicate GENERIC_CPU to avoid a kbuild warning about the prompt being redefined. Spell out that GENERIC means >= POWER4 for BE.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Remove POWER4 supportNicholas Piggin
POWER4 has been broken since at least the change 49d09bf2a6 ("powerpc/64s: Optimise MSR handling in exception handling"), which requires mtmsrd L=1 support. This was introduced in ISA v2.01, and POWER4 supports ISA v2.00. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc: Remove unused CPU_FTR_ARCH_201Nicholas Piggin
The last usage was removed in c17b98cf6028 ("KVM: PPC: Book3S HV: Remove code for PPC970 processors") (Dec 2014). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU featuresNicholas Piggin
The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and above (which is what the cputable setup does). Fix DT CPU features quirk setup to match. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Merge with upstream changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Set assembler machine type to POWER4Nicholas Piggin
Rather than override the machine type in .S code (which can hide wrong or ambiguous code generation for the target), set the type to power4 for all assembly. This also means we need to be careful not to build power4-only code when we're not building for Book3S, such as the "power7" versions of copyuser/page/memcpy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Explicitly add vector features to CPU_FTRS_POSSIBLENicholas Piggin
ALTIVEC and VSX features are not added by to default to the POWERx CPU feature sets because they are intended to be enabled by firmware. Currently they end up in CPU_FTRS_POSSIBLE due to their inclusion in other the set for other CPUs, eg. PPC970. But they should be added individually to the CPU_FTRS_POSSIBLE set, because if we reduce the set of CPUs that are built-for they may disappear from the possible mask. It already contains CPU_FTR_VSX, so add ALTIVEC. The _COMP features should be used because they won't be present if compiled out. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add detail to change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Add all POWER9 features to CPU_FTRS_ALWAYSNicholas Piggin
It's not a bug to have features missing in CPU_FTR_ALWAYS, but it is a missed opportunity for optimisation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/boot: Remove duplicate typedefs from libfdt_env.hMark Greer
When building a uImage or zImage using ppc6xx_defconfig and some other defconfigs, the following error occurs with GCC 4.5.1: /arch/powerpc/boot/libfdt_env.h:10:13: error: redefinition of typedef 'uint32_t' /arch/powerpc/boot/types.h:21:13: note: previous declaration of 'uint32_t' was here /arch/powerpc/boot/libfdt_env.h:11:13: error: redefinition of typedef 'uint64_t' /arch/powerpc/boot/types.h:22:13: note: previous declaration of 'uint64_t' was here The problem is that commit 656ad58ef19e (powerpc/boot: Add OPAL console to epapr wrappers) adds typedefs for uint32_t and uint64_t to type.h but doesn't remove the pre-existing (and now duplicate) typedefs from libfdt_env.h. Fix the error by removing the duplicate typedefs from libfdt_env.h Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s/idle: avoid sync for KVM state when waking from idleNicholas Piggin
When waking from a CPU idle instruction (e.g., nap or stop), the sync for ordering the KVM secondary thread state can be avoided if there wakeup is coming from a kernel context rather than KVM context. This improves performance for ping-pong benchmark with the stop0 idle state by 0.46% for 2 threads in the same core, and 1.02% for different cores. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplugNicholas Piggin
Implement a new function to invoke stop, power9_offline_stop, which is like power9_idle_stop but used by the cpu hotplug code. Move KVM secondary state manipulation code to the offline case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: sreset panic if there is no debugger or crash dump handlersNicholas Piggin
system_reset_exception does most of its own crash handling now, invoking the debugger or crash dumps if they are registered. If not, then it goes through to die() to print stack traces, and then is supposed to panic (according to comments). However after die() prints oopses, it does its own handling which doesn't allow system_reset_exception to panic (e.g., it may just kill the current process). This patch causes sreset exceptions to return from die after it prints messages but before acting. This also stops die from invoking the debugger on 0x100 crashes. system_reset_exception similarly calls the debugger. It had been thought this was harmless (because if the debugger was disabled, neither call would fire, and if it was enabled the first call would return). However in some cases like xmon 'X' command, the debugger returns 0, which currently causes it to be entered again (first in system_reset_exception, then in die), which is confusing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>