summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2017-11-13Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Yet another big pile of changes: - More year 2038 work from Arnd slowly reaching the point where we need to think about the syscalls themself. - A new timer function which allows to conditionally (re)arm a timer only when it's either not running or the new expiry time is sooner than the armed expiry time. This allows to use a single timer for multiple timeout requirements w/o caring about the first expiry time at the call site. - A new NMI safe accessor to clock real time for the printk timestamp work. Can be used by tracing, perf as well if required. - A large number of timer setup conversions from Kees which got collected here because either maintainers requested so or they simply got ignored. As Kees pointed out already there are a few trivial merge conflicts and some redundant commits which was unavoidable due to the size of this conversion effort. - Avoid a redundant iteration in the timer wheel softirq processing. - Provide a mechanism to treat RTC implementations depending on their hardware properties, i.e. don't inflict the write at the 0.5 seconds boundary which originates from the PC CMOS RTC to all RTCs. No functional change as drivers need to be updated separately. - The usual small updates to core code clocksource drivers. Nothing really exciting" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits) timers: Add a function to start/reduce a timer pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() timer: Prepare to change all DEFINE_TIMER() callbacks netfilter: ipvs: Convert timers to use timer_setup() scsi: qla2xxx: Convert timers to use timer_setup() block/aoe: discover_timer: Convert timers to use timer_setup() ide: Convert timers to use timer_setup() drbd: Convert timers to use timer_setup() mailbox: Convert timers to use timer_setup() crypto: Convert timers to use timer_setup() drivers/pcmcia: omap1: Fix error in automated timer conversion ARM: footbridge: Fix typo in timer conversion drivers/sgi-xp: Convert timers to use timer_setup() drivers/pcmcia: Convert timers to use timer_setup() drivers/memstick: Convert timers to use timer_setup() drivers/macintosh: Convert timers to use timer_setup() hwrng/xgene-rng: Convert timers to use timer_setup() auxdisplay: Convert timers to use timer_setup() sparc/led: Convert timers to use timer_setup() mips: ip22/32: Convert timers to use timer_setup() ...
2017-11-13Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Ingo Molnar: "Note that in this cycle most of the x86 topics interacted at a level that caused them to be merged into tip:x86/asm - but this should be a temporary phenomenon, hopefully we'll back to the usual patterns in the next merge window. The main changes in this cycle were: Hardware enablement: - Add support for the Intel UMIP (User Mode Instruction Prevention) CPU feature. This is a security feature that disables certain instructions such as SGDT, SLDT, SIDT, SMSW and STR. (Ricardo Neri) [ Note that this is disabled by default for now, there are some smaller enhancements in the pipeline that I'll follow up with in the next 1-2 days, which allows this to be enabled by default.] - Add support for the AMD SEV (Secure Encrypted Virtualization) CPU feature, on top of SME (Secure Memory Encryption) support that was added in v4.14. (Tom Lendacky, Brijesh Singh) - Enable new SSE/AVX/AVX512 CPU features: AVX512_VBMI2, GFNI, VAES, VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG. (Gayatri Kammela) Other changes: - A big series of entry code simplifications and enhancements (Andy Lutomirski) - Make the ORC unwinder default on x86 and various objtool enhancements. (Josh Poimboeuf) - 5-level paging enhancements (Kirill A. Shutemov) - Micro-optimize the entry code a bit (Borislav Petkov) - Improve the handling of interdependent CPU features in the early FPU init code (Andi Kleen) - Build system enhancements (Changbin Du, Masahiro Yamada) - ... plus misc enhancements, fixes and cleanups" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits) x86/build: Make the boot image generation less verbose selftests/x86: Add tests for the STR and SLDT instructions selftests/x86: Add tests for User-Mode Instruction Prevention x86/traps: Fix up general protection faults caused by UMIP x86/umip: Enable User-Mode Instruction Prevention at runtime x86/umip: Force a page fault when unable to copy emulated result to user x86/umip: Add emulation code for UMIP instructions x86/cpufeature: Add User-Mode Instruction Prevention definitions x86/insn-eval: Add support to resolve 16-bit address encodings x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode x86/insn-eval: Add wrapper function for 32 and 64-bit addresses x86/insn-eval: Add support to resolve 32-bit address encodings x86/insn-eval: Compute linear address in several utility functions resource: Fix resource_size.cocci warnings X86/KVM: Clear encryption attribute when SEV is active X86/KVM: Decrypt shared per-cpu variables when SEV is active percpu: Introduce DEFINE_PER_CPU_DECRYPTED x86: Add support for changing memory encryption attribute in early boot x86/io: Unroll string I/O when SEV is active x86/boot: Add early boot support when running with SEV active ...
2017-11-07resource: Provide resource struct in resource walk callbackTom Lendacky
In preperation for a new function that will need additional resource information during the resource walk, update the resource walk callback to pass the resource structure. Since the current callback start and end arguments are pulled from the resource structure, the callback functions can obtain them from the resource structure directly. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: kvm@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lkml.kernel.org/r/20171020143059.3291-10-brijesh.singh@amd.com
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-03Merge tag 'powerpc-4.14-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 4.14. This is bigger than I like to send at rc7, but that's at least partly because I didn't send any fixes last week. If it wasn't for the IMC driver, which is new and getting heavy testing, the diffstat would look a bit better. I've also added ftrace on big endian to my test suite, so we shouldn't break that again in future. - A fix to the handling of misaligned paste instructions (P9 only), where a change to a #define has caused the check for the instruction to always fail. - The preempt handling was unbalanced in the radix THP flush (P9 only). Though we don't generally use preempt we want to keep it working as much as possible. - Two fixes for IMC (P9 only), one when booting with restricted number of CPUs and one in the error handling when initialisation fails due to firmware etc. - A revert to fix function_graph on big endian machines, and then a rework of the reverted patch to fix kprobes blacklist handling on big endian machines. Thanks to: Anju T Sudhakar, Guilherme G. Piccoli, Madhavan Srinivasan, Naveen N. Rao, Nicholas Piggin, Paul Mackerras" * tag 'powerpc-4.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/perf: Fix core-imc hotplug callback failure during imc initialization powerpc/kprobes: Dereference function pointers only if the address does not belong to kernel text Revert "powerpc64/elfv1: Only dereference function descriptor for non-text symbols" powerpc/64s/radix: Fix preempt imbalance in TLB flush powerpc: Fix check for copy/paste instructions in alignment handler powerpc/perf: Fix IMC allocation routine
2017-11-02powerpc/watchdog: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01powerpc/kprobes: Dereference function pointers only if the address does not ↵Naveen N. Rao
belong to kernel text This makes the changes introduced in commit 83e840c770f2c5 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") to be specific to the kprobe subsystem. We previously changed ppc_function_entry() to always check the provided address to confirm if it needed to be dereferenced. This is actually only an issue for kprobe blacklisted asm labels (through use of _ASM_NOKPROBE_SYMBOL) and can cause other issues with ftrace. Also, the additional checks are not really necessary for our other uses. As such, move this check to the kprobes subsystem. Fixes: 83e840c770f2 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-25powerpc: Fix check for copy/paste instructions in alignment handlerPaul Mackerras
Commit 07d2a628bc00 ("powerpc/64s: Avoid cpabort in context switch when possible", 2017-06-09) changed the definition of PPC_INST_COPY and in so doing inadvertently broke the check for copy/paste instructions in the alignment fault handler. The check currently matches no instructions. This fixes it by ANDing both sides of the comparison with the mask. Fixes: 07d2a628bc00 ("powerpc/64s: Avoid cpabort in context switch when possible") Cc: stable@vger.kernel.org # v4.13+ Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-24Merge tag 'v4.14-rc6' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18locking/arch, powerpc/rtas: Use arch_spin_lock() instead of ↵Will Deacon
arch_spin_lock_flags() arch_spin_lock_flags() is an internal part of the spinlock implementation and is no longer available when SMP=n and DEBUG_SPINLOCK=y, so the PPC RTAS code fails to compile in this configuration: arch/powerpc/kernel/rtas.c: In function 'lock_rtas': >> arch/powerpc/kernel/rtas.c:81:2: error: implicit declaration of function 'arch_spin_lock_flags' [-Werror=implicit-function-declaration] arch_spin_lock_flags(&rtas.lock, flags); ^~~~~~~~~~~~~~~~~~~~ Since there's no good reason to use arch_spin_lock_flags() here (the code in question already calls local_irq_save(flags)), switch it over to arch_spin_lock and get things building again. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508327469-20231-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-13Merge tag 'powerpc-4.14-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A fix for a bad bug (written by me) in our livepatch handler. Removal of an over-zealous lockdep_assert_cpus_held() in our topology code. A fix to the recently added emulation of cntlz[wd]. And three small fixes to the recently added IMC PMU driver. Thanks to: Anju T Sudhakar, Balbir Singh, Kamalesh Babulal, Naveen N. Rao, Sandipan Das, Santosh Sivaraj, Thiago Jung Bauermann" * tag 'powerpc-4.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/perf: Fix IMC initialization crash powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node() powerpc/perf: Fix for core/nest imc call trace on cpuhotplug powerpc: Don't call lockdep_assert_cpus_held() from arch_update_cpu_topology() powerpc/lib/sstep: Fix count leading zeros instructions powerpc/livepatch: Fix livepatch stack access
2017-10-10powerpc/livepatch: Fix livepatch stack accessKamalesh Babulal
While running stress test with livepatch module loaded, kernel bug was triggered. cpu 0x5: Vector: 400 (Instruction Access) at [c0000000eb9d3b60] 5:mon> t [c0000000eb9d3de0] c0000000eb9d3e30 (unreliable) [c0000000eb9d3e30] c000000000008ab4 hardware_interrupt_common+0x114/0x120 --- Exception: 501 (Hardware Interrupt) at c000000000053040 livepatch_handler+0x4c/0x74 [c0000000eb9d4120] 0000000057ac6e9d (unreliable) [d0000000089d9f78] 2e0965747962382e SP (965747962342e09) is in userspace When an interrupt occurs during the livepatch_handler execution, it's possible for the livepatch_stack and/or thread_info to be corrupted. eg: Task A Interrupt Handler ========= ================= livepatch_handler: mr r0, r1 ld r1, TI_livepatch_sp(r12) hardware_interrupt_common: do_IRQ+0x8: mflr r0 <- saved stack pointer is overwritten bl _mcount ... std r27,-40(r1) <- overwrite of thread_info() lis r2, STACK_END_MAGIC@h ori r2, r2, STACK_END_MAGIC@l ld r12, -8(r1) Fix the corruption by using r11 register for livepatch stack manipulation, instead of shuffling task stack and livepatch stack into r1 register. Using r11 register also avoids disabling/enabling irq's while setting up the livepatch stack. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-09Merge branch 'ppc-bundle' (bundle from Michael Ellerman)Linus Torvalds
Merge powerpc transactional memory fixes from Michael Ellerman: "I figured I'd still send you the commits using a bundle to make sure it works in case I need to do it again in future" This fixes transactional memory state restore for powerpc. * bundle'd patches from Michael Ellerman: powerpc/tm: Fix illegal TM state in signal handler powerpc/64s: Use emergency stack for kernel TM Bad Thing program checks
2017-10-06Merge tag 'powerpc-4.14-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Nine small fixes, really nothing that stands out. A work-around for a spurious MCE on Power9. A CXL fault handling fix, some fixes to the new XIVE code, and a fix to the new 32-bit STRICT_KERNEL_RWX code. Fixes for old code/stable: an fix to an incorrect TLB flush on boot but not on any current machines, a compile error on 4xx and a fix to memory hotplug when using radix (Power9). Thanks to: Anton Blanchard, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Guenter Roeck, Jeremy Kerr, Michael Neuling, Nicholas Piggin" * tag 'powerpc-4.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Increase memory block size to 1GB on radix powerpc/mm: Call flush_tlb_kernel_range with interrupts enabled powerpc/xive: Clear XIVE internal structures when a CPU is removed powerpc/xive: Fix IPI reset powerpc/4xx: Fix compile error with 64K pages on 40x, 44x powerpc: Fix action argument for cpufeatures-based TLB flush cxl: Fix memory page not handled powerpc: Fix workaround for spurious MCE on POWER9 powerpc: Handle MCE on POWER9 with only DSISR bit 30 set
2017-10-06Merge branch 'core-watchdog-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull watchddog clean-up and fixes from Thomas Gleixner: "The watchdog (hard/softlockup detector) code is pretty much broken in its current state. The patch series addresses this by removing all duct tape and refactoring it into a workable state. The reasons why I ask for inclusion that late in the cycle are: 1) The code causes lockdep splats vs. hotplug locking which get reported over and over. Unfortunately there is no easy fix. 2) The risk of breakage is minimal because it's already broken 3) As 4.14 is a long term stable kernel, I prefer to have working watchdog code in that and the lockdep issues resolved. I wouldn't ask you to pull if 4.14 wouldn't be a LTS kernel or if the solution would be easy to backport. 4) The series was around before the merge window opened, but then got delayed due to the UP failure caused by the for_each_cpu() surprise which we discussed recently. Changes vs. V1: - Addressed your review points - Addressed the warning in the powerpc code which was discovered late - Changed two function names which made sense up to a certain point in the series. Now they match what they do in the end. - Fixed a 'unused variable' warning, which got not detected by the intel robot. I triggered it when trying all possible related config combinations manually. Randconfig testing seems not random enough. The changes have been tested by and reviewed by Don Zickus and tested and acked by Micheal Ellerman for powerpc" * 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) watchdog/core: Put softlockup_threads_initialized under ifdef guard watchdog/core: Rename some softlockup_* functions powerpc/watchdog: Make use of watchdog_nmi_probe() watchdog/core, powerpc: Lock cpus across reconfiguration watchdog/core, powerpc: Replace watchdog_nmi_reconfigure() watchdog/hardlockup/perf: Fix spelling mistake: "permanetely" -> "permanently" watchdog/hardlockup/perf: Cure UP damage watchdog/hardlockup: Clean up hotplug locking mess watchdog/hardlockup/perf: Simplify deferred event destroy watchdog/hardlockup/perf: Use new perf CPU enable mechanism watchdog/hardlockup/perf: Implement CPU enable replacement watchdog/hardlockup/perf: Implement init time detection of perf watchdog/hardlockup/perf: Implement init time perf validation watchdog/core: Get rid of the racy update loop watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage watchdog/sysctl: Clean up sysctl variable name space watchdog/sysctl: Get rid of the #ifdeffery watchdog/core: Clean up header mess watchdog/core: Further simplify sysctl handling watchdog/core: Get rid of the thread teardown/setup dance ...
2017-10-06powerpc/tm: Fix illegal TM state in signal handlerGustavo Romero
Currently it's possible that on returning from the signal handler through the restore_tm_sigcontexts() code path (e.g. from a signal caught due to a `trap` instruction executed in the middle of an HTM block, or a deliberately constructed sigframe) an illegal TM state (like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets implicitly the MSR register from SRR1 register on return to userspace it causes a TM Bad Thing exception. That illegal state can be set (a) by a malicious user that disables the TM bit by tweaking the bits in uc_mcontext before returning from the signal handler or (b) by a sufficient number of context switches occurring such that the load_tm counter overflows and TM is disabled whilst in the signal handler. This commit fixes the illegal TM state by ensuring that TM bit is always enabled before we return from restore_tm_sigcontexts(). A small comment correction is made as well. Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-06powerpc/64s: Use emergency stack for kernel TM Bad Thing program checksCyril Bur
When using transactional memory (TM), the CPU can be in one of six states as far as TM is concerned, encoded in the Machine State Register (MSR). Certain state transitions are illegal and if attempted trigger a "TM Bad Thing" type program check exception. If we ever hit one of these exceptions it's treated as a bug, ie. we oops, and kill the process and/or panic, depending on configuration. One case where we can trigger a TM Bad Thing, is when returning to userspace after a system call or interrupt, using RFID. When this happens the CPU first restores the user register state, in particular r1 (the stack pointer) and then attempts to update the MSR. However the MSR update is not allowed and so we take the program check with the user register state, but the kernel MSR. This tricks the exception entry code into thinking we have a bad kernel stack pointer, because the MSR says we're coming from the kernel, but r1 is pointing to userspace. To avoid this we instead always switch to the emergency stack if we take a TM Bad Thing from the kernel. That way none of the user register values are used, other than for printing in the oops message. This is the fix for CVE-2017-1000255. Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Rewrite change log & comments, tweak asm slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/watchdog: Make use of watchdog_nmi_probe()Thomas Gleixner
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu() on powerpc because it is called multiple times for the boot CPU. The first call is via: start_wd_on_cpu+0x80/0x2f0 watchdog_nmi_reconfigure+0x124/0x170 softlockup_reconfigure_threads+0x110/0x130 lockup_detector_init+0xbc/0xe0 kernel_init_freeable+0x18c/0x37c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0xbc And then again via the CPU hotplug registration: start_wd_on_cpu+0x80/0x2f0 cpuhp_invoke_callback+0x194/0x620 cpuhp_thread_fun+0x7c/0x1b0 smpboot_thread_fn+0x290/0x2a0 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0xbc This can be avoided by setting up the cpu hotplug state with nocalls and move the initialization to the watchdog_nmi_probe() function. That initializes the hotplug callbacks without invoking the callback and the following core initialization function then configures the watchdog for the online CPUs (in this case CPU0) via softlockup_reconfigure_threads(). Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org
2017-10-04watchdog/core, powerpc: Lock cpus across reconfigurationThomas Gleixner
Instead of dropping the cpu hotplug lock after stopping NMI watchdog and threads and reaquiring for restart, the code and the protection rules become more obvious when holding cpu hotplug lock across the full reconfiguration. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos
2017-10-04watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()Thomas Gleixner
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure() into two stages. One to stop the NMI and one to restart it after reconfiguration. That was done by adding a boolean 'run' argument to the code, which is functionally correct but not necessarily a piece of art. Replace it by two explicit functions: watchdog_nmi_stop() and watchdog_nmi_start(). Fixes: 6592ad2fcc8f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage") Requested-by: Linus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas 'Mopping up garbage' Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
2017-10-03powerpc/4xx: Fix compile error with 64K pages on 40x, 44xChristian Lamparter
The mmu context on the 40x, 44x does not define pte_frag entry. This causes gcc abort the compilation due to: setup-common.c: In function ‘setup_arch’: setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’ This patch fixes the issue by removing the pte_frag initialization in setup-common.c. This is possible, because the compiler will do the initialization, since the mm_context is a sub struct of init_mm. init_mm is declared in mm_types.h as external linkage. According to C99 6.2.4.3: An object whose identifier is declared with external linkage [...] has static storage duration. C99 defines in 6.7.8.10 that: If an object that has static storage duration is not initialized explicitly, then: - if it has pointer type, it is initialized to a null pointer Fixes: b1923caa6e64 ("powerpc: Merge 32-bit and 64-bit setup_arch()") Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-03powerpc: Fix action argument for cpufeatures-based TLB flushJeremy Kerr
Commit 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the cpufeatures code, specifying the number of sets to flush. However, these functions take an action argument, not a number of sets. This means we hit the BUG() in __flush_tlb_{206,300} when using cpufeatures-style configuration. This change passes TLB_INVAL_SCOPE_GLOBAL instead. Fixes: 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-29powerpc: Fix workaround for spurious MCE on POWER9Michael Neuling
In the recent commit d8bd9f3f0925 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set") I screwed up the bit number. It should be bit 25 (IBM bit 38). Fixes: d8bd9f3f0925 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-26powerpc: Handle MCE on POWER9 with only DSISR bit 30 setMichael Neuling
On POWER9 DD2.1 and below, it's possible for a paste instruction to cause a Machine Check Exception (MCE) where only DSISR bit 30 (IBM 33) is set. This will result in the MCE handler seeing an unknown event, which triggers linux to crash. We change this by detecting unknown events caused by load/stores in the MCE handler and marking them as handled so that we no longer crash. An MCE that occurs like this is spurious, so we don't need to do anything in terms of servicing it. If there is something that needs to be serviced, the CPU will raise the MCE again with the correct DSISR so that it can be serviced properly. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com Acked-by: Balbir Singh <bsingharora@gmail.com> [mpe: Expand comment with details from change log, use normal bit #s] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-21powerpc/eeh: Create PHB PEs after EEH is initializedBenjamin Herrenschmidt
Otherwise we end up not yet having computed the right diag data size on powernv where EEH initialization is delayed, thus causing memory corruption later on when calling OPAL. Fixes: 5cb1f8fdddb7 ("powerpc/powernv/pci: Dynamically allocate PHB diag data") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/kprobes: Update optprobes to use emulate_update_regs()Naveen N. Rao
Optprobes depended on an updated regs->nip from analyse_instr() to identify the location to branch back from the optprobes trampoline. However, since commit 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs"), analyse_instr() doesn't update the registers anymore. Due to this, we end up branching back from the optprobes trampoline to the same branch into the trampoline resulting in a loop. Fix this by calling out to emulate_update_regs() before using the nip. Additionally, explicitly compare the return value from analyse_instr() to 1, rather than just checking for !0 so as to guard against any future changes to analyse_instr() that may result in -1 being returned in more scenarios. Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20Merge branch 'next' of ↵Michael Ellerman
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixes Merge one commit from Scott which I missed while away.
2017-09-20powerpc/tm: Flush TM only if CPU has TM featureGustavo Romero
Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") added code to access TM SPRs in flush_tmregs_to_thread(). However flush_tmregs_to_thread() does not check if TM feature is available on CPU before trying to access TM SPRs in order to copy live state to thread structures. flush_tmregs_to_thread() is indeed guarded by CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on a CPU without TM feature available, thus rendering the execution of TM instructions that are treated by the CPU as illegal instructions. The fix is just to add proper checking in flush_tmregs_to_thread() if CPU has the TM feature before accessing any TM-specific resource, returning immediately if TM is no available on the CPU. Adding that checking in flush_tmregs_to_thread() instead of in places where it is called, like in vsr_get() and vsr_set(), is better because avoids the same problem cropping up elsewhere. Cc: stable@vger.kernel.org # v4.13+ Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-15Merge tag 'powerpc-4.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Just one fix, for the handling of alignment interrupts on dcbz instructions. Thanks to Paul Mackerras, Christian Zigotzky, Michal Sojka" * tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix handling of alignment interrupt on dcbz instruction
2017-09-15powerpc: Fix handling of alignment interrupt on dcbz instructionPaul Mackerras
This fixes the emulation of the dcbz instruction in the alignment interrupt handler. The error was that we were comparing just the instruction type field of op.type rather than the whole thing, and therefore the comparison "type != CACHEOP + DCBZ" was always true. Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Tested-by: Michal Sojka <sojkam1@fel.cvut.cz> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-14watchdog/hardlockup: Clean up hotplug locking messThomas Gleixner
All watchdog thread related functions are delegated to the smpboot thread infrastructure, which handles serialization against CPU hotplug correctly. The sysctl interface is completely decoupled from anything which requires CPU hotplug protection. No need to protect the sysctl writes against cpu hotplug anymore. Remove it and add the now required protection to the powerpc arch_nmi_watchdog implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170912194148.418497420@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stageThomas Gleixner
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure() need to be done in two steps. 1) Stop all NMIs 2) Read the new parameters and start NMIs Right now watchdog_nmi_reconfigure() is a combination of both. To allow a clean reconfiguration add a 'run' argument and split the functionality in powerpc. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14watchdog/core: Remove broken suspend/resume interfacesThomas Gleixner
This interface has several issues: - It's causing recursive locking of the hotplug lock. - It's complete overkill to teardown all threads and then recreate them The same can be achieved with the simple hardlockup_detector_perf_stop / restart() interfaces. The abuse from the busy looping poweroff() loop of PARISC has been solved as well. Remove the cruft. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Link: http://lkml.kernel.org/r/20170912194146.487537732@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13mm: treewide: remove GFP_TEMPORARY allocation flagMichal Hocko
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL | __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "Life has been busy and I have not gotten half as much done this round as I would have liked. I delayed it so that a minor conflict resolution with the mips tree could spend a little time in linux-next before I sent this pull request. This includes two long delayed user namespace changes from Kirill Tkhai. It also includes a very useful change from Serge Hallyn that allows the security capability attribute to be used inside of user namespaces. The practical effect of this is people can now untar tarballs and install rpms in user namespaces. It had been suggested to generalize this and encode some of the namespace information information in the xattr name. Upon close inspection that makes the things that should be hard easy and the things that should be easy more expensive. Then there is my bugfix/cleanup for signal injection that removes the magic encoding of the siginfo union member from the kernel internal si_code. The mips folks reported the case where I had used FPE_FIXME me is impossible so I have remove FPE_FIXME from mips, while at the same time including a return statement in that case to keep gcc from complaining about unitialized variables. I almost finished the work to get make copy_siginfo_to_user a trivial copy to user. The code is available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3 But I did not have time/energy to get the code posted and reviewed before the merge window opened. I was able to see that the security excuse for just copying fields that we know are initialized doesn't work in practice there are buggy initializations that don't initialize the proper fields in siginfo. So we still sometimes copy unitialized data to userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Introduce v3 namespaced file capabilities mips/signal: In force_fcr31_sig return in the impossible case signal: Remove kernel interal si_code magic fcntl: Don't use ambiguous SIG_POLL si_codes prctl: Allow local CAP_SYS_ADMIN changing exe_file security: Use user_namespace::level to avoid redundant iterations in cap_capable() userns,pidns: Verify the userns for new pid namespaces signal/testing: Don't look for __SI_FAULT in userspace signal/mips: Document a conflict with SI_USER with SIGFPE signal/sparc: Document a conflict with SI_USER with SIGFPE signal/ia64: Document a conflict with SI_USER with SIGFPE signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-08treewide: make "nr_cpu_ids" unsignedAlexey Dobriyan
First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-02powerpc/xive: add XIVE Exploitation Mode to CASCédric Le Goater
On POWER9, the Client Architecture Support (CAS) negotiation process determines whether the guest operates in XIVE Legacy compatibility or in XIVE exploitation mode. Now that we have initial guest support for the XIVE interrupt controller, let's inform the hypervisor what we can do. The platform advertises the XIVE Exploitation Mode support using the property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 : - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only - 0b10 XIVE legacy or exploitation mode The OS asks for XIVE Exploitation Mode support using the property "ibm,architecture-vec-5", byte 23 bits 0-1: - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01powerpc/iommu: Use permission-specific DEVICE_ATTR variantsJulia Lawall
Use DEVICE_ATTR_RW for read-write attributes. This simplifies the source code, improves readbility, and reduces the chance of inconsistencies. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01powerpc/eeh: Delete an error out of memory message at init timeMarkus Elfring
Omit an extra message for a memory allocation failure in eeh_dev_init(). This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [mpe: Do not drop the message that can happen at runtime and lead to an event not being handled] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01powerpc/32: remove a NOP from memset()Christophe Leroy
memset() is patched after initialisation to activate the optimised part which uses cache instructions. Today we have a 'b 2f' to skip the optimised patch, which then gets replaced by a NOP, implying a useless cycle consumption. As we have a 'bne 2f' just before, we could use that instruction for the live patching, hence removing the need to have a dedicated 'b 2f' to be replaced by a NOP. This patch changes the 'bne 2f' by a 'b 2f'. During init, that 'b 2f' is then replaced by 'bne 2f' Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01powerpc: Use instruction emulation infrastructure to handle alignment faultsPaul Mackerras
This replaces almost all of the instruction emulation code in fix_alignment() with calls to analyse_instr(), emulate_loadstore() and emulate_dcbz(). The only emulation code left is the SPE emulation code; analyse_instr() etc. do not handle SPE instructions at present. One result of this is that we can now handle alignment faults on all the new VSX load and store instructions that were added in POWER9. VSX loads/stores will take alignment faults for unaligned accesses to cache-inhibited memory. Another effect is that we no longer rely on the DAR and DSISR values set by the processor. With this, we now need to include the instruction emulation code unconditionally. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc: Fix DAR reporting when alignment handler faultsMichael Ellerman
Anton noticed that if we fault part way through emulating an unaligned instruction, we don't update the DAR to reflect that. The DAR value is eventually reported back to userspace as the address in the SEGV signal, and if userspace is using that value to demand fault then it can be confused by us not setting the value correctly. This patch is ugly as hell, but is intended to be the minimal fix and back ports easily. Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
2017-08-31powerpc/smp: Add Power9 scheduler topologyOliver O'Halloran
In previous generations of Power processors each core had a private L2 cache. The Power 9 processor has a slightly different design where the L2 cache is shared among pairs of cores rather than being completely private. Making the scheduler aware of this cache sharing allows the scheduler to make better migration decisions. For example, if two CPU heavy tasks share a core then one task can be migrated to the paired core to improve throughput. Under the existing three level topology the task could be migrated to any core on the same chip, while with the new topology it would be preferentially migrated to the paired core so it remains cache-hot. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/smp: Add cpu_l2_cache_mapOliver O'Halloran
We want to add an extra level to the CPU scheduler topology to account for cores which share a cache. To do this we need to build a cpumask for each CPU that indicates which CPUs share this cache to use as an input to the scheduler. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/smp: Rework CPU topology constructionOliver O'Halloran
The CPU scheduler topology is constructed from a number of per-cpu cpumasks which describe which sets of logical CPUs are related in some fashion. Current code that handles constructing these masks when CPUs are hot(un)plugged can be simplified a bit by exploiting the fact that the scheduler requires higher levels of the toplogy (e.g package level groupings) to be supersets of the lower levels (e.g. threas in a core). This patch reworks the cpumask construction to be simpler and easier to extend with extra topology levels. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Fix CONFIG_HOTPLUG_CPU=n build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/smp: Use cpu_to_chip_id() to find core siblingsOliver O'Halloran
When building the CPU scheduler topology the kernel uses the ibm,chipid property from the devicetree to group logical CPUs. Currently the DT search for this property is open-coded in smp.c and this functionality is a duplication of what's in cpu_to_chip_id() already. This patch removes the existing search in favor of that. It's worth mentioning that the semantics of the search are different in cpu_to_chip_id(). When there is no ibm,chipid in the CPUs node it will also search /cpus and / for the property, but this should not effect the output topology. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/asm: Convert .llong directives to .8byteTobin C. Harding
.llong is an undocumented PPC specific directive. The generic equivalent is .quad, but even better (because it's self describing) is .8byte. Convert all .llong directives to .8byte. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc: Squash lines for simple wrapper functionsMasahiro Yamada
Remove unneeded variables and assignments. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/kernel: Change retrieval of pci_dnBryant G. Ly
For a PCI device it's pci_dn can be retrieved from pdev->dev.archdata.firmware_data, PCI_DN(devnode), or parent's list. Thus, we should just use the existing function pci_get_pdn_by_devfn to get the pci_dn. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Reviewed-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>