summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2017-07-03Merge tag 'tty-4.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here is the large tty/serial patchset for 4.13-rc1. A lot of tty and serial driver updates are in here, along with some fixups for some __get/put_user usages that were reported. Nothing huge, just lots of development by a number of different developers, full details in the shortlog. All of these have been in linux-next for a while" * tag 'tty-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (71 commits) tty: serial: lpuart: add a more accurate baud rate calculation method tty: serial: lpuart: add earlycon support for imx7ulp tty: serial: lpuart: add imx7ulp support dt-bindings: serial: fsl-lpuart: add i.MX7ULP support tty: serial: lpuart: add little endian 32 bit register support tty: serial: lpuart: refactor lpuart32_{read|write} prototype tty: serial: lpuart: introduce lpuart_soc_data to represent SoC property serial: imx-serial - move DMA buffer configuration to DT serial: imx: Enable RTSD only when needed serial: imx: Remove unused members from imx_port struct serial: 8250: 8250_omap: Fix race b/w dma completion and RX timeout serial: 8250: Fix THRE flag usage for CAP_MINI tty/serial: meson_uart: update to stable bindings dt-bindings: serial: Add bindings for the Amlogic Meson UARTs serial: Delete dead code for CIR serial ports serial: sirf: make of_device_ids const serial/mpsc: switch to dma_alloc_attrs tty: serial: Add Actions Semi Owl UART earlycon dt-bindings: serial: Document Actions Semi Owl UARTs tty/serial: atmel: make the driver DT only ...
2017-07-04powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configsBalbir Singh
All code that patches kernel text has been moved over to using patch_instruction() and patch_instruction() is able to cope with the kernel text being read only. The linker script has been updated to ensure the read only data ends on a large page boundary, so it and the preceding kernel text can be marked R_X. We also have implementations of mark_rodata_ro() for Hash and Radix MMU modes. There are some corner-cases missing when the kernel is built relocatable, so for now make it depend on !RELOCATABLE. There's also a temporary workaround to depend on !HIBERNATION to avoid a build failure, that will be removed once we've merged with the PM tree. Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Make it depend on !RELOCATABLE, munge change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-04powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for RadixBalbir Singh
The Radix linear mapping code (create_physical_mapping()) tries to use the largest page size it can at each step. Currently the only reason it steps down to a smaller page size is if the start addr is unaligned (never happens in practice), or the end of memory is not aligned to a huge page boundary. To support STRICT_RWX we need to break the mapping at __init_begin, so that the text and rodata prior to that can be marked R_X and the regular pages after can be marked RW. Having done that we can now implement mark_rodata_ro() for Radix, knowing that we won't need to split any mappings. Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Split down to PAGE_SIZE, not 2MB, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-04powerpc/mm/hash: Implement mark_rodata_ro() for hashBalbir Singh
With hash we update the bolted pte to mark it read-only. We rely on the MMU_FTR_KERNEL_RO to generate the correct permissions for read-only text. The radix implementation just prints a warning in this implementation Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Make the warning louder when we don't have MMU_FTR_KERNEL_RO] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug updates from Thomas Gleixner: "This update is primarily a cleanup of the CPU hotplug locking code. The hotplug locking mechanism is an open coded RWSEM, which allows recursive locking. The main problem with that is the recursive nature as it evades the full lockdep coverage and hides potential deadlocks. The rework replaces the open coded RWSEM with a percpu RWSEM and establishes full lockdep coverage that way. The bulk of the changes fix up recursive locking issues and address the now fully reported potential deadlocks all over the place. Some of these deadlocks have been observed in the RT tree, but on mainline the probability was low enough to hide them away." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) cpu/hotplug: Constify attribute_group structures powerpc: Only obtain cpu_hotplug_lock if called by rtasd ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init cpu/hotplug: Remove unused check_for_tasks() function perf/core: Don't release cred_guard_mutex if not taken cpuhotplug: Link lock stacks for hotplug callbacks acpi/processor: Prevent cpu hotplug deadlock sched: Provide is_percpu_thread() helper cpu/hotplug: Convert hotplug locking to percpu rwsem s390: Prevent hotplug rwsem recursion arm: Prevent hotplug rwsem recursion arm64: Prevent cpu hotplug rwsem recursion kprobes: Cure hotplug lock ordering issues jump_label: Reorder hotplug lock and jump_label_lock perf/tracing/cpuhotplug: Fix locking order ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() PCI: Replace the racy recursion prevention PCI: Use cpu_hotplug_disable() instead of get_online_cpus() perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() x86/perf: Drop EXPORT of perf_check_microcode ...
2017-07-03kill {__,}{get,put}_user_unaligned()Al Viro
no users left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-03Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - Continued work to add support for 5-level paging provided by future Intel CPUs. In particular we switch the x86 GUP code to the generic implementation. (Kirill A. Shutemov) - Continued work to add PCID CPU support to native kernels as well. In this round most of the focus is on reworking/refreshing the TLB flush infrastructure for the upcoming PCID changes. (Andy Lutomirski)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Delete a big outdated comment about TLB flushing x86/mm: Don't reenter flush_tlb_func_common() x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging x86/ftrace: Exclude functions in head64.c from function-tracing x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap x86/mm: Remove reset_lazy_tlbstate() x86/ldt: Simplify the LDT switching logic x86/boot/64: Put __startup_64() into .head.text x86/mm: Add support for 5-level paging for KASLR x86/mm: Make kernel_physical_mapping_init() support 5-level paging x86/mm: Add sync_global_pgds() for configuration with 5-level paging x86/boot/64: Add support of additional page table level during early boot x86/boot/64: Rename init_level4_pgt and early_level4_pgt x86/boot/64: Rewrite startup_64() in C x86/boot/compressed: Enable 5-level paging during decompression stage x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations x86/boot/efi: Cleanup initialization of GDT entries x86/asm: Fix comment in return_from_SYSCALL_64() x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation ...
2017-07-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ...
2017-07-03powerpc/vmlinux.lds: Align __init_begin to 16MBalbir Singh
For CONFIG_STRICT_KERNEL_RWX align __init_begin to 16M. We use 16M since its the larger of 2M on radix and 16M on hash for our linear mapping. The plan is to have .text, .rodata and everything upto __init_begin marked as RX. Note we still have executable read only data. We could further align rodata to another 16M boundary. I've used keeping text plus rodata as read-only-executable as a trade-off to doing read-only-executable for text and read-only for rodata. We don't use multi PT_LOAD in PHDRS because we are not sure if all bootloaders support them. This patch keeps PHDRS in vmlinux.lds.S as the same they are with just one PT_LOAD for all of the kernel marked as RWX (7). mpe: What this means is the added alignment bloats the resulting binary on disk, a powernv kernel goes from 17M to 22M. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/lib/code-patching: Use alternate map for patch_instruction()Balbir Singh
This patch creates the window using text_poke_area, allocated via get_vm_area(). text_poke_area is per CPU to avoid locking. text_poke_area for each cpu is setup using late_initcall, prior to setup of these alternate mapping areas, we continue to use direct write to change/modify kernel text. With the ability to use alternate mappings to write to kernel text, it provides us the freedom to then turn text read-only and implement CONFIG_STRICT_KERNEL_RWX. This code is CPU hotplug aware to ensure that the we have mappings for any new cpus as they come online and tear down mappings for any CPUs that go offline. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/xmon: Add patch_instruction() support for xmonBalbir Singh
Move from mwrite() to patch_instruction() for xmon for breakpoint addition and removal. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/kprobes/optprobes: Use patch_instruction()Balbir Singh
So that we can implement STRICT_RWX, use patch_instruction() in optprobes. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/kprobes: Move kprobes over to patch_instruction()Balbir Singh
arch_arm/disarm_probe() use direct assignment for copying instructions, replace them with patch_instruction(). We don't need to call flush_icache_range() because patch_instruction() does it for us. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/mm/radix: Fix execute permissions for interrupt_vectorsBalbir Singh
Commit 9abcc981de97 ("powerpc/mm/radix: Only add X for pages overlapping kernel text") changed the linear mapping on Radix to only mark the kernel text executable. However if the kernel is run relocated, for example as a kdump kernel, then the exception vectors are split from the kernel text, ie. they remain at real address 0. We tend to get away with it, because the kernel itself will usually be below 1G, which means the 1G page at 0-1G is marked executable and everything works OK. However if the kernel is loaded above 1G, or the system has less than 1G in total (meaning we can't use a 1G page), then the exception vectors will not be marked executable and the kernel will fail to boot. Fix it by also checking if the address range overlaps the exception vectors when deciding if we should add PAGE_KERNEL_X. Fixes: 9abcc981de97 ("powerpc/mm/radix: Only add X for pages overlapping kernel text") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Combine with the existing check, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()Balbir Singh
Once upon a time there were only two PP (page protection) bits. In ISA 2.03 an additional PP bit was added, but because of the layout of the HPTE it could not be made contiguous with the existing PP bits. The result is that we now have three PP bits, named pp0, pp1, pp2, where pp0 occupies bit 63 of dword 1 of the HPTE and pp1 and pp2 occupy bits 1 and 0 respectively. Until recently Linux hasn't used pp0, however with the addition of _PAGE_KERNEL_RO we started using it. The problem arises in the LPAR code, where we need to translate the PP bits into the argument for the H_PROTECT hypercall. Currently the code only passes bits 0-2 of newpp, which covers pp1, pp2 and N (no execute), meaning pp0 is not passed to the hypervisor at all. We can't simply pass it through in bit 63, as that would collide with a different field in the flags argument, as defined in PAPR. Instead we have to shift it down to bit 8 (IBM bit 55). Fixes: e58e87adc8bf ("powerpc/mm: Update _PAGE_KERNEL_RO") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Simplify the test, rework change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Blacklist rtas entry/exit from kprobesNaveen N. Rao
We can't take traps with relocation off, so blacklist enter_rtas() and rtas_return_loc(). However, instead of blacklisting all of enter_rtas(), introduce a new symbol __enter_rtas from where on we can't take a trap and blacklist that. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Blacklist functions invoked on a trapNaveen N. Rao
Blacklist all functions involved while handling a trap. We: - convert some of the symbols into private symbols, and - blacklist most functions involved while handling a trap. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Un-blacklist system_call() from kprobesNaveen N. Rao
It is actually safe to probe system_call() in entry_64.S, but only till we unset MSR_RI. To allow this, add a new symbol system_call_exit() after the mtmsrd and blacklist that. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Move system_call() symbol to just after setting MSR_EENaveen N. Rao
It is common to get a PMU interrupt right after the mtmsr instruction that enables interrupts. Due to this, the stack trace profile gets needlessly split across system_call_common() and system_call(). Previously, system_call() symbol was at the current place to hide a few earlier symbols which have since been made private or removed entirely. So, let's move system_call() slightly higher up, right after the mtmsr instruction that enables interrupts. Convert existing references to system_call to a local syscall symbol. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Blacklist system_call() and system_call_common() from kprobesNaveen N. Rao
Convert some of the symbols into private symbols and blacklist system_call_common() and system_call() from kprobes. We can't take a trap at parts of these functions as either MSR_RI is unset or the kernel stack pointer is not yet setup. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Don't convert system_call_common to _GLOBAL()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Convert .L__replay_interrupt_return to a local labelNaveen N. Rao
Commit b48bbb82e2b835 ("powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt()") introduced __replay_interrupt_return symbol with '.L' prefix in hopes of keeping it private. However, due to the use of LOAD_REG_ADDR(), the assembler kept this symbol visible. Fix the same by instead using the local label '1'. Fixes: Commit b48bbb82e2b835 ("powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt()") Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc64/elfv1: Only dereference function descriptor for non-text symbolsNaveen N. Rao
Currently, we assume that the function pointer we receive in ppc_function_entry() points to a function descriptor. However, this is not always the case. In particular, assembly symbols without the right annotation do not have an associated function descriptor. Some of these symbols are added to the kprobe blacklist using _ASM_NOKPROBE_SYMBOL(). When such addresses are subsequently processed through arch_deref_entry_point() in populate_kprobe_blacklist(), we see the below errors during bootup: [ 0.663963] Failed to find blacklist at 7d9b02a648029b6c [ 0.663970] Failed to find blacklist at a14d03d0394a0001 [ 0.663972] Failed to find blacklist at 7d5302a6f94d0388 [ 0.663973] Failed to find blacklist at 48027d11e8610178 [ 0.663974] Failed to find blacklist at f8010070f8410080 [ 0.663976] Failed to find blacklist at 386100704801f89d [ 0.663977] Failed to find blacklist at 7d5302a6f94d00b0 Fix this by checking if the function pointer we receive in ppc_function_entry() already points to kernel text. If so, we just return it as is. If not, we assume that this is a function descriptor and proceed to dereference it. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03cxl: Export library to support IBM XSLChristophe Lombard
This patch exports a in-kernel 'library' API which can be called by other drivers to help interacting with an IBM XSL on a POWER9 system. The XSL (Translation Service Layer) is a stripped down version of the PSL (Power Service Layer) used in some cards such as the Mellanox CX5. Like the PSL, it implements the CAIA architecture, but has a number of differences, mostly in it's implementation dependent registers. The XSL also uses a special DMA cxl mode, which uses a slightly different init sequence for the CAPP and PHB. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch, a few of them are tripping people up while working on top of next, and we also have a dependency between the CXL fixes and new CXL code we want to merge into next.
2017-07-03powerpc/dts: Use #include "..." to include local DTMasahiro Yamada
Most of DT files in PowerPC use #include "..." to make pre-processor include DT in the same directory, but we have 3 exceptional files that use #include <...> for that. Fix them to remove -I$(srctree)/arch/$(SRCARCH)/boot/dts path from dtc_cpp_flags. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD - Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending.
2017-07-03KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving codePaul Mackerras
This fixes a typo where the wrong loop index was used to index the kvmppc_xive_vcpu.queues[] array in xive_pre_save_scan(). The variable i contains the vcpu number; we need to index queues[] using j, which iterates from 0 to KVMPPC_XIVE_Q_COUNT-1. The effect of this bug is that things that save the interrupt controller state, such as "virsh dump", on a VM with more than 8 vCPUs, result in xive_pre_save_queue() getting called on a bogus queue structure, usually resulting in a crash like this: [ 501.821107] Unable to handle kernel paging request for data at address 0x00000084 [ 501.821212] Faulting instruction address: 0xc008000004c7c6f8 [ 501.821234] Oops: Kernel access of bad area, sig: 11 [#1] [ 501.821305] SMP NR_CPUS=1024 [ 501.821307] NUMA [ 501.821376] PowerNV [ 501.821470] Modules linked in: vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables ses enclosure scsi_transport_sas ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel kvm_hv nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc kvm tg3 ptp pps_core [ 501.822477] CPU: 3 PID: 3934 Comm: live_migration Not tainted 4.11.0-4.git8caa70f.el7.centos.ppc64le #1 [ 501.822633] task: c0000003f9e3ae80 task.stack: c0000003f9ed4000 [ 501.822745] NIP: c008000004c7c6f8 LR: c008000004c7c628 CTR: 0000000030058018 [ 501.822877] REGS: c0000003f9ed7980 TRAP: 0300 Not tainted (4.11.0-4.git8caa70f.el7.centos.ppc64le) [ 501.823030] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [ 501.823047] CR: 28022244 XER: 00000000 [ 501.823203] CFAR: c008000004c7c77c DAR: 0000000000000084 DSISR: 40000000 SOFTE: 1 [ 501.823203] GPR00: c008000004c7c628 c0000003f9ed7c00 c008000004c91450 00000000000000ff [ 501.823203] GPR04: c0000003f5580000 c0000003f559bf98 9000000000009033 0000000000000000 [ 501.823203] GPR08: 0000000000000084 0000000000000000 00000000000001e0 9000000000001003 [ 501.823203] GPR12: c00000000008a7d0 c00000000fdc1b00 000000000a9a0000 0000000000000000 [ 501.823203] GPR16: 00000000402954e8 000000000a9a0000 0000000000000004 0000000000000000 [ 501.823203] GPR20: 0000000000000008 c000000002e8f180 c000000002e8f1e0 0000000000000001 [ 501.823203] GPR24: 0000000000000008 c0000003f5580008 c0000003f4564018 c000000002e8f1e8 [ 501.823203] GPR28: 00003ff6e58bdc28 c0000003f4564000 0000000000000000 0000000000000000 [ 501.825441] NIP [c008000004c7c6f8] xive_get_attr+0x3b8/0x5b0 [kvm] [ 501.825671] LR [c008000004c7c628] xive_get_attr+0x2e8/0x5b0 [kvm] [ 501.825887] Call Trace: [ 501.825991] [c0000003f9ed7c00] [c008000004c7c628] xive_get_attr+0x2e8/0x5b0 [kvm] (unreliable) [ 501.826312] [c0000003f9ed7cd0] [c008000004c62ec4] kvm_device_ioctl_attr+0x64/0xa0 [kvm] [ 501.826581] [c0000003f9ed7d20] [c008000004c62fcc] kvm_device_ioctl+0xcc/0xf0 [kvm] [ 501.826843] [c0000003f9ed7d40] [c000000000350c70] do_vfs_ioctl+0xd0/0x8c0 [ 501.827060] [c0000003f9ed7de0] [c000000000351534] SyS_ioctl+0xd4/0xf0 [ 501.827282] [c0000003f9ed7e30] [c00000000000b8e0] system_call+0x38/0xfc [ 501.827496] Instruction dump: [ 501.827632] 419e0078 3b760008 e9160008 83fb000c 83db0010 80fb0008 2f280000 60000000 [ 501.827901] 60000000 60420000 419a0050 7be91764 <7d284c2c> 552a0ffe 7f8af040 419e003c [ 501.828176] ---[ end trace 2d0529a5bbbbafed ]--- Cc: stable@vger.kernel.org Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-07-02powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8Thiago Jung Bauermann
On POWER9 SMT8 the 24x7 API returns two result elements for physical core and virtual CPU events and we need to add their counts to get the final result. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Support v2 of the hypervisor APIThiago Jung Bauermann
POWER9 introduces a new version of the hypervisor API to access the 24x7 perf counters. The new version changed some of the structures used for requests and results. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Minor improvementsThiago Jung Bauermann
There's an H24x7_DATA_BUFFER_SIZE constant, so use it in init_24x7_request. There's also an HV_PERF_DOMAIN_MAX constant, so use it in h_24x7_event_init. This makes the comment above the check redundant, so remove it. In add_event_to_24x7_request, a statement is terminated with a comma instead of a semicolon. Fix it. In hv-24x7.h, improve comments in struct hv_24x7_result. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Fix return value of hcallsThiago Jung Bauermann
The H_GET_24X7_CATALOG_PAGE hcall can return a signed error code, so fix this in the code. The H_GET_24X7_DATA hcall can return a signed error code, so fix this in the code. Also, don't truncate it to 32 bit to use as return value for make_24x7_request. In case of error h_24x7_event_commit_txn passes that return value to generic code, so it should be a proper errno. The other caller of make_24x7_request is single_24x7_request, whose callers don't actually care which error code is returned so they are not affected by this change. Finally, h_24x7_get_value doesn't use the error code from single_24x7_request, so there's no need to store it. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc-perf/hx-24x7: Don't log failed hcall twiceThiago Jung Bauermann
make_24x7_request already calls log_24x7_hcall if it fails, so callers don't have to do it again. In fact, since the latter is now only called from the former, there's no need for a separate log_24x7_hcall anymore so remove it. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Properly iterate through resultsThiago Jung Bauermann
hv-24x7.h has a comment mentioning that result_buffer->results can't be indexed as a normal array because it may contain results of variable sizes, so fix the loop in h_24x7_event_commit_txn to take the variation into account when iterating through results. Another problem in that loop is that it sets h24x7hw->events[i] to NULL. This assumes that only the i'th result maps to the i'th request, but that is not guaranteed to be true. We need to leave the event in the array so that we don't dereference a NULL pointer in case more than one result maps to one request. We still assume that each result has only one result element, so warn if that assumption is violated. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer checkThiago Jung Bauermann
request_buffer can hold 254 requests, so if it already has that number of entries we can't add a new one. Also, define constant to show where the number comes from. Fixes: e3ee15dc5d19 ("powerpc/perf/hv-24x7: Define add_event_to_24x7_request()") Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/perf/hv-24x7: Fix passing of catalog version numberThiago Jung Bauermann
H_GET_24X7_CATALOG_PAGE needs to be passed the version number obtained from the first catalog page obtained previously. This is a 64 bit number, but create_events_from_catalog truncates it to 32-bit. This worked on POWER8, but POWER9 actually uses the upper bits so the call fails with H_P3 because the hypervisor doesn't recognize the version. This patch also adds the hcall return code to the error message, which is helpful when debugging the problem. Fixes: 5c5cd7b50259 ("powerpc/perf/hv-24x7: parse catalog and populate sysfs with events") Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/mm: Enable ZONE_DEVICE on powerpcOliver O'Halloran
Flip the switch. Running around and screaming "IT'S ALIVE" is optional, but recommended. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/mm: Wire up hpte_removebolted for powernvAnton Blanchard
Adds support for removing bolted (i.e kernel linear mapping) mappings on powernv. This is needed to support memory hot unplug operations which are required for the teardown of DAX/PMEM devices. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/mm: Add devmap support for ppc64Oliver O'Halloran
Add support for the devmap bit on PTEs and PMDs for PPC64 Book3S. This is used to differentiate device backed memory from transparent huge pages since they are handled in more or less the same manner by the core mm code. Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/vmemmap: Add altmap supportOliver O'Halloran
Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An altmap is a driver provided region that is used to provide the backing storage for the struct pages of ZONE_DEVICE memory. In situations where large amount of ZONE_DEVICE memory is being added to the system the altmap reduces pressure on main system memory by allowing the mm/ metadata to be stored on the device itself rather in main memory. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/vmemmap: Reshuffle vmemmap_free()Oliver O'Halloran
Removes an indentation level and shuffles some code around to make the following patch cleaner. No functional changes. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/hugetlbfs: Export HPAGE_SHIFTOliver O'Halloran
Export it so it can be referenced inside a module. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc: use spin loop primitives in some functionsNicholas Piggin
Use the different spin loop primitives in some simple powerpc spin loops, including those which will spin as a common case. This will help to test the spin loop primitives before more conversions are done. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add some includes of <linux/processor.h>] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02powerpc/64: implement spin loop primitivesNicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-01KVM: PPC: Book3S HV: Close race with testing for signals on guest entryPaul Mackerras
At present, interrupts are hard-disabled fairly late in the guest entry path, in the assembly code. Since we check for pending signals for the vCPU(s) task(s) earlier in the guest entry path, it is possible for a signal to be delivered before we enter the guest but not be noticed until after we exit the guest for some other reason. Similarly, it is possible for the scheduler to request a reschedule while we are in the guest entry path, and we won't notice until after we have run the guest, potentially for a whole timeslice. Furthermore, with a radix guest on POWER9, we can take the interrupt with the MMU on. In this case we end up leaving interrupts hard-disabled after the guest exit, and they are likely to stay hard-disabled until we exit to userspace or context-switch to another process. This was masking the fact that we were also not setting the RI (recoverable interrupt) bit in the MSR, meaning that if we had taken an interrupt, it would have crashed the host kernel with an unrecoverable interrupt message. To close these races, we need to check for signals and reschedule requests after hard-disabling interrupts, and then keep interrupts hard-disabled until we enter the guest. If there is a signal or a reschedule request from another CPU, it will send an IPI, which will cause a guest exit. This puts the interrupt disabling before we call kvmppc_start_thread() for all the secondary threads of this core that are going to run vCPUs. The reason for that is that once we have started the secondary threads there is no easy way to back out without going through at least part of the guest entry path. However, kvmppc_start_thread() includes some code for radix guests which needs to call smp_call_function(), which must be called with interrupts enabled. To solve this problem, this patch moves that code into a separate function that is called earlier. When the guest exit is caused by an external interrupt, a hypervisor doorbell or a hypervisor maintenance interrupt, we now handle these using the replay facility. __kvmppc_vcore_entry() now returns the trap number that caused the exit on this thread, and instead of the assembly code jumping to the handler entry, we return to C code with interrupts still hard-disabled and set the irq_happened flag in the PACA, so that when we do local_irq_enable() the appropriate handler gets called. With all this, we now have the interrupt soft-enable flag clear while we are in the guest. This is useful because code in the real-mode hypercall handlers that checks whether interrupts are enabled will now see that they are disabled, which is correct, since interrupts are hard-disabled in the real-mode code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-07-01KVM: PPC: Book3S HV: Simplify dynamic micro-threading codePaul Mackerras
Since commit b009031f74da ("KVM: PPC: Book3S HV: Take out virtual core piggybacking code", 2016-09-15), we only have at most one vcore per subcore. Previously, the fact that there might be more than one vcore per subcore meant that we had the notion of a "master vcore", which was the vcore that controlled thread 0 of the subcore. We also needed a list per subcore in the core_info struct to record which vcores belonged to each subcore. Now that there can only be one vcore in the subcore, we can replace the list with a simple pointer and get rid of the notion of the master vcore (and in fact treat every vcore as a master vcore). We can also get rid of the subcore_vm[] field in the core_info struct since it is never read. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-30Merge tag 'powerpc-4.12-8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Hopefully the last two powerpc fixes for 4.12. The CXL one is larger than I'd usually send at rc7, but it fixes new code this cycle, so better to have it working for the release. It was actually sent a few weeks back but got blocked in testing behind another fix that was causing issues. We are still tracking one crash in v4.12-rc7, but only one person has reproduced it and the commit identified by bisect doesn't touch any of the relevant code, so I think it's 50/50 whether that commit is actually the problem or it's some code layout / toolchain issue. Two fixes for code we merged this cycle: - cxl: Fixes for Coherent Accelerator Interface Architecture 2.0 - Avoid miscompilation w/GCC 4.6.3 on 32-bit - don't inline copy_to/from_user() Thanks to Al Viro, Larry Finger, Christophe Lombard" * tag 'powerpc-4.12-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user() cxl: Fixes for Coherent Accelerator Interface Architecture 2.0
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge tag 'kvmarm-for-4.13' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.13 - vcpu request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups Conflicts: arch/s390/include/asm/kvm_host.h
2017-06-30kbuild: thin archives make default for all archsNicholas Piggin
Make thin archives build the default, but keep the config option to allow exemptions if any breakage can't be quickly solved. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-06-28arch: remove unused macro/function thread_saved_pc()Tobias Klauser
The only user of thread_saved_pc() in non-arch-specific code was removed in commit 8243d5597793 ("sched/core: Remove pointless printout in sched_show_task()"). Remove the implementations as well. Some architectures use thread_saved_pc() in their arch-specific code. Leave their thread_saved_pc() intact. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>