summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2017-08-28powerpc/configs: Drop unneeded CONFIG_CRYPTO_ANSI_CPRNGMichael Ellerman
Since commit 401e4238f35c ("crypto: rng - Make DRBG the default RNG") we no longer need to set CONFIG_CRYPTO_ANSI_CPRNG in our defconfigs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28powerpc/configs: Update for symbol movement onlyMichael Ellerman
Update defconfigs for symbols that have moved around, without their value changing. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28powerpc/oops: Line up NIP & MSR with other rowsMichael Ellerman
This is purely cosmetic, but does look nicer IMHO: Before: task: c000000001453400 task.stack: c000000001c6c000 NIP: c000000000a0fbfc LR: c000000000a0fbf4 CTR: c000000000ba6220 REGS: c0000001fffef820 TRAP: 0300 Not tainted (4.13.0-rc6-gcc-6.3.1-00234-g423af27f7d81) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 88088242 XER: 00000000 CFAR: c0000000000b3488 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0 After: task: c000000001453400 task.stack: c000000001c6c000 NIP: c000000000a0fbfc LR: c000000000a0fbf4 CTR: c000000000ba6220 REGS: c0000001fffef820 TRAP: 0300 Not tainted (4.13.0-rc6-gcc-6.3.1-00234-g423af27f7d81-dirty) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 88088242 XER: 00000000 CFAR: c0000000000b34a4 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28powerpc/oops: Print CR/XER on same line as MSRMichael Ellerman
Somehow we missed this when the pr_cont() changes went in. Fix CR/XER to go on the same line as MSR, as they have historically, eg: MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 4804408a XER: 20000000 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28powerpc/oops: Use IS_ENABLED() for oops markersMichael Ellerman
Just because it looks less gross. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28powerpc/oops: Print the kernel's endian in the oopsMichael Ellerman
Although the MSR tells you what endian you're in it's possible that isn't the same endian the kernel was built for, and if that happens you're usually having a very bad day. So print a marker to make it 100% clear which endian the kernel was built for. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28powerpc/oops: Fix the oops markers to use pr_cont()Michael Ellerman
When we oops we print a few markers for significant config options such as PREEMPT, SMP etc. Currently these appear on separate lines because we're not using pr_cont() properly. Fix it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-28powerpc/powernv: Fix build error in opal-imc.c when NUMA=nLABBE Corentin
When building a random powerpc kernel I hit this build error: arch/powerpc/platforms/powernv/opal-imc.c:130:13: error : assignment discards « const » qualifier from pointer target type [-Werror=discarded-qualifiers] l_cpumask = cpumask_of_node(nid); ^ This happens because when CONFIG_NUMA=n cpumask_of_node() returns a const pointer. This patch simply adds const to l_cpumask to fix this issue. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24powerpc/powernv: Enable removal of memory for in memory tracingRashmica Gupta
The hardware trace macro feature requires access to a chunk of real memory. This patch provides a debugfs interface to do this. By writing an integer containing the size of memory to be unplugged into /sys/kernel/debug/powerpc/memtrace/enable, the code will attempt to remove that much memory from the end of each NUMA node. This patch also adds additional debugsfs files for each node that allows the tracer to interact with the removed memory, as well as a trace file that allows userspace to read the generated trace. Note that this patch does not invoke the hardware trace macro, it only allows memory to be removed during runtime for the trace macro to utilise. Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> [mpe: Minor formatting etc fixups] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24powerpc/uprobes: Implement arch_uretprobe_is_alive()Naveen N. Rao
This helper is used to detect if a uprobe'd function has returned through a setjmp/longjmp, rather than branching to the LR that was updated previously by us. This fixes a SIGSEGV that gets generated when programs use setjmp/longjmp with uretprobes. We use the arm64 model (arch/arm64/kernel/probes/uprobes.c: arch_uretprobe_is_alive()) for detecting when stack frames have been removed from under us. Reference: https://marc.info/?l=linux-kernel&m=143748610330073 commit 7b868e4802a86 ("uprobes/x86: Reimplement arch_uretprobe_is_alive()") commit db087ef69a2b1 ("uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever") Tested with the test program from: https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob;f=testsuite/systemtap.base/bz5274.c;hb=HEAD And this script: $ cat test.sh #!/bin/bash perf probe -x ./bz5274 -a bz5274_main_return=main%return perf probe -x ./bz5274 -a bz5274_funca_return=funca%return perf probe -x ./bz5274 -a bz5274_funcb_return=funcb%return perf probe -x ./bz5274 -a bz5274_funcc_return=funcc%return perf probe -x ./bz5274 -a bz5274_funcd_return=funcd%return perf record -e 'probe_bz5274:*' -aR ./bz5274 Reported-by: Gustavo Luiz Duarte <gduarte@redhat.com> Reported-by: zsun@redhat.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24powerpc/kprobes: Don't save/restore DAR/DSISR to/from pt_regs for optprobesNaveen N. Rao
We don't save/restore these across a trap, or with KPROBES_ON_FTRACE. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-24powerpc/xive: Fix the size of the cpumask used in xive_find_target_in_mask()Cédric Le Goater
When called from xive_irq_startup(), the size of the cpumask can be larger than nr_cpu_ids. This can result in a WARN_ON such as: WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0 ... NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0 LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0 Call Trace: xive_find_target_in_mask+0x74/0x2f0 (unreliable) xive_pick_irq_target.isra.1+0x200/0x230 xive_irq_startup+0x60/0x180 irq_startup+0x70/0xd0 __setup_irq+0x7bc/0x880 request_threaded_irq+0x14c/0x2c0 request_event_sources_irqs+0x100/0x180 __machine_initcall_pseries_init_ras_IRQ+0x104/0x134 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x290/0x374 kernel_init+0x24/0x170 ret_from_kernel_thread+0x5c/0x74 This happens because we're being called with our affinity mask set to irq_default_affinity. That in turn was populated using cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a mask which has > nr_cpu_ids bits set. Fix it by limiting the value returned by cpumask_weight(). Signed-off-by: Cédric Le Goater <clg@kaod.org> [mpe: Add change log details on actual cause] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64: Optimise set/clear of CTRL[RUN] (runlatch)Nicholas Piggin
On modern CPUs the CTRL register is read-only except bit 63 which is the run latch control. This means it can be updated with a mtspr rather than mfspr/mtspr. To accomodate older CPUs (Cell at least), where there are other bits in the register, we still do a read/modify/write on pre 2.06 CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Update change log to mention 2.06 workaround] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Remove spurious IRQ reason in IRQ replayNicholas Piggin
HVI interrupts have always used 0x500, so remove the dead branch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64: Remove redundant instruction in interrupt replayNicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Use the HV handler for external IRQ replay in HV mode on POWER9Nicholas Piggin
POWER9 host external interrupts use the h_virt_irq_common handler, so use that to replay them rather than using the hardware_interrupt_common handler. Both call do_IRQ, but using the correct handler reduces i-cache footprint. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Merge HV and non-HV paths for doorbell IRQ replayNicholas Piggin
This results in smaller code, and fewer branches. This relies on the fact that both the 0xe80 and 0xa00 handlers call the same upper level code, namely doorbell_exception(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Mention we rely on the implementation of the 0xe80/0xa00 handlers] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64: Cleanup __check_irq_replay()Nicholas Piggin
Move the clearing of irq_happened bits into the condition where they were found to be set. This reduces instruction count slightly, and reduces stores into irq_happened. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: masked_interrupt() returns to kernel so avoid restoring r13Nicholas Piggin
Places in the kernel where r13 is not the PACA pointer must have maskable interrupts disabled, so r13 does not have to be restored when returning from a soft-masked interrupt. We should never have interrupts soft disabled when we're in user space. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Optimise clearing of MSR_EE in masked_[H]interrupt()Nicholas Piggin
MSR_EE is always enabled in SRR1 for masked interrupts, so we can use xor to clear it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Avoid a branch in masked_[H]interrupt()Nicholas Piggin
Interrupts which do not require EE to be cleared can all be tested with a single bitwise test. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/mm: Make switch_mm_irqs_off() out of lineBenjamin Herrenschmidt
It's too big to be inline, there is no reason to keep it that way. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Rework to incorporate the comment changes via fixes branch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/mm: Optimize detection of thread local mm'sBenjamin Herrenschmidt
Instead of comparing the whole CPU mask every time, let's keep a counter of how many bits are set in the mask. Thus testing for a local mm only requires testing if that counter is 1 and the current CPU bit is set in the mask. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/mm: Use mm_is_thread_local() instread of open-codingBenjamin Herrenschmidt
We open-code testing for the mm being local to the current CPU in a few places. Use our existing helper instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/mm: Avoid double irq save/restore in activate_mmBenjamin Herrenschmidt
It calls switch_mm() which already does the irq save/restore these days. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/mm: Move pgdir setting into a helperBenjamin Herrenschmidt
Makes switch_mm_irqs_off() a bit more readable Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Fix replay interrupt return label nameMichael Ellerman
In __replay_interrupt() we take the address of a local label so we can return to it later. However the assembler turns the local label into a symbol with a name like ".L1^B42" - where "^B" is literally "\002". This does not make for pleasant stack traces. Fix it by giving the label a sensible name. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc: pseries: remove dlpar_attach_node dependency on full pathRob Herring
In preparation to stop storing the full node path in full_name, remove the dependency on full_name from dlpar_attach_node(). Callers of dlpar_attach_node() already have the parent device_node, so just pass the parent node into dlpar_attach_node instead of the path. This avoids doing a lookup of the parent node by the path. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc: Convert to using %pOF instead of full_nameRob Herring
Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each node. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anatolij Gustschin <agust@denx.de> Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/vio: Use device_type to detect familyMichael Ellerman
Currently in the vio.c code we use a comparision against the parent device node's full path to decide if the device is a PFO or VIO family device. Both the ibm,platform-facilities and vdevice nodes are defined by PAPR, and must have a matching device_type. So instead of using the path we can instead compare the device_type. I've checked Qemu and kvmtool both do this correctly, and all the PowerVM systems I have access to do also. So it seems to be safe. This removes the dependency on full_name, which is being removed upstream. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23Merge branch 'fixes' into nextMichael Ellerman
There's a non-trivial dependency between some commits we want to put in next and the KVM prefetch work around that went into fixes. So merge fixes into next.
2017-08-18powerpc/mm: Ensure cpumask update is orderedBenjamin Herrenschmidt
There is no guarantee that the various isync's involved with the context switch will order the update of the CPU mask with the first TLB entry for the new context being loaded by the HW. Be safe here and add a memory barrier to order any subsequent load/store which may bring entries into the TLB. The corresponding barrier on the other side already exists as pte updates use pte_xchg() which uses __cmpxchg_u64 which has a sync after the atomic operation. Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comments in the code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17powerpc/mm/cxl: Add the fault handling cpu to mm cpumaskAneesh Kumar K.V
We use mm cpumask for serializing against lockless page table walk. Anybody who is doing a lockless page table walk is expected to disable irq and only cpus in mm cpumask is expected do the lockless walk. This ensure that a THP split can send IPI to only cpus in the mm cpumask, to make sure there are no parallel lockless page table walk. Add the CAPI fault handling cpu to the mm cpumask so that we can do the lockless page table walk while inserting hash page table entries. Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17powerpc/mm: Don't send IPI to all cpus on THP updatesAneesh Kumar K.V
Now that we made sure that lockless walk of linux page table is mostly limitted to current task(current->mm->pgdir) we can update the THP update sequence to only send IPI to CPUs on which this task has run. This helps in reducing the IPI overload on systems with large number of CPUs. WRT kvm even though kvm is walking page table with vpc->arch.pgdir, it is done only on secondary CPUs and in that case we have primary CPU added to task's mm cpumask. Sending an IPI to primary will force the secondary to do a vm exit and hence this mm cpumask usage is safe here. WRT CAPI, we still end up walking linux page table with capi context MM. For now the pte lookup serialization sends an IPI to all CPUs in CPI is in use. We can further improve this by adding the CAPI interrupt handling CPU to task mm cpumask. That will be done in a later patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Bring in the commit to rename find_linux_pte_or_hugepte() which touches arch and KVM code, and might need to be merged with the kvmppc tree to avoid conflicts.
2017-08-17powerpc/mm: Rename find_linux_pte_or_hugepte()Aneesh Kumar K.V
Add newer helpers to make the function usage simpler. It is always recommended to use find_current_mm_pte() for walking the page table. If we cannot use find_current_mm_pte(), it should be documented why the said usage of __find_linux_pte() is safe against a parallel THP split. For now we have KVM code using __find_linux_pte(). This is because kvm code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but with PACA soft_enabled = 1. We may want to fix that later and make sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that we can switch kvm to use find_linux_pte(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17powerpc/bpf: Use memset32() to pre-fill traps in BPF page(s)Naveen N. Rao
Use the newly introduced memset32() to pre-fill BPF page(s) with trap instructions. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17powerpc/string: Implement optimized memset variantsNaveen N. Rao
Based on Matthew Wilcox's patches for other architectures. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17powerpc/perf: Fix usage of nest_imc_refcMadhavan Srinivasan
nest_imc_refc is a reference count struct, used to track number of active perf sessions using the nest units. Currently the code accesses nest_imc_refc using node_id, which is incorrect, the array is indexed by node number. Meaning in the case of sparse node ids we index off the end of the array. Fix it to use get_nest_pmu_ref() which uses the existing per-cpu variable local_nest_imc_refc. Fixes: 885dcd709ba91 ('powerpc/perf: Add nest IMC PMU support') Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Tweak change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17powerpc: Add const to bin_attribute structuresBhumika Goyal
Declare bin_attribute structures as const as they are only passed as an argument to the function sysfs_create_bin_file. This argument is of type const, so declare the structure as const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16powerpc: Remove more redundant VSX save/testsBenjamin Herrenschmidt
__giveup_vsx/save_vsx are completely equivalent to testing MSR_FP and MSR_VEC and calling the corresponding giveup/save function so just remove the spurious VSX cases. Also add WARN_ONs checking that we never have VSX enabled without the two other. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16powerpc: Remove redundant clear of MSR_VSX in __giveup_vsx()Benjamin Herrenschmidt
__giveup_fpu() already does it and we cannot have MSR_VSX set without having MSR_FP also set. This also adds a warning to check we indeed do Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16powerpc: Remove redundant FP/Altivec giveup codeBenjamin Herrenschmidt
__giveup_vsx() already calls those two functions. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16powerpc: Fix missing newline before {Benjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16powerpc/topology: Remove the unused parent_node() macroDou Liyang
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in POWERPC platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VECBenjamin Herrenschmidt
VSX uses a combination of the old vector registers, the old FP registers and new "second halves" of the FP registers. Thus when we need to see the VSX state in the thread struct (flush_vsx_to_thread()) or when we'll use the VSX in the kernel (enable_kernel_vsx()) we need to ensure they are all flushed into the thread struct if either of them is individually enabled. Unfortunately we only tested if the whole VSX was enabled, not if they were individually enabled. Fixes: 72cd7b44bc99 ("powerpc: Uncomment and make enable_kernel_vsx() routine available") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16powerpc/mm/hugetlb: Allow runtime allocation of 16G.Aneesh Kumar K.V
Now that we have GIGANTIC_PAGE enabled on powerpc, use this for 16G hugepages with hash translation mode. Depending on the total system memory we have, we may be able to allocate 16G hugepages runtime. This also remove the hugetlb setup difference between hash/radix translation mode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel ↵Aneesh Kumar K.V
command line With commit aa888a74977a8 ("hugetlb: support larger than MAX_ORDER") we added support for allocating gigantic hugepages via kernel command line. Switch ppc64 arch specific code to use that. W.r.t FSL support, we now limit our allocation range using BOOTMEM_ALLOC_ACCESSIBLE. We use the kernel command line to do reservation of hugetlb pages on powernv platforms. On pseries hash mmu mode the supported gigantic huge page size is 16GB and that can only be allocated with hypervisor assist. For pseries the command line option doesn't do the allocation. Instead pseries does gigantic hugepage allocation based on hypervisor hint that is specified via "ibm,expected#pages" property of the memory node. Cc: Scott Wood <oss@buserror.net> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/hugetlb: fix page rights verification in gup_hugepte()Christophe Leroy
gup_hugepte() checks if pages are present and readable, and when 'write' is set, also checks if the pages are writable. Initially this was done by checking if _PAGE_PRESENT and _PAGE_READ were set. In addition, _PAGE_WRITE was verified for write accesses. The problem is that we have to handle the three following cases: 1/ The target defines __PAGE_READ and __PAGE_WRITE 2/ The target defines __PAGE_RW 3/ The target defines __PAGE_RO In case 1/, this is obvious In case 2/, __PAGE_READ is defined as 0 and __PAGE_WRITE as __PAGE_RW so it works as well. But in case 3, __PAGE_RW is defined as 0, which means __PAGE_WRITE is 0 and then the test returns true (page writable) in all cases. A first correction was attempted in commit 6b8cb66a6a7cc ("powerpc: Fix usage of _PAGE_RO in hugepage"), but that fix is wrong: instead of checking that the page is writable when write is requested, it checks that the page is NOT writable when write is NOT requested. This patch adds a new pte_read() helper to check whether a page is readable or not. This avoids handling all possible cases in gup_hugepte(). Then gup_hugepte() is modified to use pte_present(), pte_read() and pte_write() instead of the raw flags. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: Simplify __set_fixmap()Christophe Leroy
__set_fixmap() uses __fix_to_virt() then does the boundary checks by it self. Instead, we can use fix_to_virt() which does the verification at build time. For this, we need to use it inline so that GCC can see the real value of idx at buildtime. In the meantime, we remove the 'fixmaps' variable. This variable is set but has never been used from the beginning (commit 2c419bdeca1d9 ("[POWERPC] Port fixmap from x86 and use for kmap_atomic")) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>