summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2016-03-01powerpc/ps3: gelic_udbg: use struct udphdr from <linux/udp.h>Luis Henriques
Instead of defining a local version of struct udphdr use the standard definition from <linux/udp.h>. The 'src' field is named 'source' in the <linux/udp.h> definition. Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01powerpc/ps3: gelic_udbg: use struct iphdr from <linux/ip.h>Luis Henriques
Instead of defining a local version of struct iphdr use the standard definition from <linux/ip.h>. Several fields in the <linux/ip.h> definition have different names: - proto -> protocol - src -> saddr - dest -> daddr - total_length -> tot_len - checksum -> check Also, 'ver_len' is composed by 'version' and 'ihl' in <linux/ip.h>. Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01powerpc/ps3: gelic_udbg: use struct vlan_hdr from <linux/if_vlan.h>Luis Henriques
Instead of defining the local struct vlantag use the standard definition of vlan_hdr from <linux/if_vlan.h>. The fields in the <linux/if_vlan.h> definition have different names: - vlan -> h_vlan_TCI - subtype -> h_vlan_encapsulated_proto While there, use also the ETH_P_IP macro instead of an hard-coded 0x0800 value. Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01powerpc/ps3: gelic_udbg: use struct ethhdr from <linux/if_ether.h>Luis Henriques
Instead of defining a local version of struct ethhdr use the standard definition from <linux/if_ether.h>. The fields in the <linux/if_ether.h> definition have different names: - dest -> h_dest - src -> h_source - type -> h_proto While there, use a few other standard functions/macros: - eth_broadcast_addr (instead of a memset) - ETH_ALEN - ETH_P_8021Q Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29powerpc/mm/book3s-64: Expand the real page number field of the Linux PTEPaul Mackerras
Now that other PTE fields have been moved out of the way, we can expand the RPN field of the PTE on 64-bit Book 3S systems and align it with the RPN field in the radix PTE format used by PowerISA v3.0 CPUs in radix mode. For 64k page size, this means we need to move the _PAGE_COMBO and _PAGE_4K_PFN bits. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29powerpc/mm/book3s-64: Move software-used bits in PTEPaul Mackerras
This moves the _PAGE_SPECIAL and _PAGE_SOFT_DIRTY bits in the Linux PTE on 64-bit Book 3S systems to bit positions which are designated for software use in the radix PTE format used by PowerISA v3.0 CPUs in radix mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29powerpc/mm/book3s-64: Shuffle read, write, execute and user bits in PTEPaul Mackerras
This moves the _PAGE_EXEC, _PAGE_RW and _PAGE_USER bits around in the Linux PTE on 64-bit Book 3S systems to correspond with the bit positions used in radix mode by PowerISA v3.0 CPUs. This also adds a _PAGE_READ bit corresponding to the read permission bit in the radix PTE. _PAGE_READ is currently unused but could possibly be used in future to improve pte_protnone(). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29powerpc/mm/book3s-64: Move HPTE-related bits in PTE to upper endPaul Mackerras
This moves the _PAGE_HASHPTE, _PAGE_F_GIX and _PAGE_F_SECOND fields in the Linux PTE on 64-bit Book 3S systems to the most significant byte. Of the 5 bits, one is a software-use bit and the other four are reserved bit positions in the PowerISA v3.0 radix PTE format. Using these bits is OK because these bits are all to do with tracking the HPTE(s) associated with the Linux PTE, and therefore won't be needed in radix mode. This frees up bit positions in the lower two bytes. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29powerpc/mm/book3s-64: Move _PAGE_PTE to 2nd most significant bitPaul Mackerras
This changes _PAGE_PTE for 64-bit Book 3S processors from 0x1 to 0x4000_0000_0000_0000, because that bit is used as the L (leaf) bit by PowerISA v3.0 CPUs in radix mode. The "leaf" bit indicates that the PTE points to a page directly rather than another radix level, which is what the _PAGE_PTE bit means. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29powerpc/mm/book3s-64: Move _PAGE_PRESENT to the most significant bitPaul Mackerras
This changes _PAGE_PRESENT for 64-bit Book 3S processors from 0x2 to 0x8000_0000_0000_0000, because that is where PowerISA v3.0 CPUs in radix mode will expect to find it. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29powerpc/mm/book3s-64: Use physical addresses in upper page table tree levelsPaul Mackerras
This changes the Linux page tables to store physical addresses rather than kernel virtual addresses in the upper levels of the tree (pgd, pud and pmd) for 64-bit Book 3S machines. This also changes the hugepd pointers used to implement hugepages when the base page size is 4k to store physical addresses rather than virtual addresses (again just for 64-bit Book3S machines). This frees up some high order bits, and will be needed with PowerISA v3.0 machines which read the page table tree in hardware in radix mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29Merge tag 'v4.5-rc6' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar
applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29KVM: PPC: Book3S HV: Add tunable to control H_IPI redirectionSuresh E. Warrier
Redirecting the wakeup of a VCPU from the H_IPI hypercall to a core running in the host is usually a good idea, most workloads seemed to benefit. However, in one heavily interrupt-driven SMT1 workload, some regression was observed. This patch adds a kvm_hv module parameter called h_ipi_redirect to control this feature. The default value for this tunable is 1 - that is enable the feature. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: Send IPI to host core to wake VCPUSuresh E. Warrier
This patch adds support to real-mode KVM to search for a core running in the host partition and send it an IPI message with VCPU to be woken. This avoids having to switch to the host partition to complete an H_IPI hypercall when the VCPU which is the target of the the H_IPI is not loaded (is not running in the guest). The patch also includes the support in the IPI handler running in the host to do the wakeup by calling kvmppc_xics_ipi_action for the PPC_MSG_RM_HOST_ACTION message. When a guest is being destroyed, we need to ensure that there are no pending IPIs waiting to wake up a VCPU before we free the VCPUs of the guest. This is accomplished by: - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs before freeing any VCPUs in kvm_arch_destroy_vm(). - Any PPC_MSG_RM_HOST_ACTION messages must be executed first before any other PPC_MSG_CALL_FUNCTION messages. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVMSuresh Warrier
This patch adds the support for the kick VCPU operation for kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function provides the function to be invoked for a host side operation when poked by the real mode KVM. This is initiated by KVM by sending an IPI to any free host core. KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and rm_data to point to the VCPU to be woken up before sending the IPI. Note that we have allocated one kvmppc_host_rm_core structure per core. The above values need to be set in the structure corresponding to the core to which the IPI will be sent. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUsSuresh Warrier
The kvmppc_host_rm_ops structure keeps track of which cores are are in the host by maintaining a bitmask of active/runnable online CPUs that have not entered the guest. This patch adds support to manage the bitmask when a CPU is offlined or onlined in the host. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: Manage core host stateSuresh Warrier
Update the core host state in kvmppc_host_rm_ops whenever the primary thread of the core enters the guest or returns back. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29KVM: PPC: Book3S HV: Host-side RM data structuresSuresh Warrier
This patch defines the data structures to support the setting up of host side operations while running in real mode in the guest, and also the functions to allocate and free it. The operations are for now limited to virtual XICS operations. Currently, we have only defined one operation in the data structure: - Wake up a VCPU sleeping in the host when it receives a virtual interrupt The operations are assigned at the core level because PowerKVM requires that the host run in SMT off mode. For each core, we will need to manage its state atomically - where the state is defined by: 1. Is the core running in the host? 2. Is there a Real Mode (RM) operation pending on the host? Currently, core state is only managed at the whole-core level even when the system is in split-core mode. This just limits the number of free or "available" cores in the host to perform any host-side operations. The kvmppc_host_rm_core.rm_data allows any data to be passed by KVM in real mode to the host core along with the operation to be performed. The kvmppc_host_rm_ops structure is allocated the very first time a guest VM is started. Initial core state is also set - all online cores are in the host. This structure is never deleted, not even when there are no active guests. However, it needs to be freed when the module is unloaded because the kvmppc_host_rm_ops_hv can contain function pointers to kvm-hv.ko functions for the different supported host operations. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29powerpc/xics: Add icp_native_cause_ipi_rmSuresh Warrier
Function to cause an IPI by directly updating the MFFR register in the XICS. The function is meant for real-mode callers since they cannot use the smp_ops->cause_ipi function which uses an ioremapped address. Normal usage is for the the KVM real mode code to set the IPI message using smp_muxed_ipi_message_pass and then invoke icp_native_cause_ipi_rm to cause the actual IPI. The function requires kvm_hstate.xics_phys to have been initialized with the physical address of XICS. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29powerpc/smp: Add smp_muxed_ipi_set_messageSuresh Warrier
smp_muxed_ipi_message_pass() invokes smp_ops->cause_ipi, which uses an ioremapped address to access registers on the XICS interrupt controller to cause the IPI. Because of this real mode callers cannot call smp_muxed_ipi_message_pass() for IPI messaging. This patch creates a separate function smp_muxed_ipi_set_message just to set the IPI message without the cause_ipi routine. After calling this function to set the IPI message, real mode callers must cause the IPI by writing to the XICS registers directly. As part of this, we also change smp_muxed_ipi_message_pass to call smp_muxed_ipi_set_message to set the message instead of doing it directly inside the routine. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29powerpc/smp: Support more IPI messagesSuresh Warrier
This patch increases the number of demuxed messages for a controller with a single ipi to 8 for 64-bit systems. This is required because we want to use the IPI mechanism to send messages from a CPU running in KVM real mode in a guest to a CPU in the host to take some action. Currently, we only support 4 messages and all 4 are already taken. Define a fifth message PPC_MSG_RM_HOST_ACTION for this purpose. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-27mm: ASLR: use get_random_long()Daniel Cashman
Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27powerpc/mm/book3s-64: Free up 7 high-order bits in the Linux PTEPaul Mackerras
This frees up bits 57-63 in the Linux PTE on 64-bit Book 3S machines. In the 4k page case, this is done just by reducing the size of the RPN field to 39 bits, giving 51-bit real addresses. In the 64k page case, we had 10 unused bits in the middle of the PTE, so this moves the RPN field down 10 bits to make use of those unused bits. This means the RPN field is now 3 bits larger at 37 bits, giving 53-bit real addresses in the normal case, or 49-bit real addresses for the special 4k PFN case. We are doing this in order to be able to move some other PTE bits into the positions where PowerISA V3.0 processors will expect to find them in radix-tree mode. Ultimately we will be able to move the RPN field to lower bit positions and make it larger. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-27powerpc/mm/book3s-64: Clean up some obsolete or misleading commentsPaul Mackerras
No code changes. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-25Merge tag 'powerpc-4.5-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - eeh: Fix partial hotplug criterion from Gavin Shan - mm: Clear the invalid slot information correctly from Aneesh Kumar K.V * tag 'powerpc-4.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm/hash: Clear the invalid slot information correctly powerpc/eeh: Fix partial hotplug criterion
2016-02-25net: Facility to report route quality of connected socketsTom Herbert
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The purpose is to allow an application to give feedback to the kernel about the quality of the network path for a connected socket. The value argument indicates the type of quality report. For this initial patch the only supported advice is a value of 1 which indicates "bad path, please reroute"-- the action taken by the kernel is to call dst_negative_advice which will attempt to choose a different ECMP route, reset the TX hash for flow label and UDP source port in encapsulation, etc. This facility should be useful for connected UDP sockets where only the application can provide any feedback about path quality. It could also be useful for TCP applications that have additional knowledge about the path outside of the normal TCP control loop. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25Merge tag 'powerpc-4.5-4' into nextMichael Ellerman
Pull in our current fixes from 4.5, in particular the "Fix Multi hit ERAT" bug is causing folks some grief when testing next.
2016-02-25KVM: Use simple waitqueue for vcpu->wqMarcelo Tosatti
The problem: On -rt, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Daniel writes: Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency benchmark on mainline. The test was run 1000 times on tip/sched/core 4.4.0-rc8-01134-g0905f04: ./x86-run x86/tscdeadline_latency.flat -cpu host with idle=poll. The test seems not to deliver really stable numbers though most of them are smaller. Paolo write: "Anything above ~10000 cycles means that the host went to C1 or lower---the number means more or less nothing in that case. The mean shows an improvement indeed." Before: min max mean std count 1000.000000 1000.000000 1000.000000 1000.000000 mean 5162.596000 2019270.084000 5824.491541 20681.645558 std 75.431231 622607.723969 89.575700 6492.272062 min 4466.000000 23928.000000 5537.926500 585.864966 25% 5163.000000 1613252.750000 5790.132275 16683.745433 50% 5175.000000 2281919.000000 5834.654000 23151.990026 75% 5190.000000 2382865.750000 5861.412950 24148.206168 max 5228.000000 4175158.000000 6254.827300 46481.048691 After min max mean std count 1000.000000 1000.00000 1000.000000 1000.000000 mean 5143.511000 2076886.10300 5813.312474 21207.357565 std 77.668322 610413.09583 86.541500 6331.915127 min 4427.000000 25103.00000 5529.756600 559.187707 25% 5148.000000 1691272.75000 5784.889825 17473.518244 50% 5160.000000 2308328.50000 5832.025000 23464.837068 75% 5172.000000 2393037.75000 5853.177675 24223.969976 max 5222.000000 3922458.00000 6186.720500 42520.379830 [Patch was originaly based on the swait implementation found in the -rt tree. Daniel ported it to mainline's version and gathered the benchmark numbers for tscdeadline_latency test.] Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-24powerpc: Fix BUG_ON() reporting in real modeBalbir Singh
I ran into this issue while debugging an early boot problem. The system hit a BUG_ON() but report bug failed to print the line number and file name. The reason being that the system was running in real mode and report_bug() searches for addresses in the PAGE_OFFSET+ region. Suggested-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-24powerpc: Use BUILD_BUG_ON_MSG() for unsupported {cmp}xchg sizespan xinhui
__xchg_called_with_bad_pointer() can't tell us which code uses {cmp}xchg with an unsupported size, and no error is reported until the link stage. To make such problems easier to debug, use BUILD_BUG_ON_MSG() instead. Signed-off-by: pan xinhui <xinhui.pan@linux.vnet.ibm.com> [mpe: Tweak change log wording & add relaxed/acquire] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> fixup
2016-02-24powerpc/powernv: Add AST graphics driver to powernv_defconfigJeremy Kerr
Most current OpenPOWER platforms have an AST BMC, so add graphics support via the AST DRM driver. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-24powerpc/powernv: Add powernv firmware interface drivers to powernv_defconfigJeremy Kerr
There are a few firmware-provided interfaces for OpenPOWER platforms: the PRD infrastructure, IPMI support, and MTD access to the PNOR flash. This change adds these to powernv_defconfig Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-24powerpc/powernv: Add powernv_defconfigJeremy Kerr
This change adds a defconfig for the non-virtualised power platforms, based on pseries_defconfig, but without pseries, and little-endian, and no OF trampoline. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Acked-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-22powerpc: Add POWER9 cputable entryMichael Neuling
Add a cputable entry for POWER9. More code is required to actually boot and run on a POWER9 but this gets the base piece in which we can start building on. Copies over from POWER8 except for: - Adds a new CPU_FTR_ARCH_300 bit to start hanging new architecture features from (in subsequent patches). - Advertises new user features bits PPC_FEATURE2_ARCH_3_00 & HAS_IEEE128 when on POWER9. - Drops CPU_FTR_SUBCORE. - Drops PMU code and machine check. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-22powerpc: Use defines for __init_tlb_power[78]Michael Neuling
Use defines for literals __init_tlb_power[78] rather than hand coding them. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-22powerpc/powernv: Create separate subcores CPU feature bitMichael Neuling
Subcores isn't really part of the 2.07 architecture but currently we turn it on using the 2.07 feature bit. Subcores is really a POWER8 specific feature. This adds a new CPU_FTR bit just for subcores and moves the subcore init code over to use this. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-22powerpc/powernv: don't create OPAL msglog sysfs entry if memcons init failsAndrew Donnellan
When initialising OPAL interfaces, there is a possibility that opal_msglog_init() may fail to initialise the msglog/memory console. Fix opal_msglog_sysfs_init() so it doesn't try to create sysfs entry for the msglog if this occurs. Suggested-by: Joel Stanley <joel@jms.id.au> Fixes: 9b4fffa14906 ("powerpc/powernv: new function to access OPAL msglog") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-22powerpc/mm/hash: Clear the invalid slot information correctlyAneesh Kumar K.V
We can get a hash pte fault with 4k base page size and find the pte already inserted with 64K base page size. In that case we need to clear the existing slot information from the old pte. Fix this correctly With THP, we also clear the slot information with respect to all the 64K hash pte mapping that 16MB page. They are all invalid now. This make sure we don't find the slot valid when we fault with 4k base page size. Finding the slot valid should not result in any wrong behavior because we do check again in hash page table for the validity. But we can avoid that check completely. Fixes: a43c0eb8364c022 ("powerpc/mm: Convert 4k hash insert to C") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-22powerpc/eeh: Fix partial hotplug criterionGavin Shan
During error recovery, the device could be removed as part of the partial hotplug. The criterion used to come with partial hotplug is: if the device driver provides error_detected(), slot_reset() and resume() callbacks, it's immune from hotplug. Otherwise, it's going to experience partial hotplug during EEH recovery. But the criterion isn't correct enough: mlx4_core driver for Mellanox adapters provides error_detected(), slot_reset() callbacks, but resume() isn't there. Those Mellanox adapters won't be to involved in the partial hotplug. This fixes the criterion to a practical one: adpater with driver that provides error_detected(), slot_reset() will be immune from partial hotplug. resume() isn't mandatory. Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion") Cc: stable@vger.kernel.org #v4.4+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-20Merge tag 'powerpc-4.5-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix build error on 32-bit with checkpoint restart from Aneesh Kumar - Fix dedotify for binutils >= 2.26 from Andreas Schwab - Don't trace hcalls on offline CPUs from Denis Kirjanov - eeh: Fix stale cached primary bus from Gavin Shan - eeh: Fix stale PE primary bus from Gavin Shan - mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V - ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy * tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/ioda: Set "read" permission when "write" is set powerpc/mm: Fix Multi hit ERAT cause by recent THP update powerpc/powernv: Fix stale PE primary bus powerpc/eeh: Fix stale cached primary bus powerpc/pseries: Don't trace hcalls on offline CPUs powerpc: Fix dedotify for binutils >= 2.26 powerpc/book3s_32: Fix build error with checkpoint restart
2016-02-18mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()Dave Hansen
This plumbs a protection key through calc_vm_flag_bits(). We could have done this in calc_vm_prot_bits(), but I did not feel super strongly which way to go. It was pretty arbitrary which one to use. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arve Hjønnevåg <arve@android.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave@sr71.net> Cc: David Airlie <airlied@linux.ie> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Geliang Tang <geliangtang@163.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Leon Romanovsky <leon@leon.nu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Riley Andrews <riandrews@android.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: devel@driverdev.osuosl.org Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160212210231.E6F1F0D6@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18mm/core, x86/mm/pkeys: Differentiate instruction fetchesDave Hansen
As discussed earlier, we attempt to enforce protection keys in software. However, the code checks all faults to ensure that they are not violating protection key permissions. It was assumed that all faults are either write faults where we check PKRU[key].WD (write disable) or read faults where we check the AD (access disable) bit. But, there is a third category of faults for protection keys: instruction faults. Instruction faults never run afoul of protection keys because they do not affect instruction fetches. So, plumb the PF_INSTR bit down in to the arch_vma_access_permitted() function where we do the protection key checks. We also add a new FAULT_FLAG_INSTRUCTION. This is because handle_mm_fault() is not passed the architecture-specific error_code where we keep PF_INSTR, so we need to encode the instruction fetch information in to the arch-generic fault flags. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18mm/core: Do not enforce PKEY permissions on remote mm accessDave Hansen
We try to enforce protection keys in software the same way that we do in hardware. (See long example below). But, we only want to do this when accessing our *own* process's memory. If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then tried to PTRACE_POKE a target process which just happened to have some mprotect_pkey(pkey=6) memory, we do *not* want to deny the debugger access to that memory. PKRU is fundamentally a thread-local structure and we do not want to enforce it on access to _another_ thread's data. This gets especially tricky when we have workqueues or other delayed-work mechanisms that might run in a random process's context. We can check that we only enforce pkeys when operating on our *own* mm, but delayed work gets performed when a random user context is active. We might end up with a situation where a delayed-work gup fails when running randomly under its "own" task but succeeds when running under another process. We want to avoid that. To avoid that, we use the new GUP flag: FOLL_REMOTE and add a fault flag: FAULT_FLAG_REMOTE. They indicate that we are walking an mm which is not guranteed to be the same as current->mm and should not be subject to protection key enforcement. Thanks to Jerome Glisse for pointing out this scenario. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Dominik Vogt <vogt@linux.vnet.ibm.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Geliang Tang <geliangtang@163.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Low <jason.low2@hp.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: iommu@lists.linux-foundation.org Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keysDave Hansen
Today, for normal faults and page table walks, we check the VMA and/or PTE to ensure that it is compatible with the action. For instance, if we get a write fault on a non-writeable VMA, we SIGSEGV. We try to do the same thing for protection keys. Basically, we try to make sure that if a user does this: mprotect(ptr, size, PROT_NONE); *ptr = foo; they see the same effects with protection keys when they do this: mprotect(ptr, size, PROT_READ|PROT_WRITE); set_pkey(ptr, size, 4); wrpkru(0xffffff3f); // access disable pkey 4 *ptr = foo; The state to do that checking is in the VMA, but we also sometimes have to do it on the page tables only, like when doing a get_user_pages_fast() where we have no VMA. We add two functions and expose them to generic code: arch_pte_access_permitted(pte_flags, write) arch_vma_access_permitted(vma, write) These are, of course, backed up in x86 arch code with checks against the PTE or VMA's protection key. But, there are also cases where we do not want to respect protection keys. When we ptrace(), for instance, we do not want to apply the tracer's PKRU permissions to the PTEs from the process being traced. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Dominik Vogt <vogt@linux.vnet.ibm.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Low <jason.low2@hp.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18powerpc: atomic: Implement acquire/release/relaxed variants for cmpxchgBoqun Feng
Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on which _release variants can be built. To avoid superfluous barriers in _acquire variants, we implement these operations with assembly code rather use __atomic_op_acquire() to build them automatically. For the same reason, we keep the assembly implementation of fully ordered cmpxchg operations. However, we don't do the similar for _release, because that will require putting barriers in the middle of ll/sc loops, which is probably a bad idea. Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not compiler barriers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-18powerpc: atomic: Implement acquire/release/relaxed variants for xchgBoqun Feng
Implement xchg{,64}_relaxed and atomic{,64}_xchg_relaxed, based on these _relaxed variants, release/acquire variants and fully ordered versions can be built. Note that xchg{,64}_relaxed and atomic_{,64}_xchg_relaxed are not compiler barriers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-18powerpc: atomic: Implement atomic{, 64}_*_return_* variantsBoqun Feng
On powerpc, acquire and release semantics can be achieved with lightweight barriers("lwsync" and "ctrl+isync"), which can be used to implement __atomic_op_{acquire,release}. For release semantics, since we only need to ensure all memory accesses that issue before must take effects before the -store- part of the atomics, "lwsync" is what we only need. On the platform without "lwsync", "sync" should be used. Therefore in __atomic_op_release() we use PPC_RELEASE_BARRIER. For acquire semantics, "lwsync" is what we only need for the similar reason. However on the platform without "lwsync", we can use "isync" rather than "sync" as an acquire barrier. Therefore in __atomic_op_acquire() we use PPC_ACQUIRE_BARRIER, which is barrier() on UP, "lwsync" if available and "isync" otherwise. Implement atomic{,64}_{add,sub,inc,dec}_return_relaxed, and build other variants with these helpers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-18powerpc: Fix kgdb on little endian ppc64leBalbir Singh
I spent some time trying to use kgdb and debugged my inability to resume from kgdb_handle_breakpoint(). NIP is not incremented and that leads to a loop in the debugger. I've tested this lightly on a virtual instance with KDB enabled. After the patch, I am able to get the "go" command to work as expected. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-17powerpc/ioda: Set "read" permission when "write" is setAlexey Kardashevskiy
Quite often drivers set only "write" permission assuming that this includes "read" permission as well and this works on plenty of platforms. However IODA2 is strict about this and produces an EEH when "read" permission is not set and reading happens. This adds a workaround in the IODA code to always add the "read" bit when the "write" bit is set. Fixes: 10b35b2b7485 ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE") Cc: stable@vger.kernel.org # 4.2+ Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>