Age | Commit message (Collapse) | Author |
|
The exception table fixup adjusts a failed page fault's interrupt return
location if it was taken at an address specified in the exception table,
to a corresponding fixup handler address.
Introduce a variation of that idea which adds a fixup table for NMIs and
soft-masked asynchronous interrupts. This will be used to protect
certain critical sections that are sensitive to being clobbered by
interrupts coming in (due to using the same SPRs and/or irq soft-mask
state).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-10-npiggin@gmail.com
|
|
The next patch would like to move interrupt return assembly code to a low
location before general text, so move it into its own file and include via
head_64.S
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-7-npiggin@gmail.com
|
|
When an interrupt is taken, the SRR registers are set to return to where
it left off. Unless they are modified in the meantime, or the return
address or MSR are modified, there is no need to reload these registers
when returning from interrupt.
Introduce per-CPU flags that track the validity of SRR and HSRR
registers. These are cleared when returning from interrupt, when
using the registers for something else (e.g., OPAL calls), when
adjusting the return address or MSR of a context, and when context
switching (which changes the return address and MSR).
This improves the performance of interrupt returns.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in fixup patch from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com
|
|
The msr argument is not used, remove it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-3-npiggin@gmail.com
|
|
Pull in some more ppc KVM patches we are keeping in our topic branch.
In particular this brings in the series to add H_RPT_INVALIDATE.
|
|
Enable support for process-scoped invalidations from nested
guests and partition-scoped invalidations for nested guests.
Process-scoped invalidations for any level of nested guests
are handled by implementing H_RPT_INVALIDATE handler in the
nested guest exit path in L0.
Partition-scoped invalidation requests are forwarded to the
right nested guest, handled there and passed down to L0
for eventual handling.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
[aneesh: Nested guest partition-scoped invalidation changes]
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Squash in fixup patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-5-bharata@linux.ibm.com
|
|
H_RPT_INVALIDATE does two types of TLB invalidations:
1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
This is currently not used in KVM as GTSE is not usually
disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
behalf of an L2 guest. This is currently handled
by H_TLB_INVALIDATE hcall and this new replaces the old that.
This commit enables process-scoped invalidations for L1 guests.
Support for process-scoped and partition-scoped invalidations
from/for nested guests will be added separately.
Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID. This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.
A description of H_RPT_INVALIDATE follows:
int64 /* H_Success: Return code on successful completion */
/* H_Busy - repeat the call with the same */
/* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
translation
lookaside information */
uint64 id, /* PID/LPID to invalidate */
uint64 target, /* Invalidation target */
uint64 type, /* Type of lookaside information */
uint64 pg_sizes, /* Page sizes */
uint64 start, /* Start of Effective Address (EA)
range (inclusive) */
uint64 end) /* End of EA range (exclusive) */
Invalidation targets (target)
-----------------------------
Core MMU 0x01 /* All virtual processors in the
partition */
Core local MMU 0x02 /* Current virtual processor */
Nest MMU 0x04 /* All nest/accelerator agents
in use by the partition */
A combination of the above can be specified,
except core and core local.
Type of translation to invalidate (type)
---------------------------------------
NESTED 0x0001 /* invalidate nested guest partition-scope */
TLB 0x0002 /* Invalidate TLB */
PWC 0x0004 /* Invalidate Page Walk Cache */
PRT 0x0008 /* Invalidate caching of Process Table
Entries if NESTED is clear */
PAT 0x0008 /* Invalidate caching of Partition Table
Entries if NESTED is set */
A combination of the above can be specified.
Page size mask (pages)
----------------------
4K 0x01
64K 0x02
2M 0x04
1G 0x08
All sizes (-1UL)
A combination of the above can be specified.
All page sizes can be selected with -1.
Semantics: Invalidate radix tree lookaside information
matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
are different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and
end should be a valid Quadrant address and end > start.
* Return H_NotSupported if the partition is not in running in radix
translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid
addresses. Else start and end should be aligned to 4kB (lower 11
bits clear).
* If NESTED is clear, then invalidate process scoped lookaside
information. Else pid specifies a nested LPID, and the invalidation
is performed on nested guest partition table and nested guest
partition scope real addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
and quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
Those which are partially covered are considered outside
invalidation range, which allows a caller to optimally invalidate
ranges that may contain mixed page sizes.
* Return H_SUCCESS on success.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com
|
|
Add a field to mmu_psize_def to store the page size encodings
of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
AP encodings. This will be used when invalidating with required
page size encoding in the hcall.
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-3-bharata@linux.ibm.com
|
|
The type values H_RPTI_TYPE_PRT and H_RPTI_TYPE_PAT indicate
invalidating the caching of process and partition scoped entries
respectively.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-2-bharata@linux.ibm.com
|
|
This is a simple native ICS backend that matches the layout of
the Microwatt implementation of ICS.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
[mpe: Add empty ics_native_init() to unbreak non-microwatt builds]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
fixup-ics
Link: https://lore.kernel.org/r/YMwW8cxrwB2W5EUN@thinks.paulus.ozlabs.org
|
|
In addition to the set_memory_xx() functions which allows to change
the memory attributes of not (yet) used memory regions, implement a
set_memory_attr() function to:
- set the final memory protection after init on currently used
kernel regions.
- enable/disable kernel memory regions in the scope of DEBUG_PAGEALLOC.
Unlike the set_memory_xx() which can act in three step as the regions
are unused, this function must modify 'on the fly' as the kernel is
executing from them. At the moment only PPC32 will use it and changing
page attributes on the fly is not an issue.
Reported-by: kbuild test robot <lkp@intel.com>
[ruscur: cast "data" to unsigned long instead of int]
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-9-jniethe5@gmail.com
|
|
Make module_alloc() use PAGE_KERNEL protections instead of
PAGE_KERNEL_EXEX if Strict Module RWX is enabled.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-4-jniethe5@gmail.com
|
|
setup_text_poke_area() is a late init call so it runs before
mark_rodata_ro() and after the init calls. This lets all the init code
patching simply write to their locations. In the future, kprobes is
going to allocate its instruction pages RO which means they will need
setup_text__poke_area() to have been already called for their code
patching. However, init_kprobes() (which allocates and patches some
instruction pages) is an early init call so it happens before
setup_text__poke_area().
start_kernel() calls poking_init() before any of the init calls. On
powerpc, poking_init() is currently a nop. setup_text_poke_area() relies
on kernel virtual memory, cpu hotplug and per_cpu_areas being setup.
setup_per_cpu_areas(), boot_cpu_hotplug_init() and mm_init() are called
before poking_init().
Turn setup_text_poke_area() into poking_init().
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Russell Currey <ruscur@russell.cc>
[mpe: Fold in missing prototype for poking_init() from lkp]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-3-jniethe5@gmail.com
|
|
The set_memory_{ro/rw/nx/x}() functions are required for
STRICT_MODULE_RWX, and are generally useful primitives to have. This
implementation is designed to be generic across powerpc's many MMUs.
It's possible that this could be optimised to be faster for specific
MMUs.
This implementation does not handle cases where the caller is attempting
to change the mapping of the page it is executing from, or if another
CPU is concurrently using the page being altered. These cases likely
shouldn't happen, but a more complex implementation with MMU-specific code
could safely handle them.
On hash, the linear mapping is not kept in the linux pagetable, so this
will not change the protection if used on that range. Currently these
functions are not used on the linear map so just WARN for now.
apply_to_existing_page_range() does not work on huge pages so for now
disallow changing the protection of huge pages.
[jpn: - Allow set memory functions to be used without Strict RWX
- Hash: Disallow certain regions
- Have change_page_attr() take function pointers to manipulate ptes
- Radix: Add ptesync after set_pte_at()]
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-2-jniethe5@gmail.com
|
|
This allows the hypervisor / firmware to describe this workarounds to
the guest.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-4-npiggin@gmail.com
|
|
Rather than tying this mitigation to RFI L1D flush requirement, add a
new bit for it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-3-npiggin@gmail.com
|
|
H_GET_CPU_CHARACTERISTICS
This allows the hypervisor / firmware to describe these workarounds to
the guest.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-2-npiggin@gmail.com
|
|
The POWER9 vCPU TLB management code assumes all threads in a core share
a TLB, and that TLBIEL execued by one thread will invalidate TLBs for
all threads. This is not the case for SMT8 capable POWER9 and POWER10
(big core) processors, where the TLB is split between groups of threads.
This results in TLB multi-hits, random data corruption, etc.
Fix this by introducing cpu_first_tlb_thread_sibling etc., to determine
which siblings share TLBs, and use that in the guest TLB flushing code.
[npiggin@gmail.com: add changelog and comment]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210602040441.3984352-1-npiggin@gmail.com
|
|
This patch adds VAS window allocatioa/close with the corresponding
hcalls. Also changes to integrate with the existing user space VAS
API and provide register/unregister functions to NX pseries driver.
The driver register function is used to create the user space
interface (/dev/crypto/nx-gzip) and unregister to remove this entry.
The user space process opens this device node and makes an ioctl
to allocate VAS window. The close interface is used to deallocate
window.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e8d956bace3f182c4d2e66e343ff37cb0391d1fd.camel@linux.ibm.com
|
|
PowerVM introduces two different type of credits: Default and Quality
of service (QoS).
The total number of default credits available on each LPAR depends
on CPU resources configured. But these credits can be shared or
over-committed across LPARs in shared mode which can result in
paste command failure (RMA_busy). To avoid NX HW contention, the
hypervisor ntroduces QoS credit type which makes sure guaranteed
access to NX esources. The system admins can assign QoS credits
or each LPAR via HMC.
Default credit type is used to allocate a VAS window by default as
on PowerVM implementation. But the process can pass
VAS_TX_WIN_FLAG_QOS_CREDIT flag with VAS_TX_WIN_OPEN ioctl to open
QoS type window.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa950b7b8e8077364267720274a7b9ec34e76e73.camel@linux.ibm.com
|
|
This patch adds hcalls and other definitions. Also define structs
that are used in VAS implementation on PowerVM.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b4b8c594c27ee4aa6be9dc6dc4ee7331571cbbe8.camel@linux.ibm.com
|
|
Many elements in vas_struct are used on PowerNV and PowerVM
platforms. vas_window is used for both TX and RX windows on
PowerNV and for TX windows on PowerVM. So some elements are
specific to these platforms.
So this patch defines common vas_window and platform
specific window structs (pnv_vas_window on PowerNV). Also adds
the corresponding changes in PowerNV vas code.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1698c35c158dfe52c6d2166667823d3d4a463353.camel@linux.ibm.com
|
|
If a coprocessor encounters an error translating an address, the
VAS will cause an interrupt in the host. The kernel processes
the fault by updating CSB. This functionality is same for both
powerNV and pseries. So this patch moves these functions to
common vas-api.c and the actual functionality is not changed.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bf8d5b0770fa1ef5cba88c96580caa08d999d3b5.camel@linux.ibm.com
|
|
Take pid and mm references when each window opens and drops during
close. This functionality is needed for powerNV and pseries. So
this patch defines the existing code as functions in common book3s
platform vas-api.c
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2fa40df962250a737c804e58202924717b39e381.camel@linux.ibm.com
|
|
PowerNV uses registers to open/close VAS windows, and getting the
paste address. Whereas the hypervisor calls are used on PowerVM.
This patch adds the platform specific user space window operations
and register with the common VAS user space interface.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f85091f4ace67f951ac04d60394d67b21e2f5d3c.camel@linux.ibm.com
|
|
powerNV and pseries drivers register / unregister to the corresponding
platform specific VAS separately. Then these VAS functions call the
common API with the specific window operations. So rename powerNV VAS
API register/unregister functions.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9db00d58dbdcb7cfc07a1df95f3d2a9e3e5d746a.camel@linux.ibm.com
|
|
The pseries platform will share vas and nx code and interfaces
with the PowerNV platform, so create the
arch/powerpc/platforms/book3s/ directory and move VAS API code
there. Functionality is not changed.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e05c8db17b9eabe3545b902d034238e4c6c08180.camel@linux.ibm.com
|
|
Merge some powerpc KVM patches from our topic branch.
In particular this brings in Nick's big series rewriting parts of the
guest entry/exit path in C.
Conflicts:
arch/powerpc/kernel/security.c
arch/powerpc/kvm/book3s_hv_rmhandlers.S
|
|
linux/processor.h has exactly same defination for spin_until_cond.
Drop the redundant defination in asm/processor.h
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1fff2054e5dfc00329804dbd3f2a91667c9a8aff.1623438544.git.christophe.leroy@csgroup.eu
|
|
update_power8_hid0() is used only by powernv platform subcore.c
Move it there.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/37f41d74faa0c66f90b373e243e8b1ee37a1f6fa.1623219019.git.christophe.leroy@csgroup.eu
|
|
proc_trap() has never been used, remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/827944ea12d470c2f862635f48b5ee6c1520351f.1623217909.git.christophe.leroy@csgroup.eu
|
|
Don't duplicate swapper_pg_dir[] in each platform's head.S
Define it in mm/pgtable.c
Define MAX_PTRS_PER_PGD because on book3s/64 PTRS_PER_PGD is
not a constant.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5e3f1b8a4695c33ccc80aa3870e016bef32b85e1.1623063174.git.christophe.leroy@csgroup.eu
|
|
ppc8xx already has set_context() in C.
Other ones have it in assembly. The only thing it does is to
write the context id into SPRN_PID.
Do it in C.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a5d0759064f3831c6b88af49ef5d3b05ba1c4dad.1622712515.git.christophe.leroy@csgroup.eu
|
|
All KUAP helpers defined in asm/kup.h are single line functions
that should be inlined. But on book3s/32 build, we get many
instances of <prevent_write_to_user.constprop.0>.
Force inlining of those helpers.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8479a862e165a57a855292d47e24c259a578f5a0.1622711627.git.christophe.leroy@csgroup.eu
|
|
prevent_user_access() doesn't use anymore to/from/size parameters.
Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b7113662fd2c26e4c33e9d705de324bd3860822e.1622708530.git.christophe.leroy@csgroup.eu
|
|
book3s/32 was the only user of KUAP_CURRENT_XXX.
After rework of book3s/32 KUAP, it is not used anymore.
Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/549214ecf6887d965645e664520d4886663c5ffb.1622708530.git.christophe.leroy@csgroup.eu
|
|
On book3s/32, KUAP is provided by toggling Ks bit in segment registers.
One segment register addresses 256M of virtual memory.
At the time being, KUAP implements a complex logic to apply the
unlock/lock on the exact number of segments covering the user range
to access, with saving the boundaries of the range of segments in
a member of thread struct.
But most if not all user accesses are within a single segment.
Rework KUAP with a different approach:
- Open only one segment, the one corresponding to the starting
address of the range to be accessed.
- If a second segment is involved, it will generate a page fault. The
segment will then be open by the page fault handler.
The kuap member of thread struct will now contain:
- The start address of the current on going user access, that will be
used to know which segment to lock at the end of the user access.
- ~0 when no user access is open
- ~1 when additionnal segments are opened by a page fault.
Then, at lock time
- When only one segment is open, close it.
- When several segments are open, close all user segments.
Almost 100% of the time, only one segment will be involved.
In interrupts, inline the function that unlock/lock all segments,
because not inlining them implies a lot of register save/restore.
With the patch, writing value 128 in userspace in perf_copy_attr() is
done with 16 instructions:
3890: 93 82 04 dc stw r28,1244(r2)
3894: 7d 20 e5 26 mfsrin r9,r28
3898: 55 29 00 80 rlwinm r9,r9,0,2,0
389c: 7d 20 e1 e4 mtsrin r9,r28
38a0: 4c 00 01 2c isync
38a4: 39 20 00 80 li r9,128
38a8: 91 3c 00 00 stw r9,0(r28)
38ac: 81 42 04 dc lwz r10,1244(r2)
38b0: 39 00 ff ff li r8,-1
38b4: 91 02 04 dc stw r8,1244(r2)
38b8: 2c 0a ff fe cmpwi r10,-2
38bc: 41 82 00 88 beq 3944 <perf_copy_attr+0x36c>
38c0: 7d 20 55 26 mfsrin r9,r10
38c4: 65 29 40 00 oris r9,r9,16384
38c8: 7d 20 51 e4 mtsrin r9,r10
38cc: 4c 00 01 2c isync
...
3944: 48 00 00 01 bl 3944 <perf_copy_attr+0x36c>
3944: R_PPC_REL24 kuap_lock_all_ool
Before the patch it was 118 instructions. In reality only 42 are
executed in most cases, but GCC is not able to see that a properly
aligned user access cannot involve more than one segment.
5060: 39 1d 00 04 addi r8,r29,4
5064: 3d 20 b0 00 lis r9,-20480
5068: 7c 08 48 40 cmplw r8,r9
506c: 40 81 00 08 ble 5074 <perf_copy_attr+0x2cc>
5070: 3d 00 b0 00 lis r8,-20480
5074: 39 28 ff ff addi r9,r8,-1
5078: 57 aa 00 06 rlwinm r10,r29,0,0,3
507c: 55 29 27 3e rlwinm r9,r9,4,28,31
5080: 39 29 00 01 addi r9,r9,1
5084: 7d 29 53 78 or r9,r9,r10
5088: 91 22 04 dc stw r9,1244(r2)
508c: 7d 20 ed 26 mfsrin r9,r29
5090: 55 29 00 80 rlwinm r9,r9,0,2,0
5094: 7c 08 50 40 cmplw r8,r10
5098: 40 81 00 c0 ble 5158 <perf_copy_attr+0x3b0>
509c: 7d 46 50 f8 not r6,r10
50a0: 7c c6 42 14 add r6,r6,r8
50a4: 54 c6 27 be rlwinm r6,r6,4,30,31
50a8: 7d 20 51 e4 mtsrin r9,r10
50ac: 3c ea 10 00 addis r7,r10,4096
50b0: 39 29 01 11 addi r9,r9,273
50b4: 7f 88 38 40 cmplw cr7,r8,r7
50b8: 55 29 02 06 rlwinm r9,r9,0,8,3
50bc: 40 9d 00 9c ble cr7,5158 <perf_copy_attr+0x3b0>
50c0: 2f 86 00 00 cmpwi cr7,r6,0
50c4: 41 9e 00 4c beq cr7,5110 <perf_copy_attr+0x368>
50c8: 2f 86 00 01 cmpwi cr7,r6,1
50cc: 41 9e 00 2c beq cr7,50f8 <perf_copy_attr+0x350>
50d0: 2f 86 00 02 cmpwi cr7,r6,2
50d4: 41 9e 00 14 beq cr7,50e8 <perf_copy_attr+0x340>
50d8: 7d 20 39 e4 mtsrin r9,r7
50dc: 39 29 01 11 addi r9,r9,273
50e0: 3c e7 10 00 addis r7,r7,4096
50e4: 55 29 02 06 rlwinm r9,r9,0,8,3
50e8: 7d 20 39 e4 mtsrin r9,r7
50ec: 39 29 01 11 addi r9,r9,273
50f0: 3c e7 10 00 addis r7,r7,4096
50f4: 55 29 02 06 rlwinm r9,r9,0,8,3
50f8: 7d 20 39 e4 mtsrin r9,r7
50fc: 3c e7 10 00 addis r7,r7,4096
5100: 39 29 01 11 addi r9,r9,273
5104: 7f 88 38 40 cmplw cr7,r8,r7
5108: 55 29 02 06 rlwinm r9,r9,0,8,3
510c: 40 9d 00 4c ble cr7,5158 <perf_copy_attr+0x3b0>
5110: 7d 20 39 e4 mtsrin r9,r7
5114: 39 29 01 11 addi r9,r9,273
5118: 3c c7 10 00 addis r6,r7,4096
511c: 55 29 02 06 rlwinm r9,r9,0,8,3
5120: 7d 20 31 e4 mtsrin r9,r6
5124: 39 29 01 11 addi r9,r9,273
5128: 3c c6 10 00 addis r6,r6,4096
512c: 55 29 02 06 rlwinm r9,r9,0,8,3
5130: 7d 20 31 e4 mtsrin r9,r6
5134: 39 29 01 11 addi r9,r9,273
5138: 3c c7 30 00 addis r6,r7,12288
513c: 55 29 02 06 rlwinm r9,r9,0,8,3
5140: 7d 20 31 e4 mtsrin r9,r6
5144: 3c e7 40 00 addis r7,r7,16384
5148: 39 29 01 11 addi r9,r9,273
514c: 7f 88 38 40 cmplw cr7,r8,r7
5150: 55 29 02 06 rlwinm r9,r9,0,8,3
5154: 41 9d ff bc bgt cr7,5110 <perf_copy_attr+0x368>
5158: 4c 00 01 2c isync
515c: 39 20 00 80 li r9,128
5160: 91 3d 00 00 stw r9,0(r29)
5164: 38 e0 00 00 li r7,0
5168: 90 e2 04 dc stw r7,1244(r2)
516c: 7d 20 ed 26 mfsrin r9,r29
5170: 65 29 40 00 oris r9,r9,16384
5174: 40 81 00 c0 ble 5234 <perf_copy_attr+0x48c>
5178: 7d 47 50 f8 not r7,r10
517c: 7c e7 42 14 add r7,r7,r8
5180: 54 e7 27 be rlwinm r7,r7,4,30,31
5184: 7d 20 51 e4 mtsrin r9,r10
5188: 3d 4a 10 00 addis r10,r10,4096
518c: 39 29 01 11 addi r9,r9,273
5190: 7c 08 50 40 cmplw r8,r10
5194: 55 29 02 06 rlwinm r9,r9,0,8,3
5198: 40 81 00 9c ble 5234 <perf_copy_attr+0x48c>
519c: 2c 07 00 00 cmpwi r7,0
51a0: 41 82 00 4c beq 51ec <perf_copy_attr+0x444>
51a4: 2c 07 00 01 cmpwi r7,1
51a8: 41 82 00 2c beq 51d4 <perf_copy_attr+0x42c>
51ac: 2c 07 00 02 cmpwi r7,2
51b0: 41 82 00 14 beq 51c4 <perf_copy_attr+0x41c>
51b4: 7d 20 51 e4 mtsrin r9,r10
51b8: 39 29 01 11 addi r9,r9,273
51bc: 3d 4a 10 00 addis r10,r10,4096
51c0: 55 29 02 06 rlwinm r9,r9,0,8,3
51c4: 7d 20 51 e4 mtsrin r9,r10
51c8: 39 29 01 11 addi r9,r9,273
51cc: 3d 4a 10 00 addis r10,r10,4096
51d0: 55 29 02 06 rlwinm r9,r9,0,8,3
51d4: 7d 20 51 e4 mtsrin r9,r10
51d8: 3d 4a 10 00 addis r10,r10,4096
51dc: 39 29 01 11 addi r9,r9,273
51e0: 7c 08 50 40 cmplw r8,r10
51e4: 55 29 02 06 rlwinm r9,r9,0,8,3
51e8: 40 81 00 4c ble 5234 <perf_copy_attr+0x48c>
51ec: 7d 20 51 e4 mtsrin r9,r10
51f0: 39 29 01 11 addi r9,r9,273
51f4: 3c ea 10 00 addis r7,r10,4096
51f8: 55 29 02 06 rlwinm r9,r9,0,8,3
51fc: 7d 20 39 e4 mtsrin r9,r7
5200: 39 29 01 11 addi r9,r9,273
5204: 3c e7 10 00 addis r7,r7,4096
5208: 55 29 02 06 rlwinm r9,r9,0,8,3
520c: 7d 20 39 e4 mtsrin r9,r7
5210: 39 29 01 11 addi r9,r9,273
5214: 3c ea 30 00 addis r7,r10,12288
5218: 55 29 02 06 rlwinm r9,r9,0,8,3
521c: 7d 20 39 e4 mtsrin r9,r7
5220: 3d 4a 40 00 addis r10,r10,16384
5224: 39 29 01 11 addi r9,r9,273
5228: 7c 08 50 40 cmplw r8,r10
522c: 55 29 02 06 rlwinm r9,r9,0,8,3
5230: 41 81 ff bc bgt 51ec <perf_copy_attr+0x444>
5234: 4c 00 01 2c isync
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Export the ool handlers to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu
|
|
PPC64 uses MMU features to enable/disable KUAP at boot time.
But feature fixups are applied way too early on PPC32.
Now that all KUAP related actions are in C following the
conversion of KUAP initial setup and context switch in C,
static branches can be used to enable/disable KUAP.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Export disable_kuap_key to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cd79e8008455fba5395d099f9bb1305c039b931c.1622708530.git.christophe.leroy@csgroup.eu
|
|
PPC64 uses MMU features to enable/disable KUEP at boot time.
But feature fixups are applied way too early on PPC32.
Now that all KUEP related actions are in C following the
conversion of KUEP initial setup and context switch in C,
static branches can be used to enable/disable KUEP.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7745a2c3a08ec46302920a3f48d1cb9b5469dbbb.1622708530.git.christophe.leroy@csgroup.eu
|
|
segment register has VSID on bits 8-31.
Bits 4-7 are reserved, there is no requirement to set them to 0.
VSIDs are calculated from VSID of SR0 by adding 0x111.
Even with highest possible VSID which would be 0xFFFFF0,
adding 16 times 0x111 results in 0x1001100.
So, the reserved bits are never overflowed, no need to clear
the reserved bits after each calculation.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ddc1cfd2ec8f3b2395c6a4d7f2b0c1aa1b1e64fb.1622708530.git.christophe.leroy@csgroup.eu
|
|
switch_mmu_context() does things that can easily be done in C.
For updating user segments, we have update_user_segments().
As mentionned in commit b5efec00b671 ("powerpc/32s: Move KUEP
locking/unlocking in C"), update_user_segments() has the loop
unrolled which is a significant performance gain.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/05c0875ad8220c03452c3a334946e207c6ca04d6.1622708530.git.christophe.leroy@csgroup.eu
|
|
In order to reuse it in switch_mmu_context(), this
patch moves CTX_TO_VSID() macro into asm/book3s/32/mmu-hash.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/26b36ef2939234a04b37baf6ffe50cba81f5d1b7.1622708530.git.christophe.leroy@csgroup.eu
|
|
KUEP implements the update of user segment registers.
Move it into mmu-hash.h in order to use it from other places.
And inline kuep_lock() and kuep_unlock(). Inlining kuep_lock() is
important for system_call_exception(), otherwise system_call_exception()
has to save into stack the system call parameters that are used just
after, and doing that takes more instructions than kuep_lock() itself.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24591ca480d14a62ef910e38a5273d551262c4a2.1622708530.git.christophe.leroy@csgroup.eu
|
|
PPC64 uses MMU features to enable/disable KUAP at boot time.
But feature fixups are applied way too early on PPC32.
But since commit c16728835eec ("powerpc/32: Manage KUAP in C"),
all KUAP is in C so it is now possible to use static branches.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3dca510ce555335261a47c4799167da698f569c0.1622782111.git.christophe.leroy@csgroup.eu
|
|
Powerpc 44x has two bits for exec protection in TLBs: one
for user (UX) and one for superviser (SX).
Clear SX on user pages in TLB miss handlers to provide KUEP.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/169310e08152aa1d96c979770291d165ec6896ae.1622616032.git.christophe.leroy@csgroup.eu
|
|
Use PPC_RAW_ macros to simplify the code.
And use PPC_LO/PPC_HI instead of IMM_L/IMM_H which are for
internal use inside ppc-opcode.h
Those macros are self explanatory, comments can go as well.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5a167b8ba4d33a5c09cd504f0c862e25ffe85459.1621516826.git.christophe.leroy@csgroup.eu
|
|
ppc_inst() ppc_inst_prefixed() ppc_inst_swab() can easily be made common
to both PPC32 and PPC64.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d54c63dcac6d190e1cc0d2fe3259d6e621928cdf.1621516826.git.christophe.leroy@csgroup.eu
|
|
'struct ppc_inst' is an internal representation of an instruction, but
in-memory instructions are and will remain a table of 'u32' forever.
Replace all 'struct ppc_inst *' used for locating an instruction in
memory by 'u32 *'. This removes a lot of undue casts to 'struct
ppc_inst *'.
It also helps locating ab-use of 'struct ppc_inst' dereference.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix ppc_inst_next(), use u32 instead of unsigned int]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
|
|
instr_is_branch_to_addr() is only used in code-patching.c
Make it static.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5f6b9c8c83170ed310953eac2f5b14539bfc964a.1621516826.git.christophe.leroy@csgroup.eu
|
|
Avoid casting/dereferencing ppc_inst() as u64* , check each member
of the struct when relevant.
And remove the 0xff initialisation of the suffix for non
prefixed instruction. An instruction with 0xff as a suffix
might be invalid, but still is a prefixed instruction and
has to be considered as this.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d8b155e930b7a9708ca110e8ff0ace6713a7af75.1621516826.git.christophe.leroy@csgroup.eu
|