summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-28selftests/powerpc: Update bhrb filter sampling test for multiple branch filtersAthira Rajeev
For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type, ie branch filters are supported. The testcase "bhrb_filter_map_test" tests the valid and invalid filter maps in different powerpc platforms. Update this testcase to include scenario to cover multiple branch filters at sametime. Since powerpc doesn't support multiple filters at sametime, expect failure during perf_event_open. Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921145255.20972-3-atrajeev@linux.vnet.ibm.com
2022-09-28powerpc/perf: Fix branch_filter support for multiple filtersAthira Rajeev
For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type ie branch filters are supported. The branch filters are requested via event attribute "branch_sample_type". Multiple branch filters can be passed in event attribute. eg: $ perf record -b -o- -B --branch-filter any,ind_call true None of the Power PMUs support having multiple branch filters at the same time. Branch filters for branch stack sampling is set via MMCRA IFM bits [32:33]. But currently when requesting for multiple filter types, the "perf record" command does not report any error. eg: $ perf record -b -o- -B --branch-filter any,save_type true $ perf record -b -o- -B --branch-filter any,ind_call true The "bhrb_filter_map" function in PMU driver code does the validity check for supported branch filters. But this check is done for single filter. Hence "perf record" will proceed here without reporting any error. Fix power_pmu_event_init() to return EOPNOTSUPP when multiple branch filters are requested in the event attr. After the fix: $ perf record --branch-filter any,ind_call -- ls Error: cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel<disgoel@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Tweak comment and change log wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921145255.20972-1-atrajeev@linux.vnet.ibm.com
2022-09-28powerpc/64s/interrupt: halt early boot interrupts if paca is not set upNicholas Piggin
Ensure r13 is zero from very early in boot until it gets set to the boot paca pointer. This allows early program and mce handlers to halt if there is no valid paca, rather than potentially run off into the weeds. This preserves register and memory contents for low level debugging tools. Nothing could be printed to console at this point in any case because even udbg is only set up after the boot paca is set, so this shouldn't be missed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-6-npiggin@gmail.com
2022-09-28powerpc/64: don't set boot CPU's r13 to paca until the structure is set upNicholas Piggin
The idea is to get to the point where if r13 is non-zero, then it should contain a reasonable paca. This can be used in early boot program check and machine check handlers to avoid running off into the weeds if they hit before r13 has a paca. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-5-npiggin@gmail.com
2022-09-28powerpc/64: avoid using r13 in relocateNicholas Piggin
relocate() uses r13 in early boot before it is used for the paca. Use a different register for this so r13 is kept unchanged until it is set to the paca pointer. Avoid r14 as well while we're here, there's no reason not to use the volatile registers which is a bit less surprising, and r14 could be used as another fixed reg one day. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-4-npiggin@gmail.com
2022-09-28powerpc/64s: early boot machine check handlerNicholas Piggin
Use the early boot interrupt fixup in the machine check handler to allow the machine check handler to run before interrupt endian is set up. Branch to an early boot handler that just does a basic crash, which allows it to run before ppc_md is set up. MSR[ME] is enabled on the boot CPU earlier, and the machine check stack is temporarily set to the middle of the init task stack. This allows machine checks (e.g., due to invalid data access in real mode) to print something useful earlier in boot (as soon as udbg is set up, if CONFIG_PPC_EARLY_DEBUG=y). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-3-npiggin@gmail.com
2022-09-28powerpc/64s/interrupt: move early boot ILE fixup into a macroNicholas Piggin
In preparation for using this sequence in machine check interrupt, move it into a macro, with a small change to make it position independent. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-2-npiggin@gmail.com
2022-09-28powerpc/64e: provide an addressing macro for use with TOC in alternate registerNicholas Piggin
The interrupt entry code carefully saves a minimal number of registers, so in some places the TOC is required, it is loaded into a different register, so provide a macro that can supply an alternate TOC register. This continues to use got addressing because TOC-relative results in "got/toc optimization is not supported" messages by the linker. Having r2 be one of the saved registers and using that for TOC addressing may be the best way to avoid that and switch this to TOC addressing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-6-npiggin@gmail.com
2022-09-28powerpc/64: provide a helper macro to load r2 with the kernel TOCNicholas Piggin
A later change stops the kernel using r2 and loads it with a poison value. Provide a PACATOC loading abstraction which can hide this detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-5-npiggin@gmail.com
2022-09-28powerpc/64: switch asm helpers from GOT to TOC relative addressingNicholas Piggin
There is no need to use GOT addressing within the kernel. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-4-npiggin@gmail.com
2022-09-28powerpc/64: asm use consistent global variable declaration and accessNicholas Piggin
Use helper macros to access global variables, and place them in .data sections rather than in .toc. Putting addresses in TOC is not required because the kernel is linked with a single TOC. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-3-npiggin@gmail.com
2022-09-28powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKERNicholas Piggin
Using a 32-bit constant for this marker allows it to be loaded with two ALU instructions, like 32-bit. This avoids a TOC entry and a TOC load that depends on the r2 value that has just been loaded from the PACA. This changes the value for 32-bit as well, so both have the same value in the low 4 bytes and 64-bit has 0 in the top bytes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-2-npiggin@gmail.com
2022-09-28powerpc/64s: POWER10 CPU Kconfig build optionNicholas Piggin
This adds basic POWER10_CPU option, which builds with -mcpu=power10. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220923033004.536127-1-npiggin@gmail.com
2022-09-28powerpc/pseries: Move vas_migration_handler early during migrationHaren Myneni
When the migration is initiated, the hypervisor changes VAS mappings as part of pre-migration event. Then the OS gets the migration event which closes all VAS windows before the migration starts. NX generates continuous faults until windows are closed and the user space can not differentiate these NX faults coming from the actual migration. So to reduce this time window, close VAS windows first in pseries_migrate_partition(). Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d8efade91dda831c9ed4abb226dab627da594c5f.camel@linux.ibm.com
2022-09-28powerpc/64/irq: tidy soft-masked irq replay and improve documentationNicholas Piggin
irq replay is quite complicated because of softirq processing which itself enables and disables irqs. Several considerations need to be accounted for due to this, and they are not clearly documented. Refactor the irq replay code a bit to tidy and deduplicate some common functions. Add comments, debug checks. This has a minor functional change that irq tracing enable/disable is done after each interrupt replayed, rather than after a batch. It also re-sets state to IRQS_ALL_DISABLED after an interrupt, which doesn't matter much because interrupts are hard disabled at this point, but it is more consistent with how interrupt handlers are called. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-8-npiggin@gmail.com
2022-09-28powerpc/64/interrupt: avoid BUG/WARN recursion in interrupt entryNicholas Piggin
BUG/WARN are handled with a program interrupt which can turn into an infinite recursion when there are bugs in interrupt handler entry (which can be irritated by bugs in other parts of the code). There is one feeble attempt to avoid this recursion, but it misses several cases. Make a tidier macro for this and switch most bugs in the interrupt entry wrapper over to use it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-7-npiggin@gmail.com
2022-09-28powerpc/64s/interrupt: masked handler debug check for previous hard disableNicholas Piggin
Prior changes eliminated cases of masked PACA_IRQ_MUST_HARD_MASK interrupts that re-fire due to MSR[EE] being enabled while they are pending. Add a debug check in the masked interrupt handler to catch if this occurs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-6-npiggin@gmail.com
2022-09-28powerpc/64s: Fix irq state management in runlatch functionsNicholas Piggin
When irqs are soft-disabled, MSR[EE] is volatile and can change from 1 to 0 asynchronously (if a PACA_IRQ_MUST_HARD_MASK interrupt hits). So it can not be used to check hard IRQ enabled status, except to confirm it is disabled. ppc64_runlatch_on/off functions use MSR this way to decide whether to re-enable MSR[EE] after disabling it, which leads to MSR[EE] being enabled when it shouldn't be (when a PACA_IRQ_MUST_HARD_MASK had disabled it between reading the MSR and clearing EE). This has been tolerated in the kernel previously, and it doesn't seem to cause a problem, but it is unexpected and may trip warnings or cause other problems as we tighten up this state management. Fix this by only re-enabling if PACA_IRQ_HARD_DIS is clear. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-5-npiggin@gmail.com
2022-09-28powerpc/64/interrupt: Fix return to masked context after hard-mask irq ↵Nicholas Piggin
becomes pending If a synchronous interrupt (e.g., hash fault) is taken inside an irqs-disabled region which has MSR[EE]=1, then an asynchronous interrupt that is PACA_IRQ_MUST_HARD_MASK (e.g., PMI) is taken inside the synchronous interrupt handler, then the synchronous interrupt will return with MSR[EE]=1 and the asynchronous interrupt fires again. If the asynchronous interrupt is a PMI and the original context does not have PMIs disabled (only Linux IRQs), the asynchronous interrupt will fire despite having the PMI marked soft pending. This can confuse the perf code and cause warnings. This patch changes the interrupt return so that irqs-disabled MSR[EE]=1 contexts will be returned to with MSR[EE]=0 if a PACA_IRQ_MUST_HARD_MASK interrupt has become pending in the meantime. The longer explanation for what happens: 1. local_irq_disable() 2. Hash fault interrupt fires, do_hash_fault handler runs 3. interrupt_enter_prepare() sets IRQS_ALL_DISABLED 4. interrupt_enter_prepare() sets MSR[EE]=1 5. PMU interrupt fires, masked handler runs 6. Masked handler marks PMI pending 7. Masked handler returns with PACA_IRQ_HARD_DIS set, MSR[EE]=0 8. do_hash_fault interrupt return handler runs 9. interrupt_exit_kernel_prepare() clears PACA_IRQ_HARD_DIS 10. interrupt returns with MSR[EE]=1 11. PMU interrupt fires, perf handler runs Fixes: 4423eb5ae32e ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-4-npiggin@gmail.com
2022-09-28powerpc/64: mark irqs hard disabled in boot pacaNicholas Piggin
This prevents interrupts in early boot (e.g., program check) from enabling MSR[EE], potentially causing endian mismatch or other crashes when reporting early boot traps. Fixes: 4423eb5ae32ec ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-3-npiggin@gmail.com
2022-09-28powerpc/64/interrupt: Fix false warning in context tracking due to idle stateNicholas Piggin
Commit 171476775d32 ("context_tracking: Convert state to atomic_t") added a CONTEXT_IDLE state which can be encountered by interrupts from kernel mode in the idle thread, causing a false positive warning. Fixes: 171476775d32 ("context_tracking: Convert state to atomic_t") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-2-npiggin@gmail.com
2022-09-28powerpc/64s: Enable KFENCE on book3s64Nicholas Miehlbradt
KFENCE support was added for ppc32 in commit 90cbac0e995d ("powerpc: Enable KFENCE for PPC32"). Enable KFENCE on ppc64 architecture with hash and radix MMUs. It uses the same mechanism as debug pagealloc to protect/unprotect pages. All KFENCE kunit tests pass on both MMUs. KFENCE memory is initially allocated using memblock but is later marked as SLAB allocated. This necessitates the change to __pud_free to ensure that the KFENCE pages are freed appropriately. Based on previous work by Christophe Leroy and Jordan Niethe. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-4-nicholas@linux.ibm.com
2022-09-28powerpc/64s: Allow double call of kernel_[un]map_linear_page()Christophe Leroy
If the page is already mapped resp. already unmapped, bail out. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-3-nicholas@linux.ibm.com
2022-09-28powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utilsChristophe Leroy
debug_pagealloc_enabled() is always defined and constant folds to 'false' when CONFIG_DEBUG_PAGEALLOC is not enabled. Remove the #ifdefs, the code and associated static variables will be optimised out by the compiler when CONFIG_DEBUG_PAGEALLOC is not defined. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-2-nicholas@linux.ibm.com
2022-09-28powerpc/64s: Add DEBUG_PAGEALLOC for radixNicholas Miehlbradt
There is support for DEBUG_PAGEALLOC on hash but not on radix. Add support on radix. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-1-nicholas@linux.ibm.com
2022-09-28powerpc/64s: update cpu selection optionsNicholas Piggin
Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so make that clear in the option name. The POWER5_CPU option is dropped because it's uncommon, and GENERIC_CPU covers it. -mtune= before power8 is dropped because the minimum gcc version supports power8, and tuning is made consistent between big and little endian. A 970 option is added for PowerPC 970 / G5 because they still have a user base, and -mtune=power8 does not generate good code for the 970. This also updates the ISA versions document to add Power4/Power4+ because I didn't realise Power4+ used 2.01. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921014103.587954-2-npiggin@gmail.com
2022-09-28powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5Nicholas Piggin
Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5. POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added the popcntb instruction which a compiler might use. Use -mcpu=power4. Fixes: 471d7ff8b51b ("powerpc/64s: Remove POWER4 support") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921014103.587954-1-npiggin@gmail.com
2022-09-28powerpc/64s: Make POWER10 and later use pause_short in cpu_relax loopsNicholas Piggin
We want to move away from using SMT priority updates for cpu_relax, and use a 'wait' instruction which is similar to x86. As well as being a much better fit for what everybody else uses and tests with, priority nops are stateful which is nasty (interrupts have to consider they might be taken at a different priority), and they're expensive to execute, similar to a mtSPR which can effect other threads in the pipe. This has shown to give results that are less affected by code alignment on benchmarks that cause a lot of spin waiting (e.g., rwsem contention on unixbench filesystem benchmarks) on POWER10. QEMU TCG only supports this instruction correctly since v7.1, versions without the fix may cause hangs whne running POWER10 CPUs. Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix checkpatch warnings RE the macros] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220920122259.363092-2-npiggin@gmail.com
2022-09-28powerpc: add ISA v3.0 / v3.1 wait opcode macroNicholas Piggin
The wait instruction encoding changed between ISA v2.07 and ISA v3.0. In v3.1 the instruction gained a new field. Update the PPC_WAIT macro to the current encoding. Rename the older incompatible one with a _v203 suffix as it was introduced in v2.03 (the WC field was introduced in v2.07 but the kernel only uses WC=0). Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220920122259.363092-1-npiggin@gmail.com
2022-09-28powerpc/time: avoid programming DEC at the start of the timer interruptNicholas Piggin
Setting DEC to maximum at the start of the timer interrupt is not necessary and can be avoided for performance when MSR[EE] is not enabled during the handler as explained in commit 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use"), where this change was first attempted. The idea is that the timer interrupt runs with MSR[EE]=0, and at the end of the interrupt DEC is programmed to the next timer interval, so there is no need to clear the decrementer exception before then. When the above commit was merged, that was not quite true. The low res timer subsystem had some cases in the oneshot timer code where if the tick was to be stopped and no timers active, the clock device would not get the ->set_state_oneshot_stopped() call, so DEC would not get reprogrammed, and this would hang taking continual timer interrupts. So this was reverted in commit d2b9be1f4af5 ("powerpc/time: Always set decrementer in timer_interrupt()"), which was a partial revert of the above commit. Commit 62c1256d5447 ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped") was later merged to fix this missing case in the timer subsystem, so now the behaviour can be restored. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220909142457.278032-1-npiggin@gmail.com
2022-09-28powerpc: Add support for early debugging via Serial 16550 consolePali Rohár
Currently powerpc early debugging contains lot of platform specific options, but does not support standard UART / serial 16550 console. Later legacy_serial.c code supports registering UART as early debug console from device tree but it is not early during booting, but rather later after machine description code finishes. So for real early debugging via UART is current code unsuitable. Add support for new early debugging option CONFIG_PPC_EARLY_DEBUG_16550 which enable Serial 16550 console on address defined by new option CONFIG_PPC_EARLY_DEBUG_16550_PHYSADDR and by stride by option CONFIG_PPC_EARLY_DEBUG_16550_STRIDE. With this change it is possible to debug powerpc machine descriptor code. For example this early debugging code can print on serial console also "No suitable machine description found" error which is done before legacy_serial.c code. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220822231501.16827-1-pali@kernel.org
2022-09-28powerpc/64/kdump: Limit kdump base to 512MBHari Bathini
Since commit e641eb03ab2b0 ("powerpc: Fix up the kdump base cap to 128M") memory for kdump kernel has been reserved at an offset of 128MB. This held up well for a long time before running into boot failure on LPARs having a lot of cores. Commit 7c5ed82b800d8 ("powerpc: Set crashkernel offset to mid of RMA region") fixed this boot failure by moving the offset to mid of RMA region. This change meant the offset is either 256MB or 512MB on LPARs as ppc64_rma_size was 512MB or 1024MB owing to commit 103a8542cb35b ("powerpc/book3s64/ radix: Fix boot failure with large amount of guest memory"). But ppc64_rma_size can be larger as well with newer f/w. So, limit crashkernel reservation offset to 512MB to avoid running into boot failures during kdump kernel boot, due to RTAS or other allocation restrictions. Also, while here, use SZ_128M instead of opening coding it. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220912065031.57416-1-hbathini@linux.ibm.com
2022-09-28powerpc: Provide syscall wrapperRohan McLure
Implement syscall wrapper as per s390, x86, arm64. When enabled cause handlers to accept parameters from a stack frame rather than from user scratch register state. This allows for user registers to be safely cleared in order to reduce caller influence on speculation within syscall routine. The wrapper is a macro that emits syscall handler symbols that call into the target handler, obtaining its parameters from a struct pt_regs on the stack. As registers are already saved to the stack prior to calling system_call_exception, it appears that this function is executed more efficiently with the new stack-pointer convention than with parameters passed by registers, avoiding the allocation of a stack frame for this method. On a 32-bit system, we see >20% performance increases on the null_syscall microbenchmark, and on a Power 8 the performance gains amortise the cost of clearing and restoring registers which is implemented at the end of this series, seeing final result of ~5.6% performance improvement on null_syscall. Syscalls are wrapped in this fashion on all platforms except for the Cell processor as this commit does not provide SPU support. This can be quickly fixed in a successive patch, but requires spu_sys_callback to allocate a pt_regs structure to satisfy the wrapped calling convention. Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmai.com> [mpe: Make incompatible with COMPAT to retain clearing of high bits of args] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-22-rmclure@linux.ibm.com
2022-09-28powerpc: Change system_call_exception calling conventionRohan McLure
Change system_call_exception arguments to pass a pointer to a stack frame container caller state, as well as the original r0, which determines the number of the syscall. This has been observed to yield improved performance to passing them by registers, circumventing the need to allocate a stack frame. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Retain clearing of high bits of args for compat tasks] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-21-rmclure@linux.ibm.com
2022-09-28powerpc: Use common syscall handler typeRohan McLure
Cause syscall handlers to be typed as follows when called indirectly throughout the kernel. This is to allow for better type checking. typedef long (*syscall_fn)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); Since both 32 and 64-bit abis allow for at least the first six machine-word length parameters to a function to be passed by registers, even handlers which admit fewer than six parameters may be viewed as having the above type. Coercing syscalls to syscall_fn requires a cast to void* to avoid -Wcast-function-type. Fixup comparisons in VDSO to avoid pointer-integer comparison. Introduce explicit cast on systems with SPUs. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-19-rmclure@linux.ibm.com
2022-09-28powerpc: Enable compile-time check for syscall handlersRohan McLure
The table of syscall handlers and registered compatibility syscall handlers has in past been produced using assembly, with function references resolved at link time. This moves link-time errors to compile-time, by rewriting systbl.S in C, and including the linux/syscalls.h, linux/compat.h and asm/syscalls.h headers for prototypes. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-18-rmclure@linux.ibm.com
2022-09-28powerpc: Include all arch-specific syscall prototypesRohan McLure
Forward declare all syscall handler prototypes where a generic prototype is not provided in either linux/syscalls.h or linux/compat.h in asm/syscalls.h. This is required for compile-time type-checking for syscall handlers, which is implemented later in this series. 32-bit compatibility syscall handlers are expressed in terms of types in ppc32.h. Expose this header globally. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use standard include guard naming for syscalls_32.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-17-rmclure@linux.ibm.com
2022-09-28powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlersRohan McLure
Arch-specific implementations of syscall handlers are currently used over generic implementations for the following reasons: 1. Semantics unique to powerpc 2. Compatibility syscalls require 'argument padding' to comply with 64-bit argument convention in ELF32 abi. 3. Parameter types or order is different in other architectures. These syscall handlers have been defined prior to this patch series without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with custom input and output types. We remove every such direct definition in favour of the aforementioned macros. Also update syscalls.tbl in order to refer to the symbol names generated by each of these macros. Since ppc64_personality can be called by both 64 bit and 32 bit binaries through compatibility, we must generate both both compat_sys_ and sys_ symbols for this handler. As an aside: A number of architectures including arm and powerpc agree on an alternative argument order and numbering for most of these arch-specific handlers. A future patch series may allow for asm/unistd.h to signal through its defines that a generic implementation of these syscall handlers with the correct calling convention be emitted, through the __ARCH_WANT_COMPAT_SYS_... convention. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-16-rmclure@linux.ibm.com
2022-09-28powerpc: Provide do_ppc64_personality helperRohan McLure
Avoid duplication in future patch that will define the ppc64_personality syscall handler in terms of the SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE macros, by extracting the common body of ppc64_personality into a helper function. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-15-rmclure@linux.ibm.com
2022-09-28powerpc: Remove direct call to mmap2 syscall handlersRohan McLure
Syscall handlers should not be invoked internally by their symbol names, as these symbols defined by the architecture-defined SYSCALL_DEFINE macro. Move the compatibility syscall definition for mmap2 to syscalls.c, so that all mmap implementations can share a helper function. Remove 'inline' on static mmap helper. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix compat_sys_mmap2() prototype and offset handling] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-14-rmclure@linux.ibm.com
2022-09-28s390/pci: remove unused bus_next field from struct zpci_devNiklas Schnelle
This field was added in commit 44510d6fa0c0 ("s390/pci: Handling multifunctions") but is an unused remnant of an earlier version where the devices on the virtual bus were connected in a linked list instead of a fixed 256 entry array of pointers. It is also not used for the list of busses as that is threaded through struct zpci_bus not through struct zpci_dev. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-09-28s390/cio: remove unused ccw_device_force_console() declarationGaosheng Cui
ccw_device_force_console() has been removed by commit 8cc0dcfdc1c0 ("s390/cio: remove pm support from ccw bus driver"), so remove the declaration, too. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-09-28drm/framebuffer: convert to drm_dbg_kms()Simon Ser
Replace DRM_DEBUG_KMS() with drm_dbg_kms() which allows specifying the DRM device to provide more context. Signed-off-by: Simon Ser <contact@emersion.fr> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220905103559.118561-1-contact@emersion.fr
2022-09-28Merge branch 'sfc-tc-offload'David S. Miller
Edward Cree says: ==================== sfc: bare bones TC offload This series begins the work of supporting TC flower offload on EF100 NICs. This is the absolute minimum viable TC implementation to get traffic to VFs and allow them to be tested; it supports no match fields besides ingress port, no actions besides mirred and drop, and no stats. More matches, actions, and counters will be added in subsequent patches. Changed in v2: - Add missing 'static' on declarations (kernel test robot, sparse) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-28sfc: bare bones TC offload on EF100Edward Cree
This is the absolute minimum viable TC implementation to get traffic to VFs and allow them to be tested; it supports no match fields besides ingress port, no actions besides mirred and drop, and no stats. Example usage: tc filter add dev $PF parent ffff: flower skip_sw \ action mirred egress mirror dev $VFREP tc filter add dev $VFREP parent ffff: flower skip_sw \ action mirred egress redirect dev $PF gives a VF unfiltered access to the network out the physical port ($PF acts here as a physical port representor). More matches, actions, and counters will be added in subsequent patches. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-28sfc: interrogate MAE capabilities at probe timeEdward Cree
Different versions of EF100 firmware and FPGA bitstreams support different matching capabilities in the Match-Action Engine. Probe for these at start of day; subsequent patches will validate TC offload requests against the reported capabilities. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-28sfc: add a hashtable for offloaded TC rulesEdward Cree
Nothing inserts into this table yet, but we have code to remove rules on FLOW_CLS_DESTROY or at driver teardown time, in both cases also attempting to remove the corresponding hardware rules. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-28sfc: optional logging of TC offload errorsEdward Cree
TC offload support will involve complex limitations on what matches and actions a rule can do, in some cases potentially depending on rules already offloaded. So add an ethtool private flag "log-tc-errors" which controls reporting the reasons for un-offloadable TC rules at NETIF_INFO. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-28sfc: bind indirect blocks for TC offload on EF100Edward Cree
Bind indirect blocks for recognised tunnel netdevices. Currently these connect to a stub efx_tc_flower() that only returns -EOPNOTSUPP; subsequent patches will implement flower offloads to the Match-Action Engine. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-28sfc: bind blocks for TC offload on EF100Edward Cree
Bind direct blocks for the MAE-admin PF and each VF representor. Currently these connect to a stub efx_tc_flower() that only returns -EOPNOTSUPP; subsequent patches will implement flower offloads to the Match-Action Engine. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>