summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2023-03-04Merge tag 'powerpc-6.3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Drop orphaned VAS MAINTAINERS entry - Fix build errors with clang and KCSAN - Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together with recordmcount Thanks to Nathan Chancellor. * tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Avoid dead code/data elimination when using recordmcount powerpc/vmlinux.lds: Add .text.asan/tsan sections powerpc: Drop orphaned VAS MAINTAINERS entry
2023-02-28powerpc/vmlinux.lds: Add .text.asan/tsan sectionsMichael Ellerman
When KASAN/KCSAN are enabled clang generates .text.asan/tsan sections. Because they are not mentioned in the linker script warnings are generated, and when orphan handling is set to error that becomes a build error, eg: ld.lld: error: vmlinux.a(init/main.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' ld.lld: error: vmlinux.a(init/version.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' Fix it by adding the sections to our linker script, similar to the generic change made in 848378812e40 ("vmlinux.lds.h: Handle clang's module.{c,d}tor sections"). Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230222060037.2897169-1-mpe@ellerman.id.au
2023-02-26Merge tag 'kbuild-v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Change V=1 option to print both short log and full command log - Allow V=1 and V=2 to be combined as V=12 - Make W=1 detect wrong .gitignore files - Tree-wide cleanups for unused command line arguments passed to Clang - Stop using -Qunused-arguments with Clang - Make scripts/setlocalversion handle only correct release tags instead of any arbitrary annotated tag - Create Debian and RPM source packages without cleaning the source tree - Various cleanups for packaging * tag 'kbuild-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (74 commits) kbuild: rpm-pkg: remove unneeded KERNELRELEASE from modules/headers_install docs: kbuild: remove description of KBUILD_LDS_MODULE .gitattributes: use 'dts' diff driver for *.dtso files kbuild: deb-pkg: improve the usability of source package kbuild: deb-pkg: fix binary-arch and clean in debian/rules kbuild: tar-pkg: use tar rules in scripts/Makefile.package kbuild: make perf-tar*-src-pkg work without relying on git kbuild: deb-pkg: switch over to source format 3.0 (quilt) kbuild: deb-pkg: make .orig tarball a hard link if possible kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile kbuild: srcrpm-pkg: create source package without cleaning kbuild: rpm-pkg: build binary packages from source rpm kbuild: deb-pkg: create source package without cleaning kbuild: add a tool to list files ignored by git Documentation/llvm: add Chimera Linux, Google and Meta datacenters setlocalversion: use only the correct release tag for git-describe setlocalversion: clean up the construction of version output .gitignore: ignore *.cover and *.mbx kbuild: remove --include-dir MAKEFLAG from top Makefile kbuild: fix trivial typo in comment ...
2023-02-25Merge tag 'powerpc-6.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Support for configuring secure boot with user-defined keys on PowerVM LPARs - Simplify the replay of soft-masked IRQs by making it non-recursive - Add support for KCSAN on 64-bit Book3S - Improvements to the API & code which interacts with RTAS (pseries firmware) - Change 32-bit powermac to assign PCI bus numbers per domain by default - Some improvements to the 32-bit BPF JIT - Various other small features and fixes Thanks to Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin Gray, Christophe Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Josh Poimboeuf, Kajol Jain, Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers, Mimi Zohar, Murphy Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin, Pali Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika Vasireddy, Sourabh Jain, Stefan Berger, Stephen Rothwell, and Sudhakar Kuppusamy. * tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (114 commits) powerpc/pseries: Avoid hcall in plpks_is_available() on non-pseries powerpc: dts: turris1x.dts: Set lower priority for CPLD syscon-reboot powerpc/e500: Add missing prototype for 'relocate_init' powerpc/64: Fix unannotated intra-function call warning powerpc/epapr: Don't use wrteei on non booke powerpc: Pass correct CPU reference to assembler powerpc/mm: Rearrange if-else block to avoid clang warning powerpc/nohash: Fix build with llvm-as powerpc/nohash: Fix build error with binutils >= 2.38 powerpc/pseries: Fix endianness issue when parsing PLPKS secvar flags macintosh: windfarm: Use unsigned type for 1-bit bitfields powerpc/kexec_file: print error string on usable memory property update failure powerpc/machdep: warn when machine_is() used too early powerpc/64: Replace -mcpu=e500mc64 by -mcpu=e5500 powerpc/eeh: Set channel state after notifying the drivers selftests/powerpc: Fix incorrect kernel headers search path powerpc/rtas: arch-wide function token lookup conversions powerpc/rtas: introduce rtas_function_token() API powerpc/pseries/lpar: convert to papr_sysparm API powerpc/pseries/hv-24x7: convert to papr_sysparm API ...
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-20Merge tag 'sched-core-2023-02-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Improve the scalability of the CFS bandwidth unthrottling logic with large number of CPUs. - Fix & rework various cpuidle routines, simplify interaction with the generic scheduler code. Add __cpuidle methods as noinstr to objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks. - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query previously issued registrations. - Limit scheduler slice duration to the sysctl_sched_latency period, to improve scheduling granularity with a large number of SCHED_IDLE tasks. - Debuggability enhancement on sys_exit(): warn about disabled IRQs, but also enable them to prevent a cascade of followup problems and repeat warnings. - Fix the rescheduling logic in prio_changed_dl(). - Micro-optimize cpufreq and sched-util methods. - Micro-optimize ttwu_runnable() - Micro-optimize the idle-scanning in update_numa_stats(), select_idle_capacity() and steal_cookie_task(). - Update the RSEQ code & self-tests - Constify various scheduler methods - Remove unused methods - Refine __init tags - Documentation updates - Misc other cleanups, fixes * tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits) sched/rt: pick_next_rt_entity(): check list_entry sched/deadline: Add more reschedule cases to prio_changed_dl() sched/fair: sanitize vruntime of entity being placed sched/fair: Remove capacity inversion detection sched/fair: unlink misfit task from cpu overutilized objtool: mem*() are not uaccess safe cpuidle: Fix poll_idle() noinstr annotation sched/clock: Make local_clock() noinstr sched/clock/x86: Mark sched_clock() noinstr x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read() x86/atomics: Always inline arch_atomic64*() cpuidle: tracing, preempt: Squash _rcuidle tracing cpuidle: tracing: Warn about !rcu_is_watching() cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG cpuidle: drivers: firmware: psci: Dont instrument suspend code KVM: selftests: Fix build of rseq test exit: Detect and fix irq disabled state in oops cpuidle, arm64: Fix the ARM64 cpuidle logic cpuidle: mvebu: Fix duplicate flags assignment sched/fair: Limit sched slice duration ...
2023-02-17powerpc/64: Fix unannotated intra-function call warningSathvika Vasireddy
objtool throws the following warning: arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x6128: unannotated intra-function call Fix the warning by annotating start_initialization_book3s symbol with the SYM_FUNC_START_LOCAL and SYM_FUNC_END macros. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Fixes: 58f24eea5278 ("powerpc/64s: Refactor initialisation after prom") Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230217043226.1020041-1-sv@linux.ibm.com
2023-02-17powerpc/epapr: Don't use wrteei on non bookeChristophe Leroy
wrteei is only for booke. Use the standard mfmsr/ori/mtmsr when non booke. Reported-by: Jan-Benedict Glaw <jbglaw@lug-owl.de> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b29c7f1727433b003eae050e44072741c8ac223b.1671475543.git.christophe.leroy@csgroup.eu
2023-02-15powerpc/eeh: Set channel state after notifying the driversGanesh Goudar
When a PCI error is encountered 6th time in an hour we set the channel state to perm_failure and notify the driver about the permanent failure. However, after upstream commit 38ddc011478e ("powerpc/eeh: Make permanently failed devices non-actionable"), EEH handler stops calling any routine once the device is marked as permanent failure. This issue can lead to fatal consequences like kernel hang with certain PCI devices. Following log is observed with lpfc driver, with and without this change, Without this change kernel hangs, If PCI error is encountered 6 times for a device in an hour. Without the change EEH: Beginning: 'error_detected(permanent failure)' PCI 0132:60:00.0#600000: EEH: not actionable (1,1,1) PCI 0132:60:00.1#600000: EEH: not actionable (1,1,1) EEH: Finished:'error_detected(permanent failure)' With the change EEH: Beginning: 'error_detected(permanent failure)' EEH: Invoking lpfc->error_detected(permanent failure) EEH: lpfc driver reports: 'disconnect' EEH: Invoking lpfc->error_detected(permanent failure) EEH: lpfc driver reports: 'disconnect' EEH: Finished:'error_detected(permanent failure)' To fix the issue, set channel state to permanent failure after notifying the drivers. Fixes: 38ddc011478e ("powerpc/eeh: Make permanently failed devices non-actionable") Suggested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230209105649.127707-1-ganeshgr@linux.ibm.com
2023-02-13powerpc/rtas: arch-wide function token lookup conversionsNathan Lynch
With the tokens for all implemented RTAS functions now available via rtas_function_token(), which is optimal and safe for arbitrary contexts, there is no need to use rtas_token() or cache its result. Most conversions are trivial, but a few are worth describing in more detail: * Error injection token comparisons for lockdown purposes are consolidated into a simple predicate: token_is_restricted_errinjct(). * A couple of special cases in block_rtas_call() do not use rtas_token() but perform string comparisons against names in the function table. These are converted to compare against token values instead, which is logically equivalent but less expensive. * The lookup for the ibm,os-term token can be deferred until needed, instead of caching it at boot to avoid device tree traversal during panic. * Since rtas_function_token() accesses a read-only data structure without taking any locks, xmon's lookup of set-indicator can be performed as needed instead of cached at startup. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-20-26929c8cce78@linux.ibm.com
2023-02-13powerpc/rtas: introduce rtas_function_token() APINathan Lynch
Users of rtas_token() supply a string argument that can't be validated at build time. A typo or misspelling has to be caught by inspection or by observing wrong behavior at runtime. Since the core RTAS code now has consolidated the names of all possible RTAS functions and mapped them to their tokens, token lookup can be implemented using symbolic constants to index a static array. So introduce rtas_function_token(), a replacement API which does that, along with a rtas_service_present()-equivalent helper, rtas_function_implemented(). Callers supply an opaque predefined function handle which is used internally to index the function table. Typos or other inappropriate arguments yield build errors, and the function handle is a type that can't be easily confused with RTAS tokens or other integer types. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-19-26929c8cce78@linux.ibm.com
2023-02-13powerpc/pseries: add RTAS work area allocatorNathan Lynch
Various pseries-specific RTAS functions take a temporary "work area" parameter - a buffer in memory accessible to RTAS. Typically such functions are passed the statically allocated rtas_data_buf buffer as the argument. This buffer is protected by a global spinlock. So users of rtas_data_buf cannot perform sleeping operations while accessing the buffer. Most RTAS functions that have a work area parameter can return a status (-2/990x) that indicates that the caller should retry. Before retrying, the caller may need to reschedule or sleep (see rtas_busy_delay() for details). This combination of factors leads to uncomfortable constructions like this: do { spin_lock(&rtas_data_buf_lock); rc = rtas_call(token, __pa(rtas_data_buf, ...); if (rc == 0) { /* parse or copy out rtas_data_buf contents */ } spin_unlock(&rtas_data_buf_lock); } while (rtas_busy_delay(rc)); Another unfortunately common way of handling this is for callers to blithely ignore the possibility of a -2/990x status and hope for the best. If users were allowed to perform blocking operations while owning a work area, the programming model would become less tedious and error-prone. Users could schedule away, sleep, or perform other blocking operations without having to release and re-acquire resources. We could continue to use a single work area buffer, and convert rtas_data_buf_lock to a mutex. But that would impose an unnecessarily coarse serialization on all users. As awkward as the current design is, it prevents longer running operations that need to repeatedly use rtas_data_buf from blocking the progress of others. There are more considerations. One is that while 4KB is fine for all current in-kernel uses, some RTAS calls can take much smaller buffers, and some (VPD, platform dumps) would likely benefit from larger ones. Another is that at least one RTAS function (ibm,get-vpd) has *two* work area parameters. And finally, we should expect the number of work area users in the kernel to increase over time as we introduce lockdown-compatible ABIs to replace less safe use cases based on sys_rtas/librtas. So a special-purpose allocator for RTAS work area buffers seems worth trying. Properties: * The backing memory for the allocator is reserved early in boot in order to satisfy RTAS addressing requirements, and then managed with genalloc. * Allocations can block, but they never fail (mempool-like). * Prioritizes first-come, first-serve fairness over throughput. * Early boot allocations before the allocator has been initialized are served via an internal static buffer. Intended to replace rtas_data_buf. New code that needs RTAS work area buffers should prefer this API. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-12-26929c8cce78@linux.ibm.com
2023-02-13powerpc/rtas: add tracepoints around RTAS entryNathan Lynch
Decompose the RTAS entry C code into tracing and non-tracing variants, calling the just-added tracepoints in the tracing-enabled path. Skip tracing in contexts known to be unsafe (real mode, CPU offline). Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-11-26929c8cce78@linux.ibm.com
2023-02-13powerpc/rtas: strengthen do_enter_rtas() type safety, drop inlineNathan Lynch
Make do_enter_rtas() take a pointer to struct rtas_args and do the __pa() conversion in one place instead of leaving it to callers. This also makes it possible to introduce enter/exit tracepoints that access the rtas_args struct fields. There's no apparent reason to force inlining of do_enter_rtas() either, and it seems to bloat the code a bit. Let the compiler decide. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-9-26929c8cce78@linux.ibm.com
2023-02-13powerpc/rtas: improve function information lookupsNathan Lynch
The core RTAS support code and its clients perform two types of lookup for RTAS firmware function information. First, mapping a known function name to a token. The typical use case invokes rtas_token() to retrieve the token value to pass to rtas_call(). rtas_token() relies on of_get_property(), which performs a linear search of the /rtas node's property list under a lock with IRQs disabled. Second, and less common: given a token value, looking up some information about the function. The primary example is the sys_rtas filter path, which linearly scans a small table to match the token to a rtas_filter struct. Another use case to come is RTAS entry/exit tracepoints, which will require efficient lookup of function names from token values. Currently there is no general API for this. We need something much like the existing rtas_filters table, but more general and organized to facilitate efficient lookups. Introduce: * A new rtas_function type, aggregating function name, token, and filter. Other function characteristics could be added in the future. * An array of rtas_function, where each element corresponds to a known RTAS function. All information in the table is static save the token values, which are derived from the device tree at boot. The array is sorted by function name to allow binary search. * A named constant for each known RTAS function, used to index the function array. These also will be used in a client-facing API to be added later. * An xarray that maps valid tokens to rtas_function objects. Fold the existing rtas_filter table into the new rtas_function array, with the appropriate adjustments to block_rtas_call(). Remove now-redundant fields from struct rtas_filter. Preserve the function of the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by introducing a per-function flag that is set for the function entries related to pseries LPAR migration. These have never had working users via sys_rtas on ppc64le; see commit de0f7349a0dd ("powerpc/rtas: prevent suspend-related sys_rtas use on LE"). Convert rtas_token() to use a lockless binary search on the function table. Fall back to the old behavior for lookups against names that are not known to be RTAS functions, but issue a warning. rtas_token() is for function names; it is not a general facility for accessing arbitrary properties of the /rtas node. All known misuses of rtas_token() have been converted to more appropriate of_ APIs in preceding changes. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-8-26929c8cce78@linux.ibm.com
2023-02-13powerpc/rtas: ensure 4KB alignment for rtas_data_bufNathan Lynch
Some RTAS functions that have work area parameters impose alignment requirements on the work area passed to them by the OS. Examples include: - ibm,configure-connector - ibm,update-nodes - ibm,update-properties 4KB is the greatest alignment required by PAPR for such buffers. rtas_data_buf used to have a __page_aligned attribute in the arch/ppc64 days, but that was changed to __cacheline_aligned for unknown reasons by commit 033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel"). That works out to 128-byte alignment on ppc64, which isn't right. This was found by inspection and I'm not aware of any real problems caused by this. Either current RTAS implementations don't enforce the alignment constraints, or rtas_data_buf is always being placed at a 4KB boundary by accident (or both, perhaps). Use __aligned(SZ_4K) to ensure the rtas_data_buf has alignment appropriate for all users. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Fixes: 033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-6-26929c8cce78@linux.ibm.com
2023-02-13powerpc/rtas: handle extended delays safely in early bootNathan Lynch
Some code that runs early in boot calls RTAS functions that can return -2 or 990x statuses, which mean the caller should retry. An example is pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but treats these benign statuses as errors instead of retrying. pSeries_cmo_feature_init() and similar code should be made to retry until they succeed or receive a real error, using the usual pattern: do { rc = rtas_call(token, etc...); } while (rtas_busy_delay(rc)); But rtas_busy_delay() will perform a timed sleep on any 990x status. This isn't safe so early in boot, before the CPU scheduler and timer subsystem have initialized. The -2 RTAS status is much more likely to occur during single-threaded boot than 990x in practice, at least on PowerVM. This is because -2 usually means that RTAS made progress but exhausted its self-imposed timeslice, while 990x is associated with concurrent requests from the OS causing internal contention. Regardless, according to the language in PAPR, the OS should be prepared to handle either type of status at any time. Add a fallback path to rtas_busy_delay() to handle this as safely as possible, performing a small delay on 990x. Include a counter to detect retry loops that aren't making progress and bail out. Add __ref to rtas_busy_delay() since it now conditionally calls an __init function. This was found by inspection and I'm not aware of any real failures. However, the implementation of rtas_busy_delay() before commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements") was not susceptible to this problem, so let's treat this as a regression. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-1-26929c8cce78@linux.ibm.com
2023-02-12powerpc/pseries: Pass PLPKS password on kexecRussell Currey
Before interacting with the PLPKS, we ask the hypervisor to generate a password for the current boot, which is then required for most further PLPKS operations. If we kexec into a new kernel, the new kernel will try and fail to generate a new password, as the password has already been set. Pass the password through to the new kernel via the device tree, in /chosen/ibm,plpks-pw. Check for the presence of this property before trying to generate a new password - if it exists, use the existing password and remove it from the device tree. This only works with the kexec_file_load() syscall, not the older kexec_load() syscall, however if you're using Secure Boot then you want to be using kexec_file_load() anyway. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-24-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Don't print error on ENOENT when reading variablesAndrew Donnellan
If attempting to read the size or data attributes of a non-existent variable (which will be possible after a later patch to expose the PLPKS via the secvar interface), don't spam the kernel log with error messages. Only print errors for return codes that aren't ENOENT. Reported-by: Sudhakar Kuppusamy <sudhakar@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-14-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Warn when PAGE_SIZE is smaller than max object sizeAndrew Donnellan
Due to sysfs constraints, when writing to a variable, we can only handle writes of up to PAGE_SIZE. It's possible that the maximum object size is larger than PAGE_SIZE, in which case, print a warning on boot so that the user is aware. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-13-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Allow backend to populate static list of variable namesAndrew Donnellan
Currently, the list of variables is populated by calling secvar_ops->get_next() repeatedly, which is explicitly modelled on the OPAL API (including the keylen parameter). For the upcoming PLPKS backend, we have a static list of variable names. It is messy to fit that into get_next(), so instead, let the backend put a NULL-terminated array of variable names into secvar_ops->var_names, which will be used if get_next() is undefined. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-12-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Extend sysfs to include config varsRussell Currey
The forthcoming pseries consumer of the secvar API wants to expose a number of config variables. Allowing secvar implementations to provide their own sysfs attributes makes it easy for consumers to expose what they need to. This is not being used by the OPAL secvar implementation at present, and the config directory will not be created if no attributes are set. Signed-off-by: Russell Currey <ruscur@russell.cc> Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-11-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Clean up init error messagesAndrew Donnellan
Remove unnecessary prefixes from error messages in secvar_sysfs_init() (the file defines pr_fmt, so putting "secvar:" in every message is unnecessary). Make capitalisation and punctuation more consistent. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-10-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Handle max object size in the consumerRussell Currey
Currently the max object size is handled in the core secvar code with an entirely OPAL-specific implementation, so create a new max_size() op and move the existing implementation into the powernv platform. Should be no functional change. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-9-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Handle format string in the consumerRussell Currey
The code that handles the format string in secvar-sysfs.c is entirely OPAL specific, so create a new "format" op in secvar_operations to make the secvar code more generic. No functional change. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-8-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Use sysfs_emit() instead of sprintf()Russell Currey
The secvar format string and object size sysfs files are both ASCII text, and should use sysfs_emit(). No functional change. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-7-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Warn and error if multiple secvar ops are setRussell Currey
The secvar code only supports one consumer at a time. Multiple consumers aren't possible at this point in time, but we'd want it to be obvious if it ever could happen. Signed-off-by: Russell Currey <ruscur@russell.cc> Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-6-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Use u64 in secvar_operationsMichael Ellerman
There's no reason for secvar_operations to use uint64_t vs the more common kernel type u64. The types are compatible, but they require different printk format strings which can lead to confusion. Change all the secvar related routines to use u64. Reviewed-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-5-ajd@linux.ibm.com
2023-02-12powerpc/secvar: Fix incorrect return in secvar_sysfs_load()Russell Currey
secvar_ops->get_next() returns -ENOENT when there are no more variables to return, which is expected behaviour. Fix this by returning 0 if get_next() returns -ENOENT. This fixes an issue introduced in commit bd5d9c743d38 ("powerpc: expose secure variables to userspace via sysfs"), but the return code of secvar_sysfs_load() was never checked so this issue never mattered. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230210080401.345462-4-ajd@linux.ibm.com
2023-02-12Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch to bring in some changes that conflict with upcoming next content.
2023-02-10powerpc/kcsan: Prevent recursive instrumentation with IRQ save/restoresRohan McLure
Instrumented memory accesses provided by KCSAN will access core-local memories (which will save and restore IRQs) as well as restoring IRQs directly. Avoid recursive instrumentation by applying __no_kcsan annotation to IRQ restore routines. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> [mpe: Resolve merge conflict with IRQ replay recursion changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230206021801.105268-5-rmclure@linux.ibm.com
2023-02-10powerpc/kcsan: Exclude udelay to prevent recursive instrumentationRohan McLure
In order for KCSAN to increase its likelihood of observing a data race, it sets a watchpoint on memory accesses and stalls, allowing for detection of conflicting accesses by other kernel threads or interrupts. Stalls are implemented by injecting a call to udelay in instrumented code. To prevent recursive instrumentation, exclude udelay from being instrumented. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230206021801.105268-3-rmclure@linux.ibm.com
2023-02-10powerpc/kcsan: Add exclusions from instrumentationRohan McLure
Exclude various incompatible compilation units from KCSAN instrumentation. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230206021801.105268-2-rmclure@linux.ibm.com
2023-02-10powerpc: Skip stack validation checking alternate stacks if they are not ↵Nicholas Piggin
allocated Stack validation in early boot can just bail out of checking alternate stacks if they are not validated yet. Checking against a NULL stack could cause NULLish pointer values to be considered valid. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216115930.2667772-5-npiggin@gmail.com
2023-02-10powerpc/64: Move paca allocation to early_setup()Nicholas Piggin
The early paca and boot cpuid dance is complicated and currently does not quite work as expected for boot cpuid != 0 cases. early_init_devtree() currently allocates the paca_ptrs and boot cpuid paca, but until that returns and early_setup() calls setup_paca(), this thread is currently still executing with smp_processor_id() == 0. One problem this causes is the paca_ptrs[smp_processor_id()] pointer is poisoned, so valid_emergency_stack() (any backtrace) and any similar users will crash. Another is that the hardware id which is set here will not be returned by get_hard_smp_processor_id(smp_processor_id()), but it would work correctly for boot_cpuid == 0, which could lead to difficult to reproduce or find bugs. The hard id does not seem to be used by the rest of early_init_devtree(), it just looks like all this code might have been put here to allocate somewhere to store boot CPU hardware id while scanning the devtree. Rearrange things so the hwid is put in a global variable like boot_cpuid, and do all the paca allocation and boot paca setup in the 64-bit early_setup() after we have everything ready to go. The paca_ptrs[0] re-poisoning code in early_setup does not seem to have ever worked, because paca_ptrs[0] was never not-poisoned when boot_cpuid is not 0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix build error on 32-bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216115930.2667772-4-npiggin@gmail.com
2023-02-10powerpc/64: Fix task_cpu in early boot when booting non-zero cpuidNicholas Piggin
powerpc/64 can boot on a non-zero SMP processor id. Initially, the boot CPU is said to be "assumed to be 0" until early_init_devtree() discovers the id from the device tree. That is not a good description because the assumption can be wrong and that has to be handled, the better description is that 0 is used as a placeholder, and things are fixed after the real id is discovered. smp_processor_id() is set to the boot cpuid, but task_cpu(current) is not, which causes the smp_processor_id() == task_cpu(current) invariant to be broken until init_idle() in sched_init(). This is quite fragile and could lead to subtle bugs in future. One bug is that validate_sp_size uses task_cpu() to get the process stack, so any stack trace from the booting CPU between early_init_devtree() and sched_init() will have problems. Early on paca_ptrs[0] will be poisoned, so that can cause machine checks dereferencing that memory in real mode. Later, validating the current stack pointer against the idle task of a different secondary will probably cause no stack trace to be printed. Fix this by setting thread_info->cpu right after smp_processor_id() is set to the boot cpuid. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix SMP=n build as reported by sfr] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216115930.2667772-3-npiggin@gmail.com
2023-02-10powerpc/64e: Simplify address calculation in secondary hold loopNicholas Piggin
As the earlier comment explains, __secondary_hold_spinloop does not have to be accessed at its virtual address, slightly simplifying code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230203113858.1152093-4-npiggin@gmail.com
2023-02-10powerpc/64s: Refactor initialisation after promNicholas Piggin
Move some basic Book3S initialisation after prom to a function similar to what Book3E looks like. Book3E returns from this function at the virtual address mapping, and Book3S will do the same in a later change, so making them look similar helps with that. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230203113858.1152093-3-npiggin@gmail.com
2023-02-10powerpc: Remove __kernel_text_address() in show_instructions()Christophe Leroy
That test was introducted in 2006 by commit 00ae36de49cc ("[POWERPC] Better check in show_instructions"). At that time, there was no BPF progs. As seen in message of commit 89d21e259a94 ("powerpc/bpf/32: Fix Oops on tail call tests"), when a page fault occurs in test_bpf.ko for instance, the code is dumped as XXXXXXXXs. Allthough __kernel_text_address() checks is_bpf_text_address(), it seems it is not enough. Today, show_instructions() uses get_kernel_nofault() to read the code, so there is no real need for additional verifications. ARM64 and x86 don't do any additional check before dumping instructions. Do the same and remove __kernel_text_address() in show_instructions(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4fd69ef7945518c3e27f96b95046a5c1468d35bf.1675245773.git.christophe.leroy@csgroup.eu
2023-02-10powerpc/mce: log the error for all unrecoverable errorsGanesh Goudar
For all unrecoverable errors we are missing to log the error, Since machine_check_log_err() is not getting called for unrecoverable errors. machine_check_log_err() is called from deferred handler, To run deferred handlers we have to do irq work raise from the exception handler. For recoverable errors exception vector code takes care of running deferred handlers. For unrecoverable errors raise irq work in save_mce_event(), So that we log the error from MCE deferred handler. Log without this change MCE: CPU27: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4] MCE: CPU27: Initiator CPU MCE: CPU27: Unknown Log with this change MCE: CPU24: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48] MCE: CPU24: Initiator CPU MCE: CPU24: Unknown RTAS: event: 5, Type: Platform Error (224), Severity: 3 Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230201095933.129482-1-ganeshgr@linux.ibm.com
2023-02-09powerpc: mm: add VM_IOREMAP flag to the vmalloc areaBaoquan He
Currently, for vmalloc areas with flag VM_IOREMAP set, except of the specific alignment clamping in __get_vm_area_node(), they will be 1) Shown as ioremap in /proc/vmallocinfo; 2) Ignored by /proc/kcore reading via vread() So for the io mapping in ioremap_phb() of ppc, we should set VM_IOREMAP in flag to make it handled correctly as above. Link: https://lkml.kernel.org/r/20230206084020.174506-7-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-08powerpc/iommu: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230202141919.2298821-1-gregkh@linuxfoundation.org
2023-02-07powerpc/pci: Add option for using pci_to_OF_bus_mapPali Rohár
The "pci-OF-bus-map" property was declared deprecated in 2006 [1] and to the best of everyone's knowledge is not used by anything anymore [2]. The creation of the property was disabled on powermac (arch/powerpc) in 2005 by commit 35499c0195e4 ("powerpc: Merge in 64-bit powermac support."). But it is still created by default on CHRP. On powermac the actual map (pci_to_OF_bus_map) is still used by default, even though the device tree property is not created. Add an option to enable/disable use of the pci_to_OF_bus_map, and creation of the property (on CHRP). Disabling the option allows enabling CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT which allows "normal" bus numbering and more than 256 buses, like 64-bit and other architectures. Mark the new option as default n, the intention is that the option and the code will be removed in a future release. [1]: https://lore.kernel.org/linuxppc-dev/1148016268.13249.14.camel@localhost.localdomain/ [2]: https://lore.kernel.org/all/575f239205e8635add81c9f902b7d9db7beb83ea.camel@kernel.crashing.org/ Signed-off-by: Pali Rohár <pali@kernel.org> [mpe: Reword commit & help text, shrink option name, rework to fix build errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230206113902.1857123-1-mpe@ellerman.id.au
2023-02-07powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switchNicholas Piggin
The RFI and STF security mitigation options can flip the interrupt_exit_not_reentrant static branch condition concurrently with the interrupt exit code which tests that branch. Interrupt exit tests this condition to set MSR[EE|RI] for exit, then again in the case a soft-masked interrupt is found pending, to recover the MSR so the interrupt can be replayed before attempting to exit again. If the condition changes between these two tests, the MSR and irq soft-mask state will become corrupted, leading to warnings and possible crashes. For example, if the branch is initially true then false, MSR[EE] will be 0 but PACA_IRQ_HARD_DIS clear and EE may not get enabled, leading to warnings in irq_64.c. Fixes: 13799748b957 ("powerpc/64: use interrupt restart table to speed up return from interrupt") Cc: stable@vger.kernel.org # v5.14+ Reported-by: Sachin Sant <sachinp@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230206042240.92103-1-npiggin@gmail.com
2023-02-05powerpc/vdso: Filter clang's auto var init zero enabler when linkingNathan Chancellor
After commit 8d9acfce3332 ("kbuild: Stop using '-Qunused-arguments' with clang"), the PowerPC vDSO shows the following error with clang-13 and older when CONFIG_INIT_STACK_ALL_ZERO is enabled: clang: error: argument unused during compilation: '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang' [-Werror,-Wunused-command-line-argument] clang-14 added a change to make sure this flag never triggers -Wunused-command-line-argument, so it is fixed with newer releases. For older releases that the kernel still supports building with, just filter out this flag, as has been done for other flags. Fixes: f0a42fbab447 ("powerpc/vdso: Improve linker flags") Fixes: 8d9acfce3332 ("kbuild: Stop using '-Qunused-arguments' with clang") Link: https://github.com/llvm/llvm-project/commit/ca6d5813d17598cd180995fb3bdfca00f364475f Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-01-31Merge tag 'v6.2-rc6' into sched/core, to pick up fixesIngo Molnar
Pick up fixes before merging another batch of cpuidle updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-30powerpc/64: Fix perf profiling asynchronous interrupt handlersNicholas Piggin
Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the hard irq disabled state. So when should_hard_irq_enable() returns true because we want PMI interrupts in irq handlers, MSR[EE] is enabled but PMIs just get soft-masked. Fix this by clearing IRQS_PMI_DISABLED before enabling MSR[EE]. This also tidies some of the warnings, no need to duplicate them in both should_hard_irq_enable() and do_hard_irq_enable(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230121100156.2824054-1-npiggin@gmail.com
2023-01-30powerpc/64: Don't recurse irq replayNicholas Piggin
Interrupt handlers called by soft-pending irq replay code can run softirqs, softirq replay enables and disables local irqs, which allows interrupts to come in including soft-masked interrupts, and it can cause pending irqs to be replayed again. That makes the soft irq replay state machine and possible races more complicated and fragile than it needs to be. Use irq_enter/irq_exit around irq replay to prevent softirqs running while interrupts are being replayed. Softirqs will now be run at the irq_exit() call after all the irq replaying is done. This prevents irqs being replayed while irqs are being replayed, and should hopefully make things simpler and easier to think about and debug. A new PACA_IRQ_REPLAYING is added to prevent asynchronous interrupt handlers hard-enabling EE while pending irqs are being replayed, because that causes new pending irqs to arrive which is also a complexity. This means pending irqs won't be profiled quite so well because perf irqs can't be taken. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230121102618.2824429-1-npiggin@gmail.com
2023-01-30powerpc/rtas: upgrade internal arch spinlocksNathan Lynch
At the time commit f97bb36f705d ("powerpc/rtas: Turn rtas lock into a raw spinlock") was written, the spinlock lockup detection code called __delay(), which will not make progress if the timebase is not advancing. Since the interprocessor timebase synchronization sequence for chrp, cell, and some now-unsupported Power models can temporarily freeze the timebase through an RTAS function (freeze-time-base), the lock that serializes most RTAS calls was converted to arch_spinlock_t to prevent kernel hangs in the lockup detection code. However, commit bc88c10d7e69 ("locking/spinlock/debug: Remove spinlock lockup detection code") removed that inconvenient property from the lock debug code several years ago. So now it should be safe to reintroduce generic locks into the RTAS support code, primarily to increase lockdep coverage. Making rtas_lock a spinlock_t would violate lock type nesting rules because it can be acquired while holding raw locks, e.g. pci_lock and irq_desc->lock. So convert it to raw_spinlock_t. There's no apparent reason not to upgrade timebase_lock as well. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230124140448.45938-5-nathanl@linux.ibm.com
2023-01-30powerpc/rtas: remove lock and args fields from global rtas structNathan Lynch
Only code internal to the RTAS subsystem needs access to the central lock and parameter block. Remove these from the globally visible 'rtas' struct and make them file-static in rtas.c. Some changed lines in rtas_call() lack appropriate spacing around operators and cause checkpatch errors; fix these as well. Suggested-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230124140448.45938-4-nathanl@linux.ibm.com