Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Drop orphaned VAS MAINTAINERS entry
- Fix build errors with clang and KCSAN
- Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together
with recordmcount
Thanks to Nathan Chancellor.
* tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Avoid dead code/data elimination when using recordmcount
powerpc/vmlinux.lds: Add .text.asan/tsan sections
powerpc: Drop orphaned VAS MAINTAINERS entry
|
|
When KASAN/KCSAN are enabled clang generates .text.asan/tsan sections.
Because they are not mentioned in the linker script warnings are
generated, and when orphan handling is set to error that becomes a build
error, eg:
ld.lld: error: vmlinux.a(init/main.o):(.text.tsan.module_ctor) is
being placed in '.text.tsan.module_ctor' ld.lld: error:
vmlinux.a(init/version.o):(.text.tsan.module_ctor) is being placed in
'.text.tsan.module_ctor'
Fix it by adding the sections to our linker script, similar to the
generic change made in 848378812e40 ("vmlinux.lds.h: Handle clang's
module.{c,d}tor sections").
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230222060037.2897169-1-mpe@ellerman.id.au
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Change V=1 option to print both short log and full command log
- Allow V=1 and V=2 to be combined as V=12
- Make W=1 detect wrong .gitignore files
- Tree-wide cleanups for unused command line arguments passed to Clang
- Stop using -Qunused-arguments with Clang
- Make scripts/setlocalversion handle only correct release tags instead
of any arbitrary annotated tag
- Create Debian and RPM source packages without cleaning the source
tree
- Various cleanups for packaging
* tag 'kbuild-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (74 commits)
kbuild: rpm-pkg: remove unneeded KERNELRELEASE from modules/headers_install
docs: kbuild: remove description of KBUILD_LDS_MODULE
.gitattributes: use 'dts' diff driver for *.dtso files
kbuild: deb-pkg: improve the usability of source package
kbuild: deb-pkg: fix binary-arch and clean in debian/rules
kbuild: tar-pkg: use tar rules in scripts/Makefile.package
kbuild: make perf-tar*-src-pkg work without relying on git
kbuild: deb-pkg: switch over to source format 3.0 (quilt)
kbuild: deb-pkg: make .orig tarball a hard link if possible
kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile
kbuild: srcrpm-pkg: create source package without cleaning
kbuild: rpm-pkg: build binary packages from source rpm
kbuild: deb-pkg: create source package without cleaning
kbuild: add a tool to list files ignored by git
Documentation/llvm: add Chimera Linux, Google and Meta datacenters
setlocalversion: use only the correct release tag for git-describe
setlocalversion: clean up the construction of version output
.gitignore: ignore *.cover and *.mbx
kbuild: remove --include-dir MAKEFLAG from top Makefile
kbuild: fix trivial typo in comment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Support for configuring secure boot with user-defined keys on PowerVM
LPARs
- Simplify the replay of soft-masked IRQs by making it non-recursive
- Add support for KCSAN on 64-bit Book3S
- Improvements to the API & code which interacts with RTAS (pseries
firmware)
- Change 32-bit powermac to assign PCI bus numbers per domain by
default
- Some improvements to the 32-bit BPF JIT
- Various other small features and fixes
Thanks to Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin
Gray, Christophe Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand,
Greg Kroah-Hartman, Jan-Benedict Glaw, Josh Poimboeuf, Kajol Jain,
Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers, Mimi Zohar, Murphy
Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin, Pali
Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika
Vasireddy, Sourabh Jain, Stefan Berger, Stephen Rothwell, and Sudhakar
Kuppusamy.
* tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (114 commits)
powerpc/pseries: Avoid hcall in plpks_is_available() on non-pseries
powerpc: dts: turris1x.dts: Set lower priority for CPLD syscon-reboot
powerpc/e500: Add missing prototype for 'relocate_init'
powerpc/64: Fix unannotated intra-function call warning
powerpc/epapr: Don't use wrteei on non booke
powerpc: Pass correct CPU reference to assembler
powerpc/mm: Rearrange if-else block to avoid clang warning
powerpc/nohash: Fix build with llvm-as
powerpc/nohash: Fix build error with binutils >= 2.38
powerpc/pseries: Fix endianness issue when parsing PLPKS secvar flags
macintosh: windfarm: Use unsigned type for 1-bit bitfields
powerpc/kexec_file: print error string on usable memory property update failure
powerpc/machdep: warn when machine_is() used too early
powerpc/64: Replace -mcpu=e500mc64 by -mcpu=e5500
powerpc/eeh: Set channel state after notifying the drivers
selftests/powerpc: Fix incorrect kernel headers search path
powerpc/rtas: arch-wide function token lookup conversions
powerpc/rtas: introduce rtas_function_token() API
powerpc/pseries/lpar: convert to papr_sysparm API
powerpc/pseries/hv-24x7: convert to papr_sysparm API
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Improve the scalability of the CFS bandwidth unthrottling logic with
large number of CPUs.
- Fix & rework various cpuidle routines, simplify interaction with the
generic scheduler code. Add __cpuidle methods as noinstr to objtool's
noinstr detection and fix boatloads of cpuidle bugs & quirks.
- Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
previously issued registrations.
- Limit scheduler slice duration to the sysctl_sched_latency period, to
improve scheduling granularity with a large number of SCHED_IDLE
tasks.
- Debuggability enhancement on sys_exit(): warn about disabled IRQs,
but also enable them to prevent a cascade of followup problems and
repeat warnings.
- Fix the rescheduling logic in prio_changed_dl().
- Micro-optimize cpufreq and sched-util methods.
- Micro-optimize ttwu_runnable()
- Micro-optimize the idle-scanning in update_numa_stats(),
select_idle_capacity() and steal_cookie_task().
- Update the RSEQ code & self-tests
- Constify various scheduler methods
- Remove unused methods
- Refine __init tags
- Documentation updates
- Misc other cleanups, fixes
* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
sched/rt: pick_next_rt_entity(): check list_entry
sched/deadline: Add more reschedule cases to prio_changed_dl()
sched/fair: sanitize vruntime of entity being placed
sched/fair: Remove capacity inversion detection
sched/fair: unlink misfit task from cpu overutilized
objtool: mem*() are not uaccess safe
cpuidle: Fix poll_idle() noinstr annotation
sched/clock: Make local_clock() noinstr
sched/clock/x86: Mark sched_clock() noinstr
x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
x86/atomics: Always inline arch_atomic64*()
cpuidle: tracing, preempt: Squash _rcuidle tracing
cpuidle: tracing: Warn about !rcu_is_watching()
cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
cpuidle: drivers: firmware: psci: Dont instrument suspend code
KVM: selftests: Fix build of rseq test
exit: Detect and fix irq disabled state in oops
cpuidle, arm64: Fix the ARM64 cpuidle logic
cpuidle: mvebu: Fix duplicate flags assignment
sched/fair: Limit sched slice duration
...
|
|
objtool throws the following warning:
arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x6128:
unannotated intra-function call
Fix the warning by annotating start_initialization_book3s symbol with the
SYM_FUNC_START_LOCAL and SYM_FUNC_END macros.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Fixes: 58f24eea5278 ("powerpc/64s: Refactor initialisation after prom")
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230217043226.1020041-1-sv@linux.ibm.com
|
|
wrteei is only for booke. Use the standard mfmsr/ori/mtmsr
when non booke.
Reported-by: Jan-Benedict Glaw <jbglaw@lug-owl.de>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b29c7f1727433b003eae050e44072741c8ac223b.1671475543.git.christophe.leroy@csgroup.eu
|
|
When a PCI error is encountered 6th time in an hour we
set the channel state to perm_failure and notify the
driver about the permanent failure.
However, after upstream commit 38ddc011478e ("powerpc/eeh:
Make permanently failed devices non-actionable"), EEH handler
stops calling any routine once the device is marked as
permanent failure. This issue can lead to fatal consequences
like kernel hang with certain PCI devices.
Following log is observed with lpfc driver, with and without
this change, Without this change kernel hangs, If PCI error
is encountered 6 times for a device in an hour.
Without the change
EEH: Beginning: 'error_detected(permanent failure)'
PCI 0132:60:00.0#600000: EEH: not actionable (1,1,1)
PCI 0132:60:00.1#600000: EEH: not actionable (1,1,1)
EEH: Finished:'error_detected(permanent failure)'
With the change
EEH: Beginning: 'error_detected(permanent failure)'
EEH: Invoking lpfc->error_detected(permanent failure)
EEH: lpfc driver reports: 'disconnect'
EEH: Invoking lpfc->error_detected(permanent failure)
EEH: lpfc driver reports: 'disconnect'
EEH: Finished:'error_detected(permanent failure)'
To fix the issue, set channel state to permanent failure after
notifying the drivers.
Fixes: 38ddc011478e ("powerpc/eeh: Make permanently failed devices non-actionable")
Suggested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230209105649.127707-1-ganeshgr@linux.ibm.com
|
|
With the tokens for all implemented RTAS functions now available via
rtas_function_token(), which is optimal and safe for arbitrary
contexts, there is no need to use rtas_token() or cache its result.
Most conversions are trivial, but a few are worth describing in more
detail:
* Error injection token comparisons for lockdown purposes are
consolidated into a simple predicate: token_is_restricted_errinjct().
* A couple of special cases in block_rtas_call() do not use
rtas_token() but perform string comparisons against names in the
function table. These are converted to compare against token values
instead, which is logically equivalent but less expensive.
* The lookup for the ibm,os-term token can be deferred until needed,
instead of caching it at boot to avoid device tree traversal during
panic.
* Since rtas_function_token() accesses a read-only data structure
without taking any locks, xmon's lookup of set-indicator can be
performed as needed instead of cached at startup.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-20-26929c8cce78@linux.ibm.com
|
|
Users of rtas_token() supply a string argument that can't be validated
at build time. A typo or misspelling has to be caught by inspection or
by observing wrong behavior at runtime.
Since the core RTAS code now has consolidated the names of all
possible RTAS functions and mapped them to their tokens, token lookup
can be implemented using symbolic constants to index a static array.
So introduce rtas_function_token(), a replacement API which does that,
along with a rtas_service_present()-equivalent helper,
rtas_function_implemented(). Callers supply an opaque predefined
function handle which is used internally to index the function
table. Typos or other inappropriate arguments yield build errors, and
the function handle is a type that can't be easily confused with RTAS
tokens or other integer types.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-19-26929c8cce78@linux.ibm.com
|
|
Various pseries-specific RTAS functions take a temporary "work area"
parameter - a buffer in memory accessible to RTAS. Typically such
functions are passed the statically allocated rtas_data_buf buffer as
the argument. This buffer is protected by a global spinlock. So users
of rtas_data_buf cannot perform sleeping operations while accessing
the buffer.
Most RTAS functions that have a work area parameter can return a
status (-2/990x) that indicates that the caller should retry. Before
retrying, the caller may need to reschedule or sleep (see
rtas_busy_delay() for details). This combination of factors
leads to uncomfortable constructions like this:
do {
spin_lock(&rtas_data_buf_lock);
rc = rtas_call(token, __pa(rtas_data_buf, ...);
if (rc == 0) {
/* parse or copy out rtas_data_buf contents */
}
spin_unlock(&rtas_data_buf_lock);
} while (rtas_busy_delay(rc));
Another unfortunately common way of handling this is for callers to
blithely ignore the possibility of a -2/990x status and hope for the
best.
If users were allowed to perform blocking operations while owning a
work area, the programming model would become less tedious and
error-prone. Users could schedule away, sleep, or perform other
blocking operations without having to release and re-acquire
resources.
We could continue to use a single work area buffer, and convert
rtas_data_buf_lock to a mutex. But that would impose an unnecessarily
coarse serialization on all users. As awkward as the current design
is, it prevents longer running operations that need to repeatedly use
rtas_data_buf from blocking the progress of others.
There are more considerations. One is that while 4KB is fine for all
current in-kernel uses, some RTAS calls can take much smaller buffers,
and some (VPD, platform dumps) would likely benefit from larger
ones. Another is that at least one RTAS function (ibm,get-vpd)
has *two* work area parameters. And finally, we should expect the
number of work area users in the kernel to increase over time as we
introduce lockdown-compatible ABIs to replace less safe use cases
based on sys_rtas/librtas.
So a special-purpose allocator for RTAS work area buffers seems worth
trying.
Properties:
* The backing memory for the allocator is reserved early in boot in
order to satisfy RTAS addressing requirements, and then managed with
genalloc.
* Allocations can block, but they never fail (mempool-like).
* Prioritizes first-come, first-serve fairness over throughput.
* Early boot allocations before the allocator has been initialized are
served via an internal static buffer.
Intended to replace rtas_data_buf. New code that needs RTAS work area
buffers should prefer this API.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-12-26929c8cce78@linux.ibm.com
|
|
Decompose the RTAS entry C code into tracing and non-tracing variants,
calling the just-added tracepoints in the tracing-enabled path. Skip
tracing in contexts known to be unsafe (real mode, CPU offline).
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-11-26929c8cce78@linux.ibm.com
|
|
Make do_enter_rtas() take a pointer to struct rtas_args and do the
__pa() conversion in one place instead of leaving it to callers. This
also makes it possible to introduce enter/exit tracepoints that access
the rtas_args struct fields.
There's no apparent reason to force inlining of do_enter_rtas()
either, and it seems to bloat the code a bit. Let the compiler decide.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-9-26929c8cce78@linux.ibm.com
|
|
The core RTAS support code and its clients perform two types of lookup
for RTAS firmware function information.
First, mapping a known function name to a token. The typical use case
invokes rtas_token() to retrieve the token value to pass to
rtas_call(). rtas_token() relies on of_get_property(), which performs
a linear search of the /rtas node's property list under a lock with
IRQs disabled.
Second, and less common: given a token value, looking up some
information about the function. The primary example is the sys_rtas
filter path, which linearly scans a small table to match the token to
a rtas_filter struct. Another use case to come is RTAS entry/exit
tracepoints, which will require efficient lookup of function names
from token values. Currently there is no general API for this.
We need something much like the existing rtas_filters table, but more
general and organized to facilitate efficient lookups.
Introduce:
* A new rtas_function type, aggregating function name, token,
and filter. Other function characteristics could be added in the
future.
* An array of rtas_function, where each element corresponds to a known
RTAS function. All information in the table is static save the token
values, which are derived from the device tree at boot. The array is
sorted by function name to allow binary search.
* A named constant for each known RTAS function, used to index the
function array. These also will be used in a client-facing API to be
added later.
* An xarray that maps valid tokens to rtas_function objects.
Fold the existing rtas_filter table into the new rtas_function array,
with the appropriate adjustments to block_rtas_call(). Remove
now-redundant fields from struct rtas_filter. Preserve the function of
the CONFIG_CPU_BIG_ENDIAN guard in the current filter table by
introducing a per-function flag that is set for the function entries
related to pseries LPAR migration. These have never had working users
via sys_rtas on ppc64le; see commit de0f7349a0dd ("powerpc/rtas:
prevent suspend-related sys_rtas use on LE").
Convert rtas_token() to use a lockless binary search on the function
table. Fall back to the old behavior for lookups against names that
are not known to be RTAS functions, but issue a warning. rtas_token()
is for function names; it is not a general facility for accessing
arbitrary properties of the /rtas node. All known misuses of
rtas_token() have been converted to more appropriate of_ APIs in
preceding changes.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-8-26929c8cce78@linux.ibm.com
|
|
Some RTAS functions that have work area parameters impose alignment
requirements on the work area passed to them by the OS. Examples
include:
- ibm,configure-connector
- ibm,update-nodes
- ibm,update-properties
4KB is the greatest alignment required by PAPR for such
buffers. rtas_data_buf used to have a __page_aligned attribute in the
arch/ppc64 days, but that was changed to __cacheline_aligned for
unknown reasons by commit 033ef338b6e0 ("powerpc: Merge rtas.c into
arch/powerpc/kernel"). That works out to 128-byte alignment
on ppc64, which isn't right.
This was found by inspection and I'm not aware of any real problems
caused by this. Either current RTAS implementations don't enforce the
alignment constraints, or rtas_data_buf is always being placed at a
4KB boundary by accident (or both, perhaps).
Use __aligned(SZ_4K) to ensure the rtas_data_buf has alignment
appropriate for all users.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-6-26929c8cce78@linux.ibm.com
|
|
Some code that runs early in boot calls RTAS functions that can return
-2 or 990x statuses, which mean the caller should retry. An example is
pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but
treats these benign statuses as errors instead of retrying.
pSeries_cmo_feature_init() and similar code should be made to retry
until they succeed or receive a real error, using the usual pattern:
do {
rc = rtas_call(token, etc...);
} while (rtas_busy_delay(rc));
But rtas_busy_delay() will perform a timed sleep on any 990x
status. This isn't safe so early in boot, before the CPU scheduler and
timer subsystem have initialized.
The -2 RTAS status is much more likely to occur during single-threaded
boot than 990x in practice, at least on PowerVM. This is because -2
usually means that RTAS made progress but exhausted its self-imposed
timeslice, while 990x is associated with concurrent requests from the
OS causing internal contention. Regardless, according to the language
in PAPR, the OS should be prepared to handle either type of status at
any time.
Add a fallback path to rtas_busy_delay() to handle this as safely as
possible, performing a small delay on 990x. Include a counter to
detect retry loops that aren't making progress and bail out. Add __ref
to rtas_busy_delay() since it now conditionally calls an __init
function.
This was found by inspection and I'm not aware of any real
failures. However, the implementation of rtas_busy_delay() before
commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
was not susceptible to this problem, so let's treat this as a
regression.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230125-b4-powerpc-rtas-queue-v3-1-26929c8cce78@linux.ibm.com
|
|
Before interacting with the PLPKS, we ask the hypervisor to generate a
password for the current boot, which is then required for most further
PLPKS operations.
If we kexec into a new kernel, the new kernel will try and fail to
generate a new password, as the password has already been set.
Pass the password through to the new kernel via the device tree, in
/chosen/ibm,plpks-pw. Check for the presence of this property before
trying to generate a new password - if it exists, use the existing
password and remove it from the device tree.
This only works with the kexec_file_load() syscall, not the older
kexec_load() syscall, however if you're using Secure Boot then you want
to be using kexec_file_load() anyway.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-24-ajd@linux.ibm.com
|
|
If attempting to read the size or data attributes of a non-existent
variable (which will be possible after a later patch to expose the PLPKS
via the secvar interface), don't spam the kernel log with error messages.
Only print errors for return codes that aren't ENOENT.
Reported-by: Sudhakar Kuppusamy <sudhakar@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-14-ajd@linux.ibm.com
|
|
Due to sysfs constraints, when writing to a variable, we can only handle
writes of up to PAGE_SIZE.
It's possible that the maximum object size is larger than PAGE_SIZE, in
which case, print a warning on boot so that the user is aware.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-13-ajd@linux.ibm.com
|
|
Currently, the list of variables is populated by calling
secvar_ops->get_next() repeatedly, which is explicitly modelled on the
OPAL API (including the keylen parameter).
For the upcoming PLPKS backend, we have a static list of variable names.
It is messy to fit that into get_next(), so instead, let the backend put
a NULL-terminated array of variable names into secvar_ops->var_names,
which will be used if get_next() is undefined.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-12-ajd@linux.ibm.com
|
|
The forthcoming pseries consumer of the secvar API wants to expose a
number of config variables. Allowing secvar implementations to provide
their own sysfs attributes makes it easy for consumers to expose what
they need to.
This is not being used by the OPAL secvar implementation at present, and
the config directory will not be created if no attributes are set.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-11-ajd@linux.ibm.com
|
|
Remove unnecessary prefixes from error messages in secvar_sysfs_init()
(the file defines pr_fmt, so putting "secvar:" in every message is
unnecessary). Make capitalisation and punctuation more consistent.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-10-ajd@linux.ibm.com
|
|
Currently the max object size is handled in the core secvar code with an
entirely OPAL-specific implementation, so create a new max_size() op and
move the existing implementation into the powernv platform. Should be
no functional change.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-9-ajd@linux.ibm.com
|
|
The code that handles the format string in secvar-sysfs.c is entirely
OPAL specific, so create a new "format" op in secvar_operations to make
the secvar code more generic. No functional change.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-8-ajd@linux.ibm.com
|
|
The secvar format string and object size sysfs files are both ASCII
text, and should use sysfs_emit(). No functional change.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-7-ajd@linux.ibm.com
|
|
The secvar code only supports one consumer at a time.
Multiple consumers aren't possible at this point in time, but we'd want
it to be obvious if it ever could happen.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-6-ajd@linux.ibm.com
|
|
There's no reason for secvar_operations to use uint64_t vs the more
common kernel type u64.
The types are compatible, but they require different printk format
strings which can lead to confusion.
Change all the secvar related routines to use u64.
Reviewed-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-5-ajd@linux.ibm.com
|
|
secvar_ops->get_next() returns -ENOENT when there are no more variables
to return, which is expected behaviour.
Fix this by returning 0 if get_next() returns -ENOENT.
This fixes an issue introduced in commit bd5d9c743d38 ("powerpc: expose
secure variables to userspace via sysfs"), but the return code of
secvar_sysfs_load() was never checked so this issue never mattered.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230210080401.345462-4-ajd@linux.ibm.com
|
|
Merge our fixes branch to bring in some changes that conflict with
upcoming next content.
|
|
Instrumented memory accesses provided by KCSAN will access core-local
memories (which will save and restore IRQs) as well as restoring IRQs
directly. Avoid recursive instrumentation by applying __no_kcsan
annotation to IRQ restore routines.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
[mpe: Resolve merge conflict with IRQ replay recursion changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-5-rmclure@linux.ibm.com
|
|
In order for KCSAN to increase its likelihood of observing a data race,
it sets a watchpoint on memory accesses and stalls, allowing for
detection of conflicting accesses by other kernel threads or interrupts.
Stalls are implemented by injecting a call to udelay in instrumented code.
To prevent recursive instrumentation, exclude udelay from being instrumented.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-3-rmclure@linux.ibm.com
|
|
Exclude various incompatible compilation units from KCSAN
instrumentation.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206021801.105268-2-rmclure@linux.ibm.com
|
|
allocated
Stack validation in early boot can just bail out of checking alternate
stacks if they are not validated yet. Checking against a NULL stack
could cause NULLish pointer values to be considered valid.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-5-npiggin@gmail.com
|
|
The early paca and boot cpuid dance is complicated and currently does
not quite work as expected for boot cpuid != 0 cases.
early_init_devtree() currently allocates the paca_ptrs and boot cpuid
paca, but until that returns and early_setup() calls setup_paca(), this
thread is currently still executing with smp_processor_id() == 0.
One problem this causes is the paca_ptrs[smp_processor_id()] pointer is
poisoned, so valid_emergency_stack() (any backtrace) and any similar
users will crash.
Another is that the hardware id which is set here will not be returned
by get_hard_smp_processor_id(smp_processor_id()), but it would work
correctly for boot_cpuid == 0, which could lead to difficult to
reproduce or find bugs. The hard id does not seem to be used by the rest
of early_init_devtree(), it just looks like all this code might have
been put here to allocate somewhere to store boot CPU hardware id while
scanning the devtree.
Rearrange things so the hwid is put in a global variable like
boot_cpuid, and do all the paca allocation and boot paca setup in the
64-bit early_setup() after we have everything ready to go.
The paca_ptrs[0] re-poisoning code in early_setup does not seem to have
ever worked, because paca_ptrs[0] was never not-poisoned when boot_cpuid
is not 0.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix build error on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-4-npiggin@gmail.com
|
|
powerpc/64 can boot on a non-zero SMP processor id. Initially, the boot
CPU is said to be "assumed to be 0" until early_init_devtree() discovers
the id from the device tree. That is not a good description because the
assumption can be wrong and that has to be handled, the better
description is that 0 is used as a placeholder, and things are fixed
after the real id is discovered.
smp_processor_id() is set to the boot cpuid, but task_cpu(current) is
not, which causes the smp_processor_id() == task_cpu(current) invariant
to be broken until init_idle() in sched_init().
This is quite fragile and could lead to subtle bugs in future. One bug
is that validate_sp_size uses task_cpu() to get the process stack, so
any stack trace from the booting CPU between early_init_devtree()
and sched_init() will have problems. Early on paca_ptrs[0] will be
poisoned, so that can cause machine checks dereferencing that memory
in real mode. Later, validating the current stack pointer against the
idle task of a different secondary will probably cause no stack trace
to be printed.
Fix this by setting thread_info->cpu right after smp_processor_id() is
set to the boot cpuid.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix SMP=n build as reported by sfr]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221216115930.2667772-3-npiggin@gmail.com
|
|
As the earlier comment explains, __secondary_hold_spinloop does not have
to be accessed at its virtual address, slightly simplifying code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203113858.1152093-4-npiggin@gmail.com
|
|
Move some basic Book3S initialisation after prom to a function similar
to what Book3E looks like. Book3E returns from this function at the
virtual address mapping, and Book3S will do the same in a later change,
so making them look similar helps with that.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230203113858.1152093-3-npiggin@gmail.com
|
|
That test was introducted in 2006 by
commit 00ae36de49cc ("[POWERPC] Better check in show_instructions").
At that time, there was no BPF progs.
As seen in message of commit 89d21e259a94 ("powerpc/bpf/32: Fix Oops
on tail call tests"), when a page fault occurs in test_bpf.ko for
instance, the code is dumped as XXXXXXXXs. Allthough
__kernel_text_address() checks is_bpf_text_address(), it seems it is
not enough.
Today, show_instructions() uses get_kernel_nofault() to read the code,
so there is no real need for additional verifications.
ARM64 and x86 don't do any additional check before dumping
instructions. Do the same and remove __kernel_text_address()
in show_instructions().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4fd69ef7945518c3e27f96b95046a5c1468d35bf.1675245773.git.christophe.leroy@csgroup.eu
|
|
For all unrecoverable errors we are missing to log the
error, Since machine_check_log_err() is not getting called
for unrecoverable errors. machine_check_log_err() is called
from deferred handler, To run deferred handlers we have to do
irq work raise from the exception handler.
For recoverable errors exception vector code takes care of
running deferred handlers.
For unrecoverable errors raise irq work in save_mce_event(),
So that we log the error from MCE deferred handler.
Log without this change
MCE: CPU27: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered]
MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4]
MCE: CPU27: Initiator CPU
MCE: CPU27: Unknown
Log with this change
MCE: CPU24: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered]
MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48]
MCE: CPU24: Initiator CPU
MCE: CPU24: Unknown
RTAS: event: 5, Type: Platform Error (224), Severity: 3
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230201095933.129482-1-ganeshgr@linux.ibm.com
|
|
Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
specific alignment clamping in __get_vm_area_node(), they will be
1) Shown as ioremap in /proc/vmallocinfo;
2) Ignored by /proc/kcore reading via vread()
So for the io mapping in ioremap_phb() of ppc, we should set VM_IOREMAP in
flag to make it handled correctly as above.
Link: https://lkml.kernel.org/r/20230206084020.174506-7-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time. To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230202141919.2298821-1-gregkh@linuxfoundation.org
|
|
The "pci-OF-bus-map" property was declared deprecated in 2006 [1] and to
the best of everyone's knowledge is not used by anything anymore [2].
The creation of the property was disabled on powermac (arch/powerpc) in
2005 by commit 35499c0195e4 ("powerpc: Merge in 64-bit powermac
support."). But it is still created by default on CHRP.
On powermac the actual map (pci_to_OF_bus_map) is still used by default,
even though the device tree property is not created.
Add an option to enable/disable use of the pci_to_OF_bus_map, and
creation of the property (on CHRP).
Disabling the option allows enabling CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT
which allows "normal" bus numbering and more than 256 buses, like 64-bit
and other architectures.
Mark the new option as default n, the intention is that the option and
the code will be removed in a future release.
[1]: https://lore.kernel.org/linuxppc-dev/1148016268.13249.14.camel@localhost.localdomain/
[2]: https://lore.kernel.org/all/575f239205e8635add81c9f902b7d9db7beb83ea.camel@kernel.crashing.org/
Signed-off-by: Pali Rohár <pali@kernel.org>
[mpe: Reword commit & help text, shrink option name, rework to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206113902.1857123-1-mpe@ellerman.id.au
|
|
The RFI and STF security mitigation options can flip the
interrupt_exit_not_reentrant static branch condition concurrently with
the interrupt exit code which tests that branch.
Interrupt exit tests this condition to set MSR[EE|RI] for exit, then
again in the case a soft-masked interrupt is found pending, to recover
the MSR so the interrupt can be replayed before attempting to exit
again. If the condition changes between these two tests, the MSR and irq
soft-mask state will become corrupted, leading to warnings and possible
crashes. For example, if the branch is initially true then false,
MSR[EE] will be 0 but PACA_IRQ_HARD_DIS clear and EE may not get
enabled, leading to warnings in irq_64.c.
Fixes: 13799748b957 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230206042240.92103-1-npiggin@gmail.com
|
|
After commit 8d9acfce3332 ("kbuild: Stop using '-Qunused-arguments' with
clang"), the PowerPC vDSO shows the following error with clang-13 and
older when CONFIG_INIT_STACK_ALL_ZERO is enabled:
clang: error: argument unused during compilation: '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang' [-Werror,-Wunused-command-line-argument]
clang-14 added a change to make sure this flag never triggers
-Wunused-command-line-argument, so it is fixed with newer releases. For
older releases that the kernel still supports building with, just filter
out this flag, as has been done for other flags.
Fixes: f0a42fbab447 ("powerpc/vdso: Improve linker flags")
Fixes: 8d9acfce3332 ("kbuild: Stop using '-Qunused-arguments' with clang")
Link: https://github.com/llvm/llvm-project/commit/ca6d5813d17598cd180995fb3bdfca00f364475f
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Pick up fixes before merging another batch of cpuidle updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the
hard irq disabled state. So when should_hard_irq_enable() returns true
because we want PMI interrupts in irq handlers, MSR[EE] is enabled but
PMIs just get soft-masked. Fix this by clearing IRQS_PMI_DISABLED before
enabling MSR[EE].
This also tidies some of the warnings, no need to duplicate them in
both should_hard_irq_enable() and do_hard_irq_enable().
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121100156.2824054-1-npiggin@gmail.com
|
|
Interrupt handlers called by soft-pending irq replay code can run
softirqs, softirq replay enables and disables local irqs, which allows
interrupts to come in including soft-masked interrupts, and it can
cause pending irqs to be replayed again. That makes the soft irq replay
state machine and possible races more complicated and fragile than it
needs to be.
Use irq_enter/irq_exit around irq replay to prevent softirqs running
while interrupts are being replayed. Softirqs will now be run at the
irq_exit() call after all the irq replaying is done. This prevents irqs
being replayed while irqs are being replayed, and should hopefully make
things simpler and easier to think about and debug.
A new PACA_IRQ_REPLAYING is added to prevent asynchronous interrupt
handlers hard-enabling EE while pending irqs are being replayed, because
that causes new pending irqs to arrive which is also a complexity. This
means pending irqs won't be profiled quite so well because perf irqs
can't be taken.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230121102618.2824429-1-npiggin@gmail.com
|
|
At the time commit f97bb36f705d ("powerpc/rtas: Turn rtas lock into a
raw spinlock") was written, the spinlock lockup detection code called
__delay(), which will not make progress if the timebase is not
advancing. Since the interprocessor timebase synchronization sequence
for chrp, cell, and some now-unsupported Power models can temporarily
freeze the timebase through an RTAS function (freeze-time-base), the
lock that serializes most RTAS calls was converted to arch_spinlock_t
to prevent kernel hangs in the lockup detection code.
However, commit bc88c10d7e69 ("locking/spinlock/debug: Remove spinlock
lockup detection code") removed that inconvenient property from the
lock debug code several years ago. So now it should be safe to
reintroduce generic locks into the RTAS support code, primarily to
increase lockdep coverage.
Making rtas_lock a spinlock_t would violate lock type nesting rules
because it can be acquired while holding raw locks, e.g. pci_lock and
irq_desc->lock. So convert it to raw_spinlock_t. There's no apparent
reason not to upgrade timebase_lock as well.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-5-nathanl@linux.ibm.com
|
|
Only code internal to the RTAS subsystem needs access to the central
lock and parameter block. Remove these from the globally visible
'rtas' struct and make them file-static in rtas.c.
Some changed lines in rtas_call() lack appropriate spacing around
operators and cause checkpatch errors; fix these as well.
Suggested-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230124140448.45938-4-nathanl@linux.ibm.com
|