summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2024-09-18Merge tag 'random-6.12-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "Originally I'd planned on sending each of the vDSO getrandom() architecture ports to their respective arch trees. But as we started to work on this, we found lots of interesting issues in the shared code and infrastructure, the fixes for which the various archs needed to base their work. So in the end, this turned into a nice collaborative effort fixing up issues and porting to 5 new architectures -- arm64, powerpc64, powerpc32, s390x, and loongarch64 -- with everybody pitching in and commenting on each other's code. It was a fun development cycle. This contains: - Numerous fixups to the vDSO selftest infrastructure, getting it running successfully on more platforms, and fixing bugs in it. - Additions to the vDSO getrandom & chacha selftests. Basically every time manual review unearthed a bug in a revision of an arch patch, or an ambiguity, the tests were augmented. By the time the last arch was submitted for review, s390x, v1 of the series was essentially fine right out of the gate. - Fixes to the the generic C implementation of vDSO getrandom, to build and run successfully on all archs, decoupling it from assumptions we had (unintentionally) made on x86_64 that didn't carry through to the other architectures. - Port of vDSO getrandom to LoongArch64, from Xi Ruoyao and acked by Huacai Chen. - Port of vDSO getrandom to ARM64, from Adhemerval Zanella and acked by Will Deacon. - Port of vDSO getrandom to PowerPC, in both 32-bit and 64-bit varieties, from Christophe Leroy and acked by Michael Ellerman. - Port of vDSO getrandom to S390X from Heiko Carstens, the arch maintainer. While it'd be natural for there to be things to fix up over the course of the development cycle, these patches got a decent amount of review from a fairly diverse crew of folks on the mailing lists, and, for the most part, they've been cooking in linux-next, which has been helpful for ironing out build issues. In terms of architectures, I think that mostly takes care of the important 64-bit archs with hardware still being produced and running production loads in settings where vDSO getrandom is likely to help. Arguably there's still RISC-V left, and we'll see for 6.13 whether they find it useful and submit a port" * tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (47 commits) selftests: vDSO: check cpu caps before running chacha test s390/vdso: Wire up getrandom() vdso implementation s390/vdso: Move vdso symbol handling to separate header file s390/vdso: Allow alternatives in vdso code s390/module: Provide find_section() helper s390/facility: Let test_facility() generate static branch if possible s390/alternatives: Remove ALT_FACILITY_EARLY s390/facility: Disable compile time optimization for decompressor code selftests: vDSO: fix vdso_config for s390 selftests: vDSO: fix ELF hash table entry size for s390x powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64 powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32 powerpc/vdso: Refactor CFLAGS for CVDSO build powerpc/vdso32: Add crtsavres mm: Define VM_DROPPABLE for powerpc/32 powerpc/vdso: Fix VDSO data access when running in a non-root time namespace selftests: vDSO: don't include generated headers for chacha test arm64: vDSO: Wire up getrandom() vDSO implementation arm64: alternative: make alternative_has_cap_likely() VDSO compatible selftests: vDSO: also test counter in vdso_test_chacha ...
2024-09-18Merge tag 'rcu.release.v6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Neeraj Upadhyay: "Context tracking: - rename context tracking state related symbols and remove references to "dynticks" in various context tracking state variables and related helpers - force context_tracking_enabled_this_cpu() to be inlined to avoid leaving a noinstr section CSD lock: - enhance CSD-lock diagnostic reports - add an API to provide an indication of ongoing CSD-lock stall nocb: - update and simplify RCU nocb code to handle (de-)offloading of callbacks only for offline CPUs - fix RT throttling hrtimer being armed from offline CPU rcutorture: - remove redundant rcu_torture_ops get_gp_completed fields - add SRCU ->same_gp_state and ->get_comp_state functions - add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU polled grace periods - add CFcommon.arch for arch-specific Kconfig options - print number of update types in rcu_torture_write_types() - add rcutree.nohz_full_patience_delay testing to the TREE07 scenario - add a stall_cpu_repeat module parameter to test repeated CPU stalls - add argument to limit number of CPUs a guest OS can use in torture.sh rcustall: - abbreviate RCU CPU stall warnings during CSD-lock stalls - Allow dump_cpu_task() to be called without disabling preemption - defer printing stall-warning backtrace when holding rcu_node lock srcu: - make SRCU gp seq wrap-around faster - add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and ->reschedule_count which are used in heuristics governing auto-expediting of normal SRCU grace periods and grace-period-state-machine delays - mark idle SRCU-barrier callbacks to help identify stuck SRCU-barrier callback rcu tasks: - remove RCU Tasks Rude asynchronous APIs as they are no longer used - stop testing RCU Tasks Rude asynchronous APIs - fix access to non-existent percpu regions - check processor-ID assumptions during chosen CPU calculation for callback enqueuing - update description of rtp->tasks_gp_seq grace-period sequence number - add rcu_barrier_cb_is_done() to identify whether a given rcu_barrier callback is stuck - mark idle Tasks-RCU-barrier callbacks - add *torture_stats_print() functions to print detailed diagnostics for Tasks-RCU variants - capture start time of rcu_barrier_tasks*() operation to help distinguish a hung barrier operation from a long series of barrier operations refscale: - add a TINY scenario to support tests of Tiny RCU and Tiny SRCU - optimize process_durations() operation rcuscale: - dump stacks of stalled rcu_scale_writer() instances and grace-period statistics when rcu_scale_writer() stalls - mark idle RCU-barrier callbacks to identify stuck RCU-barrier callbacks - print detailed grace-period and barrier diagnostics on rcu_scale_writer() hangs for Tasks-RCU variants - warn if async module parameter is specified for RCU implementations that do not have async primitives such as RCU Tasks Rude - make all writer tasks report upon hang - tolerate repeated GFP_KERNEL failure in rcu_scale_writer() - use special allocator for rcu_scale_writer() - NULL out top-level pointers to heap memory to avoid double-free bugs on modprobe failures - maintain per-task instead of per-CPU callbacks count to avoid any issues with migration of either tasks or callbacks - constify struct ref_scale_ops Fixes: - use system_unbound_wq for kfree_rcu work to avoid disturbing isolated CPUs Misc: - warn on unexpected rcu_state.srs_done_tail state - better define "atomic" for list_replace_rcu() and hlist_replace_rcu() routines - annotate struct kvfree_rcu_bulk_data with __counted_by()" * tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits) rcu: Defer printing stall-warning backtrace when holding rcu_node lock rcu/nocb: Remove superfluous memory barrier after bypass enqueue rcu/nocb: Conditionally wake up rcuo if not already waiting on GP rcu/nocb: Fix RT throttling hrtimer armed from offline CPU rcu/nocb: Simplify (de-)offloading state machine context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}() rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs() rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck() rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save() rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap rcu: Rename struct rcu_data .dynticks_snap into .watching_snap rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs() rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since() rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs() rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online() context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu() context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*() refscale: Constify struct ref_scale_ops ...
2024-09-17Merge tag 'smp-core-2024-09-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: - Prepare the core for supporting parallel hotplug on loongarch - A small set of cleanups and enhancements * tag 'smp-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Mark smp_prepare_boot_cpu() __init cpu: Fix W=1 build kernel-doc warning cpu/hotplug: Provide weak fallback for arch_cpuhp_init_parallel_bringup() cpu/hotplug: Make HOTPLUG_PARALLEL independent of HOTPLUG_SMT
2024-09-13powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64Christophe Leroy
Extend getrandom() vDSO implementation to VDSO64. Tested on QEMU on both ppc64_defconfig and ppc64le_defconfig. Results from a Power9 (PowerNV): ~ # ./vdso_test_getrandom bench-single    vdso: 25000000 times in 0.787943615 seconds    libc: 25000000 times in 14.101887252 seconds    syscall: 25000000 times in 14.047475082 seconds Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32Christophe Leroy
To be consistent with other VDSO functions, the function is called __kernel_getrandom() __arch_chacha20_blocks_nostack() fonction is implemented basically with 32 bits operations. It performs 4 QUARTERROUND operations in parallele. There are enough registers to avoid using the stack: On input: r3: output bytes r4: 32-byte key input r5: 8-byte counter input/output r6: number of 64-byte blocks to write to output During operation: stack: pointer to counter (r5) and non-volatile registers (r14-131) r0: counter of blocks (initialised with r6) r4: Value '4' after key has been read, used for indexing r5-r12: key r14-r15: block counter r16-r31: chacha state At the end: r0, r6-r12: Zeroised r5, r14-r31: Restored Performance on powerpc 885 (using kernel selftest): ~# ./vdso_test_getrandom bench-single vdso: 25000000 times in 62.938002291 seconds libc: 25000000 times in 535.581916866 seconds syscall: 25000000 times in 531.525042806 seconds Performance on powerpc 8321 (using kernel selftest): ~# ./vdso_test_getrandom bench-single vdso: 25000000 times in 16.899318858 seconds libc: 25000000 times in 131.050596522 seconds syscall: 25000000 times in 129.794790389 seconds This first patch adds support for VDSO32. As selftests cannot easily be generated only for VDSO32, and because the following patch brings support for VDSO64 anyway, this patch opts out all code in __arch_chacha20_blocks_nostack() so that vdso_test_chacha will not fail to compile and will not crash on PPC64/PPC64LE, allthough the selftest itself will fail. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso: Refactor CFLAGS for CVDSO buildChristophe Leroy
In order to avoid two much duplication when we add new VDSO functionnalities in C like getrandom, refactor common CFLAGS. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso32: Add crtsavresChristophe Leroy
Commit 08c18b63d965 ("powerpc/vdso32: Add missing _restgpr_31_x to fix build failure") added _restgpr_31_x to the vdso for gettimeofday, but the work on getrandom shows that we will need more of those functions. Remove _restgpr_31_x and link in crtsavres.o so that we get all save/restore functions when optimising the kernel for size. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso: Fix VDSO data access when running in a non-root time namespaceChristophe Leroy
When running in a non-root time namespace, the global VDSO data page is replaced by a dedicated namespace data page and the global data page is mapped next to it. Detailed explanations can be found at commit 660fd04f9317 ("lib/vdso: Prepare for time namespace support"). When it happens, __kernel_get_syscall_map and __kernel_get_tbfreq and __kernel_sync_dicache don't work anymore because they read 0 instead of the data they need. To address that, clock_mode has to be read. When it is set to VDSO_CLOCKMODE_TIMENS, it means it is a dedicated namespace data page and the global data is located on the following page. Add a macro called get_realdatapage which reads clock_mode and add PAGE_SIZE to the pointer provided by get_datapage macro when clock_mode is equal to VDSO_CLOCKMODE_TIMENS. Use this new macro instead of get_datapage macro except for time functions as they handle it internally. Fixes: 74205b3fc2ef ("powerpc/vdso: Add support for time namespaces") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Closes: https://lore.kernel.org/all/ZtnYqZI-nrsNslwy@zx2c4.com/ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-11Merge v6.11-rc7 into drm-nextSimona Vetter
Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if necessary") in drm-misc, so start the backmerge cascade. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2024-09-10powerpc/pseries/eeh: Fix pseries_eeh_err_injectNarayana Murty N
VFIO_EEH_PE_INJECT_ERR ioctl is currently failing on pseries due to missing implementation of err_inject eeh_ops for pseries. This patch implements pseries_eeh_err_inject in eeh_ops/pseries eeh_ops. Implements support for injecting MMIO load/store error for testing from user space. The check on PCI error type (bus type) code is moved to platform code, since the eeh_pe_inject_err can be allowed to more error types depending on platform requirement. Removal of the check for 'type' in eeh_pe_inject_err() doesn't impact PowerNV as pnv_eeh_err_inject() already has an equivalent check in place. Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240909140220.529333-1-nnmlinux@linux.ibm.com
2024-09-08smp: Mark smp_prepare_boot_cpu() __initBibo Mao
smp_prepare_boot_cpu() is only called during boot, hence mark it as __init. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://lore.kernel.org/all/20240907082720.452148-1-maobibo@loongson.cn
2024-09-05powerpc: Stop using no_llseekMichael Ellerman
Since commit 868941b14441 ("fs: remove no_llseek"), no_llseek() is simply defined to be NULL, and a NULL llseek means seeking is unsupported. So for statically defined file_operations, such as all these, there's no need or benefit to set llseek = no_llseek. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240903111951.141376-1-mpe@ellerman.id.au
2024-09-05powerpc/64s: Remove the "fast endian switch" syscallMichael Ellerman
The non-standard "fast endian switch" syscall was added in 2008[1], but was never widely used. It was disabled by default in 2017[2], and there's no evidence it's ever been used since. Remove it entirely. A normal endian switch syscall was added in 2015[3]. [1]: 745a14cc264b ("[POWERPC] Add fast little-endian switch system call") [2]: 529d235a0e19 ("powerpc: Add a proper syscall for switching endianness") [3]: 727f13616c45 ("powerpc: Disable the fast-endian switch syscall by default") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240823070830.1269033-1-mpe@ellerman.id.au
2024-09-05powerpc: Replace kretprobe code with rethook on powerpcAbhishek Dubey
This is an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86") to powerpc. Rethook follows the existing kretprobe implementation, but separates it from kprobes so that it can be used by fprobe (ftrace-based function entry/exit probes). As such, this patch also enables fprobe to work on powerpc. The only other change compared to the existing kretprobe implementation is doing the return address fixup in arch_rethook_fixup_return(). Reference to other archs: commit b57c2f124098 ("riscv: add riscv rethook implementation") commit 7b0a096436c2 ("LoongArch: Replace kretprobe with rethook") Note: ===== In future, rethook will be only for kretprobe, and kretprobe will be replaced by fprobe. https://lore.kernel.org/all/172000134410.63468.13742222887213469474.stgit@devnote2/ We will adapt the above implementation for powerpc once its upstream. Until then, we can have this implementation of rethook to serve current kretprobe usecases. Reviewed-by: Naveen Rao <naveen@kernel.org> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240830113131.7597-1-adubey@linux.ibm.com
2024-09-05powerpc: Constify struct kobj_typeHuang Xiaojia
'struct kobj_type' is not modified. It is only used in kobject_init_and_add()/kobject_init() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure moves some data to a read-only section, so increase over all security. On a x86_64, compiled with ppc64 defconfig: Before: ====== text data bss dec hex filename 7145 606 0 7751 1e47 arch/powerpc/kernel/cacheinfo.o 3663 384 16 4063 fdf arch/powerpc/kernel/secvar-sysfs.o After: ====== text data bss dec hex filename 7193 558 0 7751 1e47 arch/powerpc/kernel/cacheinfo.o 3663 384 16 4063 fdf arch/powerpc/kernel/secvar-sysfs.o Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826150957.3500237-1-huangxiaojia2@huawei.com
2024-09-01powerpc/vdso: refactor error handlingMichael Ellerman
Linus noticed that the error handling in __arch_setup_additional_pages() fails to clear the mm VDSO pointer if _install_special_mapping() fails. In practice there should be no actual bug, because if there's an error the VDSO pointer is cleared later in arch_setup_additional_pages(). However it's no longer necessary to set the pointer before installing the mapping. Commit c1bab64360e6 ("powerpc/vdso: Move to _install_special_mapping() and remove arch_vma_name()") reworked the code so that the VMA name comes from the vm_special_mapping.name, rather than relying on arch_vma_name(). So rework the code to only set the VDSO pointer once the mappings have been installed correctly, and remove the stale comment. Link: https://lkml.kernel.org/r/20240812082605.743814-4-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01powerpc/mm: handle VDSO unmapping via close() rather than arch_unmap()Michael Ellerman
Add a close() callback to the VDSO special mapping to handle unmapping of the VDSO. That will make it possible to remove the arch_unmap() hook entirely in a subsequent patch. Link: https://lkml.kernel.org/r/20240812082605.743814-2-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-30powerpc/vdso: Inconditionally use CFUNC macroChristophe Leroy
During merge of commit 4e991e3c16a3 ("powerpc: add CFUNC assembly label annotation") a fallback version of CFUNC macro was added at the last minute, so it can be used inconditionally. Fixes: 4e991e3c16a3 ("powerpc: add CFUNC assembly label annotation") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/0fa863f2f69b2ca4094ae066fcf1430fb31110c9.1724313540.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/32: Implement validation of emergency stackChristophe Leroy
VMAP stack added an emergency stack on powerpc/32 for when there is a stack overflow, but failed to add stack validation for that emergency stack. That validation is required for show stack. Implement it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/2439d50b019f758db4a6d7b238b06441ab109799.1724156805.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Inconditionally use task PGDIR in DTLB missesChristophe Leroy
At the time being, DATA TLB miss handlers use task PGDIR for user addresses and swapper_pg_dir for kernel addresses. Now that kernel part of swapper_pg_dir is copied into task PGDIR at PGD allocation, it is possible to avoid the above logic and always use task PGDIR. But new kernel PGD entries can still be created after init, in which case those PGD entries may miss in task PGDIR. This can be handled in DATA TLB error handler. However, it needs to be done in real mode because the missing entry might be related to the stack. So implement copy of missing PGD entry in DATA TLB miss handler just after detection of invalid PGD entry. Also replace comparison by same calculation as in previous patch to know if an address belongs to a kernel or user segment. Note that as mentioned in platforms/Kconfig.cputype, SMP is not supported on 603 processors so there is no risk of the PGD entry be populated during the fault. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/a2ba8eeb1c845eeb9e46b6fe3a5e9f841df9a033.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Inconditionally use task PGDIR in ITLB missesChristophe Leroy
Now that modules exec page tables are preallocated, the instruction TLBmiss handler can use task PGDIR inconditionally. Also revise the identification of user vs kernel user space by doing a calculation instead of a comparison: Get the segment number and subtract the number of the first kernel segment. The result is positive for kernel addresses and negative for user addresses, which means that upper 2 bits are 0 for kernel and 3 for user. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/9a3242162ad2faab8019c698e501b326a126ee9e.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Switch r0 and r3 in TLB miss handlersChristophe Leroy
In preparation of next patch that will perform some additional calculations to replace comparison, switch the use of r0 and r3 as r0 has some limitations in some instructions like 'addi/subi'. Also remove outdated comments about the meaning of each register. The registers are used for many things and it would be difficult to accurately describe all things done with a given register. The function is now small enough to get a global view without much description. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/566af5e87685b1a85d3182549c0d520ce2d8877a.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Inconditionally use task PGDIR in DTLB missesChristophe Leroy
At the time being, DATA TLB miss handlers use task PGDIR for user addresses and swapper_pg_dir for kernel addresses. Now that kernel part of swapper_pg_dir is copied into task PGDIR at PGD allocation, it is possible to avoid the above logic and always use task PGDIR. But new kernel PGD entries can still be created after init, in which case those PGD entries may miss in task PGDIR. This can be handled in DATA TLB error handler. However, it needs to be done in real mode because the missing entry might be related to the stack. So implement copy of missing PGD entry in the prolog of DATA TLB ERROR handler just after the fixup of DAR. Note that this is feasible because 8xx doesn't implement vmap or ioremap with 8Mbytes pages but only 512kbytes pages which are at PTE level. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/7a76a923d2a111f1d843d8b20b4df0c65d2f4a7b.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Inconditionally use task PGDIR in ITLB missesChristophe Leroy
Now that modules exec page tables are preallocated, the instruction TLBmiss handler can use task PGDIR inconditionally. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/774fd766a8b9bcb9173b5e677d5dad0df2d3970f.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30Revert "powerpc/8xx: Always pin kernel text TLB"Christophe Leroy
This reverts commit bccc58986a2f98e3af349c85c5f49aac7fb19ef2. When STRICT_KERNEL_RWX is selected, EXEC memory must stop where RW memory start. When pinning iTLBs it means an 8M alignment for RW data start. That may be acceptable on boards with a lot of memory but one of my supported boards only has 32 Mbytes and this forced alignment leads to a waste of almost 4 Mbytes with is more than 10% of the total memory. So revert commit bccc58986a2f ("powerpc/8xx: Always pin kernel text TLB") but don't restore previous behaviour in ITLB miss handler as now kernel PGD entries are copied into each process PGDIR. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/01b6780b860c8043b51a1ba9d83acfc6f2dde910.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Fix kernel vs user address comparisonChristophe Leroy
Since commit 9132a2e82adc ("powerpc/8xx: Define a MODULE area below kernel text"), module exec space is below PAGE_OFFSET so not only space above PAGE_OFFSET, but space above TASK_SIZE need to be seen as kernel space. Until now the problem went undetected because by default TASK_SIZE is 0x8000000 which means address space is determined by just checking upper address bit. But when TASK_SIZE is over 0x80000000, PAGE_OFFSET is used for comparison, leading to thinking module addresses are part of user space. Fix it by using TASK_SIZE instead of PAGE_OFFSET for address comparison. Fixes: 9132a2e82adc ("powerpc/8xx: Define a MODULE area below kernel text") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/3f574c9845ff0a023b46cb4f38d2c45aecd769bd.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc: Remove obsoleted declaration for _get_SPGaosheng Cui
The implementation of _get_SP() was removed in commit f4db196717c6 ("[POWERPC] Remove _get_SP"), remove the now obsolete declaration. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Update change log to refer to correct commit per Christophe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822130609.786431-2-cuigaosheng1@huawei.com
2024-08-27Merge v6.11-rc5 into drm-nextDaniel Vetter
amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might as well catch up with a backmerge and handle them all. Plus both misc and intel maintainers asked for a backmerge anyway. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-08-22powerpc/vdso: Don't discard rela sectionsChristophe Leroy
After building the VDSO, there is a verification that it contains no dynamic relocation, see commit aff69273af61 ("vdso: Improve cmd_vdso_check to check all dynamic relocations"). This verification uses readelf -r and doesn't work if rela sections are discarded. Fixes: 8ad57add77d3 ("powerpc/build: vdso linker warning for orphan sections") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/45c3e6fc76cad05ad2cac0f5b5dfb4fae86dc9d6.1724153239.git.christophe.leroy@csgroup.eu
2024-08-21powerpc/32: Convert patch_instruction() to patch_uint()Benjamin Gray
These changes are for patch_instruction() uses on data. Unlike ppc64 these should not be incorrect as-is, but using the patch_uint() alias better reflects what kind of data being patched and allows for benchmarking the effect of different patch_* implementations (e.g., skipping instruction flushing when patching data). Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Tested-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-5-bgray@linux.ibm.com
2024-08-21powerpc/64: Convert patch_instruction() to patch_u32()Benjamin Gray
This use of patch_instruction() is working on 32 bit data, and can fail if the data looks like a prefixed instruction and the extra write crosses a page boundary. Use patch_u32() to fix the write size. Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules") Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/ Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Tested-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-4-bgray@linux.ibm.com
2024-08-12powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUALChristophe Leroy
Booting with CONFIG_DEBUG_VIRTUAL leads to following warning when passing hugepage reservation on command line: Kernel command line: hugepagesz=1g hugepages=1 hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1 noreboot HugeTLB: allocating 1 of page size 1.00 GiB failed. Only allocated 0 hugepages. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/io.h:948 __alloc_bootmem_huge_page+0xd4/0x284 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc6-00396-g6b0e82791bd0-dirty #936 Hardware name: MPC8544DS e500v2 0x80210030 MPC8544 DS NIP: c1020240 LR: c10201d0 CTR: 00000000 REGS: c13fdd30 TRAP: 0700 Not tainted (6.10.0-rc6-00396-g6b0e82791bd0-dirty) MSR: 00021000 <CE,ME> CR: 44084288 XER: 20000000 GPR00: c10201d0 c13fde20 c130b560 e8000000 e8001000 00000000 00000000 c1420000 GPR08: 00000000 00028001 00000000 00000004 44084282 01066ac0 c0eb7c9c efffe149 GPR16: c0fc4228 0000005f ffffffff c0eb7d0c c0eb7cc0 c0eb7ce0 ffffffff 00000000 GPR24: c1441cec efffe153 e8001000 c14240c0 00000000 c1441d64 00000000 e8000000 NIP [c1020240] __alloc_bootmem_huge_page+0xd4/0x284 LR [c10201d0] __alloc_bootmem_huge_page+0x64/0x284 Call Trace: [c13fde20] [c10201d0] __alloc_bootmem_huge_page+0x64/0x284 (unreliable) [c13fde50] [c10207b8] hugetlb_hstate_alloc_pages+0x8c/0x3e8 [c13fdeb0] [c1021384] hugepages_setup+0x240/0x2cc [c13fdef0] [c1000574] unknown_bootoption+0xfc/0x280 [c13fdf30] [c0078904] parse_args+0x200/0x4c4 [c13fdfa0] [c1000d9c] start_kernel+0x238/0x7d0 [c13fdff0] [c0000434] set_ivor+0x12c/0x168 Code: 554aa33e 7c042840 3ce0c142 80a7427c 5109a016 50caa016 7c9a2378 7fdcf378 4180000c 7c052040 41810160 7c095040 <0fe00000> 38c00000 40800108 3c60c0eb ---[ end trace 0000000000000000 ]--- This is due to virt_addr_valid() using high_memory before it is set. high_memory is set in mem_init() using max_low_pfn, but max_low_pfn is available long before, it is set in mem_topology_setup(). So just like commit daa9ada2093e ("powerpc/mm: Fix boot crash with FLATMEM") moved the setting of max_mapnr immediately after the call to mem_topology_setup(), the same can be done for high_memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/62b69c4baad067093f39e7e60df0fe27a86b8d2a.1723100702.git.christophe.leroy@csgroup.eu
2024-08-08Merge tag 'drm-misc-next-2024-08-01' of ↵Daniel Vetter
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
2024-08-07powerpc/traps: Use backlight power constantsThomas Zimmermann
Replace FB_BLANK_ constants with their counterparts from the backlight subsystem. The values are identical, so there's no change in functionality or semantics. traps.c already includes backlight.h where the BACKLIGHT constants are defined. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240731130720.1148872-2-tzimmermann@suse.de
2024-07-29Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging to get a late RC of v6.10 before moving into v6.11. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-29treewide: context_tracking: Rename CONTEXT_* into CT_STATE_*Valentin Schneider
Context tracking state related symbols currently use a mix of the CONTEXT_ (e.g. CONTEXT_KERNEL) and CT_SATE_ (e.g. CT_STATE_MASK) prefixes. Clean up the naming and make the ctx_state enum use the CT_STATE_ prefix. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-07-21Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - In the series "treewide: Refactor heap related implementation", Kuan-Wei Chiu has significantly reworked the min_heap library code and has taught bcachefs to use the new more generic implementation. - Yury Norov's series "Cleanup cpumask.h inclusion in core headers" reworks the cpumask and nodemask headers to make things generally more rational. - Kuan-Wei Chiu has sent along some maintenance work against our sorting library code in the series "lib/sort: Optimizations and cleanups". - More library maintainance work from Christophe Jaillet in the series "Remove usage of the deprecated ida_simple_xx() API". - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the series "nilfs2: eliminate the call to inode_attach_wb()". - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB command error". - Plus the usual shower of singleton patches all over the place. Please see the relevant changelogs for details. * tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits) ia64: scrub ia64 from poison.h watchdog/perf: properly initialize the turbo mode timestamp and rearm counter tsacct: replace strncpy() with strscpy() lib/bch.c: use swap() to improve code test_bpf: convert comma to semicolon init/modpost: conditionally check section mismatch to __meminit* init: remove unused __MEMINIT* macros nilfs2: Constify struct kobj_type nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro math: rational: add missing MODULE_DESCRIPTION() macro lib/zlib: add missing MODULE_DESCRIPTION() macro fs: ufs: add MODULE_DESCRIPTION() lib/rbtree.c: fix the example typo ocfs2: add bounds checking to ocfs2_check_dir_entry() fs: add kernel-doc comments to ocfs2_prepare_orphan_dir() coredump: simplify zap_process() selftests/fpu: add missing MODULE_DESCRIPTION() macro compiler.h: simplify data_race() macro build-id: require program headers to be right after ELF header resource: add missing MODULE_DESCRIPTION() ...
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-19Merge tag 'powerpc-6.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove support for 40x CPUs & platforms - Add support to the 64-bit BPF JIT for cpu v4 instructions - Fix PCI hotplug driver crash on powernv - Fix doorbell emulation for KVM on PAPR guests (nestedv2) - Fix KVM nested guest handling of some less used SPRs - Online NUMA nodes with no CPU/memory if they have a PCI device attached - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and Vaibhav Jain. * tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits) Documentation/powerpc: Mention 40x is removed powerpc: Remove 40x leftovers macintosh/therm_windtunnel: fix module unload. powerpc: Check only single values are passed to CPU/MMU feature checks powerpc/xmon: Fix disassembly CPU feature checks powerpc: Drop clang workaround for builtin constant checks powerpc64/bpf: jit support for signed division and modulo powerpc64/bpf: jit support for sign extended mov powerpc64/bpf: jit support for sign extended load powerpc64/bpf: jit support for unconditional byte swap powerpc64/bpf: jit support for 32bit offset jmp instruction powerpc/pci: Hotplug driver bridge support pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC powerpc: add missing MODULE_DESCRIPTION() macros macintosh/mac_hid: add MODULE_DESCRIPTION() KVM: PPC: add missing MODULE_DESCRIPTION() macros powerpc/kexec: Use of_property_read_reg() powerpc/64s/radix/kfence: map __kfence_pool at page granularity powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API ...
2024-07-17printk: Add a short description string to kmsg_dump()Jocelyn Falempe
kmsg_dump doesn't forward the panic reason string to the kmsg_dumper callback. This patch adds a new struct kmsg_dump_detail, that will hold the reason and description, and pass it to the dump() callback. To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc() function and a macro for backward compatibility. I've written this for drm_panic, but it can be useful for other kmsg_dumper. It allows to see the panic reason, like "sysrq triggered crash" or "VFS: Unable to mount root fs on xxxx" on the drm panic screen. v2: * Use a struct kmsg_dump_detail to hold the reason and description pointer, for more flexibility if we want to add other parameters. (Kees Cook) * Fix powerpc/nvram_64 build, as I didn't update the forward declaration of oops_to_nvram() Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Kees Cook <kees@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240702122639.248110-1-jfalempe@redhat.com
2024-07-12init/modpost: conditionally check section mismatch to __meminit*Masahiro Yamada
This reverts commit eb8f689046b8 ("Use separate sections for __dev/ _cpu/__mem code/data"). Check section mismatch to __meminit* only when CONFIG_MEMORY_HOTPLUG=n. With this change, the linker script and modpost become simpler, and we can get rid of the __ref annotations from the memory hotplug code. [sfr@canb.auug.org.au: remove MEM_KEEP from arch/powerpc/kernel/vmlinux.lds.S] Link: https://lkml.kernel.org/r/20240710093213.2aefb25f@canb.auug.org.au Link: https://lkml.kernel.org/r/20240706160511.2331061-2-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/e500: use contiguous PMD instead of hugepdChristophe Leroy
e500 supports many page sizes among which the following size are implemented in the kernel at the time being: 4M, 16M, 64M, 256M, 1G. On e500, TLB miss for hugepages is exclusively handled by SW even on e6500 which has HW assistance for 4k pages, so there are no constraints like on the 8xx. On e500/32, all are at PGD/PMD level and can be handled as cont-PMD. On e500/64, smaller ones are on PMD while bigger ones are on PUD. Again, they can easily be handled as cont-PMD and cont-PUD instead of hugepd. On e500/32, use the pagesize bits in PTE to know if it is a PMD or a leaf entry. This works because the pagesize bits are in the last 12 bits and page tables are 4k aligned. On e500/64, use highest bit which is always 1 on PxD (Because PxD contains virtual address of a kernel memory) and always 0 on PTEs because not all bits of RPN are used/possible. Link: https://lkml.kernel.org/r/dd085987816ed2a0c70adb7e34966cb833fc03e1.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/e500: free r10 for FIND_PTEChristophe Leroy
Move r13 load after the call to FIND_PTE, and use r13 instead of r10 for storing fault address. This will allow using r10 freely in FIND_PTE in following patch to handle hugepage size. Link: https://lkml.kernel.org/r/a3ee563ad5b13c891a15d3aae6c136c44ce8aa63.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/e500: don't pre-check write access on data TLB errorChristophe Leroy
Don't pre-check write access on read-only pages on data TLB error. Load the TLB anyway and take a DSI exception when it happens. This avoids reading SPRN_ESR at every data TLB error exception. Link: https://lkml.kernel.org/r/8525518e1657d6032b7e980c1888102828d66950.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)Christophe Leroy
At the time being when CONFIG_PTE_64BIT is selected, PTE entries are 64 bits but PGD entries are still 32 bits. In order to allow leaf PMD entries, switch the PGD to 64 bits entries. Link: https://lkml.kernel.org/r/ca85397df02564e5edc3a3c27b55cf43af3e4ef3.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/8xx: rework support for 8M pages using contiguous PTE entriesChristophe Leroy
In order to fit better with standard Linux page tables layout, add support for 8M pages using contiguous PTE entries in a standard page table. Page tables will then be populated with 1024 similar entries and two PMD entries will point to that page table. The PMD entries also get a flag to tell it is addressing an 8M page, this is required for the HW tablewalk assistance. Link: https://lkml.kernel.org/r/8693d9a0408371043ca63bf9e4a9c140667af63e.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/64e: drop unused TLB miss handlersMichael Ellerman
There are two possibilities for book3e_htw_mode, PPC_HTW_E6500 or PPC_HTW_NONE. The TLB miss handlers are patched to use, respectively: - exc_[data|indstruction]_tlb_miss_e6500_book3e - exc_[data|indstruction]_tlb_miss_bolted_book3e Which means the default handlers are never used. Remove those, and use the bolted handlers (PPC_HTW_NONE) by default. Link: https://lkml.kernel.org/r/9a670adc1771fb1871fba93ace5372f7eadc286f.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-12powerpc/64e: drop MMU_FTR_TYPE_FSL_E checks in 64-bit codeMichael Ellerman
All 64-bit Book3E have MMU_FTR_TYPE_FSL_E, since A2 was removed, so remove checks for it in 64-bit only code. Link: https://lkml.kernel.org/r/2b0b0bc9752e6cece222e4e2050358da70bb631d.1719928057.git.christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-06Merge tag 'powerpc-6.10-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix unnecessary copy to 0 when kernel is booted at address 0 - Fix usercopy crash when dumping dtl via debugfs - Avoid possible crash when PCI hotplug races with error handling - Fix kexec crash caused by scv being disabled before other CPUs call-in - Fix powerpc selftests build with USERCFLAGS set Thanks to Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen, Nicholas Piggin, Sourabh Jain, Srikar Dronamraju, and Vishal Chourasia. * tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix build with USERCFLAGS set powerpc/pseries: Fix scv instruction crash with kexec powerpc/eeh: avoid possible crash when edev->pdev changes powerpc/pseries: Whitelist dtl slub object for copying to userspace powerpc/64s: Fix unnecessary copy to 0 when kernel is booted at address 0
2024-07-04powerpc/pci: Hotplug driver bridge supportKrishna Kumar
There is an issue with the hotplug operation when it's done on the bridge/switch slot. The bridge-port and devices behind the bridge, which become offline by hot-unplug operation, don't get hot-plugged/enabled by doing hot-plug operation on that slot. Only the first port of the bridge gets enabled and the remaining port/devices remain unplugged. The hot plug/unplug operation is done by the hotplug driver (drivers/pci/ hotplug/pnv_php.c). This behavior is due to missing code for the switch/bridge. The existing driver depends on pci_hp_add_devices() function for device enablement. This function calls pci_scan_slot() on only one device-node/port of the bridge, not on all the siblings' device-node/port. The missing code needs to be added which will find all the sibling device-nodes/bridge-ports and will run explicit pci_scan_slot() on those. A new function has been added for this purpose which is invoked from pci_hp_add_devices(). This new function traverse_siblings_and_scan_slot() gets all the sibling bridge-ports by traversal and explicitly invokes pci_scan_slot() on them. Tested-by: Shawn Anastasio <sanastasio@raptorengineering.com> Signed-off-by: Krishna Kumar <krishnak@linux.ibm.com> [mpe: Move the code into pci-hotplug.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240701074513.94873-3-krishnak@linux.ibm.com