Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support multiple hook locations for maint scripts of Debian package
- Remove 'cpio' from the build tool requirement
- Introduce gendwarfksyms tool, which computes CRCs for export symbols
based on the DWARF information
- Support CONFIG_MODVERSIONS for Rust
- Resolve all conflicts in the genksyms parser
- Fix several syntax errors in genksyms
* tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits)
kbuild: fix Clang LTO with CONFIG_OBJTOOL=n
kbuild: Strip runtime const RELA sections correctly
kconfig: fix memory leak in sym_warn_unmet_dep()
kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST
genksyms: fix syntax error for attribute before init-declarator
genksyms: fix syntax error for builtin (u)int*x*_t types
genksyms: fix syntax error for attribute after 'union'
genksyms: fix syntax error for attribute after 'struct'
genksyms: fix syntax error for attribute after abstact_declarator
genksyms: fix syntax error for attribute before nested_declarator
genksyms: fix syntax error for attribute before abstract_declarator
genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier
genksyms: record attributes consistently for init-declarator
genksyms: restrict direct-declarator to take one parameter-type-list
genksyms: restrict direct-abstract-declarator to take one parameter-type-list
genksyms: remove Makefile hack
genksyms: fix last 3 shift/reduce conflicts
genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts
genksyms: reduce type_qualifier directly to decl_specifier
genksyms: rename cvar_qualifier to type_qualifier
...
|
|
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes
the page allocator so we end up with the ability to allocate and
free zero-refcount pages. So that callers (ie, slab) can avoid a
refcount inc & dec
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
use large folios other than PMD-sized ones
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
and fixes for this small built-in kernel selftest
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part
of the mapletree code
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups
- "simplify split calculation" from Wei Yang provides a few fixes and
a test for the mapletree code
- "mm/vma: make more mmap logic userland testable" from Lorenzo
Stoakes continues the work of moving vma-related code into the
(relatively) new mm/vma.c
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the
page allocator
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue.
It should reduce the amount of unnecessary reading
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated:
https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
Qi's series addresses this windup by synchronously freeing PTE
memory within the context of madvise(MADV_DONTNEED)
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests
code when optional compiler warnings are enabled
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from
David Hildenbrand tightens the allocator's observance of
__GFP_HARDWALL
- "pkeys kselftests improvements" from Kevin Brodsky implements
various fixes and cleanups in the MM selftests code, mainly
pertaining to the pkeys tests
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a
tmpfs-based kernel build was demonstrated
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of
zram_write_page(). A watchdog softlockup warning was eliminated
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
Brodsky cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging
logic
- "Account page tables at all levels" from Kevin Brodsky cleans up
and regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy
- "mm/damon: replace most damon_callback usages in sysfs with new
core functions" from SeongJae Park cleans up and generalizes
DAMON's sysfs file interface logic
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is
presented in response to DAMOS actions
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park
removes DAMON's long-deprecated debugfs interfaces. Thus the
migration to sysfs is completed
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
Peter Xu cleans up and generalizes the hugetlb reservation
accounting
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting),
but also inclusion (allowing) behavior
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to
reduce the size of struct page and to enable dynamic allocation of
memory descriptors
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes
and simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel
build time with swap-on-zram
- "mm: update mips to use do_mmap(), make mmap_region() internal"
from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few
MGLRU regressions and otherwise improves MGLRU performance
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
Park updates DAMON documentation
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing
- "mm: hugetlb+THP folio and migration cleanups" from David
Hildenbrand provides various cleanups in the areas of hugetlb
folios, THP folios and migration
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for
pagecache reading and writing. To permite userspace to address
issues with massive buildup of useless pagecache when
reading/writing fast devices
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests"
* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm/compaction: fix UBSAN shift-out-of-bounds warning
s390/mm: add missing ctor/dtor on page table upgrade
kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
tools: add VM_WARN_ON_VMG definition
mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
seqlock: add missing parameter documentation for raw_seqcount_try_begin()
mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
mm/page_alloc: remove the incorrect and misleading comment
zram: remove zcomp_stream_put() from write_incompressible_page()
mm: separate move/undo parts from migrate_pages_batch()
mm/kfence: use str_write_read() helper in get_access_type()
selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
selftests/mm: vm_util: split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: unmap chunks after validation
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/memfd/memfd_test: fix possible NULL pointer dereference
mm: add FGP_DONTCACHE folio creation flag
mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
...
|
|
Before SLUB initialization, various subsystems used memblock_alloc to
allocate memory. In most cases, when memory allocation fails, an
immediate panic is required. To simplify this behavior and reduce
repetitive checks, introduce `memblock_alloc_or_panic`. This function
ensures that memory allocation failures result in a panic automatically,
improving code readability and consistency across subsystems that require
this behavior.
[guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic]
Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com
Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/
Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com
Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Have fprobes built on top of function graph infrastructure
The fprobe logic is an optimized kprobe that uses ftrace to attach to
functions when a probe is needed at the start or end of the function.
The fprobe and kretprobe logic implements a similar method as the
function graph tracer to trace the end of the function. That is to
hijack the return address and jump to a trampoline to do the trace
when the function exits. To do this, a shadow stack needs to be
created to store the original return address. Fprobes and function
graph do this slightly differently. Fprobes (and kretprobes) has
slots per callsite that are reserved to save the return address. This
is fine when just a few points are traced. But users of fprobes, such
as BPF programs, are starting to add many more locations, and this
method does not scale.
The function graph tracer was created to trace all functions in the
kernel. In order to do this, when function graph tracing is started,
every task gets its own shadow stack to hold the return address that
is going to be traced. The function graph tracer has been updated to
allow multiple users to use its infrastructure. Now have fprobes be
one of those users. This will also allow for the fprobe and kretprobe
methods to trace the return address to become obsolete. With new
technologies like CFI that need to know about these methods of
hijacking the return address, going toward a solution that has only
one method of doing this will make the kernel less complex.
- Cleanup with guard() and free() helpers
There were several places in the code that had a lot of "goto out" in
the error paths to either unlock a lock or free some memory that was
allocated. But this is error prone. Convert the code over to use the
guard() and free() helpers that let the compiler unlock locks or free
memory when the function exits.
- Remove disabling of interrupts in the function graph tracer
When function graph tracer was first introduced, it could race with
interrupts and NMIs. To prevent that race, it would disable
interrupts and not trace NMIs. But the code has changed to allow NMIs
and also interrupts. This change was done a long time ago, but the
disabling of interrupts was never removed. Remove the disabling of
interrupts in the function graph tracer is it is not needed. This
greatly improves its performance.
- Allow the :mod: command to enable tracing module functions on the
kernel command line.
The function tracer already has a way to enable functions to be
traced in modules by writing ":mod:<module>" into set_ftrace_filter.
That will enable either all the functions for the module if it is
loaded, or if it is not, it will cache that command, and when the
module is loaded that matches <module>, its functions will be
enabled. This also allows init functions to be traced. But currently
events do not have that feature.
Because enabling function tracing can be done very early at boot up
(before scheduling is enabled), the commands that can be done when
function tracing is started is limited. Having the ":mod:" command to
trace module functions as they are loaded is very useful. Update the
kernel command line function filtering to allow it.
* tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
ftrace: Implement :mod: cache filtering on kernel command line
tracing: Adopt __free() and guard() for trace_fprobe.c
bpf: Use ftrace_get_symaddr() for kprobe_multi probes
ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr
Documentation: probes: Update fprobe on function-graph tracer
selftests/ftrace: Add a test case for repeating register/unregister fprobe
selftests: ftrace: Remove obsolate maxactive syntax check
tracing/fprobe: Remove nr_maxactive from fprobe
fprobe: Add fprobe_header encoding feature
fprobe: Rewrite fprobe on function-graph tracer
s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC
ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
tracing: Add ftrace_fill_perf_regs() for perf event
tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
fprobe: Use ftrace_regs in fprobe exit handler
fprobe: Use ftrace_regs in fprobe entry handler
fgraph: Pass ftrace_regs to retfunc
fgraph: Replace fgraph_ret_regs with ftrace_regs
...
|
|
Use IS_ENABLED() for the device tree checks, so that more code is
checked by the compiler without having to build all the different
configurations.
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241218113159.422821-1-mpe@ellerman.id.au
|
|
Power Hypervisor can possibily allocate MMIO window intersecting with
Dynamic DMA Window (DDW) range, which is over 32-bit addressing.
These MMIO pages needs to be marked as reserved so that IOMMU doesn't map
DMA buffers in this range.
The current code is not marking these pages correctly which is resulting
in LPAR to OOPS while booting. The stack is at below
BUG: Unable to handle kernel data access on read at 0xc00800005cd40000
Faulting instruction address: 0xc00000000005cdac
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: af_packet rfkill ibmveth(X) lpfc(+) nvmet_fc nvmet nvme_keyring crct10dif_vpmsum nvme_fc nvme_fabrics nvme_core be2net(+) nvme_auth rtc_generic nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse configfs ip_tables x_tables xfs libcrc32c dm_service_time ibmvfc(X) scsi_transport_fc vmx_crypto gf128mul crc32c_vpmsum dm_mirror dm_region_hash dm_log dm_multipath dm_mod sd_mod scsi_dh_emc scsi_dh_rdac scsi_dh_alua t10_pi crc64_rocksoft_generic crc64_rocksoft sg crc64 scsi_mod
Supported: Yes, External
CPU: 8 PID: 241 Comm: kworker/8:1 Kdump: loaded Not tainted 6.4.0-150600.23.14-default #1 SLE15-SP6 b44ee71c81261b9e4bab5e0cde1f2ed891d5359b
Hardware name: IBM,9080-M9S POWER9 (raw) 0x4e2103 0xf000005 of:IBM,FW950.B0 (VH950_149) hv:phyp pSeries
Workqueue: events work_for_cpu_fn
NIP: c00000000005cdac LR: c00000000005e830 CTR: 0000000000000000
REGS: c00001400c9ff770 TRAP: 0300 Not tainted (6.4.0-150600.23.14-default)
MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24228448 XER: 00000001
CFAR: c00000000005cdd4 DAR: c00800005cd40000 DSISR: 40000000 IRQMASK: 0
GPR00: c00000000005e830 c00001400c9ffa10 c000000001987d00 c00001400c4fe800
GPR04: 0000080000000000 0000000000000001 0000000004000000 0000000000800000
GPR08: 0000000004000000 0000000000000001 c00800005cd40000 ffffffffffffffff
GPR12: 0000000084228882 c00000000a4c4f00 0000000000000010 0000080000000000
GPR16: c00001400c4fe800 0000000004000000 0800000000000000 c00000006088b800
GPR20: c00001401a7be980 c00001400eff3800 c000000002a2da68 000000000000002b
GPR24: c0000000026793a8 c000000002679368 000000000000002a c0000000026793c8
GPR28: 000008007effffff 0000080000000000 0000000000800000 c00001400c4fe800
NIP [c00000000005cdac] iommu_table_reserve_pages+0xac/0x100
LR [c00000000005e830] iommu_init_table+0x80/0x1e0
Call Trace:
[c00001400c9ffa10] [c00000000005e810] iommu_init_table+0x60/0x1e0 (unreliable)
[c00001400c9ffa90] [c00000000010356c] iommu_bypass_supported_pSeriesLP+0x9cc/0xe40
[c00001400c9ffc30] [c00000000005c300] dma_iommu_dma_supported+0xf0/0x230
[c00001400c9ffcb0] [c00000000024b0c4] dma_supported+0x44/0x90
[c00001400c9ffcd0] [c00000000024b14c] dma_set_mask+0x3c/0x80
[c00001400c9ffd00] [c0080000555b715c] be_probe+0xc4/0xb90 [be2net]
[c00001400c9ffdc0] [c000000000986f3c] local_pci_probe+0x6c/0x110
[c00001400c9ffe40] [c000000000188f28] work_for_cpu_fn+0x38/0x60
[c00001400c9ffe70] [c00000000018e454] process_one_work+0x314/0x620
[c00001400c9fff10] [c00000000018f280] worker_thread+0x2b0/0x620
[c00001400c9fff90] [c00000000019bb18] kthread+0x148/0x150
[c00001400c9fffe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
There are 2 issues in the code
1. The index is "int" while the address is "unsigned long". This results in
negative value when setting the bitmap.
2. The DMA offset is page shifted but the MMIO range is used as-is (64-bit
address). MMIO address needs to be page shifted as well.
Fixes: 3c33066a2190 ("powerpc/kernel/iommu: Add new iommu_table_in_use() helper")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241206210039.93172-1-gbatra@linux.ibm.com
|
|
Adds a new format for MODVERSIONS which stores each field in a separate
ELF section. This initially adds support for variable length names, but
could later be used to add additional fields to MODVERSIONS in a
backwards compatible way if needed. Any new fields will be ignored by
old user tooling, unlike the current format where user tooling cannot
tolerate adjustments to the format (for example making the name field
longer).
Since PPC munges its version records to strip leading dots, we reproduce
the munging for the new format. Other architectures do not appear to
have architecture-specific usage of this information.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Matthew Maurer <mmaurer@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.
Note that the ftrace_regs can be NULL when the arch does NOT define:
HAVE_DYNAMIC_FTRACE_WITH_ARGS or HAVE_DYNAMIC_FTRACE_WITH_REGS.
More specifically, if HAVE_DYNAMIC_FTRACE_WITH_REGS is defined but
not the HAVE_DYNAMIC_FTRACE_WITH_ARGS, and the ftrace ops used to
register the function callback does not set FTRACE_OPS_FL_SAVE_REGS.
In this case, ftrace_regs can be NULL in user callback.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173518990044.391279.17406984900626078579.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Get the ftrace recursion lock in the generic function_graph_enter()
instead of each architecture code.
This changes all function_graph tracer callbacks running in
non-preemptive state. On x86 and powerpc, this is by default, but
on the other architecutres, this will be new.
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173379653720.973433.18438622234884980494.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Remove hard-coded strings by using the str_on_off() helper function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241220191705.1446-2-thorsten.blum@linux.dev
|
|
Define preempt lazy bit for Powerpc. Use bit 9 which is free and within
16 bit range of NEED_RESCHED, so compiler can issue single andi.
Since Powerpc doesn't use the generic entry/exit, add lazy check at exit
to user. CONFIG_PREEMPTION is defined for lazy/full/rt so use it for
return to kernel.
Ran a few benchmarks and db workload on Power10. Performance is close to
preempt=none/voluntary.
Since Powerpc systems can have large core count and large memory,
preempt lazy is going to be helpful in avoiding soft lockup issues.
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241116192306.88217-2-sshegde@linux.ibm.com
|
|
VDSO text is fixed-up during init so it can't be const,
but it can be read-only after init.
Do the same as x86 in commit 018ef8dcf3de ("x86/vdso: Mark the vDSO
code read-only after init") and arm in commit 11bf9b865898 ("ARM/vdso:
Mark the vDSO code read-only after init"), move it into
ro_after_init section.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/e9892d288b646cbdfeef0b2b73edbaf6d3c6cabe.1734174500.git.christophe.leroy@csgroup.eu
|
|
For ELFv1 binaries (big endian), the ELF entry point isn't the address
of the first instruction, instead it points to the function descriptor
for the entry point. The address of the first instruction is in the
function descriptor.
That means the kernel has to fetch the address of the first instruction
from user memory.
Because start_thread() uses __get_user(), which has no access_ok()
checks, it looks like a malicious ELF binary could be crafted to point
the entry point address at kernel memory. The kernel would load 8 bytes
from kernel memory into the NIP and then start the process, it would
typically crash, but a debugger could observe the NIP value which would
be the result of reading from kernel memory.
However that's NOT possible, because there is a check in
load_elf_binary() that ensures the ELF entry point is < TASK_SIZE
(look for BAD_ADDR(elf_entry)).
However it's fragile for start_thread() to rely on a check elsewhere,
even if the ELF parser is unlikely to ever drop the check that elf_entry
is a user address.
Make it more robust by using get_user(), which checks that the address
points at userspace before doing the load. If the address doesn't point
at userspace it will just set the result to zero, and the userspace
program will crash at zero (which is fine because it's self-inflicted).
Note that it's also possible for a malicious binary to have a valid
ELF entry address, but with the first instruction address pointing into
the kernel. However that's OK, because it is blocked by the MMU, just
like any other attempt to jump into the kernel from userspace.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241216121706.26790-1-mpe@ellerman.id.au
|
|
Using mul_u64_u64_shr() provides similar calculation as mulhdu()
assembly function, but enables inlining by the compiler.
The home-made assembly function had special handling for when one of
the arguments is not a fully populated u64 but time functions use it
to multiply timebase by a calculated scale which is constructed to
have most significant bit set.
On mpc8xx sched_clock() runs 3% faster. On mpc83xx it is 2%.
As you can see below, sched_clock() is not much bigger than before:
c000cf68 <sched_clock>:
c000cf68: 7d 2d 42 a6 mftbu r9
c000cf6c: 7d 0c 42 a6 mftb r8
c000cf70: 7d 4d 42 a6 mftbu r10
c000cf74: 7c 09 50 40 cmplw r9,r10
c000cf78: 40 82 ff f0 bne c000cf68 <sched_clock>
c000cf7c: 3d 40 c1 37 lis r10,-16073
c000cf80: 38 8a b3 30 addi r4,r10,-19664
c000cf84: 80 ea b3 30 lwz r7,-19664(r10)
c000cf88: 80 64 00 14 lwz r3,20(r4)
c000cf8c: 39 40 00 00 li r10,0
c000cf90: 80 a4 00 04 lwz r5,4(r4)
c000cf94: 80 c4 00 10 lwz r6,16(r4)
c000cf98: 7c 63 40 10 subfc r3,r3,r8
c000cf9c: 80 84 00 08 lwz r4,8(r4)
c000cfa0: 7d 06 49 10 subfe r8,r6,r9
c000cfa4: 7c c7 19 d6 mullw r6,r7,r3
c000cfa8: 7d 25 18 16 mulhwu r9,r5,r3
c000cfac: 7c 08 29 d6 mullw r0,r8,r5
c000cfb0: 7c 67 18 16 mulhwu r3,r7,r3
c000cfb4: 7d 29 30 14 addc r9,r9,r6
c000cfb8: 7c a8 28 16 mulhwu r5,r8,r5
c000cfbc: 7c ca 51 14 adde r6,r10,r10
c000cfc0: 7d 67 41 d6 mullw r11,r7,r8
c000cfc4: 7d 29 00 14 addc r9,r9,r0
c000cfc8: 7c c6 01 94 addze r6,r6
c000cfcc: 7c 63 28 14 addc r3,r3,r5
c000cfd0: 7d 4a 51 14 adde r10,r10,r10
c000cfd4: 7c e7 40 16 mulhwu r7,r7,r8
c000cfd8: 7c 63 58 14 addc r3,r3,r11
c000cfdc: 7d 4a 01 94 addze r10,r10
c000cfe0: 7c 63 30 14 addc r3,r3,r6
c000cfe4: 7d 4a 39 14 adde r10,r10,r7
c000cfe8: 35 24 ff e0 addic. r9,r4,-32
c000cfec: 41 80 00 10 blt c000cffc <sched_clock+0x94>
c000cff0: 7c 63 48 30 slw r3,r3,r9
c000cff4: 38 80 00 00 li r4,0
c000cff8: 4e 80 00 20 blr
c000cffc: 21 04 00 1f subfic r8,r4,31
c000d000: 54 69 f8 7e srwi r9,r3,1
c000d004: 7d 4a 20 30 slw r10,r10,r4
c000d008: 7d 29 44 30 srw r9,r9,r8
c000d00c: 7c 64 20 30 slw r4,r3,r4
c000d010: 7d 23 53 78 or r3,r9,r10
c000d014: 4e 80 00 20 blr
Before this change:
c000d0bc <sched_clock>:
c000d0bc: 94 21 ff f0 stwu r1,-16(r1)
c000d0c0: 7c 08 02 a6 mflr r0
c000d0c4: 90 01 00 14 stw r0,20(r1)
c000d0c8: 93 e1 00 0c stw r31,12(r1)
c000d0cc: 7d 2d 42 a6 mftbu r9
c000d0d0: 7d 0c 42 a6 mftb r8
c000d0d4: 7d 4d 42 a6 mftbu r10
c000d0d8: 7c 09 50 40 cmplw r9,r10
c000d0dc: 40 82 ff f0 bne c000d0cc <sched_clock+0x10>
c000d0e0: 3f e0 c1 37 lis r31,-16073
c000d0e4: 3b ff b3 30 addi r31,r31,-19664
c000d0e8: 80 9f 00 14 lwz r4,20(r31)
c000d0ec: 80 7f 00 10 lwz r3,16(r31)
c000d0f0: 7c 84 40 10 subfc r4,r4,r8
c000d0f4: 80 bf 00 00 lwz r5,0(r31)
c000d0f8: 80 df 00 04 lwz r6,4(r31)
c000d0fc: 7c 63 49 10 subfe r3,r3,r9
c000d100: 48 00 37 85 bl c0010884 <mulhdu>
c000d104: 81 3f 00 08 lwz r9,8(r31)
c000d108: 35 49 ff e0 addic. r10,r9,-32
c000d10c: 41 80 00 20 blt c000d12c <sched_clock+0x70>
c000d110: 80 01 00 14 lwz r0,20(r1)
c000d114: 7c 83 50 30 slw r3,r4,r10
c000d118: 83 e1 00 0c lwz r31,12(r1)
c000d11c: 38 80 00 00 li r4,0
c000d120: 7c 08 03 a6 mtlr r0
c000d124: 38 21 00 10 addi r1,r1,16
c000d128: 4e 80 00 20 blr
c000d12c: 80 01 00 14 lwz r0,20(r1)
c000d130: 54 8a f8 7e srwi r10,r4,1
c000d134: 21 09 00 1f subfic r8,r9,31
c000d138: 83 e1 00 0c lwz r31,12(r1)
c000d13c: 7c 63 48 30 slw r3,r3,r9
c000d140: 7d 4a 44 30 srw r10,r10,r8
c000d144: 7c 84 48 30 slw r4,r4,r9
c000d148: 7d 43 1b 78 or r3,r10,r3
c000d14c: 7c 08 03 a6 mtlr r0
c000d150: 38 21 00 10 addi r1,r1,16
c000d154: 4e 80 00 20 blr
c0010884 <mulhdu>:
c0010884: 2c 06 00 00 cmpwi r6,0
c0010888: 2c 83 00 00 cmpwi cr1,r3,0
c001088c: 7c 8a 23 78 mr r10,r4
c0010890: 7c 84 28 16 mulhwu r4,r4,r5
c0010894: 41 82 00 14 beq c00108a8 <mulhdu+0x24>
c0010898: 7c 0a 30 16 mulhwu r0,r10,r6
c001089c: 7c ea 29 d6 mullw r7,r10,r5
c00108a0: 7c e0 38 14 addc r7,r0,r7
c00108a4: 7c 84 01 94 addze r4,r4
c00108a8: 4d 86 00 20 beqlr cr1
c00108ac: 7d 23 29 d6 mullw r9,r3,r5
c00108b0: 7d 43 28 16 mulhwu r10,r3,r5
c00108b4: 41 82 00 18 beq c00108cc <mulhdu+0x48>
c00108b8: 7c 03 31 d6 mullw r0,r3,r6
c00108bc: 7d 03 30 16 mulhwu r8,r3,r6
c00108c0: 7c e0 38 14 addc r7,r0,r7
c00108c4: 7c 84 41 14 adde r4,r4,r8
c00108c8: 7d 4a 01 94 addze r10,r10
c00108cc: 7c 84 48 14 addc r4,r4,r9
c00108d0: 7c 6a 01 94 addze r3,r10
c00108d4: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/f29e473c193c87bdbd36b209dfdee99d2f0c60dc.1733566130.git.christophe.leroy@csgroup.eu
|
|
On some powermacs `escc` nodes are missing `#size-cells` properties,
which is deprecated and now triggers a warning at boot since commit
045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells
handling").
For example:
Missing '#size-cells' in /pci@f2000000/mac-io@c/escc@13000
WARNING: CPU: 0 PID: 0 at drivers/of/base.c:133 of_bus_n_size_cells+0x98/0x108
Hardware name: PowerMac3,1 7400 0xc0209 PowerMac
...
Call Trace:
of_bus_n_size_cells+0x98/0x108 (unreliable)
of_bus_default_count_cells+0x40/0x60
__of_get_address+0xc8/0x21c
__of_address_to_resource+0x5c/0x228
pmz_init_port+0x5c/0x2ec
pmz_probe.isra.0+0x144/0x1e4
pmz_console_init+0x10/0x48
console_init+0xcc/0x138
start_kernel+0x5c4/0x694
As powermacs boot via prom_init it's possible to add the missing
properties to the device tree during boot, avoiding the warning. Note
that `escc-legacy` nodes are also missing `#size-cells` properties, but
they are skipped by the macio driver, so leave them alone.
Depends-on: 045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells handling")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20241126025710.591683-1-mpe@ellerman.id.au
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Rework kfence support for the HPT MMU to work on systems with >= 16TB
of RAM.
- Remove the powerpc "maple" platform, used by the "Yellow Dog
Powerstation".
- Add support for DYNAMIC_FTRACE_WITH_CALL_OPS,
DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines.
- Add support for running KVM nested guests on Power11.
- Other small features, cleanups and fixes.
Thanks to Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa
Shulyupin, David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert
Uytterhoeven, Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard,
Lukas Bulwahn, Madhavan Srinivasan, Markus Elfring, Michal Suchanek,
Ming Lei, Mukesh Kumar Chaurasiya, Nathan Chancellor, Naveen N Rao,
Nicholas Piggin, Nysal Jan K.A, Paulo Miguel Almeida, Pavithra Prakash,
Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P Bappalige, Shen
Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten Blum,
Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun, and zhang jiao.
* tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (89 commits)
EDAC/powerpc: Remove PPC_MAPLE drivers
powerpc/perf: Add per-task/process monitoring to vpa_pmu driver
powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch
docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Document sysfs event format entries for vpa_pmu
powerpc/perf: Add perf interface to expose vpa counters
MAINTAINERS: powerpc: Mark Maddy as "M"
powerpc/Makefile: Allow overriding CPP
powerpc-km82xx.c: replace of_node_put() with __free
ps3: Correct some typos in comments
powerpc/kexec: Fix return of uninitialized variable
macintosh: Use common error handling code in via_pmu_led_init()
powerpc/powermac: Use of_property_match_string() in pmac_has_backlight_type()
powerpc: remove dead config options for MPC85xx platform support
powerpc/xive: Use cpumask_intersects()
selftests/powerpc: Remove the path after initialization.
powerpc/xmon: symbol lookup length fixed
powerpc/ep8248e: Use %pa to format resource_size_t
powerpc/ps3: Reorganize kerneldoc parameter names
KVM: PPC: Book3S HV: Fix kmv -> kvm typo
powerpc/sstep: make emulate_vsx_load and emulate_vsx_store static
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection
algorithm. This leads to improved memory savings.
- Wei Yang has gone to town on the mapletree code, contributing several
series which clean up the implementation:
- "refine mas_mab_cp()"
- "Reduce the space to be cleared for maple_big_node"
- "maple_tree: simplify mas_push_node()"
- "Following cleanup after introduce mas_wr_store_type()"
- "refine storing null"
- The series "selftests/mm: hugetlb_fault_after_madv improvements" from
David Hildenbrand fixes this selftest for s390.
- The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
implements some rationaizations and cleanups in the page mapping
code.
- The series "mm: optimize shadow entries removal" from Shakeel Butt
optimizes the file truncation code by speeding up the handling of
shadow entries.
- The series "Remove PageKsm()" from Matthew Wilcox completes the
migration of this flag over to being a folio-based flag.
- The series "Unify hugetlb into arch_get_unmapped_area functions" from
Oscar Salvador implements a bunch of consolidations and cleanups in
the hugetlb code.
- The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
takes away the wp-fault time practice of turning a huge zero page
into small pages. Instead we replace the whole thing with a THP. More
consistent cleaner and potentiall saves a large number of pagefaults.
- The series "percpu: Add a test case and fix for clang" from Andy
Shevchenko enhances and fixes the kernel's built in percpu test code.
- The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
optimizes mremap() by avoiding doing things which we didn't need to
do.
- The series "Improve the tmpfs large folio read performance" from
Baolin Wang teaches tmpfs to copy data into userspace at the folio
size rather than as individual pages. A 20% speedup was observed.
- The series "mm/damon/vaddr: Fix issue in
damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
splitting.
- The series "memcg-v1: fully deprecate charge moving" from Shakeel
Butt removes the long-deprecated memcgv2 charge moving feature.
- The series "fix error handling in mmap_region() and refactor" from
Lorenzo Stoakes cleanup up some of the mmap() error handling and
addresses some potential performance issues.
- The series "x86/module: use large ROX pages for text allocations"
from Mike Rapoport teaches x86 to use large pages for
read-only-execute module text.
- The series "page allocation tag compression" from Suren Baghdasaryan
is followon maintenance work for the new page allocation profiling
feature.
- The series "page->index removals in mm" from Matthew Wilcox remove
most references to page->index in mm/. A slow march towards shrinking
struct page.
- The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
interface tests" from Andrew Paniakin performs maintenance work for
DAMON's self testing code.
- The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
improves zswap's batching of compression and decompression. It is a
step along the way towards using Intel IAA hardware acceleration for
this zswap operation.
- The series "kasan: migrate the last module test to kunit" from
Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
tests over to the KUnit framework.
- The series "implement lightweight guard pages" from Lorenzo Stoakes
permits userapace to place fault-generating guard pages within a
single VMA, rather than requiring that multiple VMAs be created for
this. Improved efficiencies for userspace memory allocators are
expected.
- The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
tracepoints to provide increased visibility into memcg stats flushing
activity.
- The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
fixes a zram buglet which potentially affected performance.
- The series "mm: add more kernel parameters to control mTHP" from
Maíra Canal enhances our ability to control/configuremultisize THP
from the kernel boot command line.
- The series "kasan: few improvements on kunit tests" from Sabyrzhan
Tasbolatov has a couple of fixups for the KASAN KUnit tests.
- The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
from Kairui Song optimizes list_lru memory utilization when lockdep
is enabled.
* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
mm/kfence: add a new kunit test test_use_after_free_read_nofault()
zram: fix NULL pointer in comp_algorithm_show()
memcg/hugetlb: add hugeTLB counters to memcg
vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
zram: ZRAM_DEF_COMP should depend on ZRAM
MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
Docs/mm/damon: recommend academic papers to read and/or cite
mm: define general function pXd_init()
kmemleak: iommu/iova: fix transient kmemleak false positive
mm/list_lru: simplify the list_lru walk callback function
mm/list_lru: split the lock to per-cgroup scope
mm/list_lru: simplify reparenting and initial allocation
mm/list_lru: code clean up for reparenting
mm/list_lru: don't export list_lru_add
mm/list_lru: don't pass unnecessary key parameters
kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"Bindings:
- Enable dtc "interrupt_provider" warnings for binding examples. Fix
the warnings in fsl,mu-msi and ti,sci-inta due to this.
- Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
altr,fpga-passive-serial to DT schema format
- Add some documentation on the different forms of YAML text blocks
which are a constant source of review comments
- Fix some schema errors in constraints for arrays
- Add compatibles for qcom,sar2130p-pdc and onnn,adt7462
DT core:
- Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
- Add some warnings on deprecated address handling
- Rework early_init_dt_scan() so the arch can pass in the phys
address of the DTB as __pa() is not always valid to use. This fixes
a warning for arm64 with kexec.
- Add and use some new DT graph iterators for iterating over ports
and endpoints
- Rework reserved-memory handling to be sized dynamically for fixed
regions
- Optimize of_modalias() to avoid a strlen() call
- Constify struct device_node and property pointers where ever
possible"
* tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (36 commits)
of: Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
dt-bindings: interrupt-controller: qcom,pdc: Add SAR2130P compatible
of/address: Rework bus matching to avoid warnings
of: WARN on deprecated #address-cells/#size-cells handling
of/fdt: Don't use default address cell sizes for address translation
dt-bindings: Enable dtc "interrupt_provider" warnings
of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
dt-bindings: cache: qcom,llcc: Fix X1E80100 reg entries
dt-bindings: watchdog: convert zii,rave-sp-wdt.txt to yaml format
dt-bindings: input: convert zii,rave-sp-pwrbutton.txt to yaml
media: xilinx-tpg: use new of_graph functions
fbdev: omapfb: use new of_graph functions
gpu: drm: omapdrm: use new of_graph functions
ASoC: audio-graph-card2: use new of_graph functions
ASoC: audio-graph-card: use new of_graph functions
ASoC: test-component: use new of_graph functions
of: property: use new of_graph functions
of: property: add of_graph_get_next_port_endpoint()
of: property: add of_graph_get_next_port()
of: module: remove strlen() call in of_modalias()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Restructure the function graph shadow stack to prepare it for use
with kretprobes
With the goal of merging the shadow stack logic of function graph and
kretprobes, some more restructuring of the function shadow stack is
required.
Move out function graph specific fields from the fgraph
infrastructure and store it on the new stack variables that can pass
data from the entry callback to the exit callback.
Hopefully, with this change, the merge of kretprobes to use fgraph
shadow stacks will be ready by the next merge window.
- Make shadow stack 4k instead of using PAGE_SIZE.
Some architectures have very large PAGE_SIZE values which make its
use for shadow stacks waste a lot of memory.
- Give shadow stacks its own kmem cache.
When function graph is started, every task on the system gets a
shadow stack. In the future, shadow stacks may not be 4K in size.
Have it have its own kmem cache so that whatever size it becomes will
still be efficient in allocations.
- Initialize profiler graph ops as it will be needed for new updates to
fgraph
- Convert to use guard(mutex) for several ftrace and fgraph functions
- Add more comments and documentation
- Show function return address in function graph tracer
Add an option to show the caller of a function at each entry of the
function graph tracer, similar to what the function tracer does.
- Abstract out ftrace_regs from being used directly like pt_regs
ftrace_regs was created to store a partial pt_regs. It holds only the
registers and stack information to get to the function arguments and
return values. On several archs, it is simply a wrapper around
pt_regs. But some users would access ftrace_regs directly to get the
pt_regs which will not work on all archs. Make ftrace_regs an
abstract structure that requires all access to its fields be through
accessor functions.
- Show how long it takes to do function code modifications
When code modification for function hooks happen, it always had the
time recorded in how long it took to do the conversion. But this
value was never exported. Recently the code was touched due to new
ROX modification handling that caused a large slow down in doing the
modifications and had a significant impact on boot times.
Expose the timings in the dyn_ftrace_total_info file. This file was
created a while ago to show information about memory usage and such
to implement dynamic function tracing. It's also an appropriate file
to store the timings of this modification as well. This will make it
easier to see the impact of changes to code modification on boot up
timings.
- Other clean ups and small fixes
* tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits)
ftrace: Show timings of how long nop patching took
ftrace: Use guard to take ftrace_lock in ftrace_graph_set_hash()
ftrace: Use guard to take the ftrace_lock in release_probe()
ftrace: Use guard to lock ftrace_lock in cache_mod()
ftrace: Use guard for match_records()
fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph()
fgraph: Give ret_stack its own kmem cache
fgraph: Separate size of ret_stack from PAGE_SIZE
ftrace: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
selftests/ftrace: Fix check of return value in fgraph-retval.tc test
ftrace: Use arch_ftrace_regs() for ftrace_regs_*() macros
ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs
ftrace: Make ftrace_regs abstract from direct use
fgragh: No need to invoke the function call_filter_check_discard()
fgraph: Simplify return address printing in function graph tracer
function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr()
function_graph: Support recording and printing the function return address
ftrace: Have calltime be saved in the fgraph storage
ftrace: Use a running sleeptime instead of saving on shadow stack
fgraph: Use fgraph data to store subtime for profiler
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.
Cure this by:
- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.
- Queuing ignored timer signals onto a seperate ignored list.
- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.
- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.
This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
- Move all sleep and timeout functions into one file
- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.
- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.
- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place
- Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.
This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less
straight forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.
- Drivers:
- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.
- Mostly boring cleanups, fixes, improvements and code movement"
* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull vdso data page handling updates from Thomas Gleixner:
"First steps of consolidating the VDSO data page handling.
The VDSO data page handling is architecture specific for historical
reasons, but there is no real technical reason to do so.
Aside of that VDSO data has become a dump ground for various
mechanisms and fail to provide a clear separation of the
functionalities.
Clean this up by:
- consolidating the VDSO page data by getting rid of architecture
specific warts especially in x86 and PowerPC.
- removing the last includes of header files which are pulling in
other headers outside of the VDSO namespace.
- seperating timekeeping and other VDSO data accordingly.
Further consolidation of the VDSO page handling is done in subsequent
changes scheduled for the next merge window.
This also lays the ground for expanding the VDSO time getters for
independent PTP clocks in a generic way without making every
architecture add support seperately"
* tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/vdso: Add missing brackets in switch case
vdso: Rename struct arch_vdso_data to arch_vdso_time_data
powerpc: Split systemcfg struct definitions out from vdso
powerpc: Split systemcfg data out of vdso data page
powerpc: Add kconfig option for the systemcfg page
powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors
powerpc/pseries/lparcfg: Fix printing of system_active_processors
powerpc/procfs: Propagate error of remap_pfn_range()
powerpc/vdso: Remove offset comment from 32bit vdso_arch_data
x86/vdso: Split virtual clock pages into dedicated mapping
x86/vdso: Delete vvar.h
x86/vdso: Access vdso data without vvar.h
x86/vdso: Move the rng offset to vsyscall.h
x86/vdso: Access rng vdso data without vvar.h
x86/vdso: Access timens vdso data without vvar.h
x86/vdso: Allocate vvar page from C code
x86/vdso: Access rng data from kernel without vvar
x86/vdso: Place vdso_data at beginning of vvar page
x86/vdso: Use __arch_get_vdso_data() to access vdso data
x86/mm/mmap: Remove arch_vma_name()
...
|
|
Pull xattr updates from Al Viro:
"Sanitize xattr and io_uring interactions with it, add *xattrat()
syscalls, sanitize struct filename handling in there"
* tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr: remove redundant check on variable err
fs/xattr: add *at family syscalls
new helpers: file_removexattr(), filename_removexattr()
new helpers: file_listxattr(), filename_listxattr()
replace do_getxattr() with saner helpers.
replace do_setxattr() with saner helpers.
new helper: import_xattr_name()
fs: rename struct xattr_ctx to kernel_xattr_ctx
xattr: switch to CLASS(fd)
io_[gs]etxattr_prep(): just use getname()
io_uring: IORING_OP_F[GS]ETXATTR is fine with REQ_F_FIXED_FILE
getname_maybe_null() - the third variant of pathname copy-in
teach filename_lookup() to treat NULL filename as ""
|
|
Unlike all other arches, powerpc doesn't allow the user to override CPP,
because it sets it unconditionally in the arch Makefile. This can lead
to strange build failures.
Instead add the required flags to KBUILD_CPPFLAGS, which are passed
to CPP, CC and AS invocations by the generic Makefile logic.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/all/20240607061629.530301-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[mpe: Rebase, write change log, add Arnd's SoB as communicated privately]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241107112646.32401-1-mpe@ellerman.id.au
|
|
These offsets are not used anymore, delete them.
Fixes: c39b1dcf055d ("powerpc/vdso: Add a page for non-time data")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241113-vdso-powerpc-asm-offsets-v1-1-3f7e589f090d@linutronix.de
|
|
On a system with n CPUs and m interrupts, there will be n*m decimal
values yielded via seq_printf(.."%10u "..) which is less efficient
than seq_put_decimal_ull_width(), stress reading /proc/interrupts
indicates ~30% performance improvement with this patch.
Signed-off-by: David Wang <00107082@163.com>
[mpe: Flesh out change log based on original submission]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/all/20241103080552.4787-1-00107082@163.com
Link: https://patch.msgid.link/20241108162327.9887-1-00107082@163.com
|
|
As per the kernel documentation[1], hardlockup detector should
be disabled in KVM guests as it may give false positives. On
PPC, hardlockup detector is enabled inside KVM guests because
disable_hardlockup_detector() is marked as early_initcall and it
relies on kvm_guest static key (is_kvm_guest()) which is initialized
later during boot by check_kvm_guest(), which is a core_initcall.
check_kvm_guest() is also called in pSeries_smp_probe(), which is called
before initcalls, but it is skipped if KVM guest does not have doorbell
support or if the guest is launched with SMT=1.
Call check_kvm_guest() in disable_hardlockup_detector() so that
is_kvm_guest() check goes through fine and hardlockup detector can be
disabled inside the KVM guest.
[1]: Documentation/admin-guide/sysctl/kernel.rst
Fixes: 633c8e9800f3 ("powerpc/pseries: Enable hardlockup watchdog for PowerVM partitions")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241108094839.33084-1-gautam@linux.ibm.com
|
|
The param area is a memory region where the kernel places additional
command-line arguments for fadump kernel. Currently, the param memory
area is reserved in fadump kernel if it is above boot_mem_top. However,
it should be reserved if it is below boot_mem_top because the fadump
kernel already reserves memory from boot_mem_top to the end of DRAM.
Currently, there is no impact from not reserving param memory if it is
below boot_mem_top, as it is not used after the early boot phase of the
fadump kernel. However, if this changes in the future, it could lead to
issues in the fadump kernel.
Fixes: 3416c9daa6b1 ("powerpc/fadump: pass additional parameters when fadump is active")
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241107055817.489795-2-sourabhjain@linux.ibm.com
|
|
Memory for passing additional parameters to fadump capture kernel
is allocated during subsys_initcall level, using memblock. But
as slab is already available by this time, allocation happens via
the buddy allocator. This may work for radix MMU but is likely to
fail in most cases for hash MMU as hash MMU needs this memory in
the first memory block for it to be accessible in real mode in the
capture kernel (second boot). So, allocate memory for additional
parameters area as soon as MMU mode is obvious.
Fixes: 683eab94da75 ("powerpc/fadump: setup additional parameters for dump capture kernel")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Closes: https://lore.kernel.org/lkml/a70e4064-a040-447b-8556-1fd02f19383d@linux.vnet.ibm.com/T/#u
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241107055817.489795-1-sourabhjain@linux.ibm.com
|
|
Several architectures support text patching, but they name the header
files that declare patching functions differently.
Make all such headers consistently named text-patching.h and add an empty
header in asm-generic for architectures that do not support text patching.
Link: https://lkml.kernel.org/r/20241023162711.2579610-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add the four syscalls setxattrat(), getxattrat(), listxattrat() and
removexattrat(). Those can be used to operate on extended attributes,
especially security related ones, either relative to a pinned directory
or on a file descriptor without read access, avoiding a
/proc/<pid>/fd/<fd> detour, requiring a mounted procfs.
One use case will be setfiles(8) setting SELinux file contexts
("security.selinux") without race conditions and without a file
descriptor opened with read access requiring SELinux read permission.
Use the do_{name}at() pattern from fs/open.c.
Pass the value of the extended attribute, its length, and for
setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added
struct xattr_args to not exceed six syscall arguments and not
merging the AT_* and XATTR_* flags.
[AV: fixes by Christian Brauner folded in, the entire thing rebased on
top of {filename,file}_...xattr() primitives, treatment of empty
pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling
is cheap, so f...(2) can use it]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
CC: x86@kernel.org
CC: linux-alpha@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-arm-kernel@lists.infradead.org
CC: linux-ia64@vger.kernel.org
CC: linux-m68k@lists.linux-m68k.org
CC: linux-mips@vger.kernel.org
CC: linux-parisc@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-s390@vger.kernel.org
CC: linux-sh@vger.kernel.org
CC: sparclinux@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: audit@vger.kernel.org
CC: linux-arch@vger.kernel.org
CC: linux-api@vger.kernel.org
CC: linux-security-module@vger.kernel.org
CC: selinux@vger.kernel.org
[brauner: slight tweaks]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove hard-coded strings by using the str_enabled_disabled() helper
function.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241027222219.1173-2-thorsten.blum@linux.dev
|
|
Under certain conditions, the 64-bit '-mstack-protector-guard' flags may
end up in the 32-bit vDSO flags, resulting in build failures due to the
structure of clang's argument parsing of the stack protector options,
which validates the arguments of the stack protector guard flags
unconditionally in the frontend, choking on the 64-bit values when
targeting 32-bit:
clang: error: invalid value 'r13' in 'mstack-protector-guard-reg=', expected one of: r2
clang: error: invalid value 'r13' in 'mstack-protector-guard-reg=', expected one of: r2
make[3]: *** [arch/powerpc/kernel/vdso/Makefile:85: arch/powerpc/kernel/vdso/vgettimeofday-32.o] Error 1
make[3]: *** [arch/powerpc/kernel/vdso/Makefile:87: arch/powerpc/kernel/vdso/vgetrandom-32.o] Error 1
Remove these flags by adding them to the CC32FLAGSREMOVE variable, which
already handles situations similar to this. Additionally, reformat and
align a comment better for the expanding CONFIG_CC_IS_CLANG block.
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030-powerpc-vdso-drop-stackp-flags-clang-v1-1-d95e7376d29c@kernel.org
|
|
The systemcfg data has nothing to do anymore with the vdso.
Split it into a dedicated header file.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-27-b64f0842d512@linutronix.de
|
|
The systemcfg data only has minimal overlap with the vdso data.
Splitting the two avoids mapping the implementation-defined vdso data
into /proc/ppc64/systemcfg.
It is also a preparation for the standardization of vdso data storage.
The only field actually used by both systemcfg and vdso is
tb_ticks_per_sec and it is only changed once during time_init().
Initialize it in both structures there.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-26-b64f0842d512@linutronix.de
|
|
The systemcfg page through procfs is only a backwards-compatible
interface for very old applications.
Make it possible to be disabled.
This also creates a convenient config #define to guard any accesses to
the systemcfg page.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-25-b64f0842d512@linutronix.de
|
|
If the operation fails and userspace is unaware it will access unmapped
memory, crashing the process.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-22-b64f0842d512@linutronix.de
|
|
Add support for DYNAMIC_FTRACE_WITH_DIRECT_CALLS similar to the arm64
implementation.
ftrace direct calls allow custom trampolines to be called into directly
from function ftrace call sites, bypassing the ftrace trampoline
completely. This functionality is currently utilized by BPF trampolines
to hook into kernel function entries.
Since we have limited relative branch range, we support ftrace direct
calls through support for DYNAMIC_FTRACE_WITH_CALL_OPS. In this
approach, ftrace trampoline is not entirely bypassed. Rather, it is
re-purposed into a stub that reads direct_call field from the associated
ftrace_ops structure and branches into that, if it is not NULL. For
this, it is sufficient if we can ensure that the ftrace trampoline is
reachable from all traceable functions.
When multiple ftrace_ops are associated with a call site, we utilize a
call back to set pt_regs->orig_gpr3 that can then be tested on the
return path from the ftrace trampoline to branch into the direct caller.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-16-hbathini@linux.ibm.com
|
|
Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the
arm64 implementation.
This works by patching-in a pointer to an associated ftrace_ops
structure before each traceable function. If multiple ftrace_ops are
associated with a call site, then a special ftrace_list_ops is used to
enable iterating over all the registered ftrace_ops. If no ftrace_ops
are associated with a call site, then a special ftrace_nop_ops structure
is used to render the ftrace call as a no-op. ftrace trampoline can then
read the associated ftrace_ops for a call site by loading from an offset
from the LR, and branch directly to the associated function.
The primary advantage with this approach is that we don't have to
iterate over all the registered ftrace_ops for call sites that have a
single ftrace_ops registered. This is the equivalent of implementing
support for dynamic ftrace trampolines, which set up a special ftrace
trampoline for each registered ftrace_ops and have individual call sites
branch into those directly.
A secondary advantage is that this gives us a way to add support for
direct ftrace callers without having to resort to using stubs. The
address of the direct call trampoline can be loaded from the ftrace_ops
structure.
To support this, we reserve a nop before each function on 32-bit
powerpc. For 64-bit powerpc, two nops are reserved before each
out-of-line stub. During ftrace activation, we update this location with
the associated ftrace_ops pointer. Then, on ftrace entry, we load from
this location and call into ftrace_ops->func().
For 64-bit powerpc, we ensure that the out-of-line stub area is
doubleword aligned so that ftrace_ops address can be updated atomically.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-15-hbathini@linux.ibm.com
|
|
We are restricted to a .text size of ~32MB when using out-of-line
function profile sequence. Allow this to be extended up to the previous
limit of ~64MB by reserving space in the middle of .text.
A new config option CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE is
introduced to specify the number of function stubs that are reserved in
.text. On boot, ftrace utilizes stubs from this area first before using
the stub area at the end of .text.
A ppc64le defconfig has ~44k functions that can be traced. A more
conservative value of 32k functions is chosen as the default value of
PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE so that we do not allot more space
than necessary by default. If building a kernel that only has 32k
trace-able functions, we won't allot any more space at the end of .text
during the pass on vmlinux.o. Otherwise, only the remaining functions
get space for stubs at the end of .text. This default value should help
cover a .text size of ~48MB in total (including space reserved at the
end of .text which can cover up to 32MB), which should be sufficient for
most common builds. For a very small kernel build, this can be set to 0.
Or, this can be bumped up to a larger value to support vmlinux .text
size up to ~64MB.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-14-hbathini@linux.ibm.com
|
|
Function profile sequence on powerpc includes two instructions at the
beginning of each function:
mflr r0
bl ftrace_caller
The call to ftrace_caller() gets nop'ed out during kernel boot and is
patched in when ftrace is enabled.
Given the sequence, we cannot return from ftrace_caller with 'blr' as we
need to keep LR and r0 intact. This results in link stack (return
address predictor) imbalance when ftrace is enabled. To address that, we
would like to use a three instruction sequence:
mflr r0
bl ftrace_caller
mtlr r0
Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
reserve two instruction slots before the function. This results in a
total of five instruction slots to be reserved for ftrace use on each
function that is traced.
Move the function profile sequence out-of-line to minimize its impact.
To do this, we reserve a single nop at function entry using
-fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
the total number of functions that can be traced. This is then used to
generate a .S file reserving the appropriate amount of space for use as
ftrace stubs, which is built and linked into vmlinux.
On bootup, the stub space is split into separate stubs per function and
populated with the proper instruction sequence. A pointer to the
associated stub is maintained in dyn_arch_ftrace.
For modules, space for ftrace stubs is reserved from the generic module
stub space.
This is restricted to and enabled by default only on 64-bit powerpc,
though there are some changes to accommodate 32-bit powerpc. This is
done so that 32-bit powerpc could choose to opt into this based on
further tests and benchmarks.
As an example, after this patch, kernel functions will have a single nop
at function entry:
<kernel_clone>:
addis r2,r12,467
addi r2,r2,-16028
nop
mfocrf r11,8
...
When ftrace is enabled, the nop is converted to an unconditional branch
to the stub associated with that function:
<kernel_clone>:
addis r2,r12,467
addi r2,r2,-16028
b ftrace_ool_stub_text_end+0x11b28
mfocrf r11,8
...
The associated stub:
<ftrace_ool_stub_text_end+0x11b28>:
mflr r0
bl ftrace_caller
mtlr r0
b kernel_clone+0xc
...
This change showed an improvement of ~10% in null_syscall benchmark on a
Power 10 system with ftrace enabled.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-13-hbathini@linux.ibm.com
|
|
Move the ftrace stub used to cover inittext before _einittext so that it
is within kernel text, as seen through core_kernel_text(). This is
required for a subsequent change to ftrace.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-9-hbathini@linux.ibm.com
|
|
To simplify upcoming changes to ftrace, add a check to skip actual
instruction patching if the old and new instructions are the same. We
still validate that the instruction is what we expect, but don't
actually patch the same instruction again.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-8-hbathini@linux.ibm.com
|
|
Pointer to struct module is only relevant for ftrace records belonging
to kernel modules. Having this field in dyn_arch_ftrace wastes memory
for all ftrace records belonging to the kernel. Remove the same in
favour of looking up the module from the ftrace record address, similar
to other architectures.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-7-hbathini@linux.ibm.com
|
|
Minor refactor for converting #ifdef to IS_ENABLED().
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-6-hbathini@linux.ibm.com
|
|
On 32-bit powerpc, gcc generates a three instruction sequence for
function profiling:
mflr r0
stw r0, 4(r1)
bl _mcount
On kernel boot, the call to _mcount() is nop-ed out, to be patched back
in when ftrace is actually enabled. The 'stw' instruction therefore is
not necessary unless ftrace is enabled. Nop it out during ftrace init.
When ftrace is enabled, we want the 'stw' so that stack unwinding works
properly. Perform the same within the ftrace handler, similar to 64-bit
powerpc.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-5-hbathini@linux.ibm.com
|
|
Gcc v5.x emits a 3-instruction sequence for -mprofile-kernel:
mflr r0
std r0, 16(r1)
bl _mcount
Gcc v6.x moved to a simpler 2-instruction sequence by removing the 'std'
instruction. The store saved the return address in the LR save area in
the caller stack frame for stack unwinding. However, with dynamic
ftrace, we no longer have a call to _mcount on kernel boot when ftrace
is not enabled. When ftrace is enabled, that store is performed within
ftrace_caller(). As such, the additional 'std' instruction is redundant.
Nop it out on kernel boot.
With this change, we now use the same 2-instruction profiling sequence
with both -mprofile-kernel, as well as -fpatchable-function-entry on
64-bit powerpc.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-4-hbathini@linux.ibm.com
|
|
Rather than hard-coding the offset into a function to be used to
determine if a kprobe is at function entry, use ftrace_location() to
determine the ftrace location within the function and categorize all
instructions till that offset to be function entry.
For functions that cannot be traced, we fall back to using a fixed
offset of 8 (two instructions) to categorize a probe as being at
function entry for 64-bit elfv2, unless we are using pcrel.
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-3-hbathini@linux.ibm.com
|
|
So far, we have relied on the fact that gcc supports both
-mprofile-kernel, as well as -fpatchable-function-entry, and clang
supports neither. Our Makefile only checks for CONFIG_MPROFILE_KERNEL to
decide which files to build. Clang has a feature request out [*] to
implement -fpatchable-function-entry, and is unlikely to support
-mprofile-kernel.
Update our Makefile checks so that we pick up the correct files to build
once clang picks up support for -fpatchable-function-entry.
[*] https://github.com/llvm/llvm-project/issues/57031
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-2-hbathini@linux.ibm.com
|
|
__pa() is only intended to be used for linear map addresses and using
it for initial_boot_params which is in fixmap for arm64 will give an
incorrect value. Hence save the physical address when it is known at
boot time when calling early_init_dt_scan for arm64 and use it at kexec
time instead of converting the virtual address using __pa().
Note that arm64 doesn't need the FDT region reserved in the DT as the
kernel explicitly reserves the passed in FDT. Therefore, only a debug
warning is fixed with this change.
Reported-by: Breno Leitao <leitao@debian.org>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Fixes: ac10be5cdbfa ("arm64: Use common of_kexec_alloc_and_setup_fdt()")
Link: https://lore.kernel.org/r/20241023171426.452688-1-usamaarif642@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|