Age | Commit message (Collapse) | Author |
|
Recreating objtool errors can be a manual process. Kbuild removes the
object, so it has to be compiled or linked again before running objtool.
Then the objtool args need to be reversed engineered.
Make that all easier by automatically making a backup of the object file
on error, and print a modified version of the args which can be used to
recreate.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/7571e30636359b3e173ce6e122419452bb31882f.1741975349.git.jpoimboe@kernel.org
|
|
This is similar to GCC's behavior and makes it more obvious why the
build failed.
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/r/56f0565b15b4b4caa9a08953fa9c679dfa973514.1741975349.git.jpoimboe@kernel.org
|
|
Any objtool warning has the potential of reflecting (or triggering) a
major bug in the kernel or compiler which could result in crashing the
kernel or breaking the livepatch consistency model.
In preparation for failing the build on objtool errors/warnings, add a
new --Werror option.
[ jpoimboe: commit log, comments, error out on fatal errors too ]
Co-developed-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/e423ea4ec297f510a108aa6c78b52b9fe30fa8c1.1741975349.git.jpoimboe@kernel.org
|
|
Add option to allow writing the changed binary to a separate file rather
than changing it in place.
Libelf makes this suprisingly hard, so take the easy way out and just
copy the file before editing it.
Also steal the -o short option from --orc. Nobody will notice ;-)
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/0da308d42d82b3bbed16a31a72d6bde52afcd6bd.1741975349.git.jpoimboe@kernel.org
|
|
Force the user to fix their cmdline if they forget the '--link' option.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/r/8380bbf3a0fa86e03fd63f60568ae06a48146bc1.1741975349.git.jpoimboe@kernel.org
|
|
The option validations are a bit scattered, consolidate them.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/r/8f886502fda1d15f39d7351b70d4ebe5903da627.1741975349.git.jpoimboe@kernel.org
|
|
With unret validation enabled and IBT/LTO disabled, objtool runs on TUs
with --rethunk and on vmlinux.o with --unret. So this dependency isn't
valid as they don't always run on the same object.
This error never triggered before because --unret is always coupled with
--noinstr, so the first conditional in opts_valid() returns early due to
opts.noinstr being true.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/c6f5635784a28ed4b10ac4307b1858e015e6eff0.1741975349.git.jpoimboe@kernel.org
|
|
Increase the per-function WARN_FUNC() rate limit from 1 to 2. If the
number of warnings for a given function goes beyond 2, print "skipping
duplicate warning(s)". This helps root out additional warnings in a
function that might be hiding behind the first one.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/aec318d66c037a51c9f376d6fb0e8ff32812a037.1741975349.git.jpoimboe@kernel.org
|
|
Fix some outdated information in the objtool doc.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/2552ee8b48631127bf269359647a7389edf5f002.1741975349.git.jpoimboe@kernel.org
|
|
Clarify what needs to be done to resolve the missing __noreturn warning.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/ab835a35d00bacf8aff0b56257df93f14fdd8224.1741975349.git.jpoimboe@kernel.org
|
|
Make sure all fatal errors are funneled through the 'out' label with a
negative ret.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/r/0f49d6a27a080b4012e84e6df1e23097f44cc082.1741975349.git.jpoimboe@kernel.org
|
|
The CONFIG_X86_ESPFIX64 version of exc_double_fault() can return to its
caller, but the !CONFIG_X86_ESPFIX64 version never does. In the latter
case the compiler and/or objtool may consider it to be implicitly
noreturn.
However, due to the currently inflexible way objtool detects noreturns,
a function's noreturn status needs to be consistent across configs.
The current workaround for this issue is to suppress unreachable
warnings for exc_double_fault()'s callers. Unfortunately that can
result in ORC coverage gaps and potentially worse issues like inert
static calls and silently disabled CPU mitigations.
Instead, prevent exc_double_fault() from ever being implicitly marked
noreturn by forcing a return behind a never-taken conditional.
Until a more integrated noreturn detection method exists, this is likely
the least objectionable workaround.
Fixes: 55eeab2a8a11 ("objtool: Ignore exc_double_fault() __noreturn warnings")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/r/d1f4026f8dc35d0de6cc61f2684e0cb6484009d1.1741975349.git.jpoimboe@kernel.org
|
|
The objtool program need to analysis the control flow of each object file
generated by compiler toolchain, it needs to know all the locations that
a branch instruction may jump into, if a jump table is used, objtool has
to correlate the jump instruction with the table.
On x86 (which is the only port supported by objtool before LoongArch),
there is a relocation type on the jump instruction and directly points
to the table. But on LoongArch, the relocation is on another kind of
instruction prior to the jump instruction, and also with scheduling it
is not very easy to tell the offset of that instruction from the jump
instruction. Furthermore, because LoongArch has -fsection-anchors (often
enabled at -O1 or above) the relocation may actually points to a section
anchor instead of the table itself.
For the jump table of switch cases, a GCC patch "LoongArch: Add support
to annotate tablejump" and a Clang patch "[LoongArch] Add options for
annotate tablejump" have been merged into the upstream mainline, it can
parse the additional section ".discard.tablejump_annotate" which stores
the jump info as pairs of addresses, each pair contains the address of
jump instruction and the address of jump table.
For the jump table of computed gotos, it is indeed not easy to implement
in the compiler, especially if there is more than one computed goto in a
function such as ___bpf_prog_run(). objdump kernel/bpf/core.o shows that
there are many table jump instructions in ___bpf_prog_run(), but there are
no relocations on the table jump instructions and to the table directly on
LoongArch.
Without the help of compiler, in order to figure out the address of goto
table for the special case of ___bpf_prog_run(), since the instruction
sequence is relatively single and stable, it makes sense to add a helper
find_reloc_of_rodata_c_jump_table() to find the relocation which points
to the section ".rodata..c_jump_table".
If find_reloc_by_table_annotate() failed, it means there is no relocation
info of switch table address in ".rela.discard.tablejump_annotate", then
objtool may find the relocation info of goto table ".rodata..c_jump_table"
with find_reloc_of_rodata_c_jump_table().
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20250211115016.26913-6-yangtiezhu@loongson.cn
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
The objtool program need to analysis the control flow of each object file
generated by compiler toolchain, it needs to know all the locations that
a branch instruction may jump into, if a jump table is used, objtool has
to correlate the jump instruction with the table.
On x86 (which is the only port supported by objtool before LoongArch),
there is a relocation type on the jump instruction and directly points
to the table. But on LoongArch, the relocation is on another kind of
instruction prior to the jump instruction, and also with scheduling it
is not very easy to tell the offset of that instruction from the jump
instruction. Furthermore, because LoongArch has -fsection-anchors (often
enabled at -O1 or above) the relocation may actually points to a section
anchor instead of the table itself.
The good news is that after continuous analysis and discussion, at last
a GCC patch "LoongArch: Add support to annotate tablejump" and a Clang
patch "[LoongArch] Add options for annotate tablejump" have been merged
into the upstream mainline, the compiler changes make life much easier
for switch table support of objtool on LoongArch.
By now, there is an additional section ".discard.tablejump_annotate" to
store the jump info as pairs of addresses, each pair contains the address
of jump instruction and the address of jump table.
In order to find switch table, it is easy to parse the relocation section
".rela.discard.tablejump_annotate" to get table_sec and table_offset, the
rest process is somehow like x86.
Additionally, it needs to get each table size. When compiling on LoongArch,
there are unsorted table offsets of rodata if there exist many jump tables,
it will get the wrong table end and find the wrong table jump destination
instructions in add_jump_table().
Sort the rodata table offset by parsing ".rela.discard.tablejump_annotate"
and then get each table size of rodata corresponded with each table jump
instruction, it is used to check the table end and will break the process
when parsing ".rela.rodata" to avoid getting the wrong jump destination
instructions.
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0ee028f55640
Link: https://github.com/llvm/llvm-project/commit/4c2c17756739
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20250211115016.26913-5-yangtiezhu@loongson.cn
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
For the most part, an absolute relocation type is used for rodata.
In the case of STT_SECTION, reloc->sym->offset is always zero, for
the other symbol types, reloc_addend(reloc) is always zero, thus it
can use a simple statement "reloc->sym->offset + reloc_addend(reloc)"
to obtain the symbol offset for various symbol types.
When compiling on LoongArch, there exist PC relative relocation types
for rodata, it needs to calculate the symbol offset with "S + A - PC"
according to the spec of "ELF for the LoongArch Architecture".
If there is only one jump table in the rodata, the "PC" is the entry
address which is equal with the value of reloc_offset(reloc), at this
time, reloc_offset(table) is 0.
If there are many jump tables in the rodata, the "PC" is the offset
of the jump table's base address which is equal with the value of
reloc_offset(reloc) - reloc_offset(table).
So for LoongArch, if the relocation type is PC relative, it can use a
statement "reloc_offset(reloc) - reloc_offset(table)" to get the "PC"
value when calculating the symbol offset with "S + A - PC" for one or
many jump tables in the rodata.
Add an arch-specific function arch_jump_table_sym_offset() to assign
the symbol offset, for the most part that is an absolute relocation,
the default value is "reloc->sym->offset + reloc_addend(reloc)" in
the weak definition, it can be overridden by each architecture that
has different requirements.
Link: https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20250211115016.26913-4-yangtiezhu@loongson.cn
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
In the most cases, the entry size of rodata is 8 bytes because the
relocation type is 64 bit. There are also 32 bit relocation types,
the entry size of rodata should be 4 bytes in this case.
Add an arch-specific function arch_reloc_size() to assign the entry
size of rodata for x86, powerpc and LoongArch.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20250211115016.26913-3-yangtiezhu@loongson.cn
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
In the relocation section ".rela.rodata" of each .o file compiled with
LoongArch toolchain, there are various symbol types such as STT_NOTYPE,
STT_OBJECT, STT_FUNC in addition to the usual STT_SECTION, it needs to
use reloc symbol offset instead of reloc addend to find the destination
instruction in find_jump_table() and add_jump_table().
For the most part, an absolute relocation type is used for rodata. In the
case of STT_SECTION, reloc->sym->offset is always zero, and for the other
symbol types, reloc_addend(reloc) is always zero, thus it can use a simple
statement "reloc->sym->offset + reloc_addend(reloc)" to obtain the symbol
offset for various symbol types.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20250211115016.26913-2-yangtiezhu@loongson.cn
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
The check for using old libelf prints an error message when libelf.h is
not available but does not abort. This may confuse so hide the compiler
error message.
Signed-off-by: David Engraf <david.engraf@sysgo.com>
Link: https://lore.kernel.org/r/20250203073610.206000-1-david.engraf@sysgo.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
Pull KVM fixes from Paolo Bonzini:
"arm64:
- Fix a couple of bugs affecting pKVM's PSCI relay implementation
when running in the hVHE mode, resulting in the host being entered
with the MMU in an unknown state, and EL2 being in the wrong mode
x86:
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow
- Ensure DEBUGCTL is context switched on AMD to avoid running the
guest with the host's value, which can lead to unexpected bus lock
#DBs
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't
properly emulate BTF. KVM's lack of context switching has meant BTF
has always been broken to some extent
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as
the guest can enable DebugSwap without KVM's knowledge
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes
to RO memory" phase without actually generating a write-protection
fault
- Fix a printf() goof in the SEV smoke test that causes build
failures with -Werror
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when
PERFMON_V2 isn't supported by KVM"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM
KVM: selftests: Fix printf() format goof in SEV smoke test
KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3
KVM: SVM: Save host DR masks on CPUs with DebugSwap
KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()
KVM: arm64: Initialize HCR_EL2.E2H early
KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
KVM: x86: Snapshot the host's DEBUGCTL in common x86
KVM: SVM: Suppress DEBUGCTL.BTF on AMD
KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
KVM: selftests: Assert that STI blocking isn't set after event injection
KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow
|
|
into HEAD
KVM x86 fixes for 6.14-rcN #2
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.
- Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
the host's value, which can lead to unexpected bus lock #DBs.
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
emulate BTF. KVM's lack of context switching has meant BTF has always been
broken to some extent.
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
can enable DebugSwap without KVM's knowledge.
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
memory" phase without actually generating a write-protection fault.
- Fix a printf() goof in the SEV smoke test that causes build failures with
-Werror.
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
isn't supported by KVM.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"33 hotfixes. 24 are cc:stable and the remainder address post-6.13
issues or aren't considered necessary for -stable kernels.
26 are for MM and 7 are for non-MM.
- "mm: memory_failure: unmap poisoned folio during migrate properly"
from Ma Wupeng fixes a couple of two year old bugs involving the
migration of hwpoisoned folios.
- "selftests/damon: three fixes for false results" from SeongJae Park
fixes three one year old bugs in the SAMON selftest code.
The remainder are singletons and doubletons. Please see the individual
changelogs for details"
* tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits)
mm/page_alloc: fix uninitialized variable
rapidio: add check for rio_add_net() in rio_scan_alloc_net()
rapidio: fix an API misues when rio_add_net() fails
MAINTAINERS: .mailmap: update Sumit Garg's email address
Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
mm: fix finish_fault() handling for large folios
mm: don't skip arch_sync_kernel_mappings() in error paths
mm: shmem: remove unnecessary warning in shmem_writepage()
userfaultfd: fix PTE unmapping stack-allocated PTE copies
userfaultfd: do not block on locking a large folio with raised refcount
mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages
mm: shmem: fix potential data corruption during shmem swapin
mm: fix kernel BUG when userfaultfd_move encounters swapcache
selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
selftests/damon/damos_quota: make real expectation of quota exceeds
include/linux/log2.h: mark is_power_of_2() with __always_inline
NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
mm, swap: avoid BUG_ON in relocate_cluster()
mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix return address recovery of traced function in ftrace to ensure
reliable stack unwinding
- Fix compiler warnings and runtime crashes of vDSO selftests on s390
by introducing a dedicated GNU hash bucket pointer with correct
32-bit entry size
- Fix test_monitor_call() inline asm, which misses CC clobber, by
switching to an instruction that doesn't modify CC
* tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ftrace: Fix return address recovery of traced function
selftests/vDSO: Fix GNU hash table entry size for s390x
s390/traps: Fix test_monitor_call() inline assembly
|
|
with min/max boundaries
damon_nr_regions.py starts DAMON, periodically collect number of regions
in snapshots, and see if it is in the requested range. The check code
assumes the numbers are sorted on the collection list, but there is no
such guarantee. Hence this can result in false positive test success.
Sort the list before doing the check.
Link: https://lkml.kernel.org/r/20250225222333.505646-4-sj@kernel.org
Fixes: 781497347d1b ("selftests/damon: implement test for min/max_nr_regions")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
100ms
damon_nr_regions.py updates max_nr_regions to a number smaller than
expected number of real regions and confirms DAMON respect the harsh
limit. To give time for DAMON to make changes for the regions, 3
aggregation intervals (300 milliseconds) are given.
The internal mechanism works with not only the max_nr_regions, but also
sz_limit, though. It avoids merging region if that casn make region of
size larger than sz_limit. In the test, sz_limit is set too small to
achive the new max_nr_regions, unless it is updated for the new
min_nr_regions. But the update is done only once per operations set
update interval, which is one second by default.
Hence, the test randomly incurs false positive failures. Fix it by
setting the ops interval same to aggregation interval, to make sure
sz_limit is updated by the time of the check.
Link: https://lkml.kernel.org/r/20250225222333.505646-3-sj@kernel.org
Fixes: 8bf890c81612 ("selftests/damon/damon_nr_regions: test online-tuned max_nr_regions")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "selftests/damon: three fixes for false results".
Fix three DAMON selftest bugs that cause two and one false positive
failures and successes.
This patch (of 3):
damos_quota.py assumes the quota will always exceeded. But whether quota
will be exceeded or not depend on the monitoring results. Actually the
monitored workload has chaning access pattern and hence sometimes the
quota may not really be exceeded. As a result, false positive test
failures happen. Expect how much time the quota will be exceeded by
checking the monitoring results, and use it instead of the naive
assumption.
Link: https://lkml.kernel.org/r/20250225222333.505646-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250225222333.505646-2-sj@kernel.org
Fixes: 51f58c9da14b ("selftests/damon: add a test for DAMOS quota")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
further reduced
damos_quota_goal.py selftest see if DAMOS quota goals tuning feature
increases or reduces the effective size quota for given score as expected.
The tuning feature sets the minimum quota size as one byte, so if the
effective size quota is already one, we cannot expect it further be
reduced. However the test is not aware of the edge case, and fails since
it shown no expected change of the effective quota. Handle the case by
updating the failure logic for no change to see if it was the case, and
simply skips to next test input.
Link: https://lkml.kernel.org/r/20250217182304.45215-1-sj@kernel.org
Fixes: f1c07c0a1662 ("selftests/damon: add a test for DAMOS quota goal")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202502171423.b28a918d-lkp@intel.com
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org> [6.10.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit a5c6bc590094a1a73cf6fa3f505e1945d2bf2461.
The general approach described in commit e076eaca5906 ("selftests: break
the dependency upon local header files") was taken one step too far here:
it should not have been extended to include the syscall numbers. This is
because doing so would require per-arch support in tools/include/uapi, and
no such support exists.
This revert fixes two separate reports of test failures, from Dave
Hansen[1], and Li Wang[2]. An excerpt of Dave's report:
Before this commit (a5c6bc590094a1a73cf6fa3f505e1945d2bf2461) things are
fine. But after, I get:
running PKEY tests for unsupported CPU/OS
An excerpt of Li's report:
I just found that mlock2_() return a wrong value in mlock2-test
[1] https://lore.kernel.org/dc585017-6740-4cab-a536-b12b37a7582d@intel.com
[2] https://lore.kernel.org/CAEemH2eW=UMu9+turT2jRie7+6ewUazXmA6kL+VBo3cGDGU6RA@mail.gmail.com
Link: https://lkml.kernel.org/r/20250214033850.235171-1-jhubbard@nvidia.com
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Li Wang <liwang@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 14be4e6f3522 ("selftests: vDSO: fix ELF hash table entry size for s390x")
changed the type of the ELF hash table entries to 64bit on s390x.
However the *GNU* hash tables entries are always 32bit.
The "bucket" pointer is shared between both hash algorithms.
On s390, this caused the GNU hash algorithm to access its 32-bit entries as if they
were 64-bit, triggering compiler warnings (assignment between "Elf64_Xword *" and
"Elf64_Word *") and runtime crashes.
Introduce a new dedicated "gnu_bucket" pointer which is used by the GNU hash.
Fixes: e0746bde6f82 ("selftests/vDSO: support DT_GNU_HASH")
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250217-selftests-vdso-s390-gnu-hash-v2-1-f6c2532ffe2a@linutronix.de
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Print out the index of mismatching XSAVE bytes using unsigned decimal
format. Some versions of clang complain about trying to print an integer
as an unsigned char.
x86/sev_smoke_test.c:55:51: error: format specifies type 'unsigned char'
but the argument has type 'int' [-Werror,-Wformat]
Fixes: 8c53183dbaa2 ("selftests: kvm: add test for transferring FPU state into VMSA")
Link: https://lore.kernel.org/r/20250228233852.3855676-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
During the initial mprotect(RO) stage of mmu_stress_test, keep vCPUs
spinning until all vCPUs have hit -EFAULT, i.e. until all vCPUs have tried
to write to a read-only page. If a vCPU manages to complete an entire
iteration of the loop without hitting a read-only page, *and* the vCPU
observes mprotect_ro_done before starting a second iteration, then the
vCPU will prematurely fall through to GUEST_SYNC(3) (on x86 and arm64) and
get out of sequence.
Replace the "do-while (!r)" loop around the associated _vcpu_run() with
a single invocation, as barring a KVM bug, the vCPU is guaranteed to hit
-EFAULT, and retrying on success is super confusion, hides KVM bugs, and
complicates this fix. The do-while loop was semi-unintentionally added
specifically to fudge around a KVM x86 bug, and said bug is unhittable
without modifying the test to force x86 down the !(x86||arm64) path.
On x86, if forced emulation is enabled, vcpu_arch_put_guest() may trigger
emulation of the store to memory. Due a (very, very) longstanding bug in
KVM x86's emulator, emulate writes to guest memory that fail during
__kvm_write_guest_page() unconditionally return KVM_EXIT_MMIO. While that
is desirable in the !memslot case, it's wrong in this case as the failure
happens due to __copy_to_user() hitting a read-only page, not an emulated
MMIO region.
But as above, x86 only uses vcpu_arch_put_guest() if the __x86_64__ guards
are clobbered to force x86 down the common path, and of course the
unexpected MMIO is a KVM bug, i.e. *should* cause a test failure.
Fixes: b6c304aec648 ("KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)")
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/20250208105318.16861-1-yan.y.zhao@intel.com
Debugged-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20250228230804.3845860-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"Fix an objtool false positive, and objtool related build warnings that
happens on PIE-enabled architectures such as LoongArch"
* tag 'objtool-urgent-2025-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Add bch2_trans_unlocked_or_in_restart_error() to bcachefs noreturns
objtool: Fix C jump table annotations for Clang
vmlinux.lds: Ensure that const vars with relocations are mapped R/O
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix crash from bad histogram entry
An error path in the histogram creation could leave an entry in a
link list that gets freed. Then when a new entry is added it can
cause a u-a-f bug. This is fixed by restructuring the code so that
the histogram is consistent on failure and everything is cleaned up
appropriately.
- Fix fprobe self test
The fprobe self test relies on no function being attached by ftrace.
BPF programs can attach to functions via ftrace and systemd now does
so. This causes those functions to appear in the enabled_functions
list which holds all functions attached by ftrace. The selftest also
uses that file to see if functions are being connected correctly. It
counts the functions in the file, but if there's already functions in
the file, it fails. Instead, add the number of functions in the file
at the start of the test to all the calculations during the test.
- Fix potential division by zero of the function profiler stddev
The calculated divisor that calculates the standard deviation of the
function times can overflow. If the overflow happens to land on zero,
that can cause a division by zero. Check for zero from the
calculation before doing the division.
TODO: Catch when it ever overflows and report it accordingly. For
now, just prevent the system from crashing.
* tag 'trace-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Avoid potential division by zero in function_stat_show()
selftests/ftrace: Let fprobe test consider already enabled functions
tracing: Fix bad hist from corrupting named_triggers list
|
|
Add an L1 (guest) assert to the nested exceptions test to verify that KVM
doesn't put VMRUN in an STI shadow (AMD CPUs bleed the shadow into the
guest's int_state if a #VMEXIT occurs before VMRUN fully completes).
Add a similar assert to the VMX side as well, because why not.
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The fprobe test fails on Fedora 41 since the fprobe test assumption that
the number of enabled_functions is zero before the test starts is not
necessarily true. Some user space tools, like systemd, add BPF programs
that attach to functions. Those will show up in the enabled_functions table
and must be taken into account by the fprobe test.
Therefore count the number of lines of enabled_functions before tests
start, and use that as base when comparing expected results.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250226142703.910860-1-hca@linux.ibm.com
Fixes: e85c5e9792b9 ("selftests/ftrace: Update fprobe test to check enabled_functions file")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth.
We didn't get netfilter or wireless PRs this week, so next week's PR
is probably going to be bigger. A healthy dose of fixes for bugs
introduced in the current release nonetheless.
Current release - regressions:
- Bluetooth: always allow SCO packets for user channel
- af_unix: fix memory leak in unix_dgram_sendmsg()
- rxrpc:
- remove redundant peer->mtu_lock causing lockdep splats
- fix spinlock flavor issues with the peer record hash
- eth: iavf: fix circular lock dependency with netdev_lock
- net: use rtnl_net_dev_lock() in
register_netdevice_notifier_dev_net() RDMA driver register notifier
after the device
Current release - new code bugs:
- ethtool: fix ioctl confusing drivers about desired HDS user config
- eth: ixgbe: fix media cage present detection for E610 device
Previous releases - regressions:
- loopback: avoid sending IP packets without an Ethernet header
- mptcp: reset connection when MPTCP opts are dropped after join
Previous releases - always broken:
- net: better track kernel sockets lifetime
- ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels
- phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT
- eth: enetc: number of error handling fixes
- dsa: rtl8366rb: reshuffle the code to fix config / build issue with
LED support"
* tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
net: ti: icss-iep: Reject perout generation request
idpf: fix checksums set in idpf_rx_rsc()
selftests: drv-net: Check if combined-count exists
net: ipv6: fix dst ref loop on input in rpl lwt
net: ipv6: fix dst ref loop on input in seg6 lwt
usbnet: gl620a: fix endpoint checking in genelink_bind()
net/mlx5: IRQ, Fix null string in debug print
net/mlx5: Restore missing trace event when enabling vport QoS
net/mlx5: Fix vport QoS cleanup on error
net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination.
af_unix: Fix memory leak in unix_dgram_sendmsg()
net: Handle napi_schedule() calls from non-interrupt
net: Clear old fragment checksum value in napi_reuse_skb
gve: unlink old napi when stopping a queue using queue API
net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net().
tcp: Defer ts_recent changes until req is owned
net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs()
net: enetc: remove the mm_lock from the ENETC v4 driver
net: enetc: add missing enetc4_link_deinit()
net: enetc: update UDP checksum when updating originTimestamp field
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of fixes. The only slightly large change is for ASoC
Cirrus codec, but that's still in a normal range. All the rest are
small device-specific fixes and should be fairly safe to take"
* tag 'sound-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Fix microphone regression on ASUS N705UD
ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15
ASoC: cs35l56: Prevent races when soft-resetting using SPI control
firmware: cs_dsp: Remove async regmap writes
ASoC: Intel: sof_sdw: warn both sdw and pch dmic are used
ASoC: SOF: Intel: don't check number of sdw links when set dmic_fixup
ASoC: dapm-graph: set fill colour of turned on nodes
ASoC: fsl: Rename stream name of SAI DAI driver
ASoC: es8328: fix route from DAC to output
ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2
ASoC: tas2764: Set the SDOUT polarity correctly
ASoC: tas2764: Fix power control mask
ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports
ASoC: tas2770: Fix volume scale
|
|
Some drivers, like tg3, do not set combined-count:
$ ethtool -l enp4s0f1
Channel parameters for enp4s0f1:
Pre-set maximums:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Current hardware settings:
RX: 4
TX: 1
Other: n/a
Combined: n/a
In the case where combined-count is not set, the ethtool netlink code
in the kernel elides the value and the code in the test:
netnl.channels_get(...)
With a tg3 device, the returned dictionary looks like:
{'header': {'dev-index': 3, 'dev-name': 'enp4s0f1'},
'rx-max': 4,
'rx-count': 4,
'tx-max': 4,
'tx-count': 1}
Note that the key 'combined-count' is missing. As a result of this
missing key the test raises an exception:
# Exception| if channels['combined-count'] == 0:
# Exception| ~~~~~~~~^^^^^^^^^^^^^^^^^^
# Exception| KeyError: 'combined-count'
Change the test to check if 'combined-count' is a key in the dictionary
first and if not assume that this means the driver has separate RX and
TX queues.
With this change, the test now passes successfully on tg3 and mlx5
(which does have a 'combined-count').
Fixes: 1cf270424218 ("net: selftest: add test for netdev netlink queue-get API")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250226181957.212189-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"Fixes to TCP socket identification, documentation, and tests"
* tag 'landlock-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add binaries to .gitignore
selftests/landlock: Test that MPTCP actions are not restricted
selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP
landlock: Fix non-TCP sockets restriction
landlock: Minor typo and grammar fixes in IPC scoping documentation
landlock: Fix grammar error
selftests/landlock: Enable the new CONFIG_AF_UNIX_OOB
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix tools/ quiet build Makefile infrastructure that was broken when
working on tools/perf/ without testing on other tools/ living
utilities.
* tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
tools: Remove redundant quiet setup
tools: Unify top-level quiet infrastructure
|
|
Fix the following objtool warning during build time:
fs/bcachefs/btree_cache.o: warning: objtool: btree_node_lock.constprop.0() falls through to next function bch2_recalc_btree_reserve()
fs/bcachefs/btree_update.o: warning: objtool: bch2_trans_update_get_key_cache() falls through to next function need_whiteout_for_snapshot()
bch2_trans_unlocked_or_in_restart_error() is an Obviously Correct (tm)
panic() wrapper, add it to the list of known noreturns.
Fixes: b318882022a8 ("bcachefs: bch2_trans_verify_not_unlocked_or_in_restart()")
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20250218064230.219997-1-youling.tang@linux.dev
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
A C jump table (such as the one used by the BPF interpreter) is a const
global array of absolute code addresses, and this means that the actual
values in the table may not be known until the kernel is booted (e.g.,
when using KASLR or when the kernel VA space is sized dynamically).
When using PIE codegen, the compiler will default to placing such const
global objects in .data.rel.ro (which is annotated as writable), rather
than .rodata (which is annotated as read-only). As C jump tables are
explicitly emitted into .rodata, this used to result in warnings for
LoongArch builds (which uses PIE codegen for the entire kernel) like
Warning: setting incorrect section attributes for .rodata..c_jump_table
due to the fact that the explicitly specified .rodata section inherited
the read-write annotation that the compiler uses for such objects when
using PIE codegen.
This warning was suppressed by explicitly adding the read-only
annotation to the __attribute__((section(""))) string, by commit
c5b1184decc8 ("compiler.h: specify correct attribute for .rodata..c_jump_table")
Unfortunately, this hack does not work on Clang's integrated assembler,
which happily interprets the appended section type and permission
specifiers as part of the section name, which therefore no longer
matches the hard-coded pattern '.rodata..c_jump_table' that objtool
expects, causing it to emit a warning
kernel/bpf/core.o: warning: objtool: ___bpf_prog_run+0x20: sibling call from callable instruction with modified stack frame
Work around this, by emitting C jump tables into .data.rel.ro instead,
which is treated as .rodata by the linker script for all builds, not
just PIE based ones.
Fixes: c5b1184decc8 ("compiler.h: specify correct attribute for .rodata..c_jump_table")
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> # on LoongArch
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250221135704.431269-6-ardb+git@google.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for cacheinfo DT probing to avoid reading non-boolean
properties as booleans.
- A fix for cpufeature to use bitmap_equal() instead of memcmp(), so
unused bits are ignored.
- Fixes for cmpxchg and futex cmpxchg that properly encode the sign
extension requirements on inline asm, which results in spurious
successes. This manifests in at least inode_set_ctime_current, but is
likely just a disaster waiting to happen.
- A fix for the rseq selftests, which was using an invalid constraint.
- A pair of fixes for signal frame size handling:
- We were reserving space for an extra empty extension context
header on systems with extended signal context, thus resulting in
unnecessarily large allocations.
- We weren't properly checking for available extensions before
calculating the signal stack size, which resulted in undersized
stack allocations on some systems (at least those with T-Head
custom vectors).
Also, we've added Alex as a reviewer. He's been helping out a ton
lately, thanks!
* tag 'riscv-for-linus-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Add myself as a riscv reviewer
riscv: signal: fix signal_minsigstksz
riscv: signal: fix signal frame size
rseq/selftests: Fix riscv rseq_offset_deref_addv inline asm
riscv/futex: sign extend compare value in atomic cmpxchg
riscv/atomic: Do proper sign extension also for unsigned in arch_cmpxchg
riscv: cpufeature: use bitmap_equal() instead of memcmp()
riscv: cacheinfo: Use of_property_present() for non-boolean properties
|
|
Test XDP and HDS interaction. While at it add a test for using the IOCTL,
as that turned out to be the real culprit.
Testing bnxt:
# NETIF=eth0 ./ksft-net-drv/drivers/net/hds.py
KTAP version 1
1..12
ok 1 hds.get_hds
ok 2 hds.get_hds_thresh
ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by the device
ok 4 hds.set_hds_enable
ok 5 hds.set_hds_thresh_zero
ok 6 hds.set_hds_thresh_max
ok 7 hds.set_hds_thresh_gt
ok 8 hds.set_xdp
ok 9 hds.enabled_set_xdp
ok 10 hds.ioctl
ok 11 hds.ioctl_set_xdp
ok 12 hds.ioctl_enabled_set_xdp
# Totals: pass:11 fail:0 xfail:0 xpass:0 skip:1 error:0
and netdevsim:
# ./ksft-net-drv/drivers/net/hds.py
KTAP version 1
1..12
ok 1 hds.get_hds
ok 2 hds.get_hds_thresh
ok 3 hds.set_hds_disable
ok 4 hds.set_hds_enable
ok 5 hds.set_hds_thresh_zero
ok 6 hds.set_hds_thresh_max
ok 7 hds.set_hds_thresh_gt
ok 8 hds.set_xdp
ok 9 hds.enabled_set_xdp
ok 10 hds.ioctl
ok 11 hds.ioctl_set_xdp
ok 12 hds.ioctl_enabled_set_xdp
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
Netdevsim needs a sane default for tx/rx ring size.
ethtool 6.11 is needed for the --disable-netlink option.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250221025141.1132944-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some tools like KGraphViewer interpret the "ON" nodes not having an
explicitly set fill colour as them being entirely black, which obscures
the text on them and looks funny. In fact, I thought they were off for
the longest time. Comparing to the output of the `dot` tool, I assume
they are supposed to be white.
Instead of speclawyering over who's in the wrong and must immediately
atone for their wickedness at the altar of RFC2119, just be explicit
about it, set the fillcolor to white, and nobody gets confused.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://patch.msgid.link/20250221-dapm-graph-node-colour-v1-1-514ed0aa7069@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Function graph accounting fixes:
- Fix the manage ops hashes
The function graph registers a "manager ops" and "sub-ops" to
ftrace. The manager ops does not have any callback but calls the
sub-ops callbacks. The manage ops hashes (what is used to tell
ftrace what functions to attach to) is built on the sub-ops it
manages.
There was an error in the way it built the hash. An empty hash
means to attach to all functions. When the manager ops had one
sub-ops it properly copied its hash. But when the manager ops had
more than one sub-ops, it went into a loop to make a set of all
functions it needed to add to the hash. If any of the subops hashes
was empty, that would mean to attach to all functions. The error
was that the first iteration of the loop passed in an empty hash to
start with in order to add the other hashes. That starting hash was
mistaken as to attach to all functions. This made the manage ops
attach to all functions whenever it had two or more sub-ops, even
if each sub-op was attached to only a single function.
- Do not add duplicate entries to the manager ops hash
If two or more subops hashes trace the same function, an entry for
that function will be added to the manager ops for each subops.
This causes waste and extra overhead.
Fprobe accounting fixes:
- Remove last function from fprobe hash
Fprobes has a ftrace hash to manage which functions an fprobe is
attached to. It also has a counter of how many fprobes are
attached. When the last fprobe is removed, it unregisters the
fprobe from ftrace but does not remove the functions the last
fprobe was attached to from the hash. This leaves the old functions
attached. When a new fprobe is added, the fprobe infrastructure
attaches to not only the functions of the new fprobe, but also to
the functions of the last fprobe.
- Fix accounting of the fprobe counter
When a fprobe is added, it updates a counter. If the counter goes
from zero to one, it attaches its ops to ftrace. When an fprobe is
removed, the counter is decremented. If the counter goes from 1 to
zero, it removes the fprobes ops from ftrace.
There was an issue where if two fprobes trace the same function,
the addition of each fprobe would increment the counter. But when
removing the first of the fprobes, it would notice that another
fprobe is still attached to one of its functions no it does not
remove the functions from the ftrace ops.
But it also did not decrement the counter, so when the last fprobe
is removed, the counter is still one. This leaves the fprobes
callback still registered with ftrace and it being called by the
functions defined by the fprobes ops hash. Worse yet, because all
the functions from the fprobe ops hash have been removed, that
tells ftrace that it wants to trace all functions.
Thus, this puts the state of the system where every function is
calling the fprobe callback handler (which does nothing as there
are no registered fprobes), but this causes a good 13% slow down of
the entire system.
Other updates:
- Add a selftest to test the above issues to prevent regressions.
- Fix preempt count accounting in function tracing
Better recursion protection was added to function tracing which
added another layer of preempt disable. As the preempt_count gets
traced in the event, it needs to subtract the amount of preempt
disabling the tracer does to record what the preempt_count was when
the trace was triggered.
- Fix memory leak in output of set_event
A variable is passed by the seq_file functions in the location that
is set by the return of the next() function. The start() function
allocates it and the stop() function frees it. But when the last
item is found, the next() returns NULL which leaks the data that
was allocated in start(). The m->private is used for something
else, so have next() free the data when it returns NULL, as stop()
will then just receive NULL in that case"
* tag 'ftrace-v6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix memory leak when reading set_event file
ftrace: Correct preemption accounting for function tracing.
selftests/ftrace: Update fprobe test to check enabled_functions file
fprobe: Fix accounting of when to unregister from function graph
fprobe: Always unregister fgraph function from ops
ftrace: Do not add duplicate entries in subops manager ops
ftrace: Fix accounting of adding subops to a manager ops
|
|
A few bugs were found in the fprobe accounting logic along with it using
the function graph infrastructure. Update the fprobe selftest to catch
those bugs in case they or something similar shows up in the future.
The test now checks the enabled_functions file which shows all the
functions attached to ftrace or fgraph. When enabling a fprobe, make sure
that its corresponding function is also added to that file. Also add two
more fprobes to enable to make sure that the fprobe logic works properly
with multiple probes.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250220202055.733001756@goodmis.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pull BPF fixes from Daniel Borkmann:
- Fix a soft-lockup in BPF arena_map_free on 64k page size kernels
(Alan Maguire)
- Fix a missing allocation failure check in BPF verifier's
acquire_lock_state (Kumar Kartikeya Dwivedi)
- Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb
to the raw_tp_null_args set (Kuniyuki Iwashima)
- Fix a deadlock when freeing BPF cgroup storage (Abel Wu)
- Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex
(Andrii Nakryiko)
- Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is
accessing skb data not containing an Ethernet header (Shigeru
Yoshida)
- Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai)
- Several BPF sockmap fixes to address incorrect TCP copied_seq
calculations, which prevented correct data reads from recv(2) in user
space (Jiayuan Chen)
- Two fixes for BPF map lookup nullness elision (Daniel Xu)
- Fix a NULL-pointer dereference from vmlinux BTF lookup in
bpf_sk_storage_tracing_allowed (Jared Kangas)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests: bpf: test batch lookup on array of maps with holes
bpf: skip non exist keys in generic_map_lookup_batch
bpf: Handle allocation failure in acquire_lock_state
bpf: verifier: Disambiguate get_constant_map_key() errors
bpf: selftests: Test constant key extraction on irrelevant maps
bpf: verifier: Do not extract constant map keys for irrelevant maps
bpf: Fix softlockup in arena_map_free on 64k page kernel
net: Add rx_skb of kfree_skb to raw_tp_null_args[].
bpf: Fix deadlock when freeing cgroup storage
selftests/bpf: Add strparser test for bpf
selftests/bpf: Fix invalid flag of recv()
bpf: Disable non stream socket for strparser
bpf: Fix wrong copied_seq calculation
strparser: Add read_sock callback
bpf: avoid holding freeze_mutex during mmap operation
bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
selftests/bpf: Adjust data size to have ETH_HLEN
bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Smaller than usual with no fixes from any subtree.
Current release - regressions:
- core: fix race of rtnl_net_lock(dev_net(dev))
Previous releases - regressions:
- core: remove the single page frag cache for good
- flow_dissector: fix handling of mixed port and port-range keys
- sched: cls_api: fix error handling causing NULL dereference
- tcp:
- adjust rcvq_space after updating scaling ratio
- drop secpath at the same time as we currently drop dst
- eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl().
Previous releases - always broken:
- vsock:
- fix variables initialization during resuming
- for connectible sockets allow only connected
- eth:
- geneve: fix use-after-free in geneve_find_dev()
- ibmvnic: don't reference skb after sending to VIOS"
* tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
Revert "net: skb: introduce and use a single page frag cache"
net: allow small head cache usage with large MAX_SKB_FRAGS values
nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()
tcp: drop secpath at the same time as we currently drop dst
net: axienet: Set mac_managed_pm
arp: switch to dev_getbyhwaddr() in arp_req_set_public()
net: Add non-RCU dev_getbyhwaddr() helper
sctp: Fix undefined behavior in left shift operation
selftests/bpf: Add a specific dst port matching
flow_dissector: Fix port range key handling in BPF conversion
selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range
flow_dissector: Fix handling of mixed port and port-range keys
geneve: Suppress list corruption splat in geneve_destroy_tunnels().
gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().
dev: Use rtnl_net_dev_lock() in unregister_netdev().
net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
net: Add net_passive_inc() and net_passive_dec().
net: pse-pd: pd692x0: Fix power limit retrieval
MAINTAINERS: trim the GVE entry
gve: set xdp redirect target only when it is available
...
|
|
After this patch:
#102/1 flow_dissector_classification/ipv4:OK
#102/2 flow_dissector_classification/ipv4_continue_dissect:OK
#102/3 flow_dissector_classification/ipip:OK
#102/4 flow_dissector_classification/gre:OK
#102/5 flow_dissector_classification/port_range:OK
#102/6 flow_dissector_classification/ipv6:OK
#102 flow_dissector_classification:OK
Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250218043210.732959-5-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
port-range
After this patch:
# ./tc_flower_port_range.sh
TEST: Port range matching - IPv4 UDP [ OK ]
TEST: Port range matching - IPv4 TCP [ OK ]
TEST: Port range matching - IPv6 UDP [ OK ]
TEST: Port range matching - IPv6 TCP [ OK ]
TEST: Port range matching - IPv4 UDP Drop [ OK ]
Cc: Qiang Zhang <dtzq01@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250218043210.732959-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|