Age | Commit message (Collapse) | Author |
|
This commit adds support for BPF dynamic code modification on the
LoongArch architecture:
1. Add bpf_arch_text_copy() for instruction block copying.
2. Add bpf_arch_text_poke() for runtime instruction patching.
3. Add bpf_arch_text_invalidate() for code invalidation.
On LoongArch, since symbol addresses in the direct mapping region can't
be reached via relative jump instructions from the paged mapping region,
we use the move_imm+jirl instruction pair as absolute jump instructions.
These require 2-5 instructions, so we reserve 5 NOP instructions in the
program as placeholders for function jumps.
The larch_insn_text_copy() function is solely used for BPF. And the use
of larch_insn_text_copy() requires PAGE_SIZE alignment. Currently, only
the size of the BPF trampoline is page-aligned.
Co-developed-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
1. Rename the existing validate_code() to validate_ctx()
2. Factor out the code validation handling into a new helper
validate_code()
Then:
* validate_code() is used to check the validity of code.
* validate_ctx() is used to check both code validity and table entry
correctness.
The new validate_code() will be used in subsequent changes.
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Co-developed-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add larch_insn_gen_beq() and larch_insn_gen_bne() helpers which will be
used in BPF trampoline implementation.
Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com>
Co-developed-by: George Guo <guodongtai@kylinos.cn>
Signed-off-by: George Guo <guodongtai@kylinos.cn>
Co-developed-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.
Switch to the regular pointer formatting which is safer and easier to
reason about.
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
In init_cpu_fullname(), a constant pointer to "model" property is
retrieved. It's later modified by the strsep() function, which is
illegal and corrupts kernel's FDT copy. This is shown by dmesg,
OF: fdt: not creating '/sys/firmware/fdt': CRC check failed
Create a mutable copy of the model property and do in-place operations
on the mutable copy instead. loongson_sysconf.cpuname lives across the
kernel lifetime, thus manually releasing isn't necessary.
Also move the of_node_put() call for the root node after the usage of
its property, since of_node_put() decreases the reference counter thus
usage after the call is unsafe.
Cc: stable@vger.kernel.org
Fixes: 44a01f1f726a ("LoongArch: Parsing CPU-related information from DTS")
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Yao Zi <ziyao@disroot.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
The LoongArch mem= parameter parser was previously limited to the
mem=<size>@<start> format. This was inconvenient for the common use
case of simply capping the total system memory, as it forced users to
manually specify a start address. It was also inconsistent with the
behavior on other architectures.
This patch enhances the parser in early_parse_mem() to also support the
more user-friendly mem=<size> format. The implementation now checks for
the presence of the '@' symbol to determine the user's intent:
- If mem=<size> is provided (no '@'), the kernel now calls
memblock_enforce_memory_limit(). This trims memory from the top down
to the specified size.
- If mem=<size>@<start> is provided, the original behavior is retained
for backward compatibility. This allows for defining specific memory
banks.
This change introduces an important usage rule reflected in the code's
comments: the mem=<size> format should only be specified once on the
kernel command line. It acts as a single, global cap on total memory. In
contrast, the mem=<size>@<start> format can be specified multiple times
to define several distinct memory regions.
Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Now relocate_new_kernel_size is a .long value, which means 32bit, so its
high 32bit is undefined. This causes memcpy((void *)reboot_code_buffer,
relocate_new_kernel, relocate_new_kernel_size) in machine_kexec_prepare()
access out of range memories in some cases, and then end up with an ADE
exception.
So make relocate_new_kernel_size be a .quad value, which means 64bit, to
avoid such errors.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
According to the "LoongArch Reference Manual Volume 1: Basic
Architecture", the KSave registers (SAVE0-SAVE15) are defined in
Section 7.4.16 "Data Save (SAVE)" and listed in Table 7-1 "Control
and Status Registers Overview". These registers occupy the CSR
addresses from 0x30 to 0x3F, with 16 registers in total.
This patch completes the definitions of KS9 to KS15, so as to match
the architecture specification.
Reviewed-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
LoongArch architecture changes for 6.17 have many bpf features such as
trampoline, so merge 'bpf-next-6.17' to create a base to make bpf work
well.
|
|
'bpf-show-precise-rejected-function-when-attaching-to-__noreturn-and-deny-list-functions'
KaFai Wan says:
====================
bpf: Show precise rejected function when attaching to __noreturn and deny list functions
Show precise rejected function when attaching fexit/fmod_ret to __noreturn
functions.
Add log for attaching tracing programs to functions in deny list.
Add selftest for attaching tracing programs to functions in deny list.
Migrate fexit_noreturns case into tracing_failure test suite.
changes:
v4:
- change tracing_deny case attaching function (Yonghong Song)
- add Acked-by: Yafang Shao and Yonghong Song
v3:
- add tracing_deny case into existing files (Alexei)
- migrate fexit_noreturns into tracing_failure
- change SOB
https://lore.kernel.org/bpf/20250722153434.20571-1-kafai.wan@linux.dev/
v2:
- change verifier log message (Alexei)
- add missing Suggested-by
https://lore.kernel.org/bpf/20250714120408.1627128-1-mannkafai@gmail.com/
v1:
https://lore.kernel.org/all/20250710162717.3808020-1-mannkafai@gmail.com/
---
====================
Link: https://patch.msgid.link/20250724151454.499040-1-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Delete fexit_noreturns.c files and migrate the cases into
tracing_failure.c files.
The result:
$ tools/testing/selftests/bpf/test_progs -t tracing_failure/fexit_noreturns
#467/4 tracing_failure/fexit_noreturns:OK
#467 tracing_failure:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-5-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
deny list
The result:
$ tools/testing/selftests/bpf/test_progs -t tracing_failure/tracing_deny
#468/3 tracing_failure/tracing_deny:OK
#468 tracing_failure:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-4-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Show the rejected function name when attaching tracing programs to
functions in deny list.
With this change, we know why tracing programs can't attach to functions
like __rcu_read_lock() from log.
$ ./fentry
libbpf: prog '__rcu_read_lock': BPF program load failed: -EINVAL
libbpf: prog '__rcu_read_lock': -- BEGIN PROG LOAD LOG --
Attaching tracing programs to function '__rcu_read_lock' is rejected.
Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
__noreturn functions
With this change, we know the precise rejected function name when
attaching fexit/fmod_ret to __noreturn functions from log.
$ ./fexit
libbpf: prog 'fexit': BPF program load failed: -EINVAL
libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG --
Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected.
Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch fixes several minor typos in comments within the BPF verifier.
No changes in functionality.
Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Link: https://lore.kernel.org/r/20250727081754.15986-1-suchitkarunakaran@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Paul Chaignon says:
====================
bpf: Improve 64bits bounds refinement
This patchset improves the 64bits bounds refinement when the s64 ranges
crosses the sign boundary. The first patch explains the small addition
to __reg64_deduce_bounds. The last one explains why we need a third
round of __reg_deduce_bounds. The third patch adds a selftest with a
more complete example of the impact on verification. The second and
fourth patches update the existing selftests to take the new refinement
into account.
This patchset should reduce the number of kernel warnings hit by
syzkaller due to invariant violations [1]. It was also tested with
Agni [2] (and Cilium's CI for good measure).
Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1]
Link: https://github.com/bpfverif/agni [2]
Changes in v4:
- Fixed outdated test comment, noticed by Eduard.
- Rebased.
Changes in v3:
- Added a 5th patch to call __reg_deduce_bounds a third time in
reg_bounds_sync following tests from Eduard.
- Fixed broken indentations in the first patch.
Changes in v2 (all on Eduard's suggestions):
- Added two tests to ensure we cover all cases of u64/s64 overlap.
- Improved tests to check deduced ranges with __msg.
- Improved code comments.
====================
Link: https://patch.msgid.link/cover.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit d7f008738171 ("bpf: try harder to deduce register bounds from
different numeric domains") added a second call to __reg_deduce_bounds
in reg_bounds_sync because a single call wasn't enough to converge to a
fixed point in terms of register bounds.
With patch "bpf: Improve bounds when s64 crosses sign boundary" from
this series, Eduard noticed that calling __reg_deduce_bounds twice isn't
enough anymore to converge. The first selftest added in "selftests/bpf:
Test cross-sign 64bits range refinement" highlights the need for a third
call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync
performs the following bounds deduction:
reg_bounds_sync entry: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
__update_reg_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
__reg_deduce_bounds:
__reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg64_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_deduce_mixed_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
__reg_deduce_bounds:
__reg32_deduce_bounds: scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
__reg64_deduce_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_deduce_mixed_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
__reg_bound_offset: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
__update_reg_bounds: scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
In particular, notice how:
1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds
learns new u32 bounds.
2. __reg64_deduce_bounds is unable to improve bounds at this point.
3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds.
4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds
improves the smax and umin bounds thanks to patch "bpf: Improve
bounds when s64 crosses sign boundary" from this series.
5. Subsequent functions are unable to improve the ranges further (only
tnums). Yet, a better smin32 bound could be learned from the smin
bound.
__reg32_deduce_bounds is able to improve smin32 from smin, but for that
we need a third call to __reg_deduce_bounds.
As discussed in [1], there may be a better way to organize the deduction
rules to learn the same information with less calls to the same
functions. Such an optimization requires further analysis and is
orthogonal to the present patchset.
Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The improvement of the u64/s64 range refinement fixed the invariant
violation that was happening on this test for BPF_JSLT when crossing the
sign boundary.
After this patch, we have one test remaining with a known invariant
violation. It's the same test as fixed here but for 32 bits ranges.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ad046fb0016428f1a33c3b81617aabf31b51183f.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds coverage for the new cross-sign 64bits range refinement
logic. The three tests cover the cases when the u64 and s64 ranges
overlap (1) in the negative portion of s64, (2) in the positive portion
of s64, and (3) in both portions.
The first test is a simplified version of a BPF program generated by
syzkaller that caused an invariant violation [1]. It looks like
syzkaller could not extract the reproducer itself (and therefore didn't
report it to the mailing list), but I was able to extract it from the
console logs of a crash.
The principle is similar to the invariant violation described in
commit 6279846b9b25 ("bpf: Forget ranges when refining tnum after
JSET"): the verifier walks a dead branch, uses the condition to refine
ranges, and ends up with inconsistent ranges. In this case, the dead
branch is when we fallthrough on both jumps. The new refinement logic
improves the bounds such that the second jump is properly detected as
always-taken and the verifier doesn't end up walking a dead branch.
The second and third tests are inspired by the first, but rely on
condition jumps to prepare the bounds instead of ALU instructions. An
R10 write is used to trigger a verifier error when the bounds can't be
refined.
Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/a0e17b00dab8dabcfa6f8384e7e151186efedfdd.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch updates the range refinement logic in the reg_bound test to
match the new logic from the previous commit. Without this change, tests
would fail because we end with more precise ranges than the tests
expect.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/b7f6b1fbe03373cca4e1bb6a113035a6cd2b3ff7.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
__reg64_deduce_bounds currently improves the s64 range using the u64
range and vice versa, but only if it doesn't cross the sign boundary.
This patch improves __reg64_deduce_bounds to cover the case where the
s64 range crosses the sign boundary but overlaps with the u64 range on
only one end. In that case, we can improve both ranges. Consider the
following example, with the s64 range crossing the sign boundary:
0 U64_MAX
| [xxxxxxxxxxxxxx u64 range xxxxxxxxxxxxxx] |
|----------------------------|----------------------------|
|xxxxx s64 range xxxxxxxxx] [xxxxxxx|
0 S64_MAX S64_MIN -1
The u64 range overlaps only with positive portion of the s64 range. We
can thus derive the following new s64 and u64 ranges.
0 U64_MAX
| [xxxxxx u64 range xxxxx] |
|----------------------------|----------------------------|
| [xxxxxx s64 range xxxxx] |
0 S64_MAX S64_MIN -1
The same logic can probably apply to the s32/u32 ranges, but this patch
doesn't implement that change.
In addition to the selftests, the __reg64_deduce_bounds change was
also tested with Agni, the formal verification tool for the range
analysis [1].
Link: https://github.com/bpfverif/agni [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/933bd9ce1f36ded5559f92fdc09e5dbc823fa245.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
During the bounds refinement, we improve the precision of various ranges
by looking at other ranges. Among others, we improve the following in
this order (other things happen between 1 and 2):
1. Improve u32 from s32 in __reg32_deduce_bounds.
2. Improve s/u64 from u32 in __reg_deduce_mixed_bounds.
3. Improve s/u64 from s32 in __reg_deduce_mixed_bounds.
In particular, if the s32 range forms a valid u32 range, we will use it
to improve the u32 range in __reg32_deduce_bounds. In
__reg_deduce_mixed_bounds, under the same condition, we will use the s32
range to improve the s/u64 ranges.
If at (1) we were able to learn from s32 to improve u32, we'll then be
able to use that in (2) to improve s/u64. Hence, as (3) happens under
the same precondition as (1), it won't improve s/u64 ranges further than
(1)+(2) did. Thus, we can get rid of (3).
In addition to the extensive suite of selftests for bounds refinement,
this patch was also tested with the Agni formal verification tool [1].
Additionally, Eduard mentioned:
The argument appears to be as follows:
Under precondition `(u32)reg->s32_min <= (u32)reg->s32_max`
__reg32_deduce_bounds produces:
reg->u32_min = max_t(u32, reg->s32_min, reg->u32_min);
reg->u32_max = min_t(u32, reg->s32_max, reg->u32_max);
And then first part of __reg_deduce_mixed_bounds assigns:
a. reg->umin umax= (reg->umin & ~0xffffffffULL) | max_t(u32, reg->s32_min, reg->u32_min);
b. reg->umax umin= (reg->umax & ~0xffffffffULL) | min_t(u32, reg->s32_max, reg->u32_max);
And then second part of __reg_deduce_mixed_bounds assigns:
c. reg->umin umax= (reg->umin & ~0xffffffffULL) | (u32)reg->s32_min;
d. reg->umax umin= (reg->umax & ~0xffffffffULL) | (u32)reg->s32_max;
But assignment (c) is a noop because:
max_t(u32, reg->s32_min, reg->u32_min) >= (u32)reg->s32_min
Hence RHS(a) >= RHS(c) and umin= does nothing.
Also assignment (d) is a noop because:
min_t(u32, reg->s32_max, reg->u32_max) <= (u32)reg->s32_max
Hence RHS(b) <= RHS(d) and umin= does nothing.
Plus the same reasoning for the part dealing with reg->s{min,max}_value:
e. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) | max_t(u32, reg->s32_min_value, reg->u32_min_value);
f. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) | min_t(u32, reg->s32_max_value, reg->u32_max_value);
vs
g. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) | (u32)reg->s32_min_value;
h. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) | (u32)reg->s32_max_value;
RHS(e) >= RHS(g) and RHS(f) <= RHS(h), hence smax=,smin= do nothing.
This appears to be correct.
Also, Shung-Hsi:
Beside going through the reasoning, I also played with CBMC a bit to
double check that as far as a single run of __reg_deduce_bounds() is
concerned (and that the register state matches certain handwavy
expectations), the change indeed still preserve the original behavior.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://github.com/bpfverif/agni [1]
Link: https://lore.kernel.org/bpf/aIJwnFnFyUjNsCNa@mail.gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for the PTP systemcounter mechanism:
The rework of this mechanism added a 'use_nsec' member to struct
system_counterval. get_device_system_crosststamp() instantiates that
struct on the stack and hands a pointer to the driver callback.
Only the drivers which set use_nsec to true, initialize that field,
but all others ignore it. As get_device_system_crosststamp() does not
initialize the struct, the use_nsec field contains random stack
content in those cases. That causes a miscalulation usually resulting
in a failing range check in the best case.
Initialize the structure before handing it to the drivers to cure
that"
* tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Zero initialize system_counterval when querying time from phc drivers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One last fix for v6.16, removing some hard coding to avoid data
corruption on some NAND devices in the QPIC driver"
* tag 'spi-fix-v6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-qpic-snand: don't hardcode ECC steps
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- qup: avoid potential hang when waiting for bus idle
- tegra: improve ACPI reset error handling
- virtio: use interruptible wait to prevent hang during transfer
* tag 'i2c-for-6.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: qup: jump out of the loop in case of timeout
i2c: virtio: Avoid hang by using interruptible completion wait
i2c: tegra: Fix reset error handling with ACPI
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A few Allwinner clk driver fixes:
- Mark Allwinner A523 MBUS clock as critical to avoid
system stalls
- Fix names of CSI related clocks on Allwinner V3s. This
includes changes to the driver, DT bindings and DT files.
- Fix parents of TCON clock on Allwinner V3s"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: v3s: Fix TCON clock parents
clk: sunxi-ng: v3s: Fix CSI1 MCLK clock name
clk: sunxi-ng: v3s: Fix CSI SCLK clock name
clk: sunxi-ng: a523: Mark MBUS clock as critical
|
|
As arm64 JIT now supports private stack, make sure all relevant tests
run on arm64 architecture.
Relevant tests:
#415/1 struct_ops_private_stack/private_stack:OK
#415/2 struct_ops_private_stack/private_stack_fail:OK
#415/3 struct_ops_private_stack/private_stack_recur:OK
#415 struct_ops_private_stack:OK
#549/1 verifier_private_stack/Private stack, single prog:OK
#549/2 verifier_private_stack/Private stack, subtree > MAX_BPF_STACK:OK
#549/3 verifier_private_stack/No private stack:OK
#549/4 verifier_private_stack/Private stack, callback:OK
#549/5 verifier_private_stack/Private stack, exception in mainprog:OK
#549/6 verifier_private_stack/Private stack, exception in subprog:OK
#549/7 verifier_private_stack/Private stack, async callback, not nested:OK
#549/8 verifier_private_stack/Private stack, async callback, potential nesting:OK
#549 verifier_private_stack:OK
Summary: 2/11 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250724120257.7299-4-puranjay@kernel.org
|
|
The private stack is allocated in bpf_int_jit_compile() with 16-byte
alignment. It includes additional guard regions to detect stack
overflows and underflows at runtime.
Memory layout:
+------------------------------------------------------+
| |
| 16 bytes padding (overflow guard - stack top) |
| [ detects writes beyond top of stack ] |
BPF FP ->+------------------------------------------------------+
| |
| BPF private stack (sized by verifier) |
| [ 16-byte aligned ] |
| |
BPF PRIV SP ->+------------------------------------------------------+
| |
| 16 bytes padding (underflow guard - stack bottom) |
| [ detects accesses before start of stack ] |
| |
+------------------------------------------------------+
On detection of an overflow or underflow, the kernel emits messages
like:
BPF private stack overflow/underflow detected for prog <prog_name>
After commit bd737fcb6485 ("bpf, arm64: Get rid of fpb"), Jited BPF
programs use the stack in two ways:
1. Via the BPF frame pointer (top of stack), using negative offsets.
2. Via the stack pointer (bottom of stack), using positive offsets in
LDR/STR instructions.
When a private stack is used, ARM64 callee-saved register x27 replaces
the stack pointer. The BPF frame pointer usage remains unchanged; but
it now points to the top of the private stack.
Relevant tests (Enabled in following patch):
#415/1 struct_ops_private_stack/private_stack:OK
#415/2 struct_ops_private_stack/private_stack_fail:OK
#415/3 struct_ops_private_stack/private_stack_recur:OK
#415 struct_ops_private_stack:OK
#549/1 verifier_private_stack/Private stack, single prog:OK
#549/2 verifier_private_stack/Private stack, subtree > MAX_BPF_STACK:OK
#549/3 verifier_private_stack/No private stack:OK
#549/4 verifier_private_stack/Private stack, callback:OK
#549/5 verifier_private_stack/Private stack, exception in main prog:OK
#549/6 verifier_private_stack/Private stack, exception in subprog:OK
#549/7 verifier_private_stack/Private stack, async callback, not nested:OK
#549/8 verifier_private_stack/Private stack, async callback, potential nesting:OK
#549 verifier_private_stack:OK
Summary: 2/11 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250724120257.7299-3-puranjay@kernel.org
|
|
bpf_jit_get_prog_name() will be used by all JITs when enabling support
for private stack. This function is currently implemented in the x86
JIT.
Move the function to core.c so that other JITs can easily use it in
their implementation of private stack.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250724120257.7299-2-puranjay@kernel.org
|
|
In the ARM64 BPF JIT when prog->aux->exception_boundary is set for a BPF
program, find_used_callee_regs() is not called because for a program
acting as exception boundary, all callee saved registers are saved.
find_used_callee_regs() sets `ctx->fp_used = true;` when it sees FP
being used in any of the instructions.
For programs acting as exception boundary, ctx->fp_used remains false
even if frame pointer is used by the program and therefore, FP is not
set-up for such programs in the prologue. This can cause the kernel to
crash due to a pagefault.
Fix it by setting ctx->fp_used = true for exception boundary programs as
fp is always saved in such programs.
Fixes: 5d4fa9ec5643 ("bpf, arm64: Avoid blindly saving/restoring all callee-saved registers")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/20250722133410.54161-2-puranjay@kernel.org
|
|
The code is unused since 98e20e5e13d2 ("bpfilter: remove bpfilter"),
therefore remove it.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-2-0d0083334382@linutronix.de
|
|
The usermode driver framework is not used anymore by the BPF
preload code.
Fixes: cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20250721-remove-usermode-driver-v1-1-0d0083334382@linutronix.de
|
|
Pull ARM fixes from Russell King:
- use an absolute path for asm/unified.h in KBUILD_AFLAGS to solve a
regression caused by commit d5c8d6e0fa61 ("kbuild: Update assembler
calls to use proper flags and language target")
- fix dead code elimination binutils version check again
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36
ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"These are two fixes that came in late, one addresses a regression on a
rockchips based board, the other is for ensuring a consistent dt
binding for a device added in 6.16 before the incorrect one makes it
into a release"
* tag 'soc-fixes-6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: rockchip: Drop netdev led-triggers on NanoPi R5S
arm64: dts: allwinner: a523: Rename emac0 to gmac0
|
|
Yonghong Song says:
====================
selftests/bpf: Fix a few dynptr test failures with 64K page size
There are a few dynptr test failures with arm64 64K page size.
They are fixed in this patch set and please see individual patches
for details.
====================
Link: https://patch.msgid.link/20250725043425.208128-1-yonghong.song@linux.dev
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
For arm64 64K page size, the xdp data size was set to be more than 64K
in one of previous patches. This will cause failure for bpf_dynptr_memset().
Since the failure of bpf_dynptr_memset() is expected with 64K page size,
return success.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250725043440.209266-1-yonghong.song@linux.dev
|
|
For arm64 64K page size, the bpf_dynptr_copy() in test dynptr/test_dynptr_copy_xdp
will succeed, but the test will failure with 4K page size. This patch made a change
so the test will fail expectedly for both 4K and 64K page sizes.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://patch.msgid.link/20250725043435.208974-1-yonghong.song@linux.dev
|
|
With arm64 64K page size, the following 4 subtests failed:
#97/25 dynptr/test_probe_read_user_dynptr:FAIL
#97/26 dynptr/test_probe_read_kernel_dynptr:FAIL
#97/27 dynptr/test_probe_read_user_str_dynptr:FAIL
#97/28 dynptr/test_probe_read_kernel_str_dynptr:FAIL
These failures are due to function bpf_dynptr_check_off_len() in
include/linux/bpf.h where there is a test
if (len > size || offset > size - len)
return -E2BIG;
With 64K page size, the 'offset' is greater than 'size - len',
which caused the test failure.
For 64KB page size, this patch increased the xdp buffer size from 5000 to
90000. The above 4 test failures are fixed as 'size' value is increased.
But it introduced two new failures:
#97/4 dynptr/test_dynptr_copy_xdp:FAIL
#97/12 dynptr/test_dynptr_memset_xdp_chunks:FAIL
These two failures will be addressed in subsequent patches.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://patch.msgid.link/20250725043430.208469-1-yonghong.song@linux.dev
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.16-rc8
qup: avoid potential hang when waiting for bus idle
tegra: improve ACPI reset error handling
virtio: use interruptible wait to prevent hang during transfer
|
|
Pull drm fixes (part 2) from Dave Airlie:
"Just the follow up fixes for i915 and xe, all pretty minor.
i915:
- Fix DP 2.7 Gbps DP_LINK_BW value on g4x
- Fix return value on intel_atomic_commit_fence_wait
xe:
- Fix build without debugfs"
* tag 'drm-fixes-2025-07-26' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe: Fix build without debugfs
drm/i915/display: Fix dma_fence_wait_timeout() return value handling
drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x
|
|
Pull block fix from Jens Axboe:
"Just a single fix for regression in this release, where a module
reference could be leaked"
* tag 'block-6.16-20250725' of git://git.kernel.dk/linux:
block: fix module reference leak in mq-deadline I/O scheduler
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"Two last-minute fixes for this cycle:
- Set afs vllist to NULL if addr parsing fails
- Add a missing check for reaching the end of the string in afs"
* tag 'vfs-6.16-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Set vllist to NULL if addr parsing fails
afs: Fix check for NULL terminator
|
|
Pull bcachefs fixes from Kent Overstreet:
"User reported fixes:
- Fix btree node scan on encrypted filesystems by not using btree
node header fields encrypted
- Fix a race in btree write buffer flush; this caused EROs primarily
during fsck for some people"
* tag 'bcachefs-2025-07-24' of git://evilpiepirate.org/bcachefs:
bcachefs: Add missing snapshots_seen_add_inorder()
bcachefs: Fix write buffer flushing from open journal entry
bcachefs: btree_node_scan: don't re-read before initializing found_btree_node
|
|
Commit e7607f7d6d81 ("ARM: 9443/1: Require linker to support KEEP within
OVERLAY for DCE") accidentally broke the binutils version restriction
that was added in commit 0d437918fb64 ("ARM: 9414/1: Fix build issue
with LD_DEAD_CODE_DATA_ELIMINATION"), reintroducing the segmentation
fault addressed by that workaround.
Restore the binutils version dependency by using
CONFIG_LD_CAN_USE_KEEP_IN_OVERLAY as an additional condition to ensure
that CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION is only enabled with
binutils >= 2.36 and ld.lld >= 21.0.0.
Closes: https://lore.kernel.org/6739da7d-e555-407a-b5cb-e5681da71056@landley.net/
Closes: https://lore.kernel.org/CAFERDQ0zPoya5ZQfpbeuKVZEo_fKsonLf6tJbp32QnSGAtbi+Q@mail.gmail.com/
Cc: stable@vger.kernel.org
Fixes: e7607f7d6d81 ("ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE")
Reported-by: Rob Landley <rob@landley.net>
Tested-by: Rob Landley <rob@landley.net>
Reported-by: Martin Wetterwald <martin@wetterwald.eu>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper
flags and language target"), which updated as-instr to use the
'assembler-with-cpp' language option, the Kbuild version of as-instr
always fails internally for arch/arm with
<command-line>: fatal error: asm/unified.h: No such file or directory
compilation terminated.
because '-include' flags are now taken into account by the compiler
driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not
found.
This went unnoticed at the time of the Kbuild change because the last
use of as-instr in Kbuild that arch/arm could reach was removed in 5.7
by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a
stable backport of the Kbuild change to before that point exposed this
potential issue if one were to be reintroduced.
Follow the general pattern of '-include' paths throughout the tree and
make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can
be used independently.
Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg@mail.gmail.com/
Cc: stable@vger.kernel.org
Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target")
Reported-by: KernelCI bot <bot@kernelci.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
This fixes an infinite loop when repairing "extent past end of inode",
when the extent is an older snapshot than the inode that needs repair.
Without the snaphsots_seen_add_inorder() we keep trying to delete the
same extent, even though it's no longer visible in the inode's snapshot.
Fixes: 63d6e9311999 ("bcachefs: bch2_fpunch_snapshot()")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When flushing the btree write buffer, we pull write buffer keys directly
from the journal instead of letting the journal write path copy them to
the write buffer.
When flushing from the currently open journal buffer, we have to block
new reservations and wait for outstanding reservations to complete.
Recheck the reservation state after blocking new reservations:
previously, we were checking the reservation count from before calling
__journal_block().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"11 hotfixes. 9 are cc:stable and the remainder address post-6.15
issues or aren't considered necessary for -stable kernels.
7 are for MM"
* tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
sprintf.h requires stdarg.h
resource: fix false warning in __request_region()
mm/damon/core: commit damos_quota_goal->nid
kasan: use vmalloc_dump_obj() for vmalloc error reports
mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show()
mm: update MAINTAINERS entry for HMM
nilfs2: reject invalid file types when reading inodes
selftests/mm: fix split_huge_page_test for folio_split() tests
mailmap: add entry for Senozhatsky
mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix build without debugfs (Lucas)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aIKWC2RPlbRxZc5o@fedora
|