Age | Commit message (Collapse) | Author |
|
This changes the semantics of BPF_NOSPEC (previously a v4-only barrier)
to always emit a speculation barrier that works against both Spectre v1
AND v4. If mitigation is not needed on an architecture, the backend
should set bpf_jit_bypass_spec_v4/v1().
As of now, this commit only has the user-visible implication that unpriv
BPF's performance on PowerPC is reduced. This is the case because we
have to emit additional v1 barrier instructions for BPF_NOSPEC now.
This commit is required for a future commit to allow us to rely on
BPF_NOSPEC for Spectre v1 mitigation. As of this commit, the feature
that nospec acts as a v1 barrier is unused.
Commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for
mitigating Spectre v4") noted that mitigation instructions for v1 and v4
might be different on some archs. While this would potentially offer
improved performance on PowerPC, it was dismissed after the following
considerations:
* Only having one barrier simplifies the verifier and allows us to
easily rely on v4-induced barriers for reducing the complexity of
v1-induced speculative path verification.
* For the architectures that implemented BPF_NOSPEC, only PowerPC has
distinct instructions for v1 and v4. Even there, some insns may be
shared between the barriers for v1 and v4 (e.g., 'ori 31,31,0' and
'sync'). If this is still found to impact performance in an
unacceptable way, BPF_NOSPEC can be split into BPF_NOSPEC_V1 and
BPF_NOSPEC_V4 later. As an optimization, we can already skip v1/v4
insns from being emitted for PowerPC with this setup if
bypass_spec_v1/v4 is set.
Vulnerability-status for BPF_NOSPEC-based Spectre mitigations (v4 as of
this commit, v1 in the future) is therefore:
* x86 (32-bit and 64-bit), ARM64, and PowerPC (64-bit): Mitigated - This
patch implements BPF_NOSPEC for these architectures. The previous
v4-only version was supported since commit f5e81d111750 ("bpf:
Introduce BPF nospec instruction for mitigating Spectre v4") and
commit b7540d625094 ("powerpc/bpf: Emit stf barrier instruction
sequences for BPF_NOSPEC").
* LoongArch: Not Vulnerable - Commit a6f6a95f2580 ("LoongArch, bpf: Fix
jit to skip speculation barrier opcode") is the only other past commit
related to BPF_NOSPEC and indicates that the insn is not required
there.
* MIPS: Vulnerable (if unprivileged BPF is enabled) -
Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation
barrier opcode") indicates that it is not vulnerable, but this
contradicts the kernel and Debian documentation. Therefore, I assume
that there exist vulnerable MIPS CPUs (but maybe not from Loongson?).
In the future, BPF_NOSPEC could be implemented for MIPS based on the
GCC speculation_barrier [1]. For now, we rely on unprivileged BPF
being disabled by default.
* Other: Unknown - To the best of my knowledge there is no definitive
information available that indicates that any other arch is
vulnerable. They are therefore left untouched (BPF_NOSPEC is not
implemented, but bypass_spec_v1/v4 is also not set).
I did the following testing to ensure the insn encoding is correct:
* ARM64:
* 'dsb nsh; isb' was successfully tested with the BPF CI in [2]
* 'sb' locally using QEMU v7.2.15 -cpu max (emitted sb insn is
executed for example with './test_progs -t verifier_array_access')
* PowerPC: The following configs were tested locally with ppc64le QEMU
v8.2 '-machine pseries -cpu POWER9':
* STF_BARRIER_EIEIO + CONFIG_PPC_BOOK32_64
* STF_BARRIER_SYNC_ORI (forced on) + CONFIG_PPC_BOOK32_64
* STF_BARRIER_FALLBACK (forced on) + CONFIG_PPC_BOOK32_64
* CONFIG_PPC_E500 (forced on) + STF_BARRIER_EIEIO
* CONFIG_PPC_E500 (forced on) + STF_BARRIER_SYNC_ORI (forced on)
* CONFIG_PPC_E500 (forced on) + STF_BARRIER_FALLBACK (forced on)
* CONFIG_PPC_E500 (forced on) + STF_BARRIER_NONE (forced on)
Most of those cobinations should not occur in practice, but I was not
able to get an PPC e6500 rootfs (for testing PPC_E500 without forcing
it on). In any case, this should ensure that there are no unexpected
conflicts between the insns when combined like this. Individual v1/v4
barriers were already emitted elsewhere.
Hari's ack is for the PowerPC changes only.
[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=29b74545531f6afbee9fc38c267524326dbfbedf
("MIPS: Add speculation_barrier support")
[2] https://github.com/kernel-patches/bpf/pull/8576
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603211703.337860-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to
skip analysis/patching for the respective vulnerability. For v4, this
will reduce the number of barriers the verifier inserts. For v1, it
allows more programs to be accepted.
The primary motivation for this is to not regress unpriv BPF's
performance on ARM64 in a future commit where BPF_NOSPEC is also used
against Spectre v1.
This has the user-visible change that v1-induced rejections on
non-vulnerable PowerPC CPUs are avoided.
For now, this does not change the semantics of BPF_NOSPEC. It is still a
v4-only barrier and must not be implemented if bypass_spec_v4 is always
true for the arch. Changing it to a v1 AND v4-barrier is done in a
future commit.
As an alternative to bypass_spec_v1/v4, one could introduce NOSPEC_V1
AND NOSPEC_V4 instructions and allow backends to skip their lowering as
suggested by commit f5e81d111750 ("bpf: Introduce BPF nospec instruction
for mitigating Spectre v4"). Adding bpf_jit_bypass_spec_v1/v4() was
found to be preferable for the following reason:
* bypass_spec_v1/v4 benefits non-vulnerable CPUs: Always performing the
same analysis (not taking into account whether the current CPU is
vulnerable), needlessly restricts users of CPUs that are not
vulnerable. The only use case for this would be portability-testing,
but this can later be added easily when needed by allowing users to
force bypass_spec_v1/v4 to false.
* Portability is still acceptable: Directly disabling the analysis
instead of skipping the lowering of BPF_NOSPEC(_V1/V4) might allow
programs on non-vulnerable CPUs to be accepted while the program will
be rejected on vulnerable CPUs. With the fallback to speculation
barriers for Spectre v1 implemented in a future commit, this will only
affect programs that do variable stack-accesses or are very complex.
For PowerPC, the SEC_FTR checking in bpf_jit_bypass_spec_v4() is based
on the check that was previously located in the BPF_NOSPEC case.
For LoongArch, it would likely be safe to set both
bpf_jit_bypass_spec_v1() and _v4() according to
commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation
barrier opcode"). This is omitted here as I am unable to do any testing
for LoongArch.
Hari's ack concerns the PowerPC part only.
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603211318.337474-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This prevents us from trying to recover from these on speculative paths
in the future.
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603205800.334980-4-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Mark these cases as non-recoverable to later prevent them from being
caught when they occur during speculative path verification.
Eduard writes [1]:
The only pace I'm aware of that might act upon specific error code
from verifier syscall is libbpf. Looking through libbpf code, it seems
that this change does not interfere with libbpf.
[1] https://lore.kernel.org/all/785b4531ce3b44a84059a4feb4ba458c68fce719.camel@gmail.com/
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603205800.334980-3-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This is required to catch the errors later and fall back to a nospec if
on a speculative path.
Eliminate the regs variable as it is only used once and insn_idx is not
modified in-between the definition and usage.
Do not pass insn but compute it in the function itself. As Eduard points
out [1], insn is assumed to correspond to env->insn_idx in many places
(e.g, __check_reg_arg()).
Move code into do_check_insn(), replace
* "continue" with "return 0" after modifying insn_idx
* "goto process_bpf_exit" with "return PROCESS_BPF_EXIT"
* "goto process_bpf_exit_full" with "return process_bpf_exit_full()"
* "do_print_state = " with "*do_print_state = "
[1] https://lore.kernel.org/all/293dbe3950a782b8eb3b87b71d7a967e120191fd.camel@gmail.com/
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603205800.334980-2-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- a couple of fixes for out of bounds issues in memtrace and vas
Thanks to Ritesh Harjani (IBM), Haren Myneni, and Jonathan Greental
* tag 'powerpc-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/vas: Return -EINVAL if the offset is non-zero in mmap()
powerpc/powernv/memtrace: Fix out of bounds issue in memtrace mmap
|
|
The user space calls mmap() to map VAS window paste address
and the kernel returns the complete mapped page for each
window. So return -EINVAL if non-zero is passed for offset
parameter to mmap().
See Documentation/arch/powerpc/vas-api.rst for mmap()
restrictions.
Co-developed-by: Jonathan Greental <yonatan02greental@gmail.com>
Signed-off-by: Jonathan Greental <yonatan02greental@gmail.com>
Reported-by: Jonathan Greental <yonatan02greental@gmail.com>
Fixes: dda44eb29c23 ("powerpc/vas: Add VAS user space API")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250610021227.361980-2-maddy@linux.ibm.com
|
|
memtrace mmap issue has an out of bounds issue. This patch fixes the by
checking that the requested mapping region size should stay within the
allocated region size.
Reported-by: Jonathan Greental <yonatan02greental@gmail.com>
Fixes: 08a022ad3dfa ("powerpc/powernv/memtrace: Allow mmaping trace buffers")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250610021227.361980-1-maddy@linux.ibm.com
|
|
When a host is configured with a few LUNs and I/O is running, injecting
FC faults repeatedly leads to path recovery problems. The LUNs have 4
paths each and 3 of them come back active after say an FC fault which
makes 2 of the paths go down, instead of all 4. This happens after
several iterations of continuous FC faults.
Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC
ACCESS STATE TRANSITION) instead of retrying.
Signed-off-by: Rajashekhar M A <rajs@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20250606135924.27397-1-hare@kernel.org
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Currently storvsc_timeout is only used in storvsc_sdev_configure(), and
5s and 10s are used elsewhere. It turns out that rarely the 5s is not
enough on Azure, so let's use storvsc_timeout everywhere.
In case a timeout happens and storvsc_channel_init() returns an error,
close the VMBus channel so that any host-to-guest messages in the
channel's ringbuffer, which might come late, can be safely ignored.
Add a "const" to storvsc_timeout.
Cc: stable@kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1749243459-10419-1-git-send-email-decui@microsoft.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add cookie in fdinfo for raw_tp, the info as follows:
link_type: raw_tracepoint
link_id: 31
prog_tag: 9dfdf8ef453843bf
prog_id: 32
tp_name: sys_enter
cookie: 23925373020405760
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-5-chen.dylane@linux.dev
|
|
Add cookie in fdinfo for tracing, the info as follows:
link_type: tracing
link_id: 6
prog_tag: 9dfdf8ef453843bf
prog_id: 35
attach_type: 25
target_obj_id: 1
target_btf_id: 60355
cookie: 9007199254740992
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-4-chen.dylane@linux.dev
|
|
Display cookie for tracing link probe, in plain mode:
#bpftool link
5: tracing prog 34
prog_type tracing attach_type trace_fentry
target_obj_id 1 target_btf_id 60355
cookie 4503599627370496
pids test_progs(176)
And in json mode:
#bpftool link -j | jq
{
"id": 5,
"type": "tracing",
"prog_id": 34,
"prog_type": "tracing",
"attach_type": "trace_fentry",
"target_obj_id": 1,
"target_btf_id": 60355,
"cookie": 4503599627370496,
"pids": [
{
"pid": 176,
"comm": "test_progs"
}
]
}
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-3-chen.dylane@linux.dev
|
|
Adding tests for getting cookie with fill_link_info for tracing.
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-2-chen.dylane@linux.dev
|
|
bpf_tramp_link includes cookie info, we can add it in bpf_link_info.
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-1-chen.dylane@linux.dev
|
|
If we have a newer dtb than kernel, we could end up in a situation where
the GPU device is present in the dtb, but not in the drivers device
table. We don't want this to prevent the display from probing. So
check that we recognize the GPU before adding the GPU component.
v2: use %pOF
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/657701/
|
|
Ihor Solodrai says:
====================
bpf: make reg_not_null() true for CONST_PTR_TO_MAP
Handle CONST_PTR_TO_MAP null checks in the BPF verifier. Add
appropriate test cases.
v3->v4: more test cases
v2->v3: change constant in unpriv test
v1->v2: add a test case with ringbufs
v3: https://lore.kernel.org/bpf/20250604222729.3351946-1-isolodrai@meta.com/
v2: https://lore.kernel.org/bpf/20250604003759.1020745-1-isolodrai@meta.com/
v1: https://lore.kernel.org/bpf/20250523232503.1086319-1-isolodrai@meta.com/
====================
Link: https://patch.msgid.link/20250609183024.359974-1-isolodrai@meta.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
A test requires the following to happen:
* CONST_PTR_TO_MAP value is checked for null
* the code in the null branch fails verification
Add test cases:
* direct global map_ptr comparison to null
* lookup inner map, then two checks (the first transforms
map_value_or_null into map_ptr)
* lookup inner map, spill-fill it, then check for null
* use an array of ringbufs to recreate a common coding pattern [1]
[1] https://lore.kernel.org/bpf/CAEf4BzZNU0gX_sQ8k8JaLe1e+Veth3Rk=4x7MDhv=hQxvO8EDw@mail.gmail.com/
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-4-isolodrai@meta.com
|
|
Add a test for CONST_PTR_TO_MAP comparison with a non-0 constant. A
BPF program with this code must not pass verification in unpriv.
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-3-isolodrai@meta.com
|
|
When reg->type is CONST_PTR_TO_MAP, it can not be null. However the
verifier explores the branches under rX == 0 in check_cond_jmp_op()
even if reg->type is CONST_PTR_TO_MAP, because it was not checked for
in reg_not_null().
Fix this by adding CONST_PTR_TO_MAP to the set of types that are
considered non nullable in reg_not_null().
An old "unpriv: cmp map pointer with zero" selftest fails with this
change, because now early out correctly triggers in
check_cond_jmp_op(), making the verification to pass.
In practice verifier may allow pointer to null comparison in unpriv,
since in many cases the relevant branch and comparison op are removed
as dead code. So change the expected test result to __success_unpriv.
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-2-isolodrai@meta.com
|
|
After commit 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") add
perf_event info, we can also show the info with the method of cat /proc/[fd]/fdinfo.
kprobe fdinfo:
link_type: perf
link_id: 10
prog_tag: bcf7977d3b93787c
prog_id: 20
name: bpf_fentry_test1
offset: 0x0
missed: 0
addr: 0xffffffffa28a2904
event_type: kprobe
cookie: 3735928559
uprobe fdinfo:
link_type: perf
link_id: 13
prog_tag: bcf7977d3b93787c
prog_id: 21
name: /proc/self/exe
offset: 0x63dce4
ref_ctr_offset: 0x33eee2a
event_type: uprobe
cookie: 3735928559
tracepoint fdinfo:
link_type: perf
link_id: 11
prog_tag: bcf7977d3b93787c
prog_id: 22
tp_name: sched_switch
event_type: tracepoint
cookie: 3735928559
perf_event fdinfo:
link_type: perf
link_id: 12
prog_tag: bcf7977d3b93787c
prog_id: 23
type: 1
config: 2
event_type: event
cookie: 3735928559
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606150258.3385166-1-chen.dylane@linux.dev
|
|
Yonghong Song says:
====================
bpf: Implement mprog API on top of existing cgroup progs
Current cgroup prog ordering is appending at attachment time. This is not
ideal. In some cases, users want specific ordering at a particular cgroup
level. For example, in Meta, we have a case where three different
applications all have cgroup/setsockopt progs and they require specific
ordering. Current approach is to use a bpfchainer where one bpf prog
contains multiple global functions and each global function can be
freplaced by a prog for a specific application. The ordering of global
functions decides the ordering of those application specific bpf progs.
Using bpfchainer is a centralized approach and is not desirable as
one of applications acts as a daemon. The decentralized attachment
approach is more favorable for those applications.
To address this, the existing mprog API ([2]) seems an ideal solution with
supporting BPF_F_BEFORE and BPF_F_AFTER flags on top of existing cgroup
bpf implementation. More specifically, the support is added for prog/link
attachment with BPF_F_BEFORE and BPF_F_AFTER. The kernel mprog
interface ([2]) is not used and the implementation is directly done in
cgroup bpf code base. The mprog 'revision' is also implemented in
attach/detach/replace, so users can query revision number to check the
change of cgroup prog list.
The patch set contains 5 patches. Patch 1 adds revision support for
cgroup bpf progs. Patch 2 implements mprog API implementation for
prog/link attach and revision update. Patch 3 adds a new libbpf
API to do cgroup link attach with flags like BPF_F_BEFORE/BPF_F_AFTER.
Patches 4 and 5 add two tests to validate the implementation.
[1] https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev
[2] https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net
Changelogs:
v4 -> v5:
- v4: https://lore.kernel.org/bpf/20250530173812.1823479-1-yonghong.song@linux.dev/
- Remove early prog/link checking based flags and id_or_fd as later code
will do checking as well.
- Do proper cgroup flag checking for bpf_prog_attach().
v3 -> v4:
- v3: https://lore.kernel.org/bpf/20250517162720.4077882-1-yonghong.song@linux.dev/
- Refactor some to make BPF_F_BEFORE/BPF_F_AFTER handling easier to understand.
- Perviously, I degraded 'link' to 'prog' for later mprog handling. This is
not correct. Similar to mprog.c, we should be check 'link' instead link->prog
since it is possible two different links may have the same underlying prog and
we do not want to miss supporting such use case.
v2 -> v3:
- v2: https://lore.kernel.org/bpf/20250508223524.487875-1-yonghong.song@linux.dev/
- Big change to replace get_anchor_prog() to get_prog_list() so the
'struct bpf_prog_list *' is returned directly.
- Support 'BPF_F_BEFORE | BPF_F_AFTER' attachment if the prog list is empty
and flags do not have 'BPF_F_LINK | BPF_F_ID' and id_or_fd is 0.
- Add BPF_F_LINK support.
- Patch 4 is added to reuse id_from_prog_fd() and id_from_link_fd().
v1 -> v2:
- v1: https://lore.kernel.org/bpf/20250411011523.1838771-1-yonghong.song@linux.dev/
- Change cgroup_bpf.revisions from atomic64_t to u64.
- Added missing bpf_prog_put in various places.
- Rename get_cmp_prog() to get_anchor_prog(). The implementation tries to
find the anchor prog regardless of whether id_or_fd is non-NULL or not.
- Rename bpf_cgroup_prog_attached() to is_cgroup_prog_type() and handle
BPF_PROG_TYPE_LSM properly (with BPF_LSM_CGROUP attach type).
- I kept 'id || id_or_fd' condition as the condition 'id' is also used
in mprog.c so I assume it is okay in cgroup.c as well.
====================
Link: https://patch.msgid.link/20250606163131.2428225-1-yonghong.song@linux.dev
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Two tests are added:
- cgroup_mprog_opts, which mimics tc_opts.c ([1]). Both prog and link
attach are tested. Some negative tests are also included.
- cgroup_mprog_ordering, which actually runs the program with some mprog
API flags.
[1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/tc_opts.c
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163156.2429955-1-yonghong.song@linux.dev
|
|
Move static inline functions id_from_prog_fd() and id_from_link_fd()
from prog_tests/tc_helpers.h to test_progs.h so these two functions
can be reused for later cgroup mprog selftests.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163151.2429325-1-yonghong.song@linux.dev
|
|
Currently libbpf supports bpf_program__attach_cgroup() with signature:
LIBBPF_API struct bpf_link *
bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd);
To support mprog style attachment, additionsl fields like flags,
relative_{fd,id} and expected_revision are needed.
Add a new API:
LIBBPF_API struct bpf_link *
bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd,
const struct bpf_cgroup_opts *opts);
where bpf_cgroup_opts contains all above needed fields.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163146.2429212-1-yonghong.song@linux.dev
|
|
Current cgroup prog ordering is appending at attachment time. This is not
ideal. In some cases, users want specific ordering at a particular cgroup
level. To address this, the existing mprog API seems an ideal solution with
supporting BPF_F_BEFORE and BPF_F_AFTER flags.
But there are a few obstacles to directly use kernel mprog interface.
Currently cgroup bpf progs already support prog attach/detach/replace
and link-based attach/detach/replace. For example, in struct
bpf_prog_array_item, the cgroup_storage field needs to be together
with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
as the member, which makes it difficult to use kernel mprog interface.
In another case, the current cgroup prog detach tries to use the
same flag as in attach. This is different from mprog kernel interface
which uses flags passed from user space.
So to avoid modifying existing behavior, I made the following changes to
support mprog API for cgroup progs:
- The support is for prog list at cgroup level. Cross-level prog list
(a.k.a. effective prog list) is not supported.
- Previously, BPF_F_PREORDER is supported only for prog attach, now
BPF_F_PREORDER is also supported by link-based attach.
- For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported
similar to kernel mprog but with different implementation.
- For detach and replace, use the existing implementation.
- For attach, detach and replace, the revision for a particular prog
list, associated with a particular attach type, will be updated
by increasing count by 1.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163141.2428937-1-yonghong.song@linux.dev
|
|
One of key items in mprog API is revision for prog list. The revision
number will be increased if the prog list changed, e.g., attach, detach
or replace.
Add 'revisions' field to struct cgroup_bpf, representing revisions for
all cgroup related attachment types. The initial revision value is
set to 1, the same as kernel mprog implementations.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163136.2428732-1-yonghong.song@linux.dev
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- MGMT: Fix UAF on mgmt_remove_adv_monitor_complete
- MGMT: Protect mgmt_pending list with its own lock
- hci_core: fix list_for_each_entry_rcu usage
- btintel_pcie: Increase the tx and rx descriptor count
- btintel_pcie: Reduce driver buffer posting to prevent race condition
- btintel_pcie: Fix driver not posting maximum rx buffers
* tag 'for-net-2025-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Protect mgmt_pending list with its own lock
Bluetooth: MGMT: Fix UAF on mgmt_remove_adv_monitor_complete
Bluetooth: btintel_pcie: Reduce driver buffer posting to prevent race condition
Bluetooth: btintel_pcie: Increase the tx and rx descriptor count
Bluetooth: btintel_pcie: Fix driver not posting maximum rx buffers
Bluetooth: hci_core: fix list_for_each_entry_rcu usage
====================
Link: https://patch.msgid.link/20250605191136.904411-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SFQ has an assumption of always being able to queue at least one packet.
However, after the blamed commit, sch->q.len can be inflated by packets
in sch->gso_skb, and an enqueue() on an empty SFQ qdisc can be followed
by an immediate drop.
Fix sfq_drop() to properly clear q->tail in this situation.
Tested:
ip netns add lb
ip link add dev to-lb type veth peer name in-lb netns lb
ethtool -K to-lb tso off # force qdisc to requeue gso_skb
ip netns exec lb ethtool -K in-lb gro on # enable NAPI
ip link set dev to-lb up
ip -netns lb link set dev in-lb up
ip addr add dev to-lb 192.168.20.1/24
ip -netns lb addr add dev in-lb 192.168.20.2/24
tc qdisc replace dev to-lb root sfq limit 100
ip netns exec lb netserver
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
netperf -H 192.168.20.2 -l 100 &
Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Closes: https://lore.kernel.org/netdev/9da42688-bfaa-4364-8797-e9271f3bdaef@hetzner-cloud.de/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250606165127.3629486-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If SMB 3.1.1 POSIX Extensions are available and negotiated, the client
should be able to use all characters and not remap anything. Currently, the
user has to explicitly request this behavior by specifying the "nomapposix"
mount option.
Link: https://lore.kernel.org/4195bb677b33d680e77549890a4f4dd3b474ceaf.camel@rx2.rx-server.de
Signed-off-by: Philipp Kerling <pkerling@casix.org>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We are going to want to re-use this before the component is bound, when
we don't yet have the device pointer (but we do have the of node).
v2: use %pOF
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/657705/
|
|
To better match add_gpu_components().
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/657700/
|
|
The files generated by gen_header.py capture the source path to the
input files and the date. While that can be informative, it varies
based on where and when the kernel was built as the full path is
captured.
Since all of the files that this tool is run on is under the drivers
directory, this modifies the application to strip all of the path before
drivers. Additionally it prints <stripped> instead of the date.
Signed-off-by: Ryan Eatmon <reatmon@ti.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
Signed-off-by: Viswanath Kraleti <viswanath.kraleti@oss.qualcomm.com>
Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/655599/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Calling this packet is necessary when we switch contexts because there
are various pieces of state used by userspace to synchronize between BR
and BV that are persistent across submits and we need to make sure that
they are in a "safe" state when switching contexts. Otherwise a
userspace submission in one context could cause another context to
function incorrectly and hang, effectively a denial of service (although
without leaking data). This was missed during initial a7xx bringup.
Fixes: af66706accdf ("drm/msm/a6xx: Add skeleton A7xx support")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/654924/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Based on kgsl.
Fixes: af66706accdf ("drm/msm/a6xx: Add skeleton A7xx support")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/654922/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Pull in remaining fixes from queue branch.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Improve the usability of the unit_add sysfs attribute by ensuring that
the associated FCP LUN scan processing is completed synchronously. This
enables configuration tooling to consistently determine the end of the
scan process to allow for serialization of follow-on actions.
While the scan process associated with unit_add typically completes
synchronously, it is deferred to an asynchronous background process if
unit_add is used before initial remote port scanning has completed. This
occurs when unit_add is used immediately after setting the associated FCP
device online.
To ensure synchronous unit_add processing, wait for remote port scanning
to complete before initiating the FCP LUN scan.
Cc: stable@vger.kernel.org
Reviewed-by: M Nikhil <nikh1092@linux.ibm.com>
Reviewed-by: Nihar Panda <niharp@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Nihar Panda <niharp@linux.ibm.com>
Link: https://lore.kernel.org/r/20250603182252.2287285-2-niharp@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Correct the error handling goto labels used when host lookup fails in
various flashnode-related event handlers:
- iscsi_new_flashnode()
- iscsi_del_flashnode()
- iscsi_login_flashnode()
- iscsi_logout_flashnode()
- iscsi_logout_flashnode_sid()
scsi_host_put() is not required when shost is NULL, so jumping to the
correct label avoids unnecessary operations. These functions previously
jumped to the wrong goto label (put_host), which did not match the
intended cleanup logic.
Use the correct exit labels (exit_new_fnode, exit_del_fnode, etc.) to
ensure proper error handling. Also remove the unused put_host label
under iscsi_new_flashnode() as it is no longer needed.
No functional changes beyond accurate error path correction.
Fixes: c6a4bb2ef596 ("[SCSI] scsi_transport_iscsi: Add flash node mgmt support")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://lore.kernel.org/r/20250530193012.3312911-1-alok.a.tiwari@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Spelling fixes:
Deocder --> Decoder
Memroy --> Memory
This is a non-functional change aimed at improving code clarity.
Signed-off-by: Ankit Chauhan <ankitchauhan2065@gmail.com>
Link: https://lore.kernel.org/r/20250528110604.59528-1-ankitchauhan2065@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When things go wrong, the GPU is capable of quickly generating millions
of faulting translation requests per second. When that happens, in the
stall-on-fault model each access will stall until it wins the race to
signal the fault and then the RESUME register is written. This slows
processing page faults to a crawl as the GPU can generate faults much
faster than the CPU can acknowledge them. It also means that all
available resources in the SMMU are saturated waiting for the stalled
transactions, so that other transactions such as transactions generated
by the GMU, which shares translation resources with the GPU, cannot
proceed. This causes a GMU watchdog timeout, which leads to a failed
reset because GX cannot collapse when there is a transaction pending and
a permanently hung GPU.
On older platforms with qcom,smmu-v2, it seems that when one transaction
is stalled subsequent faulting transactions are terminated, which avoids
this problem, but the MMU-500 follows the spec here.
To work around these problems, disable stall-on-fault as soon as we get a
page fault until a cooldown period after pagefaults stop. This allows
the GMU some guaranteed time to continue working. We only use
stall-on-fault to halt the GPU while we collect a devcoredump and we
always terminate the transaction afterward, so it's fine to miss some
subsequent page faults. We also keep it disabled so long as the current
devcoredump hasn't been deleted, because in that case we likely won't
capture another one if there's a fault.
After this commit HFI messages still occasionally time out, because the
crashdump handler doesn't run fast enough to let the GMU resume, but the
driver seems to recover from it. This will probably go away after the
HFI timeout is increased.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/654891/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Unused since the previous commit.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/654890/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Now that we use a threaded IRQ, it should be safe to do this in the
fault handler.
We can also remove fault_info from struct msm_gpu and just pass it
directly.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/654889/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
put_unused_fd() doesn't free the installed file, if we've already done
fd_install(). So we need to also free the sync_file.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/653583/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
In error paths, we could unref the submit without calling
drm_sched_entity_push_job(), so msm_job_free() will never get
called. Since drm_sched_job_cleanup() will NULL out the
s_fence, we can use that to detect this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/653584/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Merge series from Félix Piédallu <felix.piedallu@non.se.com>:
These patches fix the behaviour of the SPI Chip Select of the OMAP2 MCSPI
driver used on TI SoCs.
The omap2-mcspi driver supports the use of multi mode (multichannel in TI
documentation). In this mode, the CS is asserted and deasserted by the
hardware.
The multi mode is disabled for messages when cs_change=0 for all transfers
(e.g when CS is kept asserted between transfers of a same message).
The multi mode also needs to be disabled for messages when cs_change=1 on the
last transfer (e.g when CS is kept asserted after the WHOLE message), and the
message right after.
Currently, that is not the case and it CS is deasserted by hardware when it
shouldn't.
This breaks peripheral drivers that send multiple messages with the CS asserted
in between.
Patch 1 ensures that multi mode is disabled when cs_change=1 on the last
transfer of the message.
Patch 2 ensures that multi mode is disable on a message following one with
cs_change=1 on the last transfer.
This is the case for the TPM TIS SPI driver that uses this logic for flow
control purposes.
Tested on an AM6442 platform with a TPM ST33HTPH2X32AHE4.
|
|
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with uapi headers that
rather should use __ASSEMBLER__ instead. So let's standardize on
the __ASSEMBLER__ macro that is provided by the compilers now.
This is a completely mechanical patch (done with a simple "sed -i"
statement).
Cc: linux-snps-arc@lists.infradead.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
|
|
__ASSEMBLY__ is only defined by the Makefile of the kernel, so
this is not really useful for uapi headers (unless the userspace
Makefile defines it, too). Let's switch to __ASSEMBLER__ which
gets set automatically by the compiler when compiling assembly
code.
Cc: linux-snps-arc@lists.infradead.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
|
|
The custom swap function used in sort() was identical to the default
built-in sort swap. Remove the custom swap function and passes NULL to
sort(), allowing it to use the default swap function.
This change reduces code size and improves performance, particularly when
CONFIG_MITIGATION_RETPOLINE is enabled. With RETPOLINE mitigation, indirect
function calls incur significant overhead, and using the default swap
function avoids this cost.
$ ./scripts/bloat-o-meter ./unwind.o.old ./unwind.o.new
add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-22 (-22)
Function old new delta
init_unwind_hdr.constprop 544 540 -4
swap_eh_frame_hdr_table_entries 18 - -18
Total: Before=4410, After=4388, chg -0.50%
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
|
|
The core atomic code has a number of macros where it elaborates
architecture primitives into more functions. ARC uses
arch_atomic64_cmpxchg() as it's architecture primitive which disable alot
of the additional functions.
Instead provide arch_cmpxchg64_relaxed() as the primitive and rely on the
core macros to create arch_cmpxchg64().
The macros will also provide other functions, for instance,
try_cmpxchg64_release(), giving a more complete implementation.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/Z0747n5bSep4_1VX@J2N7QTR9R3
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
|
|
Improve the installation procedure for the systemd service unit
'cpupower.service', to be more flexible. Some distros install libraries
to /usr/lib64/, but systemd service units have to be installed to
/usr/lib/systemd/system: as a consequence, the installation procedure
should not assume that systemd service units can be installed to
${libdir}/systemd/system ...
Define a dedicated variable ("unitdir") in the Makefile.
Link: https://lore.kernel.org/linux-pm/260b6d79-ab61-43b7-a0eb-813e257bc028@leemhuis.info/T/#m0601940ab439d5cbd288819d2af190ce59e810e6
Fixes: 9c70b779ad91 ("cpupower: add a systemd service to run cpupower")
Link: https://lore.kernel.org/r/20250521211656.65646-1-invernomuto@paranoici.org
Signed-off-by: Francesco Poli (wintermute) <invernomuto@paranoici.org>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|