summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-09bpf, arm64, powerpc: Change nospec to include v1 barrierLuis Gerhorst
This changes the semantics of BPF_NOSPEC (previously a v4-only barrier) to always emit a speculation barrier that works against both Spectre v1 AND v4. If mitigation is not needed on an architecture, the backend should set bpf_jit_bypass_spec_v4/v1(). As of now, this commit only has the user-visible implication that unpriv BPF's performance on PowerPC is reduced. This is the case because we have to emit additional v1 barrier instructions for BPF_NOSPEC now. This commit is required for a future commit to allow us to rely on BPF_NOSPEC for Spectre v1 mitigation. As of this commit, the feature that nospec acts as a v1 barrier is unused. Commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4") noted that mitigation instructions for v1 and v4 might be different on some archs. While this would potentially offer improved performance on PowerPC, it was dismissed after the following considerations: * Only having one barrier simplifies the verifier and allows us to easily rely on v4-induced barriers for reducing the complexity of v1-induced speculative path verification. * For the architectures that implemented BPF_NOSPEC, only PowerPC has distinct instructions for v1 and v4. Even there, some insns may be shared between the barriers for v1 and v4 (e.g., 'ori 31,31,0' and 'sync'). If this is still found to impact performance in an unacceptable way, BPF_NOSPEC can be split into BPF_NOSPEC_V1 and BPF_NOSPEC_V4 later. As an optimization, we can already skip v1/v4 insns from being emitted for PowerPC with this setup if bypass_spec_v1/v4 is set. Vulnerability-status for BPF_NOSPEC-based Spectre mitigations (v4 as of this commit, v1 in the future) is therefore: * x86 (32-bit and 64-bit), ARM64, and PowerPC (64-bit): Mitigated - This patch implements BPF_NOSPEC for these architectures. The previous v4-only version was supported since commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4") and commit b7540d625094 ("powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC"). * LoongArch: Not Vulnerable - Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode") is the only other past commit related to BPF_NOSPEC and indicates that the insn is not required there. * MIPS: Vulnerable (if unprivileged BPF is enabled) - Commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode") indicates that it is not vulnerable, but this contradicts the kernel and Debian documentation. Therefore, I assume that there exist vulnerable MIPS CPUs (but maybe not from Loongson?). In the future, BPF_NOSPEC could be implemented for MIPS based on the GCC speculation_barrier [1]. For now, we rely on unprivileged BPF being disabled by default. * Other: Unknown - To the best of my knowledge there is no definitive information available that indicates that any other arch is vulnerable. They are therefore left untouched (BPF_NOSPEC is not implemented, but bypass_spec_v1/v4 is also not set). I did the following testing to ensure the insn encoding is correct: * ARM64: * 'dsb nsh; isb' was successfully tested with the BPF CI in [2] * 'sb' locally using QEMU v7.2.15 -cpu max (emitted sb insn is executed for example with './test_progs -t verifier_array_access') * PowerPC: The following configs were tested locally with ppc64le QEMU v8.2 '-machine pseries -cpu POWER9': * STF_BARRIER_EIEIO + CONFIG_PPC_BOOK32_64 * STF_BARRIER_SYNC_ORI (forced on) + CONFIG_PPC_BOOK32_64 * STF_BARRIER_FALLBACK (forced on) + CONFIG_PPC_BOOK32_64 * CONFIG_PPC_E500 (forced on) + STF_BARRIER_EIEIO * CONFIG_PPC_E500 (forced on) + STF_BARRIER_SYNC_ORI (forced on) * CONFIG_PPC_E500 (forced on) + STF_BARRIER_FALLBACK (forced on) * CONFIG_PPC_E500 (forced on) + STF_BARRIER_NONE (forced on) Most of those cobinations should not occur in practice, but I was not able to get an PPC e6500 rootfs (for testing PPC_E500 without forcing it on). In any case, this should ensure that there are no unexpected conflicts between the insns when combined like this. Individual v1/v4 barriers were already emitted elsewhere. Hari's ack is for the PowerPC changes only. [1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=29b74545531f6afbee9fc38c267524326dbfbedf ("MIPS: Add speculation_barrier support") [2] https://github.com/kernel-patches/bpf/pull/8576 Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Cc: Henriette Herzog <henriette.herzog@rub.de> Cc: Maximilian Ott <ott@cs.fau.de> Cc: Milan Stephan <milan.stephan@fau.de> Link: https://lore.kernel.org/r/20250603211703.337860-1-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4()Luis Gerhorst
JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to skip analysis/patching for the respective vulnerability. For v4, this will reduce the number of barriers the verifier inserts. For v1, it allows more programs to be accepted. The primary motivation for this is to not regress unpriv BPF's performance on ARM64 in a future commit where BPF_NOSPEC is also used against Spectre v1. This has the user-visible change that v1-induced rejections on non-vulnerable PowerPC CPUs are avoided. For now, this does not change the semantics of BPF_NOSPEC. It is still a v4-only barrier and must not be implemented if bypass_spec_v4 is always true for the arch. Changing it to a v1 AND v4-barrier is done in a future commit. As an alternative to bypass_spec_v1/v4, one could introduce NOSPEC_V1 AND NOSPEC_V4 instructions and allow backends to skip their lowering as suggested by commit f5e81d111750 ("bpf: Introduce BPF nospec instruction for mitigating Spectre v4"). Adding bpf_jit_bypass_spec_v1/v4() was found to be preferable for the following reason: * bypass_spec_v1/v4 benefits non-vulnerable CPUs: Always performing the same analysis (not taking into account whether the current CPU is vulnerable), needlessly restricts users of CPUs that are not vulnerable. The only use case for this would be portability-testing, but this can later be added easily when needed by allowing users to force bypass_spec_v1/v4 to false. * Portability is still acceptable: Directly disabling the analysis instead of skipping the lowering of BPF_NOSPEC(_V1/V4) might allow programs on non-vulnerable CPUs to be accepted while the program will be rejected on vulnerable CPUs. With the fallback to speculation barriers for Spectre v1 implemented in a future commit, this will only affect programs that do variable stack-accesses or are very complex. For PowerPC, the SEC_FTR checking in bpf_jit_bypass_spec_v4() is based on the check that was previously located in the BPF_NOSPEC case. For LoongArch, it would likely be safe to set both bpf_jit_bypass_spec_v1() and _v4() according to commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip speculation barrier opcode"). This is omitted here as I am unable to do any testing for LoongArch. Hari's ack concerns the PowerPC part only. Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Cc: Henriette Herzog <henriette.herzog@rub.de> Cc: Maximilian Ott <ott@cs.fau.de> Cc: Milan Stephan <milan.stephan@fau.de> Link: https://lore.kernel.org/r/20250603211318.337474-1-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09bpf: Return -EFAULT on internal errorsLuis Gerhorst
This prevents us from trying to recover from these on speculative paths in the future. Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Henriette Herzog <henriette.herzog@rub.de> Cc: Maximilian Ott <ott@cs.fau.de> Cc: Milan Stephan <milan.stephan@fau.de> Link: https://lore.kernel.org/r/20250603205800.334980-4-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09bpf: Return -EFAULT on misconfigurationsLuis Gerhorst
Mark these cases as non-recoverable to later prevent them from being caught when they occur during speculative path verification. Eduard writes [1]: The only pace I'm aware of that might act upon specific error code from verifier syscall is libbpf. Looking through libbpf code, it seems that this change does not interfere with libbpf. [1] https://lore.kernel.org/all/785b4531ce3b44a84059a4feb4ba458c68fce719.camel@gmail.com/ Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Henriette Herzog <henriette.herzog@rub.de> Cc: Maximilian Ott <ott@cs.fau.de> Cc: Milan Stephan <milan.stephan@fau.de> Link: https://lore.kernel.org/r/20250603205800.334980-3-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09bpf: Move insn if/else into do_check_insn()Luis Gerhorst
This is required to catch the errors later and fall back to a nospec if on a speculative path. Eliminate the regs variable as it is only used once and insn_idx is not modified in-between the definition and usage. Do not pass insn but compute it in the function itself. As Eduard points out [1], insn is assumed to correspond to env->insn_idx in many places (e.g, __check_reg_arg()). Move code into do_check_insn(), replace * "continue" with "return 0" after modifying insn_idx * "goto process_bpf_exit" with "return PROCESS_BPF_EXIT" * "goto process_bpf_exit_full" with "return process_bpf_exit_full()" * "do_print_state = " with "*do_print_state = " [1] https://lore.kernel.org/all/293dbe3950a782b8eb3b87b71d7a967e120191fd.camel@gmail.com/ Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Henriette Herzog <henriette.herzog@rub.de> Cc: Maximilian Ott <ott@cs.fau.de> Cc: Milan Stephan <milan.stephan@fau.de> Link: https://lore.kernel.org/r/20250603205800.334980-2-luis.gerhorst@fau.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09Merge tag 'powerpc-6.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - a couple of fixes for out of bounds issues in memtrace and vas Thanks to Ritesh Harjani (IBM), Haren Myneni, and Jonathan Greental * tag 'powerpc-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vas: Return -EINVAL if the offset is non-zero in mmap() powerpc/powernv/memtrace: Fix out of bounds issue in memtrace mmap
2025-06-10powerpc/vas: Return -EINVAL if the offset is non-zero in mmap()Haren Myneni
The user space calls mmap() to map VAS window paste address and the kernel returns the complete mapped page for each window. So return -EINVAL if non-zero is passed for offset parameter to mmap(). See Documentation/arch/powerpc/vas-api.rst for mmap() restrictions. Co-developed-by: Jonathan Greental <yonatan02greental@gmail.com> Signed-off-by: Jonathan Greental <yonatan02greental@gmail.com> Reported-by: Jonathan Greental <yonatan02greental@gmail.com> Fixes: dda44eb29c23 ("powerpc/vas: Add VAS user space API") Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250610021227.361980-2-maddy@linux.ibm.com
2025-06-10powerpc/powernv/memtrace: Fix out of bounds issue in memtrace mmapRitesh Harjani (IBM)
memtrace mmap issue has an out of bounds issue. This patch fixes the by checking that the requested mapping region size should stay within the allocated region size. Reported-by: Jonathan Greental <yonatan02greental@gmail.com> Fixes: 08a022ad3dfa ("powerpc/powernv/memtrace: Allow mmaping trace buffers") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250610021227.361980-1-maddy@linux.ibm.com
2025-06-09scsi: error: alua: I/O errors for ALUA state transitionsRajashekhar M A
When a host is configured with a few LUNs and I/O is running, injecting FC faults repeatedly leads to path recovery problems. The LUNs have 4 paths each and 3 of them come back active after say an FC fault which makes 2 of the paths go down, instead of all 4. This happens after several iterations of continuous FC faults. Reason here is that we're returning an I/O error whenever we're encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying. Signed-off-by: Rajashekhar M A <rajs@netapp.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20250606135924.27397-1-hare@kernel.org Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09scsi: storvsc: Increase the timeouts to storvsc_timeoutDexuan Cui
Currently storvsc_timeout is only used in storvsc_sdev_configure(), and 5s and 10s are used elsewhere. It turns out that rarely the 5s is not enough on Azure, so let's use storvsc_timeout everywhere. In case a timeout happens and storvsc_channel_init() returns an error, close the VMBus channel so that any host-to-guest messages in the channel's ringbuffer, which might come late, can be safely ignored. Add a "const" to storvsc_timeout. Cc: stable@kernel.org Signed-off-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1749243459-10419-1-git-send-email-decui@microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09bpf: Add cookie in fdinfo for raw_tpTao Chen
Add cookie in fdinfo for raw_tp, the info as follows: link_type: raw_tracepoint link_id: 31 prog_tag: 9dfdf8ef453843bf prog_id: 32 tp_name: sys_enter cookie: 23925373020405760 Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606165818.3394397-5-chen.dylane@linux.dev
2025-06-09bpf: Add cookie in fdinfo for tracingTao Chen
Add cookie in fdinfo for tracing, the info as follows: link_type: tracing link_id: 6 prog_tag: 9dfdf8ef453843bf prog_id: 35 attach_type: 25 target_obj_id: 1 target_btf_id: 60355 cookie: 9007199254740992 Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606165818.3394397-4-chen.dylane@linux.dev
2025-06-09bpftool: Display cookie for tracing link probeTao Chen
Display cookie for tracing link probe, in plain mode: #bpftool link 5: tracing prog 34 prog_type tracing attach_type trace_fentry target_obj_id 1 target_btf_id 60355 cookie 4503599627370496 pids test_progs(176) And in json mode: #bpftool link -j | jq { "id": 5, "type": "tracing", "prog_id": 34, "prog_type": "tracing", "attach_type": "trace_fentry", "target_obj_id": 1, "target_btf_id": 60355, "cookie": 4503599627370496, "pids": [ { "pid": 176, "comm": "test_progs" } ] } Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606165818.3394397-3-chen.dylane@linux.dev
2025-06-09selftests/bpf: Add cookies check for tracing fill_link_info testTao Chen
Adding tests for getting cookie with fill_link_info for tracing. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606165818.3394397-2-chen.dylane@linux.dev
2025-06-09bpf: Add cookie to tracing bpf_link_infoTao Chen
bpf_tramp_link includes cookie info, we can add it in bpf_link_info. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606165818.3394397-1-chen.dylane@linux.dev
2025-06-09drm/msm/adreno: Check for recognized GPU before bindRob Clark
If we have a newer dtb than kernel, we could end up in a situation where the GPU device is present in the dtb, but not in the drivers device table. We don't want this to prevent the display from probing. So check that we recognize the GPU before adding the GPU component. v2: use %pOF Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657701/
2025-06-09Merge branch 'bpf-make-reg_not_null-true-for-const_ptr_to_map'Andrii Nakryiko
Ihor Solodrai says: ==================== bpf: make reg_not_null() true for CONST_PTR_TO_MAP Handle CONST_PTR_TO_MAP null checks in the BPF verifier. Add appropriate test cases. v3->v4: more test cases v2->v3: change constant in unpriv test v1->v2: add a test case with ringbufs v3: https://lore.kernel.org/bpf/20250604222729.3351946-1-isolodrai@meta.com/ v2: https://lore.kernel.org/bpf/20250604003759.1020745-1-isolodrai@meta.com/ v1: https://lore.kernel.org/bpf/20250523232503.1086319-1-isolodrai@meta.com/ ==================== Link: https://patch.msgid.link/20250609183024.359974-1-isolodrai@meta.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-09selftests/bpf: Add test cases with CONST_PTR_TO_MAP null checksIhor Solodrai
A test requires the following to happen: * CONST_PTR_TO_MAP value is checked for null * the code in the null branch fails verification Add test cases: * direct global map_ptr comparison to null * lookup inner map, then two checks (the first transforms map_value_or_null into map_ptr) * lookup inner map, spill-fill it, then check for null * use an array of ringbufs to recreate a common coding pattern [1] [1] https://lore.kernel.org/bpf/CAEf4BzZNU0gX_sQ8k8JaLe1e+Veth3Rk=4x7MDhv=hQxvO8EDw@mail.gmail.com/ Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250609183024.359974-4-isolodrai@meta.com
2025-06-09selftests/bpf: Add cmp_map_pointer_with_const testIhor Solodrai
Add a test for CONST_PTR_TO_MAP comparison with a non-0 constant. A BPF program with this code must not pass verification in unpriv. Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250609183024.359974-3-isolodrai@meta.com
2025-06-09bpf: Make reg_not_null() true for CONST_PTR_TO_MAPIhor Solodrai
When reg->type is CONST_PTR_TO_MAP, it can not be null. However the verifier explores the branches under rX == 0 in check_cond_jmp_op() even if reg->type is CONST_PTR_TO_MAP, because it was not checked for in reg_not_null(). Fix this by adding CONST_PTR_TO_MAP to the set of types that are considered non nullable in reg_not_null(). An old "unpriv: cmp map pointer with zero" selftest fails with this change, because now early out correctly triggers in check_cond_jmp_op(), making the verification to pass. In practice verifier may allow pointer to null comparison in unpriv, since in many cases the relevant branch and comparison op are removed as dead code. So change the expected test result to __success_unpriv. Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250609183024.359974-2-isolodrai@meta.com
2025-06-09bpf: Add show_fdinfo for perf_eventTao Chen
After commit 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") add perf_event info, we can also show the info with the method of cat /proc/[fd]/fdinfo. kprobe fdinfo: link_type: perf link_id: 10 prog_tag: bcf7977d3b93787c prog_id: 20 name: bpf_fentry_test1 offset: 0x0 missed: 0 addr: 0xffffffffa28a2904 event_type: kprobe cookie: 3735928559 uprobe fdinfo: link_type: perf link_id: 13 prog_tag: bcf7977d3b93787c prog_id: 21 name: /proc/self/exe offset: 0x63dce4 ref_ctr_offset: 0x33eee2a event_type: uprobe cookie: 3735928559 tracepoint fdinfo: link_type: perf link_id: 11 prog_tag: bcf7977d3b93787c prog_id: 22 tp_name: sched_switch event_type: tracepoint cookie: 3735928559 perf_event fdinfo: link_type: perf link_id: 12 prog_tag: bcf7977d3b93787c prog_id: 23 type: 1 config: 2 event_type: event cookie: 3735928559 Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606150258.3385166-1-chen.dylane@linux.dev
2025-06-09Merge branch 'bpf-implement-mprog-api-on-top-of-existing-cgroup-progs'Andrii Nakryiko
Yonghong Song says: ==================== bpf: Implement mprog API on top of existing cgroup progs Current cgroup prog ordering is appending at attachment time. This is not ideal. In some cases, users want specific ordering at a particular cgroup level. For example, in Meta, we have a case where three different applications all have cgroup/setsockopt progs and they require specific ordering. Current approach is to use a bpfchainer where one bpf prog contains multiple global functions and each global function can be freplaced by a prog for a specific application. The ordering of global functions decides the ordering of those application specific bpf progs. Using bpfchainer is a centralized approach and is not desirable as one of applications acts as a daemon. The decentralized attachment approach is more favorable for those applications. To address this, the existing mprog API ([2]) seems an ideal solution with supporting BPF_F_BEFORE and BPF_F_AFTER flags on top of existing cgroup bpf implementation. More specifically, the support is added for prog/link attachment with BPF_F_BEFORE and BPF_F_AFTER. The kernel mprog interface ([2]) is not used and the implementation is directly done in cgroup bpf code base. The mprog 'revision' is also implemented in attach/detach/replace, so users can query revision number to check the change of cgroup prog list. The patch set contains 5 patches. Patch 1 adds revision support for cgroup bpf progs. Patch 2 implements mprog API implementation for prog/link attach and revision update. Patch 3 adds a new libbpf API to do cgroup link attach with flags like BPF_F_BEFORE/BPF_F_AFTER. Patches 4 and 5 add two tests to validate the implementation. [1] https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev [2] https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net Changelogs: v4 -> v5: - v4: https://lore.kernel.org/bpf/20250530173812.1823479-1-yonghong.song@linux.dev/ - Remove early prog/link checking based flags and id_or_fd as later code will do checking as well. - Do proper cgroup flag checking for bpf_prog_attach(). v3 -> v4: - v3: https://lore.kernel.org/bpf/20250517162720.4077882-1-yonghong.song@linux.dev/ - Refactor some to make BPF_F_BEFORE/BPF_F_AFTER handling easier to understand. - Perviously, I degraded 'link' to 'prog' for later mprog handling. This is not correct. Similar to mprog.c, we should be check 'link' instead link->prog since it is possible two different links may have the same underlying prog and we do not want to miss supporting such use case. v2 -> v3: - v2: https://lore.kernel.org/bpf/20250508223524.487875-1-yonghong.song@linux.dev/ - Big change to replace get_anchor_prog() to get_prog_list() so the 'struct bpf_prog_list *' is returned directly. - Support 'BPF_F_BEFORE | BPF_F_AFTER' attachment if the prog list is empty and flags do not have 'BPF_F_LINK | BPF_F_ID' and id_or_fd is 0. - Add BPF_F_LINK support. - Patch 4 is added to reuse id_from_prog_fd() and id_from_link_fd(). v1 -> v2: - v1: https://lore.kernel.org/bpf/20250411011523.1838771-1-yonghong.song@linux.dev/ - Change cgroup_bpf.revisions from atomic64_t to u64. - Added missing bpf_prog_put in various places. - Rename get_cmp_prog() to get_anchor_prog(). The implementation tries to find the anchor prog regardless of whether id_or_fd is non-NULL or not. - Rename bpf_cgroup_prog_attached() to is_cgroup_prog_type() and handle BPF_PROG_TYPE_LSM properly (with BPF_LSM_CGROUP attach type). - I kept 'id || id_or_fd' condition as the condition 'id' is also used in mprog.c so I assume it is okay in cgroup.c as well. ==================== Link: https://patch.msgid.link/20250606163131.2428225-1-yonghong.song@linux.dev Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-06-09selftests/bpf: Add two selftests for mprog API based cgroup progsYonghong Song
Two tests are added: - cgroup_mprog_opts, which mimics tc_opts.c ([1]). Both prog and link attach are tested. Some negative tests are also included. - cgroup_mprog_ordering, which actually runs the program with some mprog API flags. [1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/tc_opts.c Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163156.2429955-1-yonghong.song@linux.dev
2025-06-09selftests/bpf: Move some tc_helpers.h functions to test_progs.hYonghong Song
Move static inline functions id_from_prog_fd() and id_from_link_fd() from prog_tests/tc_helpers.h to test_progs.h so these two functions can be reused for later cgroup mprog selftests. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163151.2429325-1-yonghong.song@linux.dev
2025-06-09libbpf: Support link-based cgroup attach with optionsYonghong Song
Currently libbpf supports bpf_program__attach_cgroup() with signature: LIBBPF_API struct bpf_link * bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd); To support mprog style attachment, additionsl fields like flags, relative_{fd,id} and expected_revision are needed. Add a new API: LIBBPF_API struct bpf_link * bpf_program__attach_cgroup_opts(const struct bpf_program *prog, int cgroup_fd, const struct bpf_cgroup_opts *opts); where bpf_cgroup_opts contains all above needed fields. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163146.2429212-1-yonghong.song@linux.dev
2025-06-09bpf: Implement mprog API on top of existing cgroup progsYonghong Song
Current cgroup prog ordering is appending at attachment time. This is not ideal. In some cases, users want specific ordering at a particular cgroup level. To address this, the existing mprog API seems an ideal solution with supporting BPF_F_BEFORE and BPF_F_AFTER flags. But there are a few obstacles to directly use kernel mprog interface. Currently cgroup bpf progs already support prog attach/detach/replace and link-based attach/detach/replace. For example, in struct bpf_prog_array_item, the cgroup_storage field needs to be together with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog as the member, which makes it difficult to use kernel mprog interface. In another case, the current cgroup prog detach tries to use the same flag as in attach. This is different from mprog kernel interface which uses flags passed from user space. So to avoid modifying existing behavior, I made the following changes to support mprog API for cgroup progs: - The support is for prog list at cgroup level. Cross-level prog list (a.k.a. effective prog list) is not supported. - Previously, BPF_F_PREORDER is supported only for prog attach, now BPF_F_PREORDER is also supported by link-based attach. - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported similar to kernel mprog but with different implementation. - For detach and replace, use the existing implementation. - For attach, detach and replace, the revision for a particular prog list, associated with a particular attach type, will be updated by increasing count by 1. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163141.2428937-1-yonghong.song@linux.dev
2025-06-09cgroup: Add bpf prog revisions to struct cgroup_bpfYonghong Song
One of key items in mprog API is revision for prog list. The revision number will be increased if the prog list changed, e.g., attach, detach or replace. Add 'revisions' field to struct cgroup_bpf, representing revisions for all cgroup related attachment types. The initial revision value is set to 1, the same as kernel mprog implementations. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250606163136.2428732-1-yonghong.song@linux.dev
2025-06-09Merge tag 'for-net-2025-06-05' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - MGMT: Fix UAF on mgmt_remove_adv_monitor_complete - MGMT: Protect mgmt_pending list with its own lock - hci_core: fix list_for_each_entry_rcu usage - btintel_pcie: Increase the tx and rx descriptor count - btintel_pcie: Reduce driver buffer posting to prevent race condition - btintel_pcie: Fix driver not posting maximum rx buffers * tag 'for-net-2025-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: MGMT: Protect mgmt_pending list with its own lock Bluetooth: MGMT: Fix UAF on mgmt_remove_adv_monitor_complete Bluetooth: btintel_pcie: Reduce driver buffer posting to prevent race condition Bluetooth: btintel_pcie: Increase the tx and rx descriptor count Bluetooth: btintel_pcie: Fix driver not posting maximum rx buffers Bluetooth: hci_core: fix list_for_each_entry_rcu usage ==================== Link: https://patch.msgid.link/20250605191136.904411-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-09net_sched: sch_sfq: fix a potential crash on gso_skb handlingEric Dumazet
SFQ has an assumption of always being able to queue at least one packet. However, after the blamed commit, sch->q.len can be inflated by packets in sch->gso_skb, and an enqueue() on an empty SFQ qdisc can be followed by an immediate drop. Fix sfq_drop() to properly clear q->tail in this situation. Tested: ip netns add lb ip link add dev to-lb type veth peer name in-lb netns lb ethtool -K to-lb tso off # force qdisc to requeue gso_skb ip netns exec lb ethtool -K in-lb gro on # enable NAPI ip link set dev to-lb up ip -netns lb link set dev in-lb up ip addr add dev to-lb 192.168.20.1/24 ip -netns lb addr add dev in-lb 192.168.20.2/24 tc qdisc replace dev to-lb root sfq limit 100 ip netns exec lb netserver netperf -H 192.168.20.2 -l 100 & netperf -H 192.168.20.2 -l 100 & netperf -H 192.168.20.2 -l 100 & netperf -H 192.168.20.2 -l 100 & Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback") Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Closes: https://lore.kernel.org/netdev/9da42688-bfaa-4364-8797-e9271f3bdaef@hetzner-cloud.de/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250606165127.3629486-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-09smb: client: disable path remapping with POSIX extensionsPhilipp Kerling
If SMB 3.1.1 POSIX Extensions are available and negotiated, the client should be able to use all characters and not remap anything. Currently, the user has to explicitly request this behavior by specifying the "nomapposix" mount option. Link: https://lore.kernel.org/4195bb677b33d680e77549890a4f4dd3b474ceaf.camel@rx2.rx-server.de Signed-off-by: Philipp Kerling <pkerling@casix.org> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-09drm/msm/adreno: Pass device_node to find_chipid()Rob Clark
We are going to want to re-use this before the component is bound, when we don't yet have the device pointer (but we do have the of node). v2: use %pOF Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657705/
2025-06-09drm/msm: Rename add_components_mdp()Rob Clark
To better match add_gpu_components(). Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/657700/
2025-06-09drivers: gpu: drm: msm: registers: improve reproducibilityRyan Eatmon
The files generated by gen_header.py capture the source path to the input files and the date. While that can be informative, it varies based on where and when the kernel was built as the full path is captured. Since all of the files that this tool is run on is under the drivers directory, this modifies the application to strip all of the path before drivers. Additionally it prints <stripped> instead of the date. Signed-off-by: Ryan Eatmon <reatmon@ti.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com> Signed-off-by: Viswanath Kraleti <viswanath.kraleti@oss.qualcomm.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/655599/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm/a7xx: Call CP_RESET_CONTEXT_STATEConnor Abbott
Calling this packet is necessary when we switch contexts because there are various pieces of state used by userspace to synchronize between BR and BV that are persistent across submits and we need to make sure that they are in a "safe" state when switching contexts. Otherwise a userspace submission in one context could cause another context to function incorrectly and hang, effectively a denial of service (although without leaking data). This was missed during initial a7xx bringup. Fixes: af66706accdf ("drm/msm/a6xx: Add skeleton A7xx support") Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654924/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Fix CP_RESET_CONTEXT_STATE bitfield namesConnor Abbott
Based on kgsl. Fixes: af66706accdf ("drm/msm/a6xx: Add skeleton A7xx support") Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654922/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09Merge branch '6.16/scsi-queue' into 6.16/scsi-fixesMartin K. Petersen
Pull in remaining fixes from queue branch. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09scsi: s390: zfcp: Ensure synchronous unit_addPeter Oberparleiter
Improve the usability of the unit_add sysfs attribute by ensuring that the associated FCP LUN scan processing is completed synchronously. This enables configuration tooling to consistently determine the end of the scan process to allow for serialization of follow-on actions. While the scan process associated with unit_add typically completes synchronously, it is deferred to an asynchronous background process if unit_add is used before initial remote port scanning has completed. This occurs when unit_add is used immediately after setting the associated FCP device online. To ensure synchronous unit_add processing, wait for remote port scanning to complete before initiating the FCP LUN scan. Cc: stable@vger.kernel.org Reviewed-by: M Nikhil <nikh1092@linux.ibm.com> Reviewed-by: Nihar Panda <niharp@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Nihar Panda <niharp@linux.ibm.com> Link: https://lore.kernel.org/r/20250603182252.2287285-2-niharp@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09scsi: iscsi: Fix incorrect error path labels for flashnode operationsAlok Tiwari
Correct the error handling goto labels used when host lookup fails in various flashnode-related event handlers: - iscsi_new_flashnode() - iscsi_del_flashnode() - iscsi_login_flashnode() - iscsi_logout_flashnode() - iscsi_logout_flashnode_sid() scsi_host_put() is not required when shost is NULL, so jumping to the correct label avoids unnecessary operations. These functions previously jumped to the wrong goto label (put_host), which did not match the intended cleanup logic. Use the correct exit labels (exit_new_fnode, exit_del_fnode, etc.) to ensure proper error handling. Also remove the unused put_host label under iscsi_new_flashnode() as it is no longer needed. No functional changes beyond accurate error path correction. Fixes: c6a4bb2ef596 ("[SCSI] scsi_transport_iscsi: Add flash node mgmt support") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://lore.kernel.org/r/20250530193012.3312911-1-alok.a.tiwari@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09scsi: mvsas: Fix typos in per-phy comments and SAS cmd port registersAnkit Chauhan
Spelling fixes: Deocder --> Decoder Memroy --> Memory This is a non-functional change aimed at improving code clarity. Signed-off-by: Ankit Chauhan <ankitchauhan2065@gmail.com> Link: https://lore.kernel.org/r/20250528110604.59528-1-ankitchauhan2065@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-06-09drm/msm: Temporarily disable stall-on-fault after a page faultConnor Abbott
When things go wrong, the GPU is capable of quickly generating millions of faulting translation requests per second. When that happens, in the stall-on-fault model each access will stall until it wins the race to signal the fault and then the RESUME register is written. This slows processing page faults to a crawl as the GPU can generate faults much faster than the CPU can acknowledge them. It also means that all available resources in the SMMU are saturated waiting for the stalled transactions, so that other transactions such as transactions generated by the GMU, which shares translation resources with the GPU, cannot proceed. This causes a GMU watchdog timeout, which leads to a failed reset because GX cannot collapse when there is a transaction pending and a permanently hung GPU. On older platforms with qcom,smmu-v2, it seems that when one transaction is stalled subsequent faulting transactions are terminated, which avoids this problem, but the MMU-500 follows the spec here. To work around these problems, disable stall-on-fault as soon as we get a page fault until a cooldown period after pagefaults stop. This allows the GMU some guaranteed time to continue working. We only use stall-on-fault to halt the GPU while we collect a devcoredump and we always terminate the transaction afterward, so it's fine to miss some subsequent page faults. We also keep it disabled so long as the current devcoredump hasn't been deleted, because in that case we likely won't capture another one if there's a fault. After this commit HFI messages still occasionally time out, because the crashdump handler doesn't run fast enough to let the GMU resume, but the driver seems to recover from it. This will probably go away after the HFI timeout is increased. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654891/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Delete resume_translation()Connor Abbott
Unused since the previous commit. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654890/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Don't use a worker to capture fault devcoredumpConnor Abbott
Now that we use a threaded IRQ, it should be safe to do this in the fault handler. We can also remove fault_info from struct msm_gpu and just pass it directly. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/654889/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Fix another leak in the submit error pathRob Clark
put_unused_fd() doesn't free the installed file, if we've already done fd_install(). So we need to also free the sync_file. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/653583/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09drm/msm: Fix a fence leak in submit error pathRob Clark
In error paths, we could unref the submit without calling drm_sched_entity_push_job(), so msm_job_free() will never get called. Since drm_sched_job_cleanup() will NULL out the s_fence, we can use that to detect this case. Signed-off-by: Rob Clark <robdclark@chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/653584/ Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-09SPI: omap2-mcspi: Fix SPI CS behaviour aroundMark Brown
Merge series from Félix Piédallu <felix.piedallu@non.se.com>: These patches fix the behaviour of the SPI Chip Select of the OMAP2 MCSPI driver used on TI SoCs. The omap2-mcspi driver supports the use of multi mode (multichannel in TI documentation). In this mode, the CS is asserted and deasserted by the hardware. The multi mode is disabled for messages when cs_change=0 for all transfers (e.g when CS is kept asserted between transfers of a same message). The multi mode also needs to be disabled for messages when cs_change=1 on the last transfer (e.g when CS is kept asserted after the WHOLE message), and the message right after. Currently, that is not the case and it CS is deasserted by hardware when it shouldn't. This breaks peripheral drivers that send multiple messages with the CS asserted in between. Patch 1 ensures that multi mode is disabled when cs_change=1 on the last transfer of the message. Patch 2 ensures that multi mode is disable on a message following one with cs_change=1 on the last transfer. This is the case for the TPM TIS SPI driver that uses this logic for flow control purposes. Tested on an AM6442 platform with a TPM ST33HTPH2X32AHE4.
2025-06-09ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in the non-uapi headersThomas Huth
While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. So let's standardize on the __ASSEMBLER__ macro that is provided by the compilers now. This is a completely mechanical patch (done with a simple "sed -i" statement). Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
2025-06-09ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headersThomas Huth
__ASSEMBLY__ is only defined by the Makefile of the kernel, so this is not really useful for uapi headers (unless the userspace Makefile defines it, too). Let's switch to __ASSEMBLER__ which gets set automatically by the compiler when compiling assembly code. Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
2025-06-09ARC: unwind: Use built-in sort swap to reduce code size and improve performanceYu-Chun Lin
The custom swap function used in sort() was identical to the default built-in sort swap. Remove the custom swap function and passes NULL to sort(), allowing it to use the default swap function. This change reduces code size and improves performance, particularly when CONFIG_MITIGATION_RETPOLINE is enabled. With RETPOLINE mitigation, indirect function calls incur significant overhead, and using the default swap function avoids this cost. $ ./scripts/bloat-o-meter ./unwind.o.old ./unwind.o.new add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-22 (-22) Function old new delta init_unwind_hdr.constprop 544 540 -4 swap_eh_frame_hdr_table_entries 18 - -18 Total: Before=4410, After=4388, chg -0.50% Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
2025-06-09ARC: atomics: Implement arch_atomic64_cmpxchg using _relaxedJason Gunthorpe
The core atomic code has a number of macros where it elaborates architecture primitives into more functions. ARC uses arch_atomic64_cmpxchg() as it's architecture primitive which disable alot of the additional functions. Instead provide arch_cmpxchg64_relaxed() as the primitive and rely on the core macros to create arch_cmpxchg64(). The macros will also provide other functions, for instance, try_cmpxchg64_release(), giving a more complete implementation. Suggested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/Z0747n5bSep4_1VX@J2N7QTR9R3 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
2025-06-09cpupower: split unitdir from libdir in MakefileFrancesco Poli (wintermute)
Improve the installation procedure for the systemd service unit 'cpupower.service', to be more flexible. Some distros install libraries to /usr/lib64/, but systemd service units have to be installed to /usr/lib/systemd/system: as a consequence, the installation procedure should not assume that systemd service units can be installed to ${libdir}/systemd/system ... Define a dedicated variable ("unitdir") in the Makefile. Link: https://lore.kernel.org/linux-pm/260b6d79-ab61-43b7-a0eb-813e257bc028@leemhuis.info/T/#m0601940ab439d5cbd288819d2af190ce59e810e6 Fixes: 9c70b779ad91 ("cpupower: add a systemd service to run cpupower") Link: https://lore.kernel.org/r/20250521211656.65646-1-invernomuto@paranoici.org Signed-off-by: Francesco Poli (wintermute) <invernomuto@paranoici.org> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>