linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-01-23	bpf, docs: Fix bpf_redirect_peer header doc	Victor Stewart
	Amend the bpf_redirect_peer() header documentation to also mention support for the netkit device type. Signed-off-by: Victor Stewart <v@nametag.social> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240116202952.241009-1-v@nametag.social Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	Merge branch 'bpf: tcp: Support arbitrary SYN Cookie at TC.'	Martin KaFai Lau
	Kuniyuki Iwashima says: ==================== Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless for the connection request until a valid ACK is responded to the SYN+ACK. The cookie contains two kinds of host-specific bits, a timestamp and secrets, so only can it be validated by the generator. It means SYN Cookie consumes network resources between the client and the server; intermediate nodes must remember which nodes to route ACK for the cookie. SYN Proxy reduces such unwanted resource allocation by handling 3WHS at the edge network. After SYN Proxy completes 3WHS, it forwards SYN to the backend server and completes another 3WHS. However, since the server's ISN differs from the cookie, the proxy must manage the ISN mappings and fix up SEQ/ACK numbers in every packet for each connection. If a proxy node goes down, all the connections through it are terminated. Keeping a state at proxy is painful from that perspective. At AWS, we use a dirty hack to build truly stateless SYN Proxy at scale. Our SYN Proxy consists of the front proxy layer and the backend kernel module. (See slides of LPC2023 [0], p37 - p48) The cookie that SYN Proxy generates differs from the kernel's cookie in that it contains a secret (called rolling salt) (i) shared by all the proxy nodes so that any node can validate ACK and (ii) updated periodically so that old cookies cannot be validated and we need not encode a timestamp for the cookie. Also, ISN contains WScale, SACK, and ECN, not in TS val. This is not to sacrifice any connection quality, where some customers turn off TCP timestamps option due to retro CVE. After 3WHS, the proxy restores SYN, encapsulates ACK into SYN, and forward the TCP-in-TCP packet to the backend server. Our kernel module works at Netfilter input/output hooks and first feeds SYN to the TCP stack to initiate 3WHS. When the module is triggered for SYN+ACK, it looks up the corresponding request socket and overwrites tcp_rsk(req)->snt_isn with the proxy's cookie. Then, the module can complete 3WHS with the original ACK as is. This way, our SYN Proxy does not manage the ISN mappings nor wait for SYN+ACK from the backend thus can remain stateless. It's working very well for high-bandwidth services like multiple Tbps, but we are looking for a way to drop the dirty hack and further optimise the sequences. If we could validate an arbitrary SYN Cookie on the backend server with BPF, the proxy would need not restore SYN nor pass it. After validating ACK, the proxy node just needs to forward it, and then the server can do the lightweight validation (e.g. check if ACK came from proxy nodes, etc) and create a connection from the ACK. This series allows us to create a full sk from an arbitrary SYN Cookie, which is done in 3 steps. 1) At tc, BPF prog calls a new kfunc to create a reqsk and configure it based on the argument populated from SYN Cookie. The reqsk has its listener as req->rsk_listener and is passed to the TCP stack as skb->sk. 2) During TCP socket lookup for the skb, skb_steal_sock() returns a listener in the reuseport group that inet_reqsk(skb->sk)->rsk_listener belongs to. 3) In cookie_v[46]_check(), the reqsk (skb->sk) is fully initialised and a full sk is created. The kfunc usage is as follows: struct bpf_tcp_req_attrs attrs = { .mss = mss, .wscale_ok = wscale_ok, .rcv_wscale = rcv_wscale, /* Server's WScale < 15 / .snd_wscale = snd_wscale, / Client's WScale < 15 / .tstamp_ok = tstamp_ok, .rcv_tsval = tsval, .rcv_tsecr = tsecr, / Server's Initial TSval / .usec_ts_ok = usec_ts_ok, .sack_ok = sack_ok, .ecn_ok = ecn_ok, } skc = bpf_skc_lookup_tcp(...); sk = (struct sock )bpf_skc_to_tcp_sock(skc); bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs)); bpf_sk_release(skc); [0]: https://lpc.events/event/17/contributions/1645/attachments/1350/2701/SYN_Proxy_at_Scale_with_BPF.pdf Changes: v8 * Rebase on Yonghong's cpuv4 fix * Patch 5 * Fill the trailing 3-bytes padding in struct bpf_tcp_req_attrs and test it as null * Patch 6 * Remove unused IPPROTP_MPTCP definition v7: https://lore.kernel.org/bpf/20231221012806.37137-1-kuniyu@amazon.com/ * Patch 5 & 6 * Drop MPTCP support v6: https://lore.kernel.org/bpf/20231214155424.67136-1-kuniyu@amazon.com/ * Patch 5 & 6 * /struct /s/tcp_cookie_attributes/bpf_tcp_req_attrs/ * Don't reuse struct tcp_options_received and use u8 for each attrs * Patch 6 * Check retval of test__start_subtest() v5: https://lore.kernel.org/netdev/20231211073650.90819-1-kuniyu@amazon.com/ * Split patch 1-3 * Patch 3 * Clear req->rsk_listener in skb_steal_sock() * Patch 4 & 5 * Move sysctl validation and tsoff init from cookie_bpf_check() to kfunc * Patch 5 * Do not increment LINUX_MIB_SYNCOOKIES(RECV\|FAILED) * Patch 6 * Remove __always_inline * Test if tcp_handle_{syn,ack}() is executed * Move some definition to bpf_tracing_net.h * s/BPF_F_CURRENT_NETNS/-1/ v4: https://lore.kernel.org/bpf/20231205013420.88067-1-kuniyu@amazon.com/ * Patch 1 & 2 * s/CONFIG_SYN_COOKIE/CONFIG_SYN_COOKIES/ * Patch 1 * Don't set rcv_wscale for BPF SYN Cookie case. * Patch 2 * Add test for tcp_opt.{unused,rcv_wscale} in kfunc * Modify skb_steal_sock() to avoid resetting skb-sk * Support SO_REUSEPORT lookup * Patch 3 * Add CONFIG_SYN_COOKIES to Kconfig for CI * Define BPF_F_CURRENT_NETNS v3: https://lore.kernel.org/netdev/20231121184245.69569-1-kuniyu@amazon.com/ * Guard kfunc and req->syncookie part in inet6?_steal_sock() with CONFIG_SYN_COOKIE v2: https://lore.kernel.org/netdev/20231120222341.54776-1-kuniyu@amazon.com/ * Drop SOCK_OPS and move SYN Cookie validation logic to TC with kfunc. * Add cleanup patches to reduce discrepancy between cookie_v[46]_check() v1: https://lore.kernel.org/bpf/20231013220433.70792-1-kuniyu@amazon.com/ ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftest: bpf: Test bpf_sk_assign_tcp_reqsk().	Kuniyuki Iwashima
	This commit adds a sample selftest to demonstrate how we can use bpf_sk_assign_tcp_reqsk() as the backend of SYN Proxy. The test creates IPv4/IPv6 x TCP connections and transfer messages over them on lo with BPF tc prog attached. The tc prog will process SYN and returns SYN+ACK with the following ISN and TS. In a real use case, this part will be done by other hosts. MSB LSB ISN: \| 31 ... 8 \| 7 6 \| 5 \| 4 \| 3 2 1 0 \| \| Hash_1 \| MSS \| ECN \| SACK \| WScale \| TS: \| 31 ... 8 \| 7 ... 0 \| \| Random \| Hash_2 \| WScale in SYN is reused in SYN+ACK. The client returns ACK, and tc prog will recalculate ISN and TS from ACK and validate SYN Cookie. If it's valid, the prog calls kfunc to allocate a reqsk for skb and configure the reqsk based on the argument created from SYN Cookie. Later, the reqsk will be processed in cookie_v[46]_check() to create a connection. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240115205514.68364-7-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: tcp: Support arbitrary SYN Cookie.	Kuniyuki Iwashima
	This patch adds a new kfunc available at TC hook to support arbitrary SYN Cookie. The basic usage is as follows: struct bpf_tcp_req_attrs attrs = { .mss = mss, .wscale_ok = wscale_ok, .rcv_wscale = rcv_wscale, /* Server's WScale < 15 / .snd_wscale = snd_wscale, / Client's WScale < 15 / .tstamp_ok = tstamp_ok, .rcv_tsval = tsval, .rcv_tsecr = tsecr, / Server's Initial TSval / .usec_ts_ok = usec_ts_ok, .sack_ok = sack_ok, .ecn_ok = ecn_ok, } skc = bpf_skc_lookup_tcp(...); sk = (struct sock )bpf_skc_to_tcp_sock(skc); bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs)); bpf_sk_release(skc); bpf_sk_assign_tcp_reqsk() takes skb, a listener sk, and struct bpf_tcp_req_attrs and allocates reqsk and configures it. Then, bpf_sk_assign_tcp_reqsk() links reqsk with skb and the listener. The notable thing here is that we do not hold refcnt for both reqsk and listener. To differentiate that, we mark reqsk->syncookie, which is only used in TX for now. So, if reqsk->syncookie is 1 in RX, it means that the reqsk is allocated by kfunc. When skb is freed, sock_pfree() checks if reqsk->syncookie is 1, and in that case, we set NULL to reqsk->rsk_listener before calling reqsk_free() as reqsk does not hold a refcnt of the listener. When the TCP stack looks up a socket from the skb, we steal the listener from the reqsk in skb_steal_sock() and create a full sk in cookie_v[46]_check(). The refcnt of reqsk will finally be set to 1 in tcp_get_cookie_sock() after creating a full sk. Note that we can extend struct bpf_tcp_req_attrs in the future when we add a new attribute that is determined in 3WHS. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240115205514.68364-6-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().	Kuniyuki Iwashima
	We will support arbitrary SYN Cookie with BPF in the following patch. If BPF prog validates ACK and kfunc allocates a reqsk, it will be carried to cookie_[46]_check() as skb->sk. If skb->sk is not NULL, we call cookie_bpf_check(). Then, we clear skb->sk and skb->destructor, which are needed not to hold refcnt for reqsk and the listener. See the following patch for details. After that, we finish initialisation for the remaining fields with cookie_tcp_reqsk_init(). Note that the server side WScale is set only for non-BPF SYN Cookie. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240115205514.68364-5-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: tcp: Handle BPF SYN Cookie in skb_steal_sock().	Kuniyuki Iwashima
	We will support arbitrary SYN Cookie with BPF. If BPF prog validates ACK and kfunc allocates a reqsk, it will be carried to TCP stack as skb->sk with req->syncookie 1. Also, the reqsk has its listener as req->rsk_listener with no refcnt taken. When the TCP stack looks up a socket from the skb, we steal inet_reqsk(skb->sk)->rsk_listener in skb_steal_sock() so that the skb will be processed in cookie_v[46]_check() with the listener. Note that we do not clear skb->sk and skb->destructor so that we can carry the reqsk to cookie_v[46]_check(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240115205514.68364-4-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: Fix potential premature unload in bpf_testmod	Artem Savkov
	It is possible for bpf_kfunc_call_test_release() to be called from bpf_map_free_deferred() when bpf_testmod is already unloaded and perf_test_stuct.cnt which it tries to decrease is no longer in memory. This patch tries to fix the issue by waiting for all references to be dropped in bpf_testmod_exit(). The issue can be triggered by running 'test_progs -t map_kptr' in 6.5, but is obscured in 6.6 by d119357d07435 ("rcu-tasks: Treat only synchronous grace periods urgently"). Fixes: 65eb006d85a2 ("bpf: Move kernel test kfuncs to bpf_testmod") Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/82f55c0e-0ec8-4fe1-8d8c-b1de07558ad9@linux.dev Link: https://lore.kernel.org/bpf/20240110085737.8895-1-asavkov@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	tcp: Move skb_steal_sock() to request_sock.h	Kuniyuki Iwashima
	We will support arbitrary SYN Cookie with BPF. If BPF prog validates ACK and kfunc allocates a reqsk, it will be carried to TCP stack as skb->sk with req->syncookie 1. In skb_steal_sock(), we need to check inet_reqsk(sk)->syncookie to see if the reqsk is created by kfunc. However, inet_reqsk() is not available in sock.h. Let's move skb_steal_sock() to request_sock.h. While at it, we refactor skb_steal_sock() so it returns early if skb->sk is NULL to minimise the following patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240115205514.68364-3-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpftool: Silence build warning about calloc()	Tiezhu Yang
	There exists the following warning when building bpftool: CC prog.o prog.c: In function ‘profile_open_perf_events’: prog.c:2301:24: warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Wcalloc-transposed-args] 2301 \| sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric); \| ^~~ prog.c:2301:24: note: earlier argument should specify number of elements, later size of each element Tested with the latest upstream GCC which contains a new warning option -Wcalloc-transposed-args. The first argument to calloc is documented to be number of elements in array, while the second argument is size of each element, just switch the first and second arguments of calloc() to silence the build warning, compile tested only. Fixes: 47c09d6a9f67 ("bpftool: Introduce "prog profile" command") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240116061920.31172-1-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	tcp: Move tcp_ns_to_ts() to tcp.h	Kuniyuki Iwashima
	We will support arbitrary SYN Cookie with BPF. When BPF prog validates ACK and kfunc allocates a reqsk, we need to call tcp_ns_to_ts() to calculate an offset of TSval for later use: time t0 : Send SYN+ACK -> tsval = Initial TSval (Random Number) t1 : Recv ACK of 3WHS -> tsoff = TSecr - tcp_ns_to_ts(usec_ts_ok, tcp_clock_ns()) = Initial TSval - t1 t2 : Send ACK -> tsval = t2 + tsoff = Initial TSval + (t2 - t1) = Initial TSval + Time Delta (x) (x) Note that the time delta does not include the initial RTT from t0 to t1. Let's move tcp_ns_to_ts() to tcp.h. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240115205514.68364-2-kuniyu@amazon.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: Minor improvements for bpf_cmp.	Alexei Starovoitov
	Few minor improvements for bpf_cmp() macro: . reduce number of args in __bpf_cmp() . rename NOFLIP to UNLIKELY . add a comment about 64-bit truncation in "i" constraint . use "ri" constraint for sizeof(rhs) <= 4 . improve error message for bpf_cmp_likely() Before: progs/iters_task_vma.c:31:7: error: variable 'ret' is uninitialized when used here [-Werror,-Wuninitialized] 31 \| if (bpf_cmp_likely(seen, <==, 1000)) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../bpf/bpf_experimental.h:325:3: note: expanded from macro 'bpf_cmp_likely' 325 \| ret; \| ^~~ progs/iters_task_vma.c:31:7: note: variable 'ret' is declared here ../bpf/bpf_experimental.h:310:3: note: expanded from macro 'bpf_cmp_likely' 310 \| bool ret; \| ^ After: progs/iters_task_vma.c:31:7: error: invalid operand for instruction 31 \| if (bpf_cmp_likely(seen, <==, 1000)) \| ^ ../bpf/bpf_experimental.h:324:17: note: expanded from macro 'bpf_cmp_likely' 324 \| asm volatile("r0 " #OP " invalid compare"); \| ^ <inline asm>:1:5: note: instantiated into assembly here 1 \| r0 <== invalid compare \| ^ Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240112220134.71209-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	docs/bpf: Fix an incorrect statement in verifier.rst	Yonghong Song
	In verifier.rst, I found an incorrect statement (maybe a typo) in section 'Liveness marks tracking'. Basically, the wrong register is attributed to have a read mark. This may confuse the user. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240111052136.3440417-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: Add a selftest with not-8-byte aligned BPF_ST	Yonghong Song
	Add a selftest with a 4 bytes BPF_ST of 0 where the store is not 8-byte aligned. The goal is to ensure that STACK_ZERO is properly marked in stack slots and the STACK_ZERO value can propagate properly during the load. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240110051355.2737232-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: Track aligned st store as imprecise spilled registers	Yonghong Song
	With patch set [1], precision backtracing supports register spill/fill to/from the stack. The patch [2] allows initial imprecise register spill with content 0. This is a common case for cpuv3 and lower for initializing the stack variables with pattern r1 = 0 (u64 )(r10 - 8) = r1 and the [2] has demonstrated good verification improvement. For cpuv4, the initialization could be (u64 )(r10 - 8) = 0 The current verifier marks the r10-8 contents with STACK_ZERO. Similar to [2], let us permit the above insn to behave like imprecise register spill which can reduce number of verified states. The change is in function check_stack_write_fixed_off(). Before this patch, spilled zero will be marked as STACK_ZERO which can provide precise values. In check_stack_write_var_off(), STACK_ZERO will be maintained if writing a const zero so later it can provide precise values if needed. The above handling of '(u64 )(r10 - 8) = 0' as a spill will have issues in check_stack_write_var_off() as the spill will be converted to STACK_MISC and the precise value 0 is lost. To fix this issue, if the spill slots with const zero and the BPF_ST write also with const zero, the spill slots are preserved, which can later provide precise values if needed. Without the change in check_stack_write_var_off(), the test_verifier subtest 'BPF_ST_MEM stack imm zero, variable offset' will fail. I checked cpuv3 and cpuv4 with and without this patch with veristat. There is no state change for cpuv3 since '(u64 )(r10 - 8) = 0' is only generated with cpuv4. For cpuv4: $ ../veristat -C old.cpuv4.csv new.cpuv4.csv -e file,prog,insns,states -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) ------------------------------------------ ------------------- --------- --------- --------------- ---------- ---------- ------------- local_storage_bench.bpf.linked3.o get_local 228 168 -60 (-26.32%) 17 14 -3 (-17.65%) pyperf600_bpf_loop.bpf.linked3.o on_event 6066 4889 -1177 (-19.40%) 403 321 -82 (-20.35%) test_cls_redirect.bpf.linked3.o cls_redirect 35483 35387 -96 (-0.27%) 2179 2177 -2 (-0.09%) test_l4lb_noinline.bpf.linked3.o balancer_ingress 4494 4522 +28 (+0.62%) 217 219 +2 (+0.92%) test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 1432 1455 +23 (+1.61%) 92 94 +2 (+2.17%) test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3462 3458 -4 (-0.12%) 216 216 +0 (+0.00%) verifier_iterating_callbacks.bpf.linked3.o widening 52 41 -11 (-21.15%) 4 3 -1 (-25.00%) xdp_synproxy_kern.bpf.linked3.o syncookie_tc 12412 11719 -693 (-5.58%) 345 330 -15 (-4.35%) xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 12478 11794 -684 (-5.48%) 346 331 -15 (-4.34%) test_l4lb_noinline and test_l4lb_noinline_dynptr has minor regression, but pyperf600_bpf_loop and local_storage_bench gets pretty good improvement. [1] https://lore.kernel.org/all/20231205184248.1502704-1-andrii@kernel.org/ [2] https://lore.kernel.org/all/20231205184248.1502704-9-andrii@kernel.org/ Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240110051348.2737007-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: Test assigning ID to scalars on spill	Maxim Mikityanskiy
	The previous commit implemented assigning IDs to registers holding scalars before spill. Add the test cases to check the new functionality. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-10-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: Assign ID to scalars on spill	Maxim Mikityanskiy
	Currently, when a scalar bounded register is spilled to the stack, its ID is preserved, but only if was already assigned, i.e. if this register was MOVed before. Assign an ID on spill if none is set, so that equal scalars could be tracked if a register is spilled to the stack and filled into another register. One test is adjusted to reflect the change in register IDs. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-9-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: Add the get_reg_width function	Maxim Mikityanskiy
	Put calculation of the register value width into a dedicated function. This function will also be used in a following commit. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Link: https://lore.kernel.org/r/20240108205209.838365-8-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: Add the assign_scalar_id_before_mov function	Maxim Mikityanskiy
	Extract the common code that generates a register ID for src_reg before MOV if needed into a new function. This function will also be used in a following commit. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-7-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: Add a test case for 32-bit spill tracking	Maxim Mikityanskiy
	When a range check is performed on a register that was 32-bit spilled to the stack, the IDs of the two instances of the register are the same, so the range should also be the same. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-6-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: Make bpf_for_each_spilled_reg consider narrow spills	Maxim Mikityanskiy
	Adjust the check in bpf_get_spilled_reg to take into account spilled registers narrower than 64 bits. That allows find_equal_scalars to properly adjust the range of all spilled registers that have the same ID. Before this change, it was possible for a register and a spilled register to have the same IDs but different ranges if the spill was narrower than 64 bits and a range check was performed on the register. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-5-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: check if imprecise stack spills confuse infinite loop detection	Eduard Zingerman
	Verify that infinite loop detection logic separates states with identical register states but different imprecise scalars spilled to stack. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-4-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: make infinite loop detection in is_state_visited() exact	Eduard Zingerman
	Current infinite loops detection mechanism is speculative: - first, states_maybe_looping() check is done which simply does memcmp for R1-R10 in current frame; - second, states_equal(..., exact=false) is called. With exact=false states_equal() would compare scalars for equality only if in old state scalar has precision mark. Such logic might be problematic if compiler makes some unlucky stack spill/fill decisions. An artificial example of a false positive looks as follows: r0 = ... unknown scalar ... r0 &= 0xff; (u64 )(r10 - 8) = r0; r0 = 0; loop: r0 = (u64 )(r10 - 8); if r0 > 10 goto exit_; r0 += 1; (u64 )(r10 - 8) = r0; r0 = 0; goto loop; This commit updates call to states_equal to use exact=true, forcing all scalar comparisons to be exact. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-3-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: Fix the u64_offset_to_skb_data test	Maxim Mikityanskiy
	The u64_offset_to_skb_data test is supposed to make a 64-bit fill, but instead makes a 16-bit one. Fix the test according to its intention and update the comments accordingly (umax is no longer 0xffff). The 16-bit fill is covered by u16_offset_to_skb_data. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240108205209.838365-2-maxtram95@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: Update LLVM Phabricator links	Nathan Chancellor
	reviews.llvm.org was LLVM's Phabricator instances for code review. It has been abandoned in favor of GitHub pull requests. While the majority of links in the kernel sources still work because of the work Fangrui has done turning the dynamic Phabricator instance into a static archive, there are some issues with that work, so preemptively convert all the links in the kernel sources to point to the commit on GitHub. Most of the commits have the corresponding differential review link in the commit message itself so there should not be any loss of fidelity in the relevant information. Additionally, fix a typo in the xdpwall.c print ("LLMV" -> "LLVM") while in the area. Link: https://discourse.llvm.org/t/update-on-github-pull-requests/71540/172 Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240111-bpf-update-llvm-phabricator-links-v2-1-9a7ae976bd64@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: detect testing prog flags support	Andrii Nakryiko
	Various tests specify extra testing prog_flags when loading BPF programs, like BPF_F_TEST_RND_HI32, and more recently also BPF_F_TEST_REG_INVARIANTS. While BPF_F_TEST_RND_HI32 is old enough to not cause much problem on older kernels, BPF_F_TEST_REG_INVARIANTS is very fresh and unconditionally specifying it causes selftests to fail on even slightly outdated kernels. This breaks libbpf CI test against 4.9 and 5.15 kernels, it can break some local development (done outside of VM), etc. To prevent this, and guard against similar problems in the future, do runtime detection of supported "testing flags", and only provide those that host kernel recognizes. Acked-by: Song Liu <song@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240109231738.575844-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	Introduce concept of conformance groups	Dave Thaler
	The discussion of what the actual conformance groups should be is still in progress, so this is just part 1 which only uses "legacy" for deprecated instructions and "basic" for everything else. Subsequent patches will add more groups as discussion continues. Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240108214231.5280-1-dthaler1968@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	net: filter: fix spelling mistakes	Randy Dunlap
	Fix spelling errors as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240106065545.16855-1-rdunlap@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: support multiple tags per argument	Andrii Nakryiko
	Add ability to iterate multiple decl_tag types pointed to the same function argument. Use this to support multiple __arg_xxx tags per global subprog argument. We leave btf_find_decl_tag_value() intact, but change its implementation to use a new btf_find_next_decl_tag() which can be straightforwardly used to find next BTF type ID of a matching btf_decl_tag type. btf_prepare_func_args() is switched from btf_find_decl_tag_value() to btf_find_next_decl_tag() to gain multiple tags per argument support. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105000909.2818934-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: prepare btf_prepare_func_args() for multiple tags per argument	Andrii Nakryiko
	Add btf_arg_tag flags enum to be able to record multiple tags per argument. Also streamline pointer argument processing some more. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105000909.2818934-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: make sure scalar args don't accept __arg_nonnull tag	Andrii Nakryiko
	Move scalar arg processing in btf_prepare_func_args() after all pointer arg processing is done. This makes it easier to do validation. One example of unintended behavior right now is ability to specify __arg_nonnull for integer/enum arguments. This patch fixes this. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105000909.2818934-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: fix test_loader check message	Andrii Nakryiko
	Seeing: process_subtest:PASS:Can't alloc specs array 0 nsec ... in verbose successful test log is very confusing. Use smaller identifier-like test tag to denote that we are asserting specs array allocation success. Now it's much less distracting: process_subtest:PASS:specs_alloc 0 nsec Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105000909.2818934-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	Merge branch 'bpf-inline-bpf_kptr_xchg'	Alexei Starovoitov
	Hou Tao says: ==================== The motivation of inlining bpf_kptr_xchg() comes from the performance profiling of bpf memory allocator benchmark [1]. The benchmark uses bpf_kptr_xchg() to stash the allocated objects and to pop the stashed objects for free. After inling bpf_kptr_xchg(), the performance for object free on 8-CPUs VM increases about 2%~10%. However the performance gain comes with costs: both the kasan and kcsan checks on the pointer will be unavailable. Initially the inline is implemented in do_jit() for x86-64 directly, but I think it will more portable to implement the inline in verifier. Patch #1 supports inlining bpf_kptr_xchg() helper and enables it on x86-4. Patch #2 factors out a helper for newly-added test in patch #3. Patch #3 tests whether the inlining of bpf_kptr_xchg() is expected. Please see individual patches for more details. And comments are always welcome. Change Log: v3: * rebased on bpf-next tree * patch 1 & 2: Add Rvb-by and Ack-by tags from Eduard * patch 3: use inline assembly and naked function instead of c code (suggested by Eduard) v2: https://lore.kernel.org/bpf/20231223104042.1432300-1-houtao@huaweicloud.com/ * rebased on bpf-next tree * drop patch #1 in v1 due to discussion in [2] * patch #1: add the motivation in the commit message, merge patch #1 and #3 into the new patch in v2. (Daniel) * patch #2/#3: newly-added patch to test the inlining of bpf_kptr_xchg() (Eduard) v1: https://lore.kernel.org/bpf/95b8c2cd-44d5-5fe1-60b5-7e8218779566@huaweicloud.com/ [1]: https://lore.kernel.org/bpf/20231221141501.3588586-1-houtao@huaweicloud.com/ [2]: https://lore.kernel.org/bpf/fd94efb9-4a56-c982-dc2e-c66be5202cb7@huaweicloud.com/ ==================== Link: https://lore.kernel.org/r/20240105104819.3916743-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: Test the inlining of bpf_kptr_xchg()	Hou Tao
	The test uses bpf_prog_get_info_by_fd() to obtain the xlated instructions of the program first. Since these instructions have already been rewritten by the verifier, the tests then checks whether the rewritten instructions are as expected. And to ensure LLVM generates code exactly as expected, use inline assembly and a naked function. Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105104819.3916743-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	selftests/bpf: Factor out get_xlated_program() helper	Hou Tao
	Both test_verifier and test_progs use get_xlated_program(), so moving the helper into testing_helpers.h to reuse it. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240105104819.3916743-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	bpf: Support inlining bpf_kptr_xchg() helper	Hou Tao
	The motivation of inlining bpf_kptr_xchg() comes from the performance profiling of bpf memory allocator benchmark. The benchmark uses bpf_kptr_xchg() to stash the allocated objects and to pop the stashed objects for free. After inling bpf_kptr_xchg(), the performance for object free on 8-CPUs VM increases about 2%~10%. The inline also has downside: both the kasan and kcsan checks on the pointer will be unavailable. bpf_kptr_xchg() can be inlined by converting the calling of bpf_kptr_xchg() into an atomic_xchg() instruction. But the conversion depends on two conditions: 1) JIT backend supports atomic_xchg() on pointer-sized word 2) For the specific arch, the implementation of xchg is the same as atomic_xchg() on pointer-sized words. It seems most 64-bit JIT backends satisfies these two conditions. But as a precaution, defining a weak function bpf_jit_supports_ptr_xchg() to state whether such conversion is safe and only supporting inline for 64-bit host. For x86-64, it supports BPF_XCHG atomic operation and both xchg() and atomic_xchg() use arch_xchg() to implement the exchange, so enabling the inline of bpf_kptr_xchg() on x86-64 first. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20240105104819.3916743-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23	io_uring: enable audit and restrict cred override for IORING_OP_FIXED_FD_INSTALL	Paul Moore
	We need to correct some aspects of the IORING_OP_FIXED_FD_INSTALL command to take into account the security implications of making an io_uring-private file descriptor generally accessible to a userspace task. The first change in this patch is to enable auditing of the FD_INSTALL operation as installing a file descriptor into a task's file descriptor table is a security relevant operation and something that admins/users may want to audit. The second change is to disable the io_uring credential override functionality, also known as io_uring "personalities", in the FD_INSTALL command. The credential override in FD_INSTALL is particularly problematic as it affects the credentials used in the security_file_receive() LSM hook. If a task were to request a credential override via REQ_F_CREDS on a FD_INSTALL operation, the LSM would incorrectly check to see if the overridden credentials of the io_uring were able to "receive" the file as opposed to the task's credentials. After discussions upstream, it's difficult to imagine a use case where we would want to allow a credential override on a FD_INSTALL operation so we are simply going to block REQ_F_CREDS on IORING_OP_FIXED_FD_INSTALL operations. Fixes: dc18b89ab113 ("io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL") Signed-off-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20240123215501.289566-2-paul@paul-moore.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-01-23	riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops	Pu Lehui
	We encountered a kernel crash triggered by the bpf_tcp_ca testcase as show below: Unable to handle kernel paging request at virtual address ff60000088554500 Oops [#1] ... CPU: 3 PID: 458 Comm: test_progs Tainted: G OE 6.8.0-rc1-kselftest_plain #1 Hardware name: riscv-virtio,qemu (DT) epc : 0xff60000088554500 ra : tcp_ack+0x288/0x1232 epc : ff60000088554500 ra : ffffffff80cc7166 sp : ff2000000117ba50 gp : ffffffff82587b60 tp : ff60000087be0040 t0 : ff60000088554500 t1 : ffffffff801ed24e t2 : 0000000000000000 s0 : ff2000000117bbc0 s1 : 0000000000000500 a0 : ff20000000691000 a1 : 0000000000000018 a2 : 0000000000000001 a3 : ff60000087be03a0 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000021 a7 : ffffffff8263f880 s2 : 000000004ac3c13b s3 : 000000004ac3c13a s4 : 0000000000008200 s5 : 0000000000000001 s6 : 0000000000000104 s7 : ff2000000117bb00 s8 : ff600000885544c0 s9 : 0000000000000000 s10: ff60000086ff0b80 s11: 000055557983a9c0 t3 : 0000000000000000 t4 : 000000000000ffc4 t5 : ffffffff8154f170 t6 : 0000000000000030 status: 0000000200000120 badaddr: ff60000088554500 cause: 000000000000000c Code: c796 67d7 0000 0000 0052 0002 c13b 4ac3 0000 0000 (0001) 0000 ---[ end trace 0000000000000000 ]--- The reason is that commit 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI") changes the func_addr of arch_prepare_bpf_trampoline in struct_ops from NULL to non-NULL, while we use func_addr on RV64 to differentiate between struct_ops and regular trampoline. When the struct_ops testcase is triggered, it emits wrong prologue and epilogue, and lead to unpredictable issues. After commit 2cd3e3772e41, we can use BPF_TRAMP_F_INDIRECT to distinguish them as it always be set in struct_ops. Fixes: 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240123023207.1917284-1-pulehui@huaweicloud.com
2024-01-23	docs: admin-guide: remove obsolete advice related to SLAB allocator	Lukas Bulwahn
	Commit 1db9d06aaa55 ("mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile") removes the config SLAB and makes the SLUB allocator the only default allocator in the kernel. Hence, the advice on reducing OS jitter due to kworker kernel threads to build with CONFIG_SLUB instead of CONFIG_SLAB is obsolete. Remove the obsolete advice to build with SLUB instead of SLAB. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20231130095515.21586-1-lukas.bulwahn@gmail.com
2024-01-23	doc: admin-guide/kernel-parameters: remove useless comment	Vegard Nossum
	This comment about DRM drivers has been there since the first git commit. It simply doesn't belong in kernel-parameters; remove it. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240111085220.3693059-1-vegard.nossum@oracle.com
2024-01-23	docs/accel: correct links to mailing list archives	Hu Haowen
	Since the mailing archive list lkml.org is obsolete, change the links into lore.kernel.org's ones. Signed-off-by: Hu Haowen <2023002089@link.tyut.edu.cn> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240118090140.4868-1-2023002089@link.tyut.edu.cn
2024-01-23	docs/sphinx: Fix TOC scroll hack for the home page	Gustavo Sousa
	When on the documentation home page, there won't be any ".current" element since no entry from the TOC was selected yet. That results in a javascript error. Fix that by only trying to set the scrollTop if we have matches for current entries. Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240123162157.61819-2-gustavo.sousa@intel.com
2024-01-23	Merge tag 'linux-cpupower-6.8-rc2' of ↵	Rafael J. Wysocki
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Merge cpupower utility update for 6.8-rc2 from Shuah Khan: "This cpupower fixes update for Linux 6.8-rc2 consists of one single fix to an issue where CFLAGS is passed as a make argument for cpupower, but bench makefile does not inherit and append to them." This fixes the problem so the user specified CFLAGS are honored. * tag 'linux-cpupower-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: tools cpupower bench: Override CFLAGS assignments
2024-01-23	smb: client: delete "true", "false" defines	Alexey Dobriyan
	Kernel has its own official true/false definitions. The defines aren't even used in this file. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-23	Merge tag 'wireless-2024-01-22' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.8-rc2 The most visible fix here is the ath11k crash fix which was introduced in v6.7. We also have a fix for iwlwifi memory corruption and few smaller fixes in the stack. * tag 'wireless-2024-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: fix race condition on enabling fast-xmit wifi: iwlwifi: fix a memory corruption wifi: mac80211: fix potential sta-link leak wifi: cfg80211/mac80211: remove dependency on non-existing option wifi: cfg80211: fix missing interfaces when dumping wifi: ath11k: rely on mac80211 debugfs handling for vif wifi: p54: fix GCC format truncation warning with wiphy->fw_version ==================== Link: https://lore.kernel.org/r/20240122153434.E0254C433C7@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23	nvmet: unify aer type enum	Guixin Liu
	The host and target use two definition of aer type, unify them into a single one. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-01-23	block: Fix WARNING in _copy_from_iter	Christian A. Ehrhardt
	Syzkaller reports a warning in _copy_from_iter because an iov_iter is supposedly used in the wrong direction. The reason is that syzcaller managed to generate a request with a transfer direction of SG_DXFER_TO_FROM_DEV. This instructs the kernel to copy user buffers into the kernel, read into the copied buffers and then copy the data back to user space. Thus the iovec is used in both directions. Detect this situation in the block layer and construct a new iterator with the correct direction for the copy-in. Reported-by: syzbot+a532b03fdfee2c137666@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000009b92c10604d7a5e9@google.com/t/ Reported-by: syzbot+63dec323ac56c28e644f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000003faaa105f6e7c658@google.com/T/ Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240121202634.275068-1-lk@c--e.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-01-23	spi: hisi-sfc-v3xx: Return IRQ_NONE if no interrupts were detected	Devyn Liu
	Return IRQ_NONE from the interrupt handler when no interrupt was detected. Because an empty interrupt will cause a null pointer error: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Call trace: complete+0x54/0x100 hisi_sfc_v3xx_isr+0x2c/0x40 [spi_hisi_sfc_v3xx] __handle_irq_event_percpu+0x64/0x1e0 handle_irq_event+0x7c/0x1cc Signed-off-by: Devyn Liu <liudingyuan@huawei.com> Link: https://msgid.link/r/20240123071149.917678-1-liudingyuan@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-23	regulator: ti-abb: don't use devm_platform_ioremap_resource_byname for ↵	Romain Naour
	shared interrupt register We can't use devm_platform_ioremap_resource_byname() to remap the interrupt register that can be shared between regulator-abb-{ivahd,dspeve,gpu} drivers instances. The combined helper introduce a call to devm_request_mem_region() that creates a new busy resource region on PRM_IRQSTATUS_MPU register (0x4ae06010). The first devm_request_mem_region() call succeeds for regulator-abb-ivahd but fails for the two other regulator-abb-dspeve and regulator-abb-gpu. # cat /proc/iomem \| grep -i 4ae06 4ae06010-4ae06013 : 4ae07e34.regulator-abb-ivahd int-address 4ae06014-4ae06017 : 4ae07ddc.regulator-abb-mpu int-address regulator-abb-dspeve and regulator-abb-gpu are missing due to devm_request_mem_region() failure (EBUSY): [ 1.326660] ti_abb 4ae07e30.regulator-abb-dspeve: can't request region for resource [mem 0x4ae06010-0x4ae06013] [ 1.326660] ti_abb: probe of 4ae07e30.regulator-abb-dspeve failed with error -16 [ 1.327239] ti_abb 4ae07de4.regulator-abb-gpu: can't request region for resource [mem 0x4ae06010-0x4ae06013] [ 1.327270] ti_abb: probe of 4ae07de4.regulator-abb-gpu failed with error -16 >From arm/boot/dts/dra7.dtsi: The abb_mpu is the only instance using its own interrupt register: (0x4ae06014) PRM_IRQSTATUS_MPU_2, ABB_MPU_DONE_ST (bit 7) The other tree instances (abb_ivahd, abb_dspeve, abb_gpu) share PRM_IRQSTATUS_MPU register (0x4ae06010) but use different bits ABB_IVA_DONE_ST (bit 30), ABB_DSPEVE_DONE_ST( bit 29) and ABB_GPU_DONE_ST (but 28). The commit b36c6b1887ff ("regulator: ti-abb: Make use of the helper function devm_ioremap related") overlooked the following comment implicitly explaining why devm_ioremap() is used in this case: /* * We may have shared interrupt register offsets which are * write-1-to-clear between domains ensuring exclusivity. */ Fixes and partially reverts commit b36c6b1887ff ("regulator: ti-abb: Make use of the helper function devm_ioremap related"). Improve the existing comment to avoid further conversion to devm_platform_ioremap_resource_byname(). Fixes: b36c6b1887ff ("regulator: ti-abb: Make use of the helper function devm_ioremap related") Signed-off-by: Romain Naour <romain.naour@skf.com> Reviewed-by: Yoann Congal <yoann.congal@smile.fr> Link: https://msgid.link/r/20240123111456.739381-1-romain.naour@smile.fr Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-23	Merge branch 'netfs-fixes' of ↵	Christian Brauner
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull netfs fixes from David Howells: * 'netfs-fixes' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix missing/incorrect unlocking of RCU read lock afs: Remove afs_dynroot_d_revalidate() as it is redundant afs: Fix error handling with lookup via FS.InlineBulkStatus afs: Hide silly-rename files from userspace cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write() netfs, fscache: Prevent Oops in fscache_put_cache() cifs: Don't use certain unnecessary folio_() functions afs: Don't use certain unnecessary folio_() functions netfs: Don't use certain unnecessary folio_*() functions Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-23	Merge branch 'inet_diag-remove-three-mutexes-in-diag-dumps'	Paolo Abeni
	Eric Dumazet says: ==================== inet_diag: remove three mutexes in diag dumps Surprisingly, inet_diag operations are serialized over a stack of three mutexes, giving legacy /proc based files an unfair advantage on modern hosts. This series removes all of them, making inet_diag operations (eg iproute2/ss) fully parallel. 1-2) Two first patches are adding data-race annotations and can be backported to stable kernels. 3-4) inet_diag_table_mutex can be replaced with RCU protection, if we add corresponding protection against module unload. 5-7) sock_diag_table_mutex can be replaced with RCU protection, if we add corresponding protection against module unload. 8) sock_diag_mutex is removed, as the old bug it was working around has been fixed more elegantly. 9) inet_diag_dump_icsk() can skip over empty buckets to reduce spinlock contention. ==================== Link: https://lore.kernel.org/r/20240122112603.3270097-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>