bpf: Add bpf_sock_destroy kfunc

The socket destroy kfunc is used to forcefully terminate sockets from certain BPF contexts. We plan to use the capability in Cilium load-balancing to terminate client sockets that continue to connect to deleted backends. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be forcefully terminated. The kfunc also allows terminating sockets that may or may not be actively sending traffic. The kfunc can currently be called only from BPF TCP and UDP iterators where users can filter, and terminate selected sockets. More specifically, it can only be called from BPF contexts that ensure socket locking in order to allow synchronous execution of protocol specific `diag_destroy` handlers. The previous commit that batches UDP sockets during iteration facilitated a synchronous invocation of the UDP destroy callback from BPF context by skipping socket locks in `udp_abort`. TCP iterator already supported batching of sockets being iterated. To that end, `tracing_iter_filter` callback filter is added so that verifier can restrict the kfunc to programs with `BPF_TRACE_ITER` attach type, and reject other programs. The kfunc takes `sock_common` type argument, even though it expects, and casts them to a `sock` pointer. This enables the verifier to allow the sock_destroy kfunc to be called for TCP with `sock_common` and UDP with `sock` structs. Furthermore, as `sock_common` only has a subset of certain fields of `sock`, casting pointer to the latter type might not always be safe for certain sockets like request sockets, but these have a special handling in the diag_destroy handlers. Additionally, the kfunc is defined with `KF_TRUSTED_ARGS` flag to avoid the cases where a `PTR_TO_BTF_ID` sk is obtained by following another pointer. eg. getting a sk pointer (may be even NULL) by following another sk pointer. The pointer socket argument passed in TCP and UDP iterators is tagged as `PTR_TRUSTED` in {tcp,udp}_reg_info. The TRUSTED arg changes are contributed by Martin KaFai Lau <martin.lau@kernel.org>. Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-8-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
author: Aditi Ghag <aditi.ghag@isovalent.com> 2023-05-19 22:51:55 +0000
committer: Martin KaFai Lau <martin.lau@kernel.org> 2023-05-19 22:44:28 -0700
commit: 4ddbcb886268af8d12a23e6640b39d1d9c652b1b (patch)
tree: 8a03b5879c4d97a1faba7444d00153caa0e84f15 /net/ipv4/udp.c
parent: e924e80ee6a39bc28d2ef8f51e19d336a98e3be0 (diff)
1 files changed, 5 insertions, 3 deletions
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0c999aa5ab30..6893fb867529 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2930,7 +2930,8 @@ EXPORT_SYMBOL(udp_poll);
 
 int udp_abort(struct sock *sk, int err)
 {
-	lock_sock(sk);
+	if (!has_current_bpf_ctx())
+		lock_sock(sk);
 
 	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
 	 * with close()
@@ -2943,7 +2944,8 @@ int udp_abort(struct sock *sk, int err)
 	__udp_disconnect(sk, 0);
 
 out:
-	release_sock(sk);
+	if (!has_current_bpf_ctx())
+		release_sock(sk);
 
 	return 0;
 }
@@ -3646,7 +3648,7 @@ static struct bpf_iter_reg udp_reg_info = {
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
 		{ offsetof(struct bpf_iter__udp, udp_sk),
-		  PTR_TO_BTF_ID_OR_NULL },
+		  PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
 	},
 	.seq_info		= &udp_seq_info,
 };
author	Aditi Ghag <aditi.ghag@isovalent.com>	2023-05-19 22:51:55 +0000
committer	Martin KaFai Lau <martin.lau@kernel.org>	2023-05-19 22:44:28 -0700
commit	4ddbcb886268af8d12a23e6640b39d1d9c652b1b (patch)
tree	8a03b5879c4d97a1faba7444d00153caa0e84f15 /net/ipv4/udp.c
parent	e924e80ee6a39bc28d2ef8f51e19d336a98e3be0 (diff)