summaryrefslogtreecommitdiff
path: root/net/xfrm
AgeCommit message (Collapse)Author
2022-12-12Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-12-11 We've added 74 non-merge commits during the last 11 day(s) which contain a total of 88 files changed, 3362 insertions(+), 789 deletions(-). The main changes are: 1) Decouple prune and jump points handling in the verifier, from Andrii. 2) Do not rely on ALLOW_ERROR_INJECTION for fmod_ret, from Benjamin. Merged from hid tree. 3) Do not zero-extend kfunc return values. Necessary fix for 32-bit archs, from Björn. 4) Don't use rcu_users to refcount in task kfuncs, from David. 5) Three reg_state->id fixes in the verifier, from Eduard. 6) Optimize bpf_mem_alloc by reusing elements from free_by_rcu, from Hou. 7) Refactor dynptr handling in the verifier, from Kumar. 8) Remove the "/sys" mount and umount dance in {open,close}_netns in bpf selftests, from Martin. 9) Enable sleepable support for cgrp local storage, from Yonghong. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (74 commits) selftests/bpf: test case for relaxed prunning of active_lock.id selftests/bpf: Add pruning test case for bpf_spin_lock bpf: use check_ids() for active_lock comparison selftests/bpf: verify states_equal() maintains idmap across all frames bpf: states_equal() must build idmap for all function frames selftests/bpf: test cases for regsafe() bug skipping check_id() bpf: regsafe() must not skip check_ids() docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE selftests/bpf: Add test for dynptr reinit in user_ringbuf callback bpf: Use memmove for bpf_dynptr_{read,write} bpf: Move PTR_TO_STACK alignment check to process_dynptr_func bpf: Rework check_func_arg_reg_off bpf: Rework process_dynptr_func bpf: Propagate errors from process_* checks in check_func_arg bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true bpf: Reuse freed element in free_by_rcu during allocation selftests/bpf: Bring test_offload.py back to life bpf: Fix comment error in fixup_kfunc_call function bpf: Do not zero-extend kfunc return values ... ==================== Link: https://lore.kernel.org/r/20221212024701.73809-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-09xfrm: Fix spelling mistake "oflload" -> "offload"Colin Ian King
There is a spelling mistake in a NL_SET_ERR_MSG message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from ↵Eyal Birger
TC-BPF This change adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs. This object is built based on the availability of BTF debug info. When setting the xfrm metadata, percpu metadata dsts are used in order to avoid allocating a metadata dst per packet. In order to guarantee safe module unload, the percpu dsts are allocated on first use and never freed. The percpu pointer is stored in net/core/filter.c so that it can be reused on module reload. The metadata percpu dsts take ownership of the original skb dsts so that they may be used as part of the xfrm transmission logic - e.g. for MTU calculations. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-3-eyal.birger@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-05xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.cEyal Birger
This change allows adding additional files to the xfrm_interface module. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-2-eyal.birger@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-05xfrm: add support to HW update soft and hard limitsLeon Romanovsky
Both in RX and TX, the traffic that performs IPsec packet offload transformation is accounted by HW. It is needed to properly handle hard limits that require to drop the packet. It means that XFRM core needs to update internal counters with the one that accounted by the HW, so new callbacks are introduced in this patch. In case of soft or hard limit is occurred, the driver should call to xfrm_state_check_expire() that will perform key rekeying exactly as done by XFRM core. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05xfrm: speed-up lookup of HW policiesLeon Romanovsky
Devices that implement IPsec packet offload mode should offload SA and policies too. In RX path, it causes to the situation that HW will always have higher priority over any SW policies. It means that we don't need to perform any search of inexact policies and/or priority checks if HW policy was discovered. In such situation, the HW will catch the packets anyway and HW can still implement inexact lookups. In case specific policy is not found, we will continue with packet lookup and check for existence of HW policies in inexact list. HW policies are added to the head of SPD to ensure fast lookup, as XFRM iterates over all policies in the loop. The same solution of adding HW SAs at the begging of the list is applied to SA database too. However, we don't need to change lookups as they are sorted by insertion order and not priority. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05xfrm: add TX datapath support for IPsec packet offload modeLeon Romanovsky
In IPsec packet mode, the device is going to encrypt and encapsulate packets that are associated with offloaded policy. After successful policy lookup to indicate if packets should be offloaded or not, the stack forwards packets to the device to do the magic. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Huy Nguyen <huyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05xfrm: add an interface to offload policyLeon Romanovsky
Extend netlink interface to add and delete XFRM policy from the device. This functionality is a first step to implement packet IPsec offload solution. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05xfrm: allow state packet offload modeLeon Romanovsky
Allow users to configure xfrm states with packet offload mode. The packet mode must be requested both for policy and state, and such requires us to do not implement fallback. We explicitly return an error if requested packet mode can't be configured. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-12-05xfrm: add new packet offload flagLeon Romanovsky
In the next patches, the xfrm core code will be extended to support new type of offload - packet offload. In that mode, both policy and state should be specially configured in order to perform whole offloaded data path. Full offload takes care of encryption, decryption, encapsulation and other operations with headers. As this mode is new for XFRM policy flow, we can "start fresh" with flag bits and release first and second bit for future use. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-29Merge branch 'master' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next 2022-11-26 1) Remove redundant variable in esp6. From Colin Ian King. 2) Update x->lastused for every packet. It was used only for outgoing mobile IPv6 packets, but showed to be usefull to check if the a SA is still in use in general. From Antony Antony. 3) Remove unused variable in xfrm_byidx_resize. From Leon Romanovsky. 4) Finalize extack support for xfrm. From Sabrina Dubroca. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: add extack to xfrm_set_spdinfo xfrm: add extack to xfrm_alloc_userspi xfrm: add extack to xfrm_do_migrate xfrm: add extack to xfrm_new_ae and xfrm_replay_verify_len xfrm: add extack to xfrm_del_sa xfrm: add extack to xfrm_add_sa_expire xfrm: a few coding style clean ups xfrm: Remove not-used total variable xfrm: update x->lastused for every packet esp6: remove redundant variable err ==================== Link: https://lore.kernel.org/r/20221126110303.1859238-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-25xfrm: add extack to xfrm_set_spdinfoSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-25xfrm: add extack to xfrm_alloc_userspiSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-25xfrm: add extack to xfrm_do_migrateSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-25xfrm: add extack to xfrm_new_ae and xfrm_replay_verify_lenSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-25xfrm: add extack to xfrm_del_saSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-25xfrm: add extack to xfrm_add_sa_expireSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-25xfrm: a few coding style clean upsSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-11-23Merge branch 'master' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== ipsec 2022-11-23 1) Fix "disable_policy" on ipv4 early demuxP Packets after the initial packet in a flow might be incorectly dropped on early demux if there are no matching policies. From Eyal Birger. 2) Fix a kernel warning in case XFRM encap type is not available. From Eyal Birger. 3) Fix ESN wrap around for GSO to avoid a double usage of a sequence number. From Christian Langrock. 4) Fix a send_acquire race with pfkey_register. From Herbert Xu. 5) Fix a list corruption panic in __xfrm_state_delete(). Thomas Jarosch. 6) Fix an unchecked return value in xfrm6_init(). Chen Zhongjin. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Fix ignored return value in xfrm6_init() xfrm: Fix oops in __xfrm_state_delete() af_key: Fix send_acquire race with pfkey_register xfrm: replay: Fix ESN wrap around for GSO xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available xfrm: fix "disable_policy" on ipv4 early demux ==================== Link: https://lore.kernel.org/r/20221123093117.434274-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25xfrm: Remove not-used total variableLeon Romanovsky
Total variable is not used in xfrm_byidx_resize() and can be safely removed. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-10-25xfrm: update x->lastused for every packetAntony Antony
x->lastused was only updated for outgoing mobile IPv6 packet. With this fix update it for every, in and out, packet. This is useful to check if the a SA is still in use, or when was the last time an SA was used. lastused time of in SA can used to check IPsec path is functional. Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-10-19xfrm: replay: Fix ESN wrap around for GSOChristian Langrock
When using GSO it can happen that the wrong seq_hi is used for the last packets before the wrap around. This can lead to double usage of a sequence number. To avoid this, we should serialize this last GSO packet. Fixes: d7dbefc45cf5 ("xfrm: Add xfrm_replay_overflow functions for offloading") Co-developed-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Christian Langrock <christian.langrock@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in the left-over fixes before the net-next pull-request. Conflicts: drivers/net/ethernet/mediatek/mtk_ppe.c ae3ed15da588 ("net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clear") 9d8cb4c096ab ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc") https://lore.kernel.org/all/6cb6893b-4921-a068-4c30-1109795110bb@tessares.net/ kernel/bpf/helpers.c 8addbfc7b308 ("bpf: Gate dynptr API behind CAP_BPF") 5679ff2f138f ("bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPF") 8a67f2de9b1d ("bpf: expose bpf_strtol and bpf_strtoul to all program types") https://lore.kernel.org/all/20221003201957.13149-1-daniel@iogearbox.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Refactor selftests to use an array of structs in xfrm_fill_key(). From Gautam Menghani. 2) Drop an unused argument from xfrm_policy_match. From Hongbin Wang. 3) Support collect metadata mode for xfrm interfaces. From Eyal Birger. 4) Add netlink extack support to xfrm. From Sabrina Dubroca. Please note, there is a merge conflict in: include/net/dst_metadata.h between commit: 0a28bfd4971f ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support") from the net-next tree and commit: 5182a5d48c3d ("net: allow storing xfrm interface metadata in metadata_dst") from the ipsec-next tree. Can be solved as done in linux-next. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-30net-next: skbuff: refactor pskb_pullRichard Gobert
pskb_may_pull already contains all of the checks performed by pskb_pull. Use pskb_may_pull for validation in pskb_pull, eliminating the duplication and making __pskb_pull obsolete. Replace __pskb_pull with pskb_pull where applicable. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-29xfrm: ipcomp: add extack to ipcomp{4,6}_init_stateSabrina Dubroca
And the shared helper ipcomp_init_state. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-29xfrm: pass extack down to xfrm_type ->init_stateSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-28xfrm: Reinject transport-mode packets through workqueueLiu Jian
The following warning is displayed when the tcp6-multi-diffip11 stress test case of the LTP test suite is tested: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ns-tcpserver:48198] CPU: 0 PID: 48198 Comm: ns-tcpserver Kdump: loaded Not tainted 6.0.0-rc6+ #39 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : des3_ede_encrypt+0x27c/0x460 [libdes] lr : 0x3f sp : ffff80000ceaa1b0 x29: ffff80000ceaa1b0 x28: ffff0000df056100 x27: ffff0000e51e5280 x26: ffff80004df75030 x25: ffff0000e51e4600 x24: 000000000000003b x23: 0000000000802080 x22: 000000000000003d x21: 0000000000000038 x20: 0000000080000020 x19: 000000000000000a x18: 0000000000000033 x17: ffff0000e51e4780 x16: ffff80004e2d1448 x15: ffff80004e2d1248 x14: ffff0000e51e4680 x13: ffff80004e2d1348 x12: ffff80004e2d1548 x11: ffff80004e2d1848 x10: ffff80004e2d1648 x9 : ffff80004e2d1748 x8 : ffff80004e2d1948 x7 : 000000000bcaf83d x6 : 000000000000001b x5 : ffff80004e2d1048 x4 : 00000000761bf3bf x3 : 000000007f1dd0a3 x2 : ffff0000e51e4780 x1 : ffff0000e3b9a2f8 x0 : 00000000db44e872 Call trace: des3_ede_encrypt+0x27c/0x460 [libdes] crypto_des3_ede_encrypt+0x1c/0x30 [des_generic] crypto_cbc_encrypt+0x148/0x190 crypto_skcipher_encrypt+0x2c/0x40 crypto_authenc_encrypt+0xc8/0xfc [authenc] crypto_aead_encrypt+0x2c/0x40 echainiv_encrypt+0x144/0x1a0 [echainiv] crypto_aead_encrypt+0x2c/0x40 esp6_output_tail+0x1c8/0x5d0 [esp6] esp6_output+0x120/0x278 [esp6] xfrm_output_one+0x458/0x4ec xfrm_output_resume+0x6c/0x1f0 xfrm_output+0xac/0x4ac __xfrm6_output+0x130/0x270 xfrm6_output+0x60/0xec ip6_xmit+0x2ec/0x5bc inet6_csk_xmit+0xbc/0x10c __tcp_transmit_skb+0x460/0x8c0 tcp_write_xmit+0x348/0x890 __tcp_push_pending_frames+0x44/0x110 tcp_rcv_established+0x3c8/0x720 tcp_v6_do_rcv+0xdc/0x4a0 tcp_v6_rcv+0xc24/0xcb0 ip6_protocol_deliver_rcu+0xf0/0x574 ip6_input_finish+0x48/0x7c ip6_input+0x48/0xc0 ip6_rcv_finish+0x80/0x9c xfrm_trans_reinject+0xb0/0xf4 tasklet_action_common.constprop.0+0xf8/0x134 tasklet_action+0x30/0x3c __do_softirq+0x128/0x368 do_softirq+0xb4/0xc0 __local_bh_enable_ip+0xb0/0xb4 put_cpu_fpsimd_context+0x40/0x70 kernel_neon_end+0x20/0x40 sha1_base_do_update.constprop.0.isra.0+0x11c/0x140 [sha1_ce] sha1_ce_finup+0x94/0x110 [sha1_ce] crypto_shash_finup+0x34/0xc0 hmac_finup+0x48/0xe0 crypto_shash_finup+0x34/0xc0 shash_digest_unaligned+0x74/0x90 crypto_shash_digest+0x4c/0x9c shash_ahash_digest+0xc8/0xf0 shash_async_digest+0x28/0x34 crypto_ahash_digest+0x48/0xcc crypto_authenc_genicv+0x88/0xcc [authenc] crypto_authenc_encrypt+0xd8/0xfc [authenc] crypto_aead_encrypt+0x2c/0x40 echainiv_encrypt+0x144/0x1a0 [echainiv] crypto_aead_encrypt+0x2c/0x40 esp6_output_tail+0x1c8/0x5d0 [esp6] esp6_output+0x120/0x278 [esp6] xfrm_output_one+0x458/0x4ec xfrm_output_resume+0x6c/0x1f0 xfrm_output+0xac/0x4ac __xfrm6_output+0x130/0x270 xfrm6_output+0x60/0xec ip6_xmit+0x2ec/0x5bc inet6_csk_xmit+0xbc/0x10c __tcp_transmit_skb+0x460/0x8c0 tcp_write_xmit+0x348/0x890 __tcp_push_pending_frames+0x44/0x110 tcp_push+0xb4/0x14c tcp_sendmsg_locked+0x71c/0xb64 tcp_sendmsg+0x40/0x6c inet6_sendmsg+0x4c/0x80 sock_sendmsg+0x5c/0x6c __sys_sendto+0x128/0x15c __arm64_sys_sendto+0x30/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x170/0x194 do_el0_svc+0x38/0x4c el0_svc+0x28/0xe0 el0t_64_sync_handler+0xbc/0x13c el0t_64_sync+0x180/0x184 Get softirq info by bcc tool: ./softirqs -NT 10 Tracing soft irq event time... Hit Ctrl-C to end. 15:34:34 SOFTIRQ TOTAL_nsecs block 158990 timer 20030920 sched 46577080 net_rx 676746820 tasklet 9906067650 15:34:45 SOFTIRQ TOTAL_nsecs block 86100 sched 38849790 net_rx 676532470 timer 1163848790 tasklet 9409019620 15:34:55 SOFTIRQ TOTAL_nsecs sched 58078450 net_rx 475156720 timer 533832410 tasklet 9431333300 The tasklet software interrupt takes too much time. Therefore, the xfrm_trans_reinject executor is changed from tasklet to workqueue. Add add spin lock to protect the queue. This reduces the processing flow of the tcp_sendmsg function in this scenario. Fixes: acf568ee859f0 ("xfrm: Reinject transport-mode packets through tasklet") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack support to xfrm_init_replaySabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack to __xfrm_init_stateSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack to attach_*Sabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack support to xfrm_dev_state_addSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack to verify_one_alg, verify_auth_trunc, verify_aeadSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack to verify_replaySabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-22xfrm: add extack support to verify_newsa_infoSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-09-01xfrm: Update ipcomp_scratches with NULL when freedKhalid Masum
Currently if ipcomp_alloc_scratches() fails to allocate memory ipcomp_scratches holds obsolete address. So when we try to free the percpu scratches using ipcomp_free_scratches() it tries to vfree non existent vm area. Described below: static void * __percpu *ipcomp_alloc_scratches(void) { ... scratches = alloc_percpu(void *); if (!scratches) return NULL; ipcomp_scratches does not know about this allocation failure. Therefore holding the old obsolete address. ... } So when we free, static void ipcomp_free_scratches(void) { ... scratches = ipcomp_scratches; Assigning obsolete address from ipcomp_scratches if (!scratches) return; for_each_possible_cpu(i) vfree(*per_cpu_ptr(scratches, i)); Trying to free non existent page, causing warning: trying to vfree existent vm area. ... } Fix this breakage by updating ipcomp_scrtches with NULL when scratches is freed Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com Tested-by: syzbot+5ec9bb042ddfe9644773@syzkaller.appspotmail.com Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-31xfrm: add extack to verify_sec_ctx_lenSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-31xfrm: add extack to validate_tmplSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-31xfrm: add extack to verify_policy_typeSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-31xfrm: add extack to verify_policy_dirSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-31xfrm: add extack support to verify_newpolicy_infoSabrina Dubroca
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-31xfrm: propagate extack to all netlink doit handlersSabrina Dubroca
xfrm_user_rcv_msg() already handles extack, we just need to pass it down. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-29xfrm: lwtunnel: add lwtunnel support for xfrm interfaces in collect_md modeEyal Birger
Allow specifying the xfrm interface if_id and link as part of a route metadata using the lwtunnel infrastructure. This allows for example using a single xfrm interface in collect_md mode as the target of multiple routes each specifying a different if_id. With the appropriate changes to iproute2, considering an xfrm device ipsec1 in collect_md mode one can for example add a route specifying an if_id like so: ip route add <SUBNET> dev ipsec1 encap xfrm if_id 1 In which case traffic routed to the device via this route would use if_id in the xfrm interface policy lookup. Or in the context of vrf, one can also specify the "link" property: ip route add <SUBNET> dev ipsec1 encap xfrm if_id 1 link_dev eth15 Note: LWT_XFRM_LINK uses NLA_U32 similar to IFLA_XFRM_LINK even though internally "link" is signed. This is consistent with other _LINK attributes in other devices as well as in bpf and should not have an effect as device indexes can't be negative. Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-29xfrm: interface: support collect metadata modeEyal Birger
This commit adds support for 'collect_md' mode on xfrm interfaces. Each net can have one collect_md device, created by providing the IFLA_XFRM_COLLECT_METADATA flag at creation. This device cannot be altered and has no if_id or link device attributes. On transmit to this device, the if_id is fetched from the attached dst metadata on the skb. If exists, the link property is also fetched from the metadata. The dst metadata type used is METADATA_XFRM which holds these properties. On the receive side, xfrmi_rcv_cb() populates a dst metadata for each packet received and attaches it to the skb. The if_id used in this case is fetched from the xfrm state, and the link is fetched from the incoming device. This information can later be used by upper layers such as tc, ebpf, and ip rules. Because the skb is scrubed in xfrmi_rcv_cb(), the attachment of the dst metadata is postponed until after scrubing. Similarly, xfrm_input() is adapted to avoid dropping metadata dsts by only dropping 'valid' (skb_valid_dst(skb) == true) dsts. Policy matching on packets arriving from collect_md xfrmi devices is done by using the xfrm state existing in the skb's sec_path. The xfrm_if_cb.decode_cb() interface implemented by xfrmi_decode_session() is changed to keep the details of the if_id extraction tucked away in xfrm_interface.c. Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-24net: Fix data-races around netdev_max_backlog.Kuniyuki Iwashima
While reading netdev_max_backlog, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. While at it, we remove the unnecessary spaces in the doc. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-08-24 1) Fix a refcount leak in __xfrm_policy_check. From Xin Xiong. 2) Revert "xfrm: update SA curlft.use_time". This violates RFC 2367. From Antony Antony. 3) Fix a comment on XFRMA_LASTUSED. From Antony Antony. 4) x->lastused is not cloned in xfrm_do_migrate. Fix from Antony Antony. 5) Serialize the calls to xfrm_probe_algs. From Herbert Xu. 6) Fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-23xfrm: Drop unused argumentHongbin Wang
Drop unused argument from xfrm_policy_match, __xfrm_policy_eval_candidates and xfrm_policy_eval_candidates. No functional changes intended. Signed-off-by: Hongbin Wang <wh_bin@126.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-17xfrm: policy: fix metadata dst->dev xmit null pointer dereferenceNikolay Aleksandrov
When we try to transmit an skb with metadata_dst attached (i.e. dst->dev == NULL) through xfrm interface we can hit a null pointer dereference[1] in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a loopback skb device when there's no policy which dereferences dst->dev unconditionally. Not having dst->dev can be interepreted as it not being a loopback device, so just add a check for a null dst_orig->dev. With this fix xfrm interface's Tx error counters go up as usual. [1] net-next calltrace captured via netconsole: BUG: kernel NULL pointer dereference, address: 00000000000000c0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60 Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 <f6> 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02 RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80 RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000 R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880 R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000 FS: 00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0 Call Trace: <TASK> xfrmi_xmit+0xde/0x460 ? tcf_bpf_act+0x13d/0x2a0 dev_hard_start_xmit+0x72/0x1e0 __dev_queue_xmit+0x251/0xd30 ip_finish_output2+0x140/0x550 ip_push_pending_frames+0x56/0x80 raw_sendmsg+0x663/0x10a0 ? try_charge_memcg+0x3fd/0x7a0 ? __mod_memcg_lruvec_state+0x93/0x110 ? sock_sendmsg+0x30/0x40 sock_sendmsg+0x30/0x40 __sys_sendto+0xeb/0x130 ? handle_mm_fault+0xae/0x280 ? do_user_addr_fault+0x1e7/0x680 ? kvm_read_and_reset_apf_flags+0x3b/0x50 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7ff35cac1366 Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89 RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366 RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003 RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001 </TASK> Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole] CR2: 00000000000000c0 CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Daniel Borkmann <daniel@iogearbox.net> Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-03xfrm: clone missing x->lastused in xfrm_do_migrateAntony Antony
x->lastused was not cloned in xfrm_do_migrate. Add it to clone during migrate. Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>