summaryrefslogtreecommitdiff
path: root/net/xfrm
AgeCommit message (Collapse)Author
2018-03-23xfrm: Fix transport mode skb control buffer usage.Steffen Klassert
A recent commit introduced a new struct xfrm_trans_cb that is used with the sk_buff control buffer. Unfortunately it placed the structure in front of the control buffer and overlooked that the IPv4/IPv6 control buffer is still needed for some layer 4 protocols. As a result the IPv4/IPv6 control buffer is overwritten with this structure. Fix this by setting a apropriate header in front of the structure. Fixes acf568ee859f ("xfrm: Reinject transport-mode packets ...") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-16xfrm: fix rcu_read_unlock usage in xfrm_local_errorTaehee Yoo
In the xfrm_local_error, rcu_read_unlock should be called when afinfo is not NULL. because xfrm_state_get_afinfo calls rcu_read_unlock if afinfo is NULL. Fixes: af5d27c4e12b ("xfrm: remove xfrm_state_put_afinfo") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-13Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-03-13 1) Refuse to insert 32 bit userspace socket policies on 64 bit systems like we do it for standard policies. We don't have a compat layer, so inserting socket policies from 32 bit userspace will lead to a broken configuration. 2) Make the policy hold queue work without the flowcache. Dummy bundles are not chached anymore, so we need to generate a new one on each lookup as long as the SAs are not yet in place. 3) Fix the validation of the esn replay attribute. The The sanity check in verify_replay() is bypassed if the XFRM_STATE_ESN flag is not set. Fix this by doing the sanity check uncoditionally. From Florian Westphal. 4) After most of the dst_entry garbage collection code is removed, we may leak xfrm_dst entries as they are neither cached nor tracked somewhere. Fix this by reusing the 'uncached_list' to track xfrm_dst entries too. From Xin Long. 5) Fix a rcu_read_lock/rcu_read_unlock imbalance in xfrm_get_tos() From Xin Long. 6) Fix an infinite loop in xfrm_get_dst_nexthop. On transport mode we fetch the child dst_entry after we continue, so this pointer is never updated. Fix this by fetching it before we continue. 7) Fix ESN sequence number gap after IPsec GSO packets. We accidentally increment the sequence number counter on the xfrm_state by one packet too much in the ESN case. Fix this by setting the sequence number to the correct value. 8) Reset the ethernet protocol after decapsulation only if a mac header was set. Otherwise it breaks configurations with TUN devices. From Yossi Kuperman. 9) Fix __this_cpu_read() usage in preemptible code. Use this_cpu_read() instead in ipcomp_alloc_tfms(). From Greg Hackmann. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-13net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()Greg Hackmann
f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a __this_cpu_read() call inside ipcomp_alloc_tfms(). At the time, __this_cpu_read() required the caller to either not care about races or to handle preemption/interrupt issues. 3.15 tightened the rules around some per-cpu operations, and now __this_cpu_read() should never be used in a preemptible context. On 3.15 and later, we need to use this_cpu_read() instead. syzkaller reported this leading to the following kernel BUG while fuzzing sendmsg: BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101 caller is ipcomp_init_state+0x185/0x990 CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0xb9/0x115 check_preemption_disabled+0x1cb/0x1f0 ipcomp_init_state+0x185/0x990 ? __xfrm_init_state+0x876/0xc20 ? lock_downgrade+0x5e0/0x5e0 ipcomp4_init_state+0xaa/0x7c0 __xfrm_init_state+0x3eb/0xc20 xfrm_init_state+0x19/0x60 pfkey_add+0x20df/0x36f0 ? pfkey_broadcast+0x3dd/0x600 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_seq_stop+0x80/0x80 ? __skb_clone+0x236/0x750 ? kmem_cache_alloc+0x1f6/0x260 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_process+0x62a/0x6f0 pfkey_process+0x62a/0x6f0 ? pfkey_send_new_mapping+0x11c0/0x11c0 ? mutex_lock_io_nested+0x1390/0x1390 pfkey_sendmsg+0x383/0x750 ? dump_sp+0x430/0x430 sock_sendmsg+0xc0/0x100 ___sys_sendmsg+0x6c8/0x8b0 ? copy_msghdr_from_user+0x3b0/0x3b0 ? pagevec_lru_move_fn+0x144/0x1f0 ? find_held_lock+0x32/0x1c0 ? do_huge_pmd_anonymous_page+0xc43/0x11e0 ? lock_downgrade+0x5e0/0x5e0 ? get_kernel_page+0xb0/0xb0 ? _raw_spin_unlock+0x29/0x40 ? do_huge_pmd_anonymous_page+0x400/0x11e0 ? __handle_mm_fault+0x553/0x2460 ? __fget_light+0x163/0x1f0 ? __sys_sendmsg+0xc7/0x170 __sys_sendmsg+0xc7/0x170 ? SyS_shutdown+0x1a0/0x1a0 ? __do_page_fault+0x5a0/0xca0 ? lock_downgrade+0x5e0/0x5e0 SyS_sendmsg+0x27/0x40 ? __sys_sendmsg+0x170/0x170 do_syscall_64+0x19f/0x640 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x7f0ee73dfb79 RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79 RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004 RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0 R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440 R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-04net: rename skb_gso_validate_mtu -> skb_gso_validate_network_lenDaniel Axtens
If you take a GSO skb, and split it into packets, will the network length (L3 headers + L4 headers + payload) of those packets be small enough to fit within a given MTU? skb_gso_validate_mtu gives you the answer to that question. However, we recently added to add a way to validate the MAC length of a split GSO skb (L2+L3+L4+payload), and the names get confusing, so rename skb_gso_validate_mtu to skb_gso_validate_network_len Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01xfrm: Fix ESN sequence number handling for IPsec GSO packets.Steffen Klassert
When IPsec offloading was introduced, we accidentally incremented the sequence number counter on the xfrm_state by one packet too much in the ESN case. This leads to a sequence number gap of one packet after each GSO packet. Fix this by setting the sequence number to the correct value. Fixes: d7dbefc45cf5 ("xfrm: Add xfrm_replay_overflow functions for offloading") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-02-20xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport mode.Steffen Klassert
On transport mode we forget to fetch the child dst_entry before we continue the while loop, this leads to an infinite loop. Fix this by fetching the child dst_entry before we continue the while loop. Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst") Reported-by: syzbot+7d03c810e50aaedef98a@syzkaller.appspotmail.com Tested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-02-19xfrm: do not call rcu_read_unlock when afinfo is NULL in xfrm_get_tosXin Long
When xfrm_policy_get_afinfo returns NULL, it will not hold rcu read lock. In this case, rcu_read_unlock should not be called in xfrm_get_tos, just like other places where it's calling xfrm_policy_get_afinfo. Fixes: f5e2bb4f5b22 ("xfrm: policy: xfrm_get_tos cannot fail") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-02-13xfrm_user: uncoditionally validate esn replay attribute structFlorian Westphal
The sanity test added in ecd7918745234 can be bypassed, validation only occurs if XFRM_STATE_ESN flag is set, but rest of code doesn't care and just checks if the attribute itself is present. So always validate. Alternative is to reject if we have the attribute without the flag but that would change abi. Reported-by: syzbot+0ab777c27d2bb7588f73@syzkaller.appspotmail.com Cc: Mathias Krause <minipli@googlemail.com> Fixes: ecd7918745234 ("xfrm_user: ensure user supplied esn replay window is valid") Fixes: d8647b79c3b7e ("xfrm: Add user interface for esn and big anti-replay windows") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-02-13xfrm: Fix policy hold queue after flowcache removal.Steffen Klassert
Now that the flowcache is removed we need to generate a new dummy bundle every time we check if the needed SAs are in place because the dummy bundle is not cached anymore. Fix it by passing the XFRM_LOOKUP_QUEUE flag to xfrm_lookup(). This makes sure that we get a dummy bundle in case the SAs are not yet in place. Fixes: 3ca28286ea80 ("xfrm_policy: bypass flow_cache_lookup") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-02-02xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systemsSteffen Klassert
We don't have a compat layer for xfrm, so userspace and kernel structures have different sizes in this case. This results in a broken configuration, so refuse to configure socket policies when trying to insert from 32 bit userspace as we do it already with policies inserted via netlink. Reported-and-tested-by: syzbot+e1a1577ca8bcb47b769a@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-26Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-01-26 One last patch for this development cycle: 1) Add ESN support for IPSec HW offload. From Yossef Efraim. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23xfrm: fix boolean assignment in xfrm_get_type_offloadGustavo A. R. Silva
Assign true or false to boolean variables instead of an integer value. This issue was detected with the help of Coccinelle. Fixes: ffdb5211da1c ("xfrm: Auto-load xfrm offload modules") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-19xfrm: fix error flow in case of add state failsAviad Yehezkel
If add state fails in case of device offload, netdev refcount will be negative since gc task is attempting to dev_free this state. This is fixed by putting NULL in state dev field. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Boris Pismeny <borisp@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-18xfrm: Add SA to hardware at the end of xfrm_state_construct()Yossi Kuperman
Current code configures the hardware with a new SA before the state has been fully initialized. During this time interval, an incoming ESP packet can cause a crash due to a NULL dereference. More specifically, xfrm_input() considers the packet as valid, and yet, anti-replay mechanism is not initialized. Move hardware configuration to the end of xfrm_state_construct(), and mark the state as valid once the SA is fully initialized. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Aviad Yehezkel <aviadye@mellnaox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-18xfrm: Add ESN support for IPSec HW offloadYossef Efraim
This patch adds ESN support to IPsec device offload. Adding new xfrm device operation to synchronize device ESN. Signed-off-by: Yossef Efraim <yossefe@mellanox.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Overlapping changes all over. The mini-qdisc bits were a little bit tricky, however. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16net: delete /proc THIS_MODULE referencesAlexey Dobriyan
/proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10xfrm: Fix a race in the xdst pcpu cache.Steffen Klassert
We need to run xfrm_resolve_and_create_bundle() with bottom halves off. Otherwise we may reuse an already released dst_enty when the xfrm lookup functions are called from process context. Fixes: c30d78c14a813db39a647b6a348b428 ("xfrm: add xdst pcpu cache") Reported-by: Darius Ski <darius.ski@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-08xfrm: don't call xfrm_policy_cache_flush while holding spinlockFlorian Westphal
xfrm_policy_cache_flush can sleep, so it cannot be called while holding a spinlock. We could release the lock first, but I don't see why we need to invoke this function here in first place, the packet path won't reuse an xdst entry unless its still valid. While at it, add an annotation to xfrm_policy_cache_flush, it would have probably caught this bug sooner. Fixes: ec30d78c14a813 ("xfrm: add xdst pcpu cache") Reported-by: syzbot+e149f7d1328c26f9c12f@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-05xfrm: Use __skb_queue_tail in xfrm_trans_queueHerbert Xu
We do not need locking in xfrm_trans_queue because it is designed to use per-CPU buffers. However, the original code incorrectly used skb_queue_tail which takes the lock. This patch switches it to __skb_queue_tail instead. Reported-and-tested-by: Artem Savkov <asavkov@redhat.com> Fixes: acf568ee859f ("xfrm: Reinject transport-mode packets...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-31xfrm: fix rcu usage in xfrm_get_type_offloadSabrina Dubroca
request_module can sleep, thus we cannot hold rcu_read_lock() while calling it. The function also jumps back and takes rcu_read_lock() again (in xfrm_state_get_afinfo()), resulting in an imbalance. This codepath is triggered whenever a new offloaded state is created. Fixes: ffdb5211da1c ("xfrm: Auto-load xfrm offload modules") Reported-by: syzbot+ca425f44816d749e8eb49755567a75ee48cf4a30@syzkaller.appspotmail.com Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30xfrm: skip policies marked as dead while rehashingFlorian Westphal
syzkaller triggered following KASAN splat: BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618 read of size 2 at addr ffff8801c8e92fe4 by task kworker/1:1/23 [..] Workqueue: events xfrm_hash_rebuild [..] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428 xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618 process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112 worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..] The reproducer triggers: 1016 if (error) { 1017 list_move_tail(&walk->walk.all, &x->all); 1018 goto out; 1019 } in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump callback returns -ENOBUFS). In this case, *walk is located the pfkey socket struct, so this socket becomes visible in the global policy list. It looks like this is intentional -- phony walker has walk.dead set to 1 and all other places skip such "policies". Ccing original authors of the two commits that seem to expose this issue (first patch missed ->dead check, second patch adds pfkey sockets to policies dumper list). Fixes: 880a6fab8f6ba5b ("xfrm: configure policy hash table thresholds by netlink") Fixes: 12a169e7d8f4b1c ("ipsec: Put dumpers on the dump list") Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Timo Teras <timo.teras@iki.fi> Cc: Christophe Gouault <christophe.gouault@6wind.com> Reported-by: syzbot <bot+c028095236fcb6f4348811565b75084c754dc729@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30xfrm: Forbid state updates from changing encap typeHerbert Xu
Currently we allow state updates to competely replace the contents of x->encap. This is bad because on the user side ESP only sets up header lengths depending on encap_type once when the state is first created. This could result in the header lengths getting out of sync with the actual state configuration. In practice key managers will never do a state update to change the encapsulation type. Only the port numbers need to be changed as the peer NAT entry is updated. Therefore this patch adds a check in xfrm_state_update to forbid any changes to the encap_type. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
net/ipv6/ip6_gre.c is a case of parallel adds. include/trace/events/tcp.h is a little bit more tricky. The removal of in-trace-macro ifdefs in 'net' paralleled with moving show_tcp_state_name and friends over to include/trace/events/sock.h in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-12-22 1) Check for valid id proto in validate_tmpl(), otherwise we may trigger a warning in xfrm_state_fini(). From Cong Wang. 2) Fix a typo on XFRMA_OUTPUT_MARK policy attribute. From Michal Kubecek. 3) Verify the state is valid when encap_type < 0, otherwise we may crash on IPsec GRO . From Aviv Heller. 4) Fix stack-out-of-bounds read on socket policy lookup. We access the flowi of the wrong address family in the IPv4 mapped IPv6 case, fix this by catching address family missmatches before we do the lookup. 5) fix xfrm_do_migrate() with AEAD to copy the geniv field too. Otherwise the state is not fully initialized and migration fails. From Antony Antony. 6) Fix stack-out-of-bounds with misconfigured transport mode policies. Our policy template validation is not strict enough. It is possible to configure policies with transport mode template where the address family of the template does not match the selectors address family. Fix this by refusing such a configuration, address family can not change on transport mode. 7) Fix a policy reference leak when reusing pcpu xdst entry. From Florian Westphal. 8) Reinject transport-mode packets through tasklet, otherwise it is possible to reate a recursion loop. From Herbert Xu. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21xfrm: check for xdo_dev_ops add and deleteShannon Nelson
This adds a check for the required add and delete functions up front at registration time to be sure both are defined. Since both the features check and the registration check are looking at the same things, break out the check for both to call. Lastly, for some reason the feature check was setting xfrmdev_ops to NULL if the NETIF_F_HW_ESP bit was missing, which would probably surprise the driver later if the driver turned its NETIF_F_HW_ESP bit back on. We shouldn't be messing with the driver's callback list, so we stop doing that with this patch. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20xfrm: Allow to use the layer2 IPsec GSO codepath for software crypto.Steffen Klassert
We now have support for asynchronous crypto operations in the layer 2 TX path. This was the missing part to allow the GSO codepath for software crypto, so allow this codepath now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20net: Add asynchronous callbacks for xfrm on layer 2.Steffen Klassert
This patch implements asynchronous crypto callbacks and a backlog handler that can be used when IPsec is done at layer 2 in the TX path. It also extends the skb validate functions so that we can update the driver transmit return codes based on async crypto operation or to indicate that we queued the packet in a backlog queue. Joint work with: Aviv Heller <avivh@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-20xfrm: Separate ESP handling from segmentation for GRO packets.Steffen Klassert
We change the ESP GSO handlers to only segment the packets. The ESP handling and encryption is defered to validate_xmit_xfrm() where this is done for non GRO packets too. This makes the code more robust and prepares for asynchronous crypto handling. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-19xfrm: Reinject transport-mode packets through taskletHerbert Xu
This is an old bugbear of mine: https://www.mail-archive.com/netdev@vger.kernel.org/msg03894.html By crafting special packets, it is possible to cause recursion in our kernel when processing transport-mode packets at levels that are only limited by packet size. The easiest one is with DNAT, but an even worse one is where UDP encapsulation is used in which case you just have to insert an UDP encapsulation header in between each level of recursion. This patch avoids this problem by reinjecting tranport-mode packets through a tasklet. Fixes: b05e106698d9 ("[IPV4/6]: Netfilter IPsec input hooks") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-15Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-12-15 1) Currently we can add or update socket policies, but not clear them. Support clearing of socket policies too. From Lorenzo Colitti. 2) Add documentation for the xfrm device offload api. From Shannon Nelson. 3) Fix IPsec extended sequence numbers (ESN) for IPsec offloading. From Yossef Efraim. 4) xfrm_dev_state_add function returns success even for unsupported options, fix this to fail in such cases. From Yossef Efraim. 5) Remove a redundant xfrm_state assignment. From Aviv Heller. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-12xfrm: put policies when reusing pcpu xdst entryFlorian Westphal
We need to put the policies when re-using the pcpu xdst entry, else this leaks the reference. Fixes: ec30d78c14a813db39a647b6a348b428 ("xfrm: add xdst pcpu cache") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-08xfrm: Fix stack-out-of-bounds with misconfigured transport mode policies.Steffen Klassert
On policies with a transport mode template, we pass the addresses from the flowi to xfrm_state_find(), assuming that the IP addresses (and address family) don't change during transformation. Unfortunately our policy template validation is not strict enough. It is possible to configure policies with transport mode template where the address family of the template does not match the selectors address family. This lead to stack-out-of-bound reads because we compare arddesses of the wrong family. Fix this by refusing such a configuration, address family can not change on transport mode. We use the assumption that, on transport mode, the first templates address family must match the address family of the policy selector. Subsequent transport mode templates must mach the address family of the previous template. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-08xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM)Antony Antony
copy geniv when cloning the xfrm state. x->geniv was not copied to the new state and migration would fail. xfrm_do_migrate .. xfrm_state_clone() .. .. esp_init_aead() crypto_alloc_aead() crypto_alloc_tfm() crypto_find_alg() return EAGAIN and failed Signed-off-by: Antony Antony <antony@phenome.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-01xfrm: Fix stack-out-of-bounds read on socket policy lookup.Steffen Klassert
When we do tunnel or beet mode, we pass saddr and daddr from the template to xfrm_state_find(), this is ok. On transport mode, we pass the addresses from the flowi, assuming that the IP addresses (and address family) don't change during transformation. This assumption is wrong in the IPv4 mapped IPv6 case, packet is IPv4 and template is IPv6. Fix this by catching address family missmatches of the policy and the flow already before we do the lookup. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-01xfrm: Fix xfrm_input() to verify state is valid when (encap_type < 0)Aviv Heller
Code path when (encap_type < 0) does not verify the state is valid before progressing. This will result in a crash if, for instance, x->km.state == XFRM_STATE_ACQ. Fixes: 7785bba299a8 ("esp: Add a software GRO codepath") Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-01xfrm: Remove redundant state assignment in xfrm_input()Aviv Heller
x is already initialized to the same value, above. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-01xfrm: Fix xfrm_dev_state_add to fail for unsupported HW SA optionYossef Efraim
xfrm_dev_state_add function returns success for unsupported HW SA options. Resulting the calling function to create SW SA without corrlating HW SA. Desipte IPSec device offloading option was chosen. These not supported HW SA options are hard coded within xfrm_dev_state_add function. SW backward compatibility will break if we add any of these option as old HW will fail with new SW. This patch changes the behaviour to return -EINVAL in case unsupported option is chosen. Notifying user application regarding failure and not breaking backward compatibility for newly added HW SA options. Signed-off-by: Yossef Efraim <yossefe@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-01xfrm: Fix xfrm_replay_overflow_offload_esnYossef Efraim
In case of wrap around, replay_esn->oseq_hi is not updated before it is tested for it's actual value, leading function to fail with overflow indication and packets being dropped. This patch updates replay_esn->oseq_hi in the right place. Fixes: d7dbefc45cf5 ("xfrm: Add xfrm_replay_overflow functions for offloading") Signed-off-by: Yossef Efraim <yossefe@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-01xfrm: fix XFRMA_OUTPUT_MARK policy entryMichal Kubecek
This seems to be an obvious typo, NLA_U32 is type of the attribute, not its (minimal) length. Fixes: 077fbac405bf ("net: xfrm: support setting an output mark.") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-11-30xfrm: Stop using dst->next in bundle construction.David Miller
While building ipsec bundles, blocks of xfrm dsts are linked together using dst->next from bottom to the top. The only thing this is used for is initializing the pmtu values of the xfrm stack, and for updating the mtu values at xfrm_bundle_ok() time. The bundle pmtu entries must be processed in this order so that pmtu values lower in the stack of routes can propagate up to the higher ones. Avoid using dst->next by simply maintaining an array of dst pointers as we already do for the xfrm_state objects when building the bundle. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
2017-11-30xfrm: Move dst->path into struct xfrm_dstDavid Miller
The first member of an IPSEC route bundle chain sets it's dst->path to the underlying ipv4/ipv6 route that carries the bundle. Stated another way, if one were to follow the xfrm_dst->child chain of the bundle, the final non-NULL pointer would be the path and point to either an ipv4 or an ipv6 route. This is largely used to make sure that PMTU events propagate down to the correct ipv4 or ipv6 route. When we don't have the top of an IPSEC bundle 'dst->path == dst'. Move it down into xfrm_dst and key off of dst->xfrm. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
2017-11-30xfrm: Move child route linkage into xfrm_dst.David Miller
XFRM bundle child chains look like this: xdst1 --> xdst2 --> xdst3 --> path_dst All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL. The final child pointer in the chain, here called 'path_dst', is some other kind of route such as an ipv4 or ipv6 one. The xfrm output path pops routes, one at a time, via the child pointer, until we hit one which has a dst->xfrm pointer which is NULL. We can easily preserve the above mechanisms with child sitting only in the xfrm_dst structure. All children in the chain before we break out of the xfrm_output() loop have dst->xfrm non-NULL and are therefore xfrm_dst objects. Since we break out of the loop when we find dst->xfrm NULL, we will not try to dereference 'dst' as if it were an xfrm_dst. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30ipsec: Create and use new helpers for dst child access.David Miller
This will make a future change moving the dst->child pointer less invasive. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
2017-11-30net: Create and use new helper xfrm_dst_child().David Miller
Only IPSEC routes have a non-NULL dst->child pointer. And IPSEC routes are identified by a non-NULL dst->xfrm pointer. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30net: xfrm: allow clearing socket xfrm policies.Lorenzo Colitti
Currently it is possible to add or update socket policies, but not clear them. Therefore, once a socket policy has been applied, the socket cannot be used for unencrypted traffic. This patch allows (privileged) users to clear socket policies by passing in a NULL pointer and zero length argument to the {IP,IPV6}_{IPSEC,XFRM}_POLICY setsockopts. This results in both the incoming and outgoing policies being cleared. The simple approach taken in this patch cannot clear socket policies in only one direction. If desired this could be added in the future, for example by continuing to pass in a length of zero (which currently is guaranteed to return EMSGSIZE) and making the policy be a pointer to an integer that contains one of the XFRM_POLICY_{IN,OUT} enum values. An alternative would have been to interpret the length as a signed integer and use XFRM_POLICY_IN (i.e., 0) to clear the input policy and -XFRM_POLICY_OUT (i.e., -1) to clear the output policy. Tested: https://android-review.googlesource.com/539816 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-11-29xfrm: check id proto in validate_tmpl()Cong Wang
syzbot reported a kernel warning in xfrm_state_fini(), which indicates that we have entries left in the list net->xfrm.state_all whose proto is zero. And xfrm_id_proto_match() doesn't consider them as a match with IPSEC_PROTO_ANY in this case. Proto with value 0 is probably not a valid value, at least verify_newsa_info() doesn't consider it valid either. This patch fixes it by checking the proto value in validate_tmpl() and rejecting invalid ones, like what iproute2 does in xfrm_xfrmproto_getbyname(). Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-11-21treewide: setup_timer() -> timer_setup()Kees Cook
This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>