summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-04-19esp6: fix incorrect null pointer check on xoColin Ian King
The check for xo being null is incorrect, currently it is checking for non-null, it should be checking for null. Detected with CoverityScan, CID#1429349 ("Dereference after null check") Fixes: 7862b4058b9f ("esp: Add gso handlers for esp4 and esp6") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-18mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCUPaul E. McKenney
A group of Linux kernel hackers reported chasing a bug that resulted from their assumption that SLAB_DESTROY_BY_RCU provided an existence guarantee, that is, that no block from such a slab would be reallocated during an RCU read-side critical section. Of course, that is not the case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire slab of blocks. However, there is a phrase for this, namely "type safety". This commit therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order to avoid future instances of this sort of confusion. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> [ paulmck: Add comments mentioning the old name, as requested by Eric Dumazet, in order to help people familiar with the old name find the new one. ] Acked-by: David Rientjes <rientjes@google.com>
2017-04-18sctp: process duplicated strreset asoc request correctlyXin Long
This patch is to fix the replay attack issue for strreset asoc requests. When a duplicated strreset asoc request is received, reply it with bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the result saved in asoc if it's seqno >= asoc->strreset_inseq - 2. But note that if the result saved in asoc is performed, the sender's next tsn and receiver's next tsn for the response chunk should be set. It's safe to get them from asoc. Because if it's changed, which means the peer has received the response already, the new response with wrong tsn won't be accepted by peer. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-18sctp: process duplicated strreset in and addstrm in requests correctlyXin Long
This patch is to fix the replay attack issue for strreset and addstrm in requests. When a duplicated strreset in or addstrm in request is received, reply it with bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the result saved in asoc if it's seqno >= asoc->strreset_inseq - 2. For strreset in or addstrm in request, if the receiver side processes it successfully, a strreset out or addstrm out request(as a response for that request) will be sent back to peer. reconf_time will retransmit the out request even if it's lost. So when receiving a duplicated strreset in or addstrm in request and it's result was performed, it shouldn't reply this request, but drop it instead. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-18sctp: process duplicated strreset out and addstrm out requests correctlyXin Long
Now sctp stream reconf will process a request again even if it's seqno is less than asoc->strreset_inseq. If one request has been done successfully and some data chunks have been accepted and then a duplicated strreset out request comes, the streamin's ssn will be cleared. It will cause that stream will never receive chunks any more because of unsynchronized ssn. It allows a replay attack. A similar issue also exists when processing addstrm out requests. It will cause more extra streams being added. This patch is to fix it by saving the last 2 results into asoc. When a duplicated strreset out or addstrm out request is received, reply it with bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the result saved in asoc if it's seqno >= asoc->strreset_inseq - 2. Note that it saves last 2 results instead of only last 1 result, because two requests can be sent together in one chunk. And note that when receiving a duplicated request, the receiver side will still reply it even if the peer has received the response. It's safe, As the response will be dropped by the peer. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-18nl80211: Fix enum type of variable in nl80211_put_sta_rate()Matthias Kaehlcke
rate_flg is of type 'enum nl80211_attrs', however it is assigned with 'enum nl80211_rate_info' values. Change the type of rate_flg accordingly. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-18mac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss()Matthias Kaehlcke
cfg80211_chandef_create() expects an 'enum nl80211_channel_type' as channel type however in ieee80211_sta_join_ibss() NL80211_CHAN_WIDTH_20_NOHT is passed in two occasions, which is of the enum type 'nl80211_chan_width'. Change the value to NL80211_CHAN_NO_HT (20 MHz, non-HT channel) of the channel type enum. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-18cfg80211: Fix array-bounds warning in fragment copyMatthias Kaehlcke
__ieee80211_amsdu_copy_frag intentionally initializes a pointer to array[-1] to increment it later to valid values. clang rightfully generates an array-bounds warning on the initialization statement. Initialize the pointer to array[0] and change the algorithm from increment before to increment after consume. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-18mac80211: keep a separate list of monitor interfaces that are upJohannes Berg
In addition to keeping monitor interfaces on the regular list of interfaces, keep those that are up and not in cooked mode on a separate list. This saves having to iterate all interfaces when delivering to monitor interfaces. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-18nl80211: add request id in scheduled scan event messagesArend Van Spriel
For multi-scheduled scan support in subsequent patch a request id will be added. This patch add this request id to the scheduled scan event messages. For now the request id will always be zero. With multi-scheduled scan its value will inform user-space to which scan the event relates. Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-18af_key: Fix sadb_x_ipsecrequest parsingHerbert Xu
The parsing of sadb_x_ipsecrequest is broken in a number of ways. First of all we're not verifying sadb_x_ipsecrequest_len. This is needed when the structure carries addresses at the end. Worse we don't even look at the length when we parse those optional addresses. The migration code had similar parsing code that's better but it also has some deficiencies. The length is overcounted first of all as it includes the header itself. It also fails to check the length before dereferencing the sa_family field. This patch fixes those problems in parse_sockaddr_pair and then uses it in parse_ipsecrequest. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-17net: rtnetlink: plumb extended ack to doit functionDavid Ahern
Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse for doit functions that call it directly. This is the first step to using extended error reporting in rtnetlink. >From here individual subsystems can be updated to set netlink_ext_ack as needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17ipv6: sr: fix BUG due to headroom too small after SRH pushDavid Lebrun
When a locally generated packet receives an SRH with two or more segments, the remaining headroom is too small to push an ethernet header. This patch ensures that the headroom is large enough after SRH push. The BUG generated the following trace. [ 192.950285] skbuff: skb_under_panic: text:ffffffff81809675 len:198 put:14 head:ffff88006f306400 data:ffff88006f3063fa tail:0xc0 end:0x2c0 dev:A-1 [ 192.952456] ------------[ cut here ]------------ [ 192.953218] kernel BUG at net/core/skbuff.c:105! [ 192.953411] invalid opcode: 0000 [#1] PREEMPT SMP [ 192.953411] Modules linked in: [ 192.953411] CPU: 5 PID: 3433 Comm: ping6 Not tainted 4.11.0-rc3+ #237 [ 192.953411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 [ 192.953411] task: ffff88007c2d42c0 task.stack: ffffc90000ef4000 [ 192.953411] RIP: 0010:skb_panic+0x61/0x70 [ 192.953411] RSP: 0018:ffffc90000ef7900 EFLAGS: 00010286 [ 192.953411] RAX: 0000000000000085 RBX: 00000000000086dd RCX: 0000000000000201 [ 192.953411] RDX: 0000000080000201 RSI: ffffffff81d104c5 RDI: 00000000ffffffff [ 192.953411] RBP: ffffc90000ef7920 R08: 0000000000000001 R09: 0000000000000000 [ 192.953411] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 192.953411] R13: ffff88007c5a4000 R14: ffff88007b363d80 R15: 00000000000000b8 [ 192.953411] FS: 00007f94b558b700(0000) GS:ffff88007fd40000(0000) knlGS:0000000000000000 [ 192.953411] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 192.953411] CR2: 00007fff5ecd5080 CR3: 0000000074141000 CR4: 00000000001406e0 [ 192.953411] Call Trace: [ 192.953411] skb_push+0x3b/0x40 [ 192.953411] eth_header+0x25/0xc0 [ 192.953411] neigh_resolve_output+0x168/0x230 [ 192.953411] ? ip6_finish_output2+0x242/0x8f0 [ 192.953411] ip6_finish_output2+0x242/0x8f0 [ 192.953411] ? ip6_finish_output2+0x76/0x8f0 [ 192.953411] ip6_finish_output+0xa8/0x1d0 [ 192.953411] ip6_output+0x64/0x2d0 [ 192.953411] ? ip6_output+0x73/0x2d0 [ 192.953411] ? ip6_dst_check+0xb5/0xc0 [ 192.953411] ? dst_cache_per_cpu_get.isra.2+0x40/0x80 [ 192.953411] seg6_output+0xb0/0x220 [ 192.953411] lwtunnel_output+0xcf/0x210 [ 192.953411] ? lwtunnel_output+0x59/0x210 [ 192.953411] ip6_local_out+0x38/0x70 [ 192.953411] ip6_send_skb+0x2a/0xb0 [ 192.953411] ip6_push_pending_frames+0x48/0x50 [ 192.953411] rawv6_sendmsg+0xa39/0xf10 [ 192.953411] ? __lock_acquire+0x489/0x890 [ 192.953411] ? __mutex_lock+0x1fc/0x970 [ 192.953411] ? __lock_acquire+0x489/0x890 [ 192.953411] ? __mutex_lock+0x1fc/0x970 [ 192.953411] ? tty_ioctl+0x283/0xec0 [ 192.953411] inet_sendmsg+0x45/0x1d0 [ 192.953411] ? _copy_from_user+0x54/0x80 [ 192.953411] sock_sendmsg+0x33/0x40 [ 192.953411] SYSC_sendto+0xef/0x170 [ 192.953411] ? entry_SYSCALL_64_fastpath+0x5/0xc2 [ 192.953411] ? trace_hardirqs_on_caller+0x12b/0x1b0 [ 192.953411] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 192.953411] SyS_sendto+0x9/0x10 [ 192.953411] entry_SYSCALL_64_fastpath+0x1f/0xc2 [ 192.953411] RIP: 0033:0x7f94b453db33 [ 192.953411] RSP: 002b:00007fff5ecd0578 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 192.953411] RAX: ffffffffffffffda RBX: 00007fff5ecd16e0 RCX: 00007f94b453db33 [ 192.953411] RDX: 0000000000000040 RSI: 000055a78352e9c0 RDI: 0000000000000003 [ 192.953411] RBP: 00007fff5ecd1690 R08: 000055a78352c940 R09: 000000000000001c [ 192.953411] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a783321e10 [ 192.953411] R13: 000055a7839890c0 R14: 0000000000000004 R15: 0000000000000000 [ 192.953411] Code: 00 00 48 89 44 24 10 8b 87 c4 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 90 58 d2 81 48 89 04 24 31 c0 e8 4f 70 9a ff <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 97 d8 00 00 [ 192.953411] RIP: skb_panic+0x61/0x70 RSP: ffffc90000ef7900 [ 193.000186] ---[ end trace bd0b89fabdf2f92c ]--- [ 193.000951] Kernel panic - not syncing: Fatal exception in interrupt [ 193.001137] Kernel Offset: disabled [ 193.001169] ---[ end Kernel panic - not syncing: Fatal exception in interrupt Fixes: 19d5a26f5ef8de5dcb78799feaf404d717b1aac3 ("ipv6: sr: expand skb head only if necessary") Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17gso: Validate assumption of frag_list segementationIlan Tayari
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") assumes that all SKBs in a frag_list (except maybe the last one) contain the same amount of GSO payload. This assumption is not always correct, resulting in the following warning message in the log: skb_segment: too many frags For example, mlx5 driver in Striding RQ mode creates some RX SKBs with one frag, and some with 2 frags. After GRO, the frag_list SKBs end up having different amounts of payload. If this frag_list SKB is then forwarded, the aforementioned assumption is violated. Validate the assumption, and fall back to software GSO if it not true. Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17sctp: get list_of_streams of strreset outreq earlierXin Long
Now when processing strreset out responses, it gets outreq->list_of_streams only when result is performed. But if result is not performed, str_p will be NULL. It will cause panic in sctp_ulpevent_make_stream_reset_event if nums is not 0. This patch is to fix it by getting outreq->list_of_streams earlier, and also to improve some codes for the strreset inreq process. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17Add uid and cookie bpf helper to cg_skb_func_protoChenbo Feng
BPF helper functions get_socket_cookie and get_socket_uid can be used for network traffic classifications, among others. Expose them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of commit 8f917bba0042 ("bpf: pass sk to helper functions") the required skb->sk function is available at both cgroup bpf ingress and egress hooks. With these two new helper, cg_skb_func_proto is effectively the same as sk_filter_func_proto. Change since V1: Instead of add the helper to cg_skb_func_proto, redirect the cg_skb_func_proto to sk_filter_func_proto since all helper function in sk_filter_func_proto are applicable to cg_skb_func_proto now. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17ipv6: drop non loopback packets claiming to originate from ::1Florian Westphal
We lack a saddr check for ::1. This causes security issues e.g. with acls permitting connections from ::1 because of assumption that these originate from local machine. Assuming a source address of ::1 is local seems reasonable. RFC4291 doesn't allow such a source address either, so drop such packets. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-04-14 Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12 kernel. - Many fixes to 6LoWPAN, in particular for BLE - New CA8210 IEEE 802.15.4 device driver (accounting for most of the lines of code added in this pull request) - Added Nokia Bluetooth (UART) HCI driver - Some serdev & TTY changes that are dependencies for the Nokia driver (with acks from relevant maintainers and an agreement that these come through the bluetooth tree) - Support for new Intel Bluetooth device - Various other minor cleanups/fixes here and there Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17p9_client_readdir() fixAl Viro
Don't assume that server is sane and won't return more data than asked for. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-17net: bridge: notify on hw fdb takeoverNikolay Aleksandrov
Recently we added support for SW fdbs to take over HW ones, but that results in changing a user-visible fdb flag thus we need to send a notification, also it's consistent with how HW takes over SW entries. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17kcm: remove a useless copy_from_user()WANG Cong
struct kcm_clone only contains fd, and kcm_clone() only writes this struct, so there is no need to copy it from user. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17Subject: net: allow configuring default qdiscstephen hemminger
Since 3.12 it has been possible to configure the default queuing discipline via sysctl. This patch adds ability to configure the default queue discipline in kernel configuration. This is useful for environments where configuring the value from userspace is difficult to manage. The default is still the same as before (pfifo_fast) and it is possible to change after kernel init with sysctl. This is similar to how TCP congestion control works. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17l2tp: device MTU setup, tunnel socket needs a lockR. Parameswaran
The MTU overhead calculation in L2TP device set-up merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff needs to be adjusted to lock the tunnel socket while referencing the sub-data structures to derive the socket's IP overhead. Reported-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: R. Parameswaran <rparames@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net-timestamp: avoid use-after-free in ip_recv_errorWillem de Bruijn
Syzkaller reported a use-after-free in ip_recv_error at line info->ipi_ifindex = skb->dev->ifindex; This function is called on dequeue from the error queue, at which point the device pointer may no longer be valid. Save ifindex on enqueue in __skb_complete_tx_timestamp, when the pointer is valid or NULL. Store it in temporary storage skb->cb. It is safe to reference skb->dev here, as called from device drivers or dev_queue_xmit. The exception is when called from tcp_ack_tstamp; in that case it is NULL and ifindex is set to 0 (invalid). Do not return a pktinfo cmsg if ifindex is 0. This maintains the current behavior of not returning a cmsg if skb->dev was NULL. On dequeue, the ipv4 path will cast from sock_exterr_skb to in_pktinfo. Both have ifindex as their first element, so no explicit conversion is needed. This is by design, introduced in commit 0b922b7a829c ("net: original ingress device index in PKTINFO"). For ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo. Fixes: 829ae9d61165 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17ipv4: fix a deadlock in ip_ra_controlWANG Cong
Similar to commit 87e9f0315952 ("ipv4: fix a potential deadlock in mcast getsockopt() path"), there is a deadlock scenario for IP_ROUTER_ALERT too: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(sk_lock-AF_INET); lock(rtnl_mutex); lock(sk_lock-AF_INET); Fix this by always locking RTNL first on all setsockopt() paths. Note, after this patch ip_ra_lock is no longer needed either. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: ipv6: send unsolicited NA on admin upDavid Ahern
ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is set to 1, gratuitous arp requests are sent when the device is brought up. The same is expected when ndisc_notify is set to 1 (per ndisc_notify in Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP event; add it. Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17net: dsa: isolate legacy codeVivien Didelot
This patch moves as is the legacy DSA code from dsa.c to legacy.c, except the few shared symbols which remain in dsa.c. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-16sysctl: Remove dead register_sysctl_rootEric W. Biederman
The function no longer does anything. The is only a single caller of register_sysctl_root when semantically there should be two. Remove this function so that if someone decides this functionality is needed again it will be obvious all of the callers of setup_sysctl_set need to be audited and modified appropriately. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15netfilter: remove nf_ct_is_untrackedFlorian Westphal
This function is now obsolete and always returns false. This change has no effect on generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-15netfilter: kill the fake untracked conntrack objectsFlorian Westphal
resurrect an old patch from Pablo Neira to remove the untracked objects. Currently, there are four possible states of an skb wrt. conntrack. 1. No conntrack attached, ct is NULL. 2. Normal (kmem cache allocated) ct attached. 3. a template (kmalloc'd), not in any hash tables at any point in time 4. the 'untracked' conntrack, a percpu nf_conn object, tagged via IPS_UNTRACKED_BIT in ct->status. Untracked is supposed to be identical to case 1. It exists only so users can check -m conntrack --ctstate UNTRACKED vs. -m conntrack --ctstate INVALID e.g. attempts to set connmark on INVALID or UNTRACKED conntracks is supposed to be a no-op. Thus currently we need to check ct == NULL || nf_ct_is_untracked(ct) in a lot of places in order to avoid altering untracked objects. The other consequence of the percpu untracked object is that all -j NOTRACK (and, later, kfree_skb of such skbs) result in an atomic op (inc/dec the untracked conntracks refcount). This adds a new kernel-private ctinfo state, IP_CT_UNTRACKED, to make the distinction instead. The (few) places that care about packet invalid (ct is NULL) vs. packet untracked now need to test ct == NULL vs. ctinfo == IP_CT_UNTRACKED, but all other places can omit the nf_ct_is_untracked() check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-15netfilter: ecache: Refine the nf_ct_deliver_cached_eventsGao Feng
1. Remove single !events condition check to deliver the missed event even though there is no new event happened. Consider this case: 1) nf_ct_deliver_cached_events is invoked at the first time, the event is failed to deliver, then the missed is set. 2) nf_ct_deliver_cached_events is invoked again, but there is no any new event happened. The missed event is lost really. It would try to send the missed event again after remove this check. And it is ok if there is no missed event because the latter check !((events | missed) & e->ctmask) could avoid it. 2. Correct the return value check of notify->fcn. When send the event successfully, it returns 0, not postive value. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-15netfilter: nf_nat: Fix return NF_DROP in nfnetlink_parse_nat_setupGao Feng
The __nf_nat_alloc_null_binding invokes nf_nat_setup_info which may return NF_DROP when memory is exhausted, so convert NF_DROP to -ENOMEM to make ctnetlink happy. Or ctnetlink_setup_nat treats it as a success when one error NF_DROP happens actully. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-15Merge tag 'ipvs2-for-v4.12' of ↵Pablo Neira Ayuso
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Second Round of IPVS Updates for v4.12 please consider these clean-ups and enhancements to IPVS for v4.12. * Removal unused variable * Use kzalloc where appropriate * More efficient detection of presence of NAT extension ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/netfilter/ipvs/ip_vs_ftp.c
2017-04-15ipset: remove unused function __ip_set_get_netlinkAaron Conole
There are no in-tree callers. Signed-off-by: Aaron Conole <aconole@bytheb.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-14net: off by one in inet6_pton()Dan Carpenter
If "scope_len" is sizeof(scope_id) then we would put the NUL terminator one space beyond the end of the buffer. Fixes: b1a951fe469e ("net/utils: generic inet_pton_with_scope helper") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Missing TCP header sanity check in TCPMSS target, from Eric Dumazet. 2) Incorrect event message type for related conntracks created via ctnetlink, from Liping Zhang. 3) Fix incorrect rcu locking when handling helpers from ctnetlink, from Gao feng. 4) Fix missing rcu locking when updating helper, from Liping Zhang. 5) Fix missing read_lock_bh when iterating over list of device addresses from TPROXY and redirect, also from Liping. 6) Fix crash when trying to dump expectations from conntrack with no helper via ctnetlink, from Liping. 7) Missing RCU protection to expecation list update given ctnetlink iterates over the list under rcu read lock side, from Liping too. 8) Don't dump autogenerated seed in nft_hash to userspace, this is very confusing to the user, again from Liping. 9) Fix wrong conntrack netns module refcount in ipt_CLUSTERIP, from Gao feng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-14xfrm: Prepare the GRO codepath for hardware offloading.Steffen Klassert
On IPsec hardware offloading, we already get a secpath with valid state attached when the packet enters the GRO handlers. So check for hardware offload and skip the state lookup in this case. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Add encapsulation header offsets while SKB is not encryptedIlan Tayari
Both esp4 and esp6 used to assume that the SKB payload is encrypted and therefore the inner_network and inner_transport offsets are not relevant. When doing crypto offload in the NIC, this is no longer the case and the NIC driver needs these offsets so it can do TX TCP checksum offloading. This patch sets the inner_network and inner_transport members of the SKB, as well as encapsulation, to reflect the actual positions of these headers, and removes them only once encryption is done on the payload. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14net: Add a xfrm validate function to validate_xmit_skbSteffen Klassert
When we do IPsec offloading, we need a fallback for packets that were targeted to be IPsec offloaded but rerouted to a device that does not support IPsec offload. For that we add a function that checks the offloading features of the sending device and and flags the requirement of a fallback before it calls the IPsec output function. The IPsec output function adds the IPsec trailer and does encryption if needed. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp: Use a synchronous crypto algorithm on offloading.Steffen Klassert
We need a fallback algorithm for crypto offloading to a NIC. This is because packets can be rerouted to other NICs that don't support crypto offloading. The fallback is going to be implemented at layer2 where we know the final output device but can't handle asynchronous returns fron the crypto layer. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Add xfrm_replay_overflow functions for offloadingSteffen Klassert
This patch adds functions that handles IPsec sequence numbers for GSO segments and TSO offloading. We need to calculate and update the sequence numbers based on the segments that GSO/TSO will generate. We need this to keep software and hardware sequence number counter in sync. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp: Add gso handlers for esp4 and esp6Steffen Klassert
This patch extends the xfrm_type by an encap function pointer and implements esp4_gso_encap and esp6_gso_encap. These functions doing the basic esp encapsulation for a GSO packet. In case the GSO packet needs to be segmented in software, we add gso_segment functions. This codepath is going to be used on esp hardware offloads. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp6: Reorganize esp_outputSteffen Klassert
We need a fallback for ESP at layer 2, so split esp6_output into generic functions that can be used at layer 3 and layer 2 and use them in esp_output. We also add esp6_xmit which is used for the layer 2 fallback. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp4: Reorganize esp_outputSteffen Klassert
We need a fallback for ESP at layer 2, so split esp_output into generic functions that can be used at layer 3 and layer 2 and use them in esp_output. We also add esp_xmit which is used for the layer 2 fallback. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14esp6: Remame esp_input_done2Steffen Klassert
We are going to export the ipv4 and the ipv6 version of esp_input_done2. They are not static anymore and can't have the same name. So rename the ipv6 version to esp6_input_done2. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Add an IPsec hardware offloading APISteffen Klassert
This patch adds all the bits that are needed to do IPsec hardware offload for IPsec states and ESP packets. We add xfrmdev_ops to the net_device. xfrmdev_ops has function pointers that are needed to manage the xfrm states in the hardware and to do a per packet offloading decision. Joint work with: Ilan Tayari <ilant@mellanox.com> Guy Shapiro <guysh@mellanox.com> Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Add mode handlers for IPsec on layer 2Steffen Klassert
This patch adds a gso_segment and xmit callback for the xfrm_mode and implement these functions for tunnel and transport mode. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Move device notifications to a sepatate fileSteffen Klassert
This is needed for the upcomming IPsec device offloading. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-14xfrm: Add a xfrm type offload.Steffen Klassert
We add a struct xfrm_type_offload so that we have the offloaded codepath separated to the non offloaded codepath. With this the non offloade and the offloaded codepath can coexist. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>