summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-07-19netfilter: ipt_CLUSTERIP: fix use-after-free of proc entrySabrina Dubroca
When we delete a netns with a CLUSTERIP rule, clusterip_net_exit() is called first, removing /proc/net/ipt_CLUSTERIP. Then clusterip_config_entry_put() is called from clusterip_tg_destroy(), and tries to remove its entry under /proc/net/ipt_CLUSTERIP/. Fix this by checking that the parent directory of the entry to remove hasn't already been deleted. The following triggers a KASAN splat (stealing the reproducer from 202f59afd441, thanks to Jianlin Shi and Xin Long): ip netns add test ip link add veth0_in type veth peer name veth0_out ip link set veth0_in netns test ip netns exec test ip link set lo up ip netns exec test ip link set veth0_in up ip netns exec test iptables -I INPUT -d 1.2.3.4 -i veth0_in -j \ CLUSTERIP --new --clustermac 89:d4:47:eb:9a:fa --total-nodes 3 \ --local-node 1 --hashmode sourceip-sourceport ip netns del test Fixes: ce4ff76c15a8 ("netfilter: ipt_CLUSTERIP: make proc directory per net namespace") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-18netfilter: fix netfilter_net_init() returnDan Carpenter
We accidentally return an uninitialized variable. Fixes: cf56c2f892a8 ("netfilter: remove old pre-netns era hook api") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Missing netlink message sanity check in nfnetlink, patch from Mateusz Jurczyk. 2) We now have netfilter per-netns hooks, so let's kill global hook infrastructure, this infrastructure is known to be racy with netns. We don't care about out of tree modules. Patch from Florian Westphal. 3) find_appropriate_src() is buggy when colissions happens after the conversion of the nat bysource to rhashtable. Also from Florian. 4) Remove forward chain in nf_tables arp family, it's useless and it is causing quite a bit of confusion, from Florian Westphal. 5) nf_ct_remove_expect() is called with the wrong parameter, causing kernel oops, patch from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18udp: preserve skb->dst if required for IP options processingPaolo Abeni
Eric noticed that in udp_recvmsg() we still need to access skb->dst while processing the IP options. Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue") skb->dst is no more available at recvmsg() time and bad things will happen if we enter the relevant code path. This commit address the issue, avoid clearing skb->dst if any IP options are present into the relevant skb. Since the IP CB is contained in the first skb cacheline, we can test it to decide to leverage the consume_stateless_skb() optimization, without measurable additional cost in the faster path. v1 -> v2: updated commit message tags Fixes: 0a463c78d25b ("udp: avoid a cache miss on dequeue") Reported-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()Alexander Potapenko
KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(), which originated from the TCP request socket created in cookie_v6_check(): ================================================================== BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0 CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters. Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 dump_stack+0x172/0x1c0 lib/dump_stack.c:52 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 skb_set_hash_from_sk ./include/net/sock.h:2011 tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983 tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493 tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284 tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309 call_timer_fn+0x240/0x520 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 __run_timers+0xc13/0xf10 kernel/time/timer.c:1601 run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614 __do_softirq+0x485/0x942 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x1fa/0x230 kernel/softirq.c:405 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657 smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966 apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489 RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36 RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77 RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440 RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770 RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004 R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810 R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4 </IRQ> poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293 SYSC_select+0x4b4/0x4e0 fs/select.c:653 SyS_select+0x76/0xa0 fs/select.c:634 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204 RIP: 0033:0x4597e7 RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20 R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003 chained origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_save_stack mm/kmsan/kmsan.c:317 kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547 __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259 tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472 tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103 tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212 cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337 kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766 reqsk_alloc ./include/net/request_sock.h:87 inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200 cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 ================================================================== Similar error is reported for cookie_v4_check(). Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18xfrm: add xdst pcpu cacheFlorian Westphal
retain last used xfrm_dst in a pcpu cache. On next request, reuse this dst if the policies are the same. The cache will not help with strict RR workloads as there is no hit. The cache packet-path part is reasonably small, the notifier part is needed so we do not add long hangs when a device is dismantled but some pcpu xdst still holds a reference, there are also calls to the flush operation when userspace deletes SAs so modules can be removed (there is no hit. We need to run the dst_release on the correct cpu to avoid races with packet path. This is done by adding a work_struct for each cpu and then doing the actual test/release on each affected cpu via schedule_work_on(). Test results using 4 network namespaces and null encryption: ns1 ns2 -> ns3 -> ns4 netperf -> xfrm/null enc -> xfrm/null dec -> netserver what TCP_STREAM UDP_STREAM UDP_RR Flow cache: 14644.61 294.35 327231.64 No flow cache: 14349.81 242.64 202301.72 Pcpu cache: 14629.70 292.21 205595.22 UDP tests used 64byte packets, tests ran for one minute each, value is average over ten iterations. 'Flow cache' is 'net-next', 'No flow cache' is net-next plus this series but without this patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18xfrm: remove flow cacheFlorian Westphal
After rcu conversions performance degradation in forward tests isn't that noticeable anymore. See next patch for some numbers. A followup patcg could then also remove genid from the policies as we do not cache bundles anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18xfrm_policy: make xfrm_bundle_lookup return xfrm dst objectFlorian Westphal
This allows to remove flow cache object embedded in struct xfrm_dst. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18xfrm_policy: remove xfrm_policy_lookupFlorian Westphal
This removes the wrapper and renames the __xfrm_policy_lookup variant to get rid of another place that used flow cache objects. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18xfrm_policy: kill flow to policy dir conversionFlorian Westphal
XFRM_POLICY_IN/OUT/FWD are identical to FLOW_DIR_*, so gcc already removed this function as its just returns the argument. Again, no code change. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18xfrm_policy: remove always true/false branchesFlorian Westphal
after previous change oldflo and xdst are always NULL. These branches were already removed by gcc, this doesn't change code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18xfrm_policy: bypass flow_cache_lookupFlorian Westphal
Instead of consulting flow cache, call the xfrm bundle/policy lookup functions directly. This pretends the flow cache had no entry. This helps to gradually remove flow cache integration, followup commit will remove the dead code that this change adds. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18net: xfrm: revert to lower xfrm dst gc limitFlorian Westphal
revert c386578f1cdb4dac230395 ("xfrm: Let the flowcache handle its size by default."). Once we remove flow cache, we don't have a flow cache limit anymore. We must not allow (virtually) unlimited allocations of xfrm dst entries. Revert back to the old xfrm dst gc limits. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18vti: revert flush x-netns xfrm cache when vti interface is removedFlorian Westphal
flow cache is removed in next commit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17net: Revert "net: add function to allocate sk_buff head without data area"Florian Westphal
It was added for netlink mmap tx, there are no callers in the tree. The commit also added a check for skb->head != NULL in kfree_skb path, remove that too -- all skbs ought to have skb->head set. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17inet: Remove software UFO fragmenting code.David S. Miller
Rename udp{4,6}_ufo_fragment() to udp{4,6}_tunnel_segment() and only handle tunnel segmentation. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17net: Remove all references to SKB_GSO_UDP.David S. Miller
Such packets are no longer possible. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17inet: Stop generating UFO packets.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17net: Remove references to NETIF_F_UFO from ethtool.David S. Miller
It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17net: Remove references to NETIF_F_UFO in netdev_fix_features().David S. Miller
It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17xdp: Add batching support to redirect mapJohn Fastabend
For performance reasons we want to avoid updating the tail pointer in the driver tx ring as much as possible. To accomplish this we add batching support to the redirect path in XDP. This adds another ndo op "xdp_flush" that is used to inform the driver that it should bump the tail pointer on the TX ring. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17bpf: add bpf_redirect_map helper routineJohn Fastabend
BPF programs can use the devmap with a bpf_redirect_map() helper routine to forward packets to netdevice in map. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17xdp: add trace event for xdp redirectJohn Fastabend
This adds a trace event for xdp redirect which may help when debugging XDP programs that use redirect bpf commands. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17net: implement XDP_REDIRECT for xdp genericJohn Fastabend
Add support for redirect to xdp generic creating a fall back for devices that do not yet have support and allowing test infrastructure using veth pairs to be built. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17xdp: add bpf_redirect helper functionJohn Fastabend
This adds support for a bpf_redirect helper function to the XDP infrastructure. For now this only supports redirecting to the egress path of a port. In order to support drivers handling a xdp_buff natively this patches uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff to the specified device. If the program specifies either (a) an unknown device or (b) a device that does not support the operation a BPF warning is thrown and the XDP_ABORTED error code is returned. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17net: xdp: support xdp generic on virtual devicesJohn Fastabend
XDP generic allows users to test XDP programs and/or run them with degraded performance on devices that do not yet support XDP. For testing I typically test eBPF programs using a set of veth devices. This allows testing topologies that would otherwise be difficult to setup especially in the early stages of development. This patch adds a xdp generic hook to the netif_rx_internal() function which is called from dev_forward_skb(). With this addition attaching XDP programs to veth devices works as expected! Also I noticed multiple drivers using netif_rx(). These devices will also benefit and generic XDP will work for them as well. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17inetpeer: remove AVL implementation in favor of RB treeEric Dumazet
As discussed in Faro during Netfilter Workshop 2017, RB trees can be used with RCU, using a seqlock. Note that net/rxrpc/conn_service.c is already using this. This patch converts inetpeer from AVL tree to RB tree, since it allows to remove private AVL implementation in favor of shared RB code. $ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after text data bss dec hex filename 3195 40 128 3363 d23 net/ipv4/inetpeer.before 1562 24 0 1586 632 net/ipv4/inetpeer.after The same technique can be used to speed up net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17net/unix: drop obsolete fd-recursion limitsDavid Herrmann
All unix sockets now account inflight FDs to the respective sender. This was introduced in: commit 712f4aad406bb1ed67f3f98d04c044191f0ff593 Author: willy tarreau <w@1wt.eu> Date: Sun Jan 10 07:54:56 2016 +0100 unix: properly account for FDs passed over unix sockets and further refined in: commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6 Author: Hannes Frederic Sowa <hannes@stressinduktion.org> Date: Wed Feb 3 02:11:03 2016 +0100 unix: correctly track in-flight fds in sending process user_struct Hence, regardless of the stacking depth of FDs, the total number of inflight FDs is limited, and accounted. There is no known way for a local user to exceed those limits or exploit the accounting. Furthermore, the GC logic is independent of the recursion/stacking depth as well. It solely depends on the total number of inflight FDs, regardless of their layout. Lastly, the current `recursion_level' suffers a TOCTOU race, since it checks and inherits depths only at queue time. If we consider `A<-B' to mean `queue-B-on-A', the following sequence circumvents the recursion level easily: A<-B B<-C C<-D ... Y<-Z resulting in: A<-B<-C<-...<-Z With all of this in mind, lets drop the recursion limit. It has no additional security value, anymore. On the contrary, it randomly confuses message brokers that try to forward file-descriptors, since any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client maliciously modifies the FD while inflight. Cc: Alban Crequy <alban.crequy@collabora.co.uk> Cc: Simon McVittie <simon.mcvittie@collabora.co.uk> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17skbuff: optimize the pull_pages code in __pskb_pull_tail()linzhang
In the pull_pages code block, if the first frag size > eat, we can end the loop in advance to avoid extra copy. Signed-off-by: Lin Zhang <xiaolou4617@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17netfilter: expect: fix crash when putting uninited expectationFlorian Westphal
We crash in __nf_ct_expect_check, it calls nf_ct_remove_expect on the uninitialised expectation instead of existing one, so del_timer chokes on random memory address. Fixes: ec0e3f01114ad32711243 ("netfilter: nf_ct_expect: Add nf_ct_remove_expect()") Reported-by: Sergey Kvachonok <ravenexp@gmail.com> Tested-by: Sergey Kvachonok <ravenexp@gmail.com> Cc: Gao Feng <fgao@ikuai8.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17netfilter: nf_tables: only allow in/output for arp packetsFlorian Westphal
arp packets cannot be forwarded. They can be bridged, but then they can be filtered using either ebtables or nftables bridge family. The bridge netfilter exposes a "call-arptables" switch which pushes packets into arptables, but lets not expose this for nftables, so better close this asap. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17netfilter: nat: fix src map lookupFlorian Westphal
When doing initial conversion to rhashtable I replaced the bucket walk with a single rhashtable_lookup_fast(). When moving to rhlist I failed to properly walk the list of identical tuples, but that is what is needed for this to work correctly. The table contains the original tuples, so the reply tuples are all distinct. We currently decide that mapping is (not) in range only based on the first entry, but in case its not we need to try the reply tuple of the next entry until we either find an in-range mapping or we checked all the entries. This bug makes nat core attempt collision resolution while it might be able to use the mapping as-is. Fixes: 870190a9ec90 ("netfilter: nat: convert nat bysrc hash to rhashtable") Reported-by: Jaco Kroon <jaco@uls.co.za> Tested-by: Jaco Kroon <jaco@uls.co.za> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17netfilter: remove old pre-netns era hook apiFlorian Westphal
no more users in the tree, remove this. The old api is racy wrt. module removal, all users have been converted to the netns-aware api. The old api pretended we still have global hooks but that has not been true for a long time. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17libceph: potential NULL dereference in ceph_msg_data_create()Dan Carpenter
If kmem_cache_zalloc() returns NULL then the INIT_LIST_HEAD(&data->links); will Oops. The callers aren't really prepared for NULL returns so it doesn't make a lot of difference in real life. Fixes: 5240d9f95dfe ("libceph: replace message data pointer with list") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-17libceph: don't call encode_request_finish() on MOSDBackoff messagesIlya Dryomov
encode_request_finish() is for MOSDOp messages. Calling it on MOSDBackoff ack-block messages corrupts them. Fixes: a02a946dfe96 ("libceph: respect RADOS_BACKOFF backoffs") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-17libceph: use alloc_pg_mapping() in __decode_pg_upmap_items()Ilya Dryomov
... otherwise we die in insert_pg_mapping(), which wants pg->node to be empty, i.e. initialized with RB_CLEAR_NODE. Fixes: 6f428df47dae ("libceph: pg_upmap[_items] infrastructure") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-17libceph: set -EINVAL in one place in crush_decode()Ilya Dryomov
No sooner than Dan had fixed this issue in commit 293dffaad8d5 ("libceph: NULL deref on crush_decode() error path"), I brought it back. Add a new label and set -EINVAL once, right before failing. Fixes: 278b1d709c6a ("libceph: ceph_decode_skip_* helpers") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-17libceph: NULL deref on osdmap_apply_incremental() error pathDan Carpenter
There are hidden gotos in the ceph_decode_* macros. We need to set the "err" variable on these error paths otherwise we end up returning ERR_PTR(0) which is NULL. It causes NULL dereferences in the callers. Fixes: 6f428df47dae ("libceph: pg_upmap[_items] infrastructure") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> [idryomov@gmail.com: similar bug in osdmap_decode(), changelog tweak] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-17netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcvMateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the nlmsghdr structure before accessing the nlh->nlmsg_len field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < NLMSG_HDRLEN expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-16sctp: remove the typedef sctp_hmac_algo_param_tXin Long
This patch is to remove the typedef sctp_hmac_algo_param_t, and replace with struct sctp_hmac_algo_param in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16sctp: remove the typedef sctp_chunks_param_tXin Long
This patch is to remove the typedef sctp_chunks_param_t, and replace with struct sctp_chunks_param in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16sctp: remove the typedef sctp_random_param_tXin Long
This patch is to remove the typedef sctp_random_param_t, and replace with struct sctp_random_param in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16sctp: remove the typedef sctp_supported_ext_param_tXin Long
This patch is to remove the typedef sctp_supported_ext_param_t, and replace with struct sctp_supported_ext_param in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16sctp: remove the typedef sctp_adaptation_ind_param_tXin Long
This patch is to remove the typedef sctp_adaptation_ind_param_t, and replace with struct sctp_adaptation_ind_param in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16sctp: remove the typedef sctp_supported_addrs_param_tXin Long
This patch is to remove the typedef sctp_supported_addrs_param_t, and replace with struct sctp_supported_addrs_param in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16sctp: remove the typedef sctp_cookie_preserve_param_tXin Long
This patch is to remove the typedef sctp_cookie_preserve_param_t, and replace with struct sctp_cookie_preserve_param in the places where it's using this typedef. It is also to fix some indents in sctp_sf_do_5_2_6_stale(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16sctp: remove the typedef sctp_ipv6addr_param_tXin Long
This patch is to remove the typedef sctp_ipv6addr_param_t, and replace with struct sctp_ipv6addr_param in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16sctp: remove the typedef sctp_ipv4addr_param_tXin Long
This patch is to remove the typedef sctp_ipv4addr_param_t, and replace with struct sctp_ipv4addr_param in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16rds: cancel send/recv work before queuing connection shutdownSowmini Varadhan
We could end up executing rds_conn_shutdown before the rds_recv_worker thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a sock_release and set sock->sk to null, which may interleave in bad ways with rds_recv_worker, e.g., it could result in: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000078" [ffff881769f6fd70] release_sock at ffffffff815f337b [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp] [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds] [ffff881769f6fde0] process_one_work at ffffffff810a14c1 [ffff881769f6fe40] worker_thread at ffffffff810a1940 [ffff881769f6fec0] kthread at ffffffff810a6b1e Also, do not enqueue any new shutdown workq items when the connection is shutting down (this may happen for rds-tcp in softirq mode, if a FIN or CLOSE is received while the modules is in the middle of an unload) Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-15tcp_bbr: init pacing rate on first RTT sampleNeal Cardwell
Fixes the following behavior: for connections that had no RTT sample at the time of initializing congestion control, BBR was initializing the pacing rate to a high nominal rate (based an a guess of RTT=1ms, in case this is LAN traffic). Then BBR never adjusted the pacing rate downward upon obtaining an actual RTT sample, if the connection never filled the pipe (e.g. all sends were small app-limited writes()). This fix adjusts the pacing rate upon obtaining the first RTT sample. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>