summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2017-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) BPF verifier signed/unsigned value tracking fix, from Daniel Borkmann, Edward Cree, and Josef Bacik. 2) Fix memory allocation length when setting up calls to ->ndo_set_mac_address, from Cong Wang. 3) Add a new cxgb4 device ID, from Ganesh Goudar. 4) Fix FIB refcount handling, we have to set it's initial value before the configure callback (which can bump it). From David Ahern. 5) Fix double-free in qcom/emac driver, from Timur Tabi. 6) A bunch of gcc-7 string format overflow warning fixes from Arnd Bergmann. 7) Fix link level headroom tests in ip_do_fragment(), from Vasily Averin. 8) Fix chunk walking in SCTP when iterating over error and parameter headers. From Alexander Potapenko. 9) TCP BBR congestion control fixes from Neal Cardwell. 10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger. 11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong Wang. 12) xmit_recursion in ppp driver needs to be per-device not per-cpu, from Gao Feng. 13) Cannot release skb->dst in UDP if IP options processing needs it. From Paolo Abeni. 14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander Levin and myself. 15) Revert some rtnetlink notification changes that are causing regressions, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits) net: bonding: Fix transmit load balancing in balance-alb mode rds: Make sure updates to cp_send_gen can be observed net: ethernet: ti: cpsw: Push the request_irq function to the end of probe ipv4: initialize fib_trie prior to register_netdev_notifier call. rtnetlink: allocate more memory for dev_set_mac_address() net: dsa: b53: Add missing ARL entries for BCM53125 bpf: more tests for mixed signed and unsigned bounds checks bpf: add test for mixed signed and unsigned bounds checks bpf: fix up test cases with mixed signed/unsigned bounds bpf: allow to specify log level and reduce it for test_verifier bpf: fix mixed signed/unsigned derived min/max value bounds ipv6: avoid overflow of offset in ip6_find_1stfragopt net: tehuti: don't process data if it has not been copied from userspace Revert "rtnetlink: Do not generate notifications for CHANGEADDR event" net: dsa: mv88e6xxx: Enable CMODE config support for 6390X dt-binding: ptp: Add SoC compatibility strings for dte ptp clock NET: dwmac: Make dwmac reset unconditional net: Zero terminate ifr_name in dev_ifname(). wireless: wext: terminate ifr name coming from userspace netfilter: fix netfilter_net_init() return ...
2017-07-18netfilter: fix netfilter_net_init() returnDan Carpenter
We accidentally return an uninitialized variable. Fixes: cf56c2f892a8 ("netfilter: remove old pre-netns era hook api") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17netfilter: expect: fix crash when putting uninited expectationFlorian Westphal
We crash in __nf_ct_expect_check, it calls nf_ct_remove_expect on the uninitialised expectation instead of existing one, so del_timer chokes on random memory address. Fixes: ec0e3f01114ad32711243 ("netfilter: nf_ct_expect: Add nf_ct_remove_expect()") Reported-by: Sergey Kvachonok <ravenexp@gmail.com> Tested-by: Sergey Kvachonok <ravenexp@gmail.com> Cc: Gao Feng <fgao@ikuai8.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17netfilter: nat: fix src map lookupFlorian Westphal
When doing initial conversion to rhashtable I replaced the bucket walk with a single rhashtable_lookup_fast(). When moving to rhlist I failed to properly walk the list of identical tuples, but that is what is needed for this to work correctly. The table contains the original tuples, so the reply tuples are all distinct. We currently decide that mapping is (not) in range only based on the first entry, but in case its not we need to try the reply tuple of the next entry until we either find an in-range mapping or we checked all the entries. This bug makes nat core attempt collision resolution while it might be able to use the mapping as-is. Fixes: 870190a9ec90 ("netfilter: nat: convert nat bysrc hash to rhashtable") Reported-by: Jaco Kroon <jaco@uls.co.za> Tested-by: Jaco Kroon <jaco@uls.co.za> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17netfilter: remove old pre-netns era hook apiFlorian Westphal
no more users in the tree, remove this. The old api is racy wrt. module removal, all users have been converted to the netns-aware api. The old api pretended we still have global hooks but that has not been true for a long time. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-17netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcvMateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the nlmsghdr structure before accessing the nlh->nlmsg_len field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < NLMSG_HDRLEN expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-12net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()Michal Hocko
xt_alloc_table_info() basically opencodes kvmalloc() so use the library function instead. Link: http://lkml.kernel.org/r/20170531155145.17111-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains two Netfilter fixes for your net tree, they are: 1) Fix memleak from netns release path of conntrack protocol trackers, patch from Liping Zhang. 2) Uninitialized flags field in ebt_log, that results in unpredictable logging format in ebtables, also from Liping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_inithdr_tXin Long
This patch is to remove the typedef sctp_inithdr_t, and replace with struct sctp_inithdr in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_chunkhdr_tXin Long
This patch is to remove the typedef sctp_chunkhdr_t, and replace with struct sctp_chunkhdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type)., especially in sctp_new. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_sctphdr_tXin Long
This patch is to remove the typedef sctp_sctphdr_t, and replace with struct sctphdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert sock.sk_refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. This patch uses refcount_inc_not_zero() instead of atomic_inc_not_zero_hint() due to absense of a _hint() version of refcount API. If the hint() version must be used, we might need to revisit API. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. This batch contains connection tracking updates for the cleanup iteration path, patches from Florian Westphal: X) Skip unconfirmed conntracks in nf_ct_iterate_cleanup_net(), just set dying bit to let the CPU release them. X) Add nf_ct_iterate_destroy() to be used on module removal, to kill conntrack from all namespace. X) Restart iteration on hashtable resizing, since both may occur at the same time. X) Use the new nf_ct_iterate_destroy() to remove conntrack with NAT mapping on module removal. X) Use nf_ct_iterate_destroy() to remove conntrack entries helper module removal, from Liping Zhang. X) Use nf_ct_iterate_cleanup_net() to remove the timeout extension if user requests this, also from Liping. X) Add net_ns_barrier() and use it from FTP helper, so make sure no concurrent namespace removal happens at the same time while the helper module is being removed. X) Use NFPROTO_MAX in layer 3 conntrack protocol array, to reduce module size. Same thing in nf_tables. Updates for the nf_tables infrastructure: X) Prepare usage of the extended ACK reporting infrastructure for nf_tables. X) Remove unnecessary forward declaration in nf_tables hash set. X) Skip set size estimation if number of element is not specified. X) Changes to accomodate a (faster) unresizable hash set implementation, for anonymous sets and dynamic size fixed sets with no timeouts. X) Faster lookup function for unresizable hash table for 2 and 4 bytes key. And, finally, a bunch of asorted small updates and cleanups: X) Do not hold reference to netdev from ipt_CLUSTER, instead subscribe to device events and look up for index from the packet path, this is fixing an issue that is present since the very beginning, patch from Xin Long. X) Use nf_register_net_hook() in ipt_CLUSTER, from Florian Westphal. X) Use ebt_invalid_target() whenever possible in the ebtables tree, from Gao Feng. X) Calm down compilation warning in nf_dup infrastructure, patch from stephen hemminger. X) Statify functions in nftables rt expression, also from stephen. X) Update Makefile to use canonical method to specify nf_tables-objs. From Jike Song. X) Use nf_conntrack_helpers_register() in amanda and H323. X) Space cleanup for ctnetlink, from linzhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29netfilter: nf_ct_dccp/sctp: fix memory leak after netns cleanupLiping Zhang
After running the following commands for a while, kmemleak reported that "1879 new suspected memory leaks" happened: # while : ; do ip netns add test ip netns delete test done unreferenced object 0xffff88006342fa38 (size 1024): comm "ip", pid 15477, jiffies 4295982857 (age 957.836s) hex dump (first 32 bytes): b8 b0 4d a0 ff ff ff ff c0 34 c3 59 00 88 ff ff ..M......4.Y.... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8190510a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81284130>] __kmalloc_track_caller+0x150/0x300 [<ffffffff812302d0>] kmemdup+0x20/0x50 [<ffffffffa04d598a>] dccp_init_net+0x8a/0x160 [nf_conntrack] [<ffffffffa04cf9f5>] nf_ct_l4proto_pernet_register_one+0x25/0x90 ... unreferenced object 0xffff88006342da58 (size 1024): comm "ip", pid 15477, jiffies 4295982857 (age 957.836s) hex dump (first 32 bytes): 10 b3 4d a0 ff ff ff ff 04 35 c3 59 00 88 ff ff ..M......5.Y.... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8190510a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81284130>] __kmalloc_track_caller+0x150/0x300 [<ffffffff812302d0>] kmemdup+0x20/0x50 [<ffffffffa04d6a9d>] sctp_init_net+0x5d/0x130 [nf_conntrack] [<ffffffffa04cf9f5>] nf_ct_l4proto_pernet_register_one+0x25/0x90 ... This is because we forgot to implement the get_net_proto for sctp and dccp, so we won't invoke the nf_ct_unregister_sysctl to free the ctl_table when do netns cleanup. Also note, we will fail to register the sysctl for dccp/sctp either due to the lack of get_net_proto. Fixes: c51d39010a1b ("netfilter: conntrack: built-in support for DCCP") Fixes: a85406afeb3e ("netfilter: conntrack: built-in support for SCTP") Cc: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: nfnetlink: extended ACK reportingPablo Neira Ayuso
Pass down struct netlink_ext_ack as parameter to all of our nfnetlink subsystem callbacks, so we can work on follow up patches to provide finer grain error reporting using the new infrastructure that 2d4bc93368f5 ("netlink: extended ACK reporting") provides. No functional change, just pass down this new object to callbacks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: nf_tables: reduce chain type table sizeFlorian Westphal
text data bss dec hex filename old: 151590 2240 1152 154982 25d66 net/netfilter/nf_tables_api.o new: 151666 2240 416 154322 25ad2 net/netfilter/nf_tables_api.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: conntrack: use NFPROTO_MAX to size arrayFlorian Westphal
We don't support anything larger than NFPROTO_MAX, so we can shrink this a bit: text data dec hex filename old: 8259 1096 9355 248b net/netfilter/nf_conntrack_proto.o new: 8259 624 8883 22b3 net/netfilter/nf_conntrack_proto.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: use nf_conntrack_helpers_register when possibleLiping Zhang
amanda_helper, nf_conntrack_helper_ras and nf_conntrack_helper_q931 are all arrays, so we can use nf_conntrack_helpers_register to register the ct helper, this will help us to eliminate some "goto errX" statements. Also introduce h323_helper_init/exit helper function to register the ct helpers, this is prepared for the followup patch, which will add net namespace support for ct helper. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter, kbuild: use canonical method to specify objs.Jike Song
Should use ":=" instead of "+=". Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netns: add and use net_ns_barrierFlorian Westphal
Quoting Joe Stringer: If a user loads nf_conntrack_ftp, sends FTP traffic through a network namespace, destroys that namespace then unloads the FTP helper module, then the kernel will crash. Events that lead to the crash: 1. conntrack is created with ftp helper in netns x 2. This netns is destroyed 3. netns destruction is scheduled 4. netns destruction wq starts, removes netns from global list 5. ftp helper is unloaded, which resets all helpers of the conntracks via for_each_net() but because netns is already gone from list the for_each_net() loop doesn't include it, therefore all of these conntracks are unaffected. 6. helper module unload finishes 7. netns wq invokes destructor for rmmod'ed helper CC: "Eric W. Biederman" <ebiederm@xmission.com> Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: move table iteration out of netns exit pathsFlorian Westphal
We only need to iterate & remove in case of module removal; for netns destruction all conntracks will be removed anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-16networking: make skb_put & friends return void pointersJohannes Berg
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions (skb_put, __skb_put and pskb_put) return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_put, __skb_put }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_put, __skb_put }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) which actually doesn't cover pskb_put since there are only three users overall. A handful of stragglers were converted manually, notably a macro in drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many instances in net/bluetooth/hci_sock.c. In the former file, I also had to fix one whitespace problem spatch introduced. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Just some simple overlapping changes in marvell PHY driver and the DSA core code. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-29netfilter: cttimeout: use nf_ct_iterate_cleanup_net to unlink timeout objsLiping Zhang
Similar to nf_conntrack_helper, we can use nf_ct_iterare_cleanup_net to remove these copy & paste code. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_ct_helper: use nf_ct_iterate_destroy to unlink helper objsLiping Zhang
When we unlink the helper objects, we will iterate the nf_conntrack_hash, iterate the unconfirmed list, handle the hash resize situation, etc. Actually this logic is same as the nf_ct_iterate_destroy, so we can use it to remove these copy & paste code. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add lookup variant for fixed size hashtablePablo Neira Ayuso
This patch provides a faster variant of the lookup function for 2 and 4 byte keys. Optimizing the one byte case is not worth, as the set backend selection will always select the bitmap set type for such case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add non-resizable hashtable implementationPablo Neira Ayuso
This patch adds a simple non-resizable hashtable implementation. If the user specifies the set size, then this new faster hashtable flavour is selected. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: allow large allocations for new setsPablo Neira Ayuso
The new fixed size hashtable backend implementation may result in a large array of buckets that would spew splats from mm. Update this code to fall back on vmalloc in case the memory allocation order is too costly. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add nft_hash_buckets()Pablo Neira Ayuso
Add nft_hash_buckets() helper function to calculate the number of hashtable buckets based on the elements. This function can be reused from the follow up patch to add non-resizable hashtables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: pass set description to ->privsizePablo Neira Ayuso
The new non-resizable hashtable variant needs this to calculate the size of the bucket array. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: select set backend flavour depending on descriptionPablo Neira Ayuso
This patch adds the infrastructure to support several implementations of the same set type. This selection will be based on the set description and the features available for this set. This allow us to select set backend implementation that will result in better performance numbers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: use nft_rhash prefix for resizable set backendPablo Neira Ayuso
This patch prepares the introduction of a non-resizable hashtable implementation that is significantly faster. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: no size estimation if number of set elements is unknownPablo Neira Ayuso
This size estimation is ignored by the existing set backend selection logic, since this estimation structure is stack allocated, set this to ~0 to make it easier to catch bugs in future changes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: unnecessary forward declarationPablo Neira Ayuso
Replace struct rhashtable_params forward declaration by the structure definition itself. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nat: destroy nat mappings on module exit path onlyFlorian Westphal
We don't need pernetns cleanup anymore. If the netns is being destroyed, conntrack netns exit will kill all entries in this namespace, and neither conntrack hash table nor bysource hash are per namespace. For the rmmod case, we have to make sure we remove all entries from the nat bysource table, so call the new nf_ct_iterate_destroy in module exit path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: restart iteration on resizeFlorian Westphal
We could some conntracks when a resize occurs in parallel. Avoid this by sampling generation seqcnt and doing a restart if needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: add nf_ct_iterate_destroyFlorian Westphal
sledgehammer to be used on module unload (to remove affected conntracks from all namespaces). It will also flag all unconfirmed conntracks as dying, i.e. they will not be committed to main table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: don't call iter for non-confirmed conntracksFlorian Westphal
nf_ct_iterate_cleanup_net currently calls iter() callback also for conntracks on the unconfirmed list, but this is unsafe. Acesses to nf_conn are fine, but some users access the extension area in the iter() callback, but that does only work reliably for confirmed conntracks (ct->ext can be reallocated at any time for unconfirmed conntrack). The seond issue is that there is a short window where a conntrack entry is neither on the list nor in the table: To confirm an entry, it is first removed from the unconfirmed list, then insert into the table. Fix this by iterating the unconfirmed list first and marking all entries as dying, then wait for rcu grace period. This makes sure all entries that were about to be confirmed either are in the main table, or will be dropped soon. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: rename nf_ct_iterate_cleanupFlorian Westphal
There are several places where we needlesly call nf_ct_iterate_cleanup, we should instead iterate the full table at module unload time. This is a leftover from back when the conntrack table got duplicated per net namespace. So rename nf_ct_iterate_cleanup to nf_ct_iterate_cleanup_net. A later patch will then add a non-net variant. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_rt: make local functions staticstephen hemminger
Resolves warnings: net/netfilter/nft_rt.c:26:6: warning: no previous prototype for ‘nft_rt_get_eval’ [-Wmissing-prototypes] net/netfilter/nft_rt.c:75:5: warning: no previous prototype for ‘nft_rt_get_init’ [-Wmissing-prototypes] net/netfilter/nft_rt.c:106:5: warning: no previous prototype for ‘nft_rt_get_dump’ [-Wmissing-prototypes] Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: dup: resolve warnings about missing prototypesstephen hemminger
Missing include file causes: net/netfilter/nf_dup_netdev.c:26:6: warning: no previous prototype for ‘nf_fwd_netdev_egress’ [-Wmissing-prototypes] net/netfilter/nf_dup_netdev.c:40:6: warning: no previous prototype for ‘nf_dup_netdev_egress’ [-Wmissing-prototypes] Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: ctnetlink: delete extra spaceslinzhang
This patch cleans up extra spaces. Signed-off-by: linzhang <xiaolou4617@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-24netfilter: ctnetlink: fix incorrect nf_ct_put during hash resizeLiping Zhang
If nf_conntrack_htable_size was adjusted by the user during the ct dump operation, we may invoke nf_ct_put twice for the same ct, i.e. the "last" ct. This will cause the ct will be freed but still linked in hash buckets. It's very easy to reproduce the problem by the following commands: # while : ; do echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets done # while : ; do conntrack -L done # iperf -s 127.0.0.1 & # iperf -c 127.0.0.1 -P 60 -t 36000 After a while, the system will hang like this: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382] ... So at last if we find cb->args[1] is equal to "last", this means hash resize happened, then we can set cb->args[1] to 0 to fix the above issue. Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-23netfilter: nat: use atomic bit op to clear the _SRC_NAT_DONE_BITLiping Zhang
We need to clear the IPS_SRC_NAT_DONE_BIT to indicate that the ct has been removed from nat_bysource table. But unfortunately, we use the non-atomic bit operation: "ct->status &= ~IPS_NAT_DONE_MASK". So there's a race condition that we may clear the _DYING_BIT set by another CPU unexpectedly. Since we don't care about the IPS_DST_NAT_DONE_BIT, so just using clear_bit to clear the IPS_SRC_NAT_DONE_BIT is enough. Also note, this is the last user which use the non-atomic bit operation to update the confirmed ct->status. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-23netfilter: nft_set_rbtree: handle element re-addition after deletionPablo Neira Ayuso
The existing code selects no next branch to be inspected when re-inserting an inactive element into the rb-tree, looping endlessly. This patch restricts the check for active elements to the EEXIST case only. Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates") Reported-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-23netfilter: conntrack: fix false CRC32c mismatch using paged skbDavide Caratti
sctp_compute_cksum() implementation assumes that at least the SCTP header is in the linear part of skb: modify conntrack error callback to avoid false CRC32c mismatch, if the transport header is partially/entirely paged. Fixes: cf6e007eef83 ("netfilter: conntrack: validate SCTP crc32c in PREROUTING") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) When using IPVS in direct-routing mode, normal traffic from the LVS host to a back-end server is sometimes incorrectly NATed on the way back into the LVS host. Patch to fix this from Julian Anastasov. 2) Calm down clang compilation warning in ctnetlink due to type mismatch, from Matthias Kaehlcke. 3) Do not re-setup NAT for conntracks that are already confirmed, this is fixing a problem that was introduced in the previous nf-next batch. Patch from Liping Zhang. 4) Do not allow conntrack helper removal from userspace cthelper infrastructure if already in used. This comes with an initial patch to introduce nf_conntrack_helper_put() that is required by this fix. From Liping Zhang. 5) Zero the pad when copying data to userspace, otherwise iptables fails to remove rules. This is a follow up on the patchset that sorts out the internal match/target structure pointer leak to userspace. Patch from the same author, Willem de Bruijn. This also comes with a build failure when CONFIG_COMPAT is not on, coming in the last patch of this series. 6) SYNPROXY crashes with conntrack entries that are created via ctnetlink, more specifically via conntrackd state sync. Patch from Eric Leblond. 7) RCU safe iteration on set element dumping in nf_tables, from Liping Zhang. 8) Missing sanitization of immediate date for the bitwise and cmp expressions in nf_tables. 9) Refcounting logic for chain and objects from set elements does not integrate into the nf_tables 2-phase commit protocol. 10) Missing sanitization of target verdict in ebtables arpreply target, from Gao Feng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPATWillem de Bruijn
The patch in the Fixes references COMPAT_XT_ALIGN in the definition of XT_DATA_TO_USER, outside an #ifdef CONFIG_COMPAT block. Split XT_DATA_TO_USER into separate compat and non compat variants and define the first inside an CONFIG_COMPAT block. This simplifies both variants by removing branches inside the macro. Fixes: 324318f0248c ("netfilter: xtables: zero padding in data_to_user") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-17tcp: switch TCP TS option (RFC 7323) to 1ms clockEric Dumazet
TCP Timestamps option is defined in RFC 7323 Traditionally on linux, it has been tied to the internal 'jiffies' variable, because it had been a cheap and good enough generator. For TCP flows on the Internet, 1 ms resolution would be much better than 4ms or 10ms (HZ=250 or HZ=100 respectively) For TCP flows in the DC, Google has used usec resolution for more than two years with great success [1] Receive size autotuning (DRS) is indeed more precise and converges faster to optimal window size. This patch converts tp->tcp_mstamp to a plain u64 value storing a 1 usec TCP clock. This choice will allow us to upstream the 1 usec TS option as discussed in IETF 97. [1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>