summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2017-07-12net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()Michal Hocko
xt_alloc_table_info() basically opencodes kvmalloc() so use the library function instead. Link: http://lkml.kernel.org/r/20170531155145.17111-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains two Netfilter fixes for your net tree, they are: 1) Fix memleak from netns release path of conntrack protocol trackers, patch from Liping Zhang. 2) Uninitialized flags field in ebt_log, that results in unpredictable logging format in ebtables, also from Liping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_inithdr_tXin Long
This patch is to remove the typedef sctp_inithdr_t, and replace with struct sctp_inithdr in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_chunkhdr_tXin Long
This patch is to remove the typedef sctp_chunkhdr_t, and replace with struct sctp_chunkhdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type)., especially in sctp_new. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_sctphdr_tXin Long
This patch is to remove the typedef sctp_sctphdr_t, and replace with struct sctphdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert sock.sk_refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. This patch uses refcount_inc_not_zero() instead of atomic_inc_not_zero_hint() due to absense of a _hint() version of refcount API. If the hint() version must be used, we might need to revisit API. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. This batch contains connection tracking updates for the cleanup iteration path, patches from Florian Westphal: X) Skip unconfirmed conntracks in nf_ct_iterate_cleanup_net(), just set dying bit to let the CPU release them. X) Add nf_ct_iterate_destroy() to be used on module removal, to kill conntrack from all namespace. X) Restart iteration on hashtable resizing, since both may occur at the same time. X) Use the new nf_ct_iterate_destroy() to remove conntrack with NAT mapping on module removal. X) Use nf_ct_iterate_destroy() to remove conntrack entries helper module removal, from Liping Zhang. X) Use nf_ct_iterate_cleanup_net() to remove the timeout extension if user requests this, also from Liping. X) Add net_ns_barrier() and use it from FTP helper, so make sure no concurrent namespace removal happens at the same time while the helper module is being removed. X) Use NFPROTO_MAX in layer 3 conntrack protocol array, to reduce module size. Same thing in nf_tables. Updates for the nf_tables infrastructure: X) Prepare usage of the extended ACK reporting infrastructure for nf_tables. X) Remove unnecessary forward declaration in nf_tables hash set. X) Skip set size estimation if number of element is not specified. X) Changes to accomodate a (faster) unresizable hash set implementation, for anonymous sets and dynamic size fixed sets with no timeouts. X) Faster lookup function for unresizable hash table for 2 and 4 bytes key. And, finally, a bunch of asorted small updates and cleanups: X) Do not hold reference to netdev from ipt_CLUSTER, instead subscribe to device events and look up for index from the packet path, this is fixing an issue that is present since the very beginning, patch from Xin Long. X) Use nf_register_net_hook() in ipt_CLUSTER, from Florian Westphal. X) Use ebt_invalid_target() whenever possible in the ebtables tree, from Gao Feng. X) Calm down compilation warning in nf_dup infrastructure, patch from stephen hemminger. X) Statify functions in nftables rt expression, also from stephen. X) Update Makefile to use canonical method to specify nf_tables-objs. From Jike Song. X) Use nf_conntrack_helpers_register() in amanda and H323. X) Space cleanup for ctnetlink, from linzhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29netfilter: nf_ct_dccp/sctp: fix memory leak after netns cleanupLiping Zhang
After running the following commands for a while, kmemleak reported that "1879 new suspected memory leaks" happened: # while : ; do ip netns add test ip netns delete test done unreferenced object 0xffff88006342fa38 (size 1024): comm "ip", pid 15477, jiffies 4295982857 (age 957.836s) hex dump (first 32 bytes): b8 b0 4d a0 ff ff ff ff c0 34 c3 59 00 88 ff ff ..M......4.Y.... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8190510a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81284130>] __kmalloc_track_caller+0x150/0x300 [<ffffffff812302d0>] kmemdup+0x20/0x50 [<ffffffffa04d598a>] dccp_init_net+0x8a/0x160 [nf_conntrack] [<ffffffffa04cf9f5>] nf_ct_l4proto_pernet_register_one+0x25/0x90 ... unreferenced object 0xffff88006342da58 (size 1024): comm "ip", pid 15477, jiffies 4295982857 (age 957.836s) hex dump (first 32 bytes): 10 b3 4d a0 ff ff ff ff 04 35 c3 59 00 88 ff ff ..M......5.Y.... 04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8190510a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff81284130>] __kmalloc_track_caller+0x150/0x300 [<ffffffff812302d0>] kmemdup+0x20/0x50 [<ffffffffa04d6a9d>] sctp_init_net+0x5d/0x130 [nf_conntrack] [<ffffffffa04cf9f5>] nf_ct_l4proto_pernet_register_one+0x25/0x90 ... This is because we forgot to implement the get_net_proto for sctp and dccp, so we won't invoke the nf_ct_unregister_sysctl to free the ctl_table when do netns cleanup. Also note, we will fail to register the sysctl for dccp/sctp either due to the lack of get_net_proto. Fixes: c51d39010a1b ("netfilter: conntrack: built-in support for DCCP") Fixes: a85406afeb3e ("netfilter: conntrack: built-in support for SCTP") Cc: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: nfnetlink: extended ACK reportingPablo Neira Ayuso
Pass down struct netlink_ext_ack as parameter to all of our nfnetlink subsystem callbacks, so we can work on follow up patches to provide finer grain error reporting using the new infrastructure that 2d4bc93368f5 ("netlink: extended ACK reporting") provides. No functional change, just pass down this new object to callbacks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: nf_tables: reduce chain type table sizeFlorian Westphal
text data bss dec hex filename old: 151590 2240 1152 154982 25d66 net/netfilter/nf_tables_api.o new: 151666 2240 416 154322 25ad2 net/netfilter/nf_tables_api.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: conntrack: use NFPROTO_MAX to size arrayFlorian Westphal
We don't support anything larger than NFPROTO_MAX, so we can shrink this a bit: text data dec hex filename old: 8259 1096 9355 248b net/netfilter/nf_conntrack_proto.o new: 8259 624 8883 22b3 net/netfilter/nf_conntrack_proto.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: use nf_conntrack_helpers_register when possibleLiping Zhang
amanda_helper, nf_conntrack_helper_ras and nf_conntrack_helper_q931 are all arrays, so we can use nf_conntrack_helpers_register to register the ct helper, this will help us to eliminate some "goto errX" statements. Also introduce h323_helper_init/exit helper function to register the ct helpers, this is prepared for the followup patch, which will add net namespace support for ct helper. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter, kbuild: use canonical method to specify objs.Jike Song
Should use ":=" instead of "+=". Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netns: add and use net_ns_barrierFlorian Westphal
Quoting Joe Stringer: If a user loads nf_conntrack_ftp, sends FTP traffic through a network namespace, destroys that namespace then unloads the FTP helper module, then the kernel will crash. Events that lead to the crash: 1. conntrack is created with ftp helper in netns x 2. This netns is destroyed 3. netns destruction is scheduled 4. netns destruction wq starts, removes netns from global list 5. ftp helper is unloaded, which resets all helpers of the conntracks via for_each_net() but because netns is already gone from list the for_each_net() loop doesn't include it, therefore all of these conntracks are unaffected. 6. helper module unload finishes 7. netns wq invokes destructor for rmmod'ed helper CC: "Eric W. Biederman" <ebiederm@xmission.com> Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-19netfilter: move table iteration out of netns exit pathsFlorian Westphal
We only need to iterate & remove in case of module removal; for netns destruction all conntracks will be removed anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-06-16networking: make skb_put & friends return void pointersJohannes Berg
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions (skb_put, __skb_put and pskb_put) return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_put, __skb_put }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_put, __skb_put }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) which actually doesn't cover pskb_put since there are only three users overall. A handful of stragglers were converted manually, notably a macro in drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many instances in net/bluetooth/hci_sock.c. In the former file, I also had to fix one whitespace problem spatch introduced. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Just some simple overlapping changes in marvell PHY driver and the DSA core code. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-29netfilter: cttimeout: use nf_ct_iterate_cleanup_net to unlink timeout objsLiping Zhang
Similar to nf_conntrack_helper, we can use nf_ct_iterare_cleanup_net to remove these copy & paste code. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_ct_helper: use nf_ct_iterate_destroy to unlink helper objsLiping Zhang
When we unlink the helper objects, we will iterate the nf_conntrack_hash, iterate the unconfirmed list, handle the hash resize situation, etc. Actually this logic is same as the nf_ct_iterate_destroy, so we can use it to remove these copy & paste code. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add lookup variant for fixed size hashtablePablo Neira Ayuso
This patch provides a faster variant of the lookup function for 2 and 4 byte keys. Optimizing the one byte case is not worth, as the set backend selection will always select the bitmap set type for such case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add non-resizable hashtable implementationPablo Neira Ayuso
This patch adds a simple non-resizable hashtable implementation. If the user specifies the set size, then this new faster hashtable flavour is selected. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: allow large allocations for new setsPablo Neira Ayuso
The new fixed size hashtable backend implementation may result in a large array of buckets that would spew splats from mm. Update this code to fall back on vmalloc in case the memory allocation order is too costly. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: add nft_hash_buckets()Pablo Neira Ayuso
Add nft_hash_buckets() helper function to calculate the number of hashtable buckets based on the elements. This function can be reused from the follow up patch to add non-resizable hashtables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: pass set description to ->privsizePablo Neira Ayuso
The new non-resizable hashtable variant needs this to calculate the size of the bucket array. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: select set backend flavour depending on descriptionPablo Neira Ayuso
This patch adds the infrastructure to support several implementations of the same set type. This selection will be based on the set description and the features available for this set. This allow us to select set backend implementation that will result in better performance numbers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: use nft_rhash prefix for resizable set backendPablo Neira Ayuso
This patch prepares the introduction of a non-resizable hashtable implementation that is significantly faster. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nf_tables: no size estimation if number of set elements is unknownPablo Neira Ayuso
This size estimation is ignored by the existing set backend selection logic, since this estimation structure is stack allocated, set this to ~0 to make it easier to catch bugs in future changes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_set_hash: unnecessary forward declarationPablo Neira Ayuso
Replace struct rhashtable_params forward declaration by the structure definition itself. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nat: destroy nat mappings on module exit path onlyFlorian Westphal
We don't need pernetns cleanup anymore. If the netns is being destroyed, conntrack netns exit will kill all entries in this namespace, and neither conntrack hash table nor bysource hash are per namespace. For the rmmod case, we have to make sure we remove all entries from the nat bysource table, so call the new nf_ct_iterate_destroy in module exit path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: restart iteration on resizeFlorian Westphal
We could some conntracks when a resize occurs in parallel. Avoid this by sampling generation seqcnt and doing a restart if needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: add nf_ct_iterate_destroyFlorian Westphal
sledgehammer to be used on module unload (to remove affected conntracks from all namespaces). It will also flag all unconfirmed conntracks as dying, i.e. they will not be committed to main table. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: don't call iter for non-confirmed conntracksFlorian Westphal
nf_ct_iterate_cleanup_net currently calls iter() callback also for conntracks on the unconfirmed list, but this is unsafe. Acesses to nf_conn are fine, but some users access the extension area in the iter() callback, but that does only work reliably for confirmed conntracks (ct->ext can be reallocated at any time for unconfirmed conntrack). The seond issue is that there is a short window where a conntrack entry is neither on the list nor in the table: To confirm an entry, it is first removed from the unconfirmed list, then insert into the table. Fix this by iterating the unconfirmed list first and marking all entries as dying, then wait for rcu grace period. This makes sure all entries that were about to be confirmed either are in the main table, or will be dropped soon. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: conntrack: rename nf_ct_iterate_cleanupFlorian Westphal
There are several places where we needlesly call nf_ct_iterate_cleanup, we should instead iterate the full table at module unload time. This is a leftover from back when the conntrack table got duplicated per net namespace. So rename nf_ct_iterate_cleanup to nf_ct_iterate_cleanup_net. A later patch will then add a non-net variant. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: nft_rt: make local functions staticstephen hemminger
Resolves warnings: net/netfilter/nft_rt.c:26:6: warning: no previous prototype for ‘nft_rt_get_eval’ [-Wmissing-prototypes] net/netfilter/nft_rt.c:75:5: warning: no previous prototype for ‘nft_rt_get_init’ [-Wmissing-prototypes] net/netfilter/nft_rt.c:106:5: warning: no previous prototype for ‘nft_rt_get_dump’ [-Wmissing-prototypes] Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: dup: resolve warnings about missing prototypesstephen hemminger
Missing include file causes: net/netfilter/nf_dup_netdev.c:26:6: warning: no previous prototype for ‘nf_fwd_netdev_egress’ [-Wmissing-prototypes] net/netfilter/nf_dup_netdev.c:40:6: warning: no previous prototype for ‘nf_dup_netdev_egress’ [-Wmissing-prototypes] Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-29netfilter: ctnetlink: delete extra spaceslinzhang
This patch cleans up extra spaces. Signed-off-by: linzhang <xiaolou4617@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-24netfilter: ctnetlink: fix incorrect nf_ct_put during hash resizeLiping Zhang
If nf_conntrack_htable_size was adjusted by the user during the ct dump operation, we may invoke nf_ct_put twice for the same ct, i.e. the "last" ct. This will cause the ct will be freed but still linked in hash buckets. It's very easy to reproduce the problem by the following commands: # while : ; do echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets done # while : ; do conntrack -L done # iperf -s 127.0.0.1 & # iperf -c 127.0.0.1 -P 60 -t 36000 After a while, the system will hang like this: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382] ... So at last if we find cb->args[1] is equal to "last", this means hash resize happened, then we can set cb->args[1] to 0 to fix the above issue. Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-23netfilter: nat: use atomic bit op to clear the _SRC_NAT_DONE_BITLiping Zhang
We need to clear the IPS_SRC_NAT_DONE_BIT to indicate that the ct has been removed from nat_bysource table. But unfortunately, we use the non-atomic bit operation: "ct->status &= ~IPS_NAT_DONE_MASK". So there's a race condition that we may clear the _DYING_BIT set by another CPU unexpectedly. Since we don't care about the IPS_DST_NAT_DONE_BIT, so just using clear_bit to clear the IPS_SRC_NAT_DONE_BIT is enough. Also note, this is the last user which use the non-atomic bit operation to update the confirmed ct->status. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-23netfilter: nft_set_rbtree: handle element re-addition after deletionPablo Neira Ayuso
The existing code selects no next branch to be inspected when re-inserting an inactive element into the rb-tree, looping endlessly. This patch restricts the check for active elements to the EEXIST case only. Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates") Reported-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-23netfilter: conntrack: fix false CRC32c mismatch using paged skbDavide Caratti
sctp_compute_cksum() implementation assumes that at least the SCTP header is in the linear part of skb: modify conntrack error callback to avoid false CRC32c mismatch, if the transport header is partially/entirely paged. Fixes: cf6e007eef83 ("netfilter: conntrack: validate SCTP crc32c in PREROUTING") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) When using IPVS in direct-routing mode, normal traffic from the LVS host to a back-end server is sometimes incorrectly NATed on the way back into the LVS host. Patch to fix this from Julian Anastasov. 2) Calm down clang compilation warning in ctnetlink due to type mismatch, from Matthias Kaehlcke. 3) Do not re-setup NAT for conntracks that are already confirmed, this is fixing a problem that was introduced in the previous nf-next batch. Patch from Liping Zhang. 4) Do not allow conntrack helper removal from userspace cthelper infrastructure if already in used. This comes with an initial patch to introduce nf_conntrack_helper_put() that is required by this fix. From Liping Zhang. 5) Zero the pad when copying data to userspace, otherwise iptables fails to remove rules. This is a follow up on the patchset that sorts out the internal match/target structure pointer leak to userspace. Patch from the same author, Willem de Bruijn. This also comes with a build failure when CONFIG_COMPAT is not on, coming in the last patch of this series. 6) SYNPROXY crashes with conntrack entries that are created via ctnetlink, more specifically via conntrackd state sync. Patch from Eric Leblond. 7) RCU safe iteration on set element dumping in nf_tables, from Liping Zhang. 8) Missing sanitization of immediate date for the bitwise and cmp expressions in nf_tables. 9) Refcounting logic for chain and objects from set elements does not integrate into the nf_tables 2-phase commit protocol. 10) Missing sanitization of target verdict in ebtables arpreply target, from Gao Feng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-18netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPATWillem de Bruijn
The patch in the Fixes references COMPAT_XT_ALIGN in the definition of XT_DATA_TO_USER, outside an #ifdef CONFIG_COMPAT block. Split XT_DATA_TO_USER into separate compat and non compat variants and define the first inside an CONFIG_COMPAT block. This simplifies both variants by removing branches inside the macro. Fixes: 324318f0248c ("netfilter: xtables: zero padding in data_to_user") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-17tcp: switch TCP TS option (RFC 7323) to 1ms clockEric Dumazet
TCP Timestamps option is defined in RFC 7323 Traditionally on linux, it has been tied to the internal 'jiffies' variable, because it had been a cheap and good enough generator. For TCP flows on the Internet, 1 ms resolution would be much better than 4ms or 10ms (HZ=250 or HZ=100 respectively) For TCP flows in the DC, Google has used usec resolution for more than two years with great success [1] Receive size autotuning (DRS) is indeed more precise and converges faster to optimal window size. This patch converts tp->tcp_mstamp to a plain u64 value storing a 1 usec TCP clock. This choice will allow us to upstream the 1 usec TS option as discussed in IETF 97. [1] https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15netfilter: nf_tables: revisit chain/object refcounting from elementsPablo Neira Ayuso
Andreas reports that the following incremental update using our commit protocol doesn't work. # nft -f incremental-update.nft delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 } delete chain ip filter CIn_1 ... Error: Could not process rule: Device or resource busy The existing code is not well-integrated into the commit phase protocol, since element deletions do not result in refcount decrement from the preparation phase. This results in bogus EBUSY errors like the one above. Two new functions come with this patch: * nft_set_elem_activate() function is used from the abort path, to restore the set element refcounting on objects that occurred from the preparation phase. * nft_set_elem_deactivate() that is called from nft_del_setelem() to decrement set element refcounting on objects from the preparation phase in the commit protocol. The nft_data_uninit() has been renamed to nft_data_release() since this function does not uninitialize any data store in the data register, instead just releases the references to objects. Moreover, a new function nft_data_hold() has been introduced to be used from nft_set_elem_activate(). Reported-by: Andreas Schultz <aschultz@tpip.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15netfilter: nf_tables: missing sanitization in data from userspacePablo Neira Ayuso
Do not assume userspace always sends us NFT_DATA_VALUE for bitwise and cmp expressions. Although NFT_DATA_VERDICT does not make any sense, it is still possible to handcraft a netlink message using this incorrect data type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15netfilter: nf_tables: can't assume lock is acquired when dumping set elemsLiping Zhang
When dumping the elements related to a specified set, we may invoke the nf_tables_dump_set with the NFNL_SUBSYS_NFTABLES lock not acquired. So we should use the proper rcu operation to avoid race condition, just like other nft dump operations. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15netfilter: synproxy: fix conntrackd interactionEric Leblond
This patch fixes the creation of connection tracking entry from netlink when synproxy is used. It was missing the addition of the synproxy extension. This was causing kernel crashes when a conntrack entry created by conntrackd was used after the switch of traffic from active node to the passive node. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15netfilter: xtables: zero padding in data_to_userWillem de Bruijn
When looking up an iptables rule, the iptables binary compares the aligned match and target data (XT_ALIGN). In some cases this can exceed the actual data size to include padding bytes. Before commit f77bc5b23fb1 ("iptables: use match, target and data copy_to_user helpers") the malloc()ed bytes were overwritten by the kernel with kzalloced contents, zeroing the padding and making the comparison succeed. After this patch, the kernel copies and clears only data, leaving the padding bytes undefined. Extend the clear operation from data size to aligned data size to include the padding bytes, if any. Padding bytes can be observed in both match and target, and the bug triggered, by issuing a rule with match icmp and target ACCEPT: iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT Fixes: f77bc5b23fb1 ("iptables: use match, target and data copy_to_user helpers") Reported-by: Paul Moore <pmoore@redhat.com> Reported-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-15Merge tag 'ipvs-fixes-for-v4.12' of ↵Pablo Neira Ayuso
http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs Simon Horman says: ==================== IPVS Fixes for v4.12 please consider this fix to IPVS for v4.12. * It is a fix from Julian Anastasov to only SNAT SNAT packet replies only for NATed connections My understanding is that this fix is appropriate for 4.9.25, 4.10.13, 4.11 as well as the nf tree. Julian has separately posted backports for other -stable kernels; please see: * [PATCH 3.2.88,3.4.113 -stable 1/3] ipvs: SNAT packet replies only for NATed connections * [PATCH 3.10.105,3.12.73,3.16.43,4.1.39 -stable 2/3] ipvs: SNAT packet replies only for NATed connections * [PATCH 4.4.65 -stable 3/3] ipvs: SNAT packet replies only for NATed connections ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>