summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-07-20netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no prior stateFlorian Westphal
When first DCCP packet is SYNC or SYNCACK, we insert a new conntrack that has an un-initialized timeout value, i.e. such entry could be reaped at any time. Mark them as INVALID and only ignore SYNC/SYNCACK when connection had an old state. Reported-by: syzbot+6f18401420df260e37ed@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-20netfilter: nf_tables: don't allow to rename to already-pending nameFlorian Westphal
Its possible to rename two chains to the same name in one transaction: nft add chain t c1 nft add chain t c2 nft 'rename chain t c1 c3;rename chain t c2 c3' This creates two chains named 'c3'. Appears to be harmless, both chains can still be deleted both by name or handle, but, nevertheless, its a bug. Walk transaction log and also compare vs. the pending renames. Both chains can still be deleted, but nevertheless it is a bug as we don't allow to create chains with identical names, so we should prevent this from happening-by-rename too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-20netfilter: nf_tables: fix memory leaks on chain renameFlorian Westphal
The new name is stored in the transaction metadata, on commit, the pointers to the old and new names are swapped. Therefore in abort and commit case we have to free the pointer in the chain_trans container. In commit case, the pointer can be used by another cpu that is currently dumping the renamed chain, thus kfree needs to happen after waiting for rcu readers to complete. Fixes: b7263e071a ("netfilter: nf_tables: Allow chain name of up to 255 chars") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-20netfilter: nf_tables: free flow table struct tooFlorian Westphal
Fixes: 3b49e2e94e6ebb ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-20netfilter: nf_tables: use dev->name directlyFlorian Westphal
no need to store the name in separate area. Furthermore, it uses kmalloc but not kfree and most accesses seem to treat it as char[IFNAMSIZ] not char *. Remove this and use dev->name instead. In case event zeroed dev, just omit the name in the dump. Fixes: d92191aa84e5f1 ("netfilter: nf_tables: cache device name in flowtable object") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-20xfrm: Allow xfrmi if_id to be updated by UPDSANathan Harold
Allow attaching an SA to an xfrm interface id after the creation of the SA, so that tasks such as keying which must be done as the SA is created, can remain separate from the decision on how to route traffic from an SA. This permits SA creation to be decomposed in to three separate steps: 1) allocation of a SPI 2) algorithm and key negotiation 3) insertion into the data path Signed-off-by: Nathan Harold <nharold@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-20xfrm: Remove xfrmi interface ID from flowiBenedict Wong
In order to remove performance impact of having the extra u32 in every single flowi, this change removes the flowi_xfrm struct, prefering to take the if_id as a method parameter where needed. In the inbound direction, if_id is only needed during the __xfrm_check_policy() function, and the if_id can be determined at that point based on the skb. As such, xfrmi_decode_session() is only called with the skb in __xfrm_check_policy(). In the outbound direction, the only place where if_id is needed is the xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is directly passed into the xfrm_lookup_with_ifid() call. All existing callers can still call xfrm_lookup(), which uses a default if_id of 0. This change does not change any behavior of XFRMIs except for improving overall system performance via flowi size reduction. This change has been tested against the Android Kernel Networking Tests: https://android.googlesource.com/kernel/tests/+/master/net/test Signed-off-by: Benedict Wong <benedictwong@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-19net/sched: cls_flower: Support matching on ip tos and ttl for tunnelsOr Gerlitz
Allow users to set rules matching on ipv4 tos and ttl or ipv6 traffic-class and hoplimit of tunnel headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19flow_dissector: Dissect tos and ttl from the tunnel infoOr Gerlitz
Add dissection of the tos and ttl from the ip tunnel headers fields in case a match is needed on them. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19net/sched: tunnel_key: Allow to set tos and ttl for tc based ip tunnelsOr Gerlitz
Allow user-space to provide tos and ttl to be set for the tunnel headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19net/page_pool: Fix inconsistent lock state warningTariq Toukan
Fix the warning below by calling the ptr_ring_consume_bh, which uses spin_[un]lock_bh. [ 179.064300] ================================ [ 179.069073] WARNING: inconsistent lock state [ 179.073846] 4.18.0-rc2+ #18 Not tainted [ 179.078133] -------------------------------- [ 179.082907] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 179.089637] swapper/21/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 179.095478] 00000000963d1995 (&(&r->consumer_lock)->rlock){+.?.}, at: __page_pool_empty_ring+0x61/0x100 [ 179.105988] {SOFTIRQ-ON-W} state was registered at: [ 179.111443] _raw_spin_lock+0x35/0x50 [ 179.115634] __page_pool_empty_ring+0x61/0x100 [ 179.120699] page_pool_destroy+0x32/0x50 [ 179.125204] mlx5e_free_rq+0x38/0xc0 [mlx5_core] [ 179.130471] mlx5e_close_channel+0x20/0x120 [mlx5_core] [ 179.136418] mlx5e_close_channels+0x26/0x40 [mlx5_core] [ 179.142364] mlx5e_close_locked+0x44/0x50 [mlx5_core] [ 179.148509] mlx5e_close+0x42/0x60 [mlx5_core] [ 179.153936] __dev_close_many+0xb1/0x120 [ 179.158749] dev_close_many+0xa2/0x170 [ 179.163364] rollback_registered_many+0x148/0x460 [ 179.169047] rollback_registered+0x56/0x90 [ 179.174043] unregister_netdevice_queue+0x7e/0x100 [ 179.179816] unregister_netdev+0x18/0x20 [ 179.184623] mlx5e_remove+0x2a/0x50 [mlx5_core] [ 179.190107] mlx5_remove_device+0xe5/0x110 [mlx5_core] [ 179.196274] mlx5_unregister_interface+0x39/0x90 [mlx5_core] [ 179.203028] cleanup+0x5/0xbfc [mlx5_core] [ 179.208031] __x64_sys_delete_module+0x16b/0x240 [ 179.213640] do_syscall_64+0x5a/0x210 [ 179.218151] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 179.224218] irq event stamp: 334398 [ 179.228438] hardirqs last enabled at (334398): [<ffffffffa511d8b7>] rcu_process_callbacks+0x1c7/0x790 [ 179.239178] hardirqs last disabled at (334397): [<ffffffffa511d872>] rcu_process_callbacks+0x182/0x790 [ 179.249931] softirqs last enabled at (334386): [<ffffffffa509732e>] irq_enter+0x5e/0x70 [ 179.259306] softirqs last disabled at (334387): [<ffffffffa509741c>] irq_exit+0xdc/0xf0 [ 179.268584] [ 179.268584] other info that might help us debug this: [ 179.276572] Possible unsafe locking scenario: [ 179.276572] [ 179.283877] CPU0 [ 179.286954] ---- [ 179.290033] lock(&(&r->consumer_lock)->rlock); [ 179.295546] <Interrupt> [ 179.298830] lock(&(&r->consumer_lock)->rlock); [ 179.304550] [ 179.304550] *** DEADLOCK *** Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19xfrm: don't check offload_handle for nonzeroShannon Nelson
The offload_handle should be an opaque data cookie for the driver to use, much like the data cookie for a timer or alarm callback. Thus, the XFRM stack should not be checking for non-zero, because the driver might use that to store an array reference, which could be zero, or some other zero but meaningful value. We can remove the checks for non-zero because there are plenty other attributes also being checked to see if there is an offload in place for the SA in question. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Lots of fixes, here goes: 1) NULL deref in qtnfmac, from Gustavo A. R. Silva. 2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih. 3) Lost completion messages in AF_XDP, from Magnus Karlsson. 4) Correct bogus self-assignment in rhashtable, from Rishabh Bhatnagar. 5) Fix regression in ipv6 route append handling, from David Ahern. 6) Fix masking in __set_phy_supported(), from Heiner Kallweit. 7) Missing module owner set in x_tables icmp, from Florian Westphal. 8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire. 9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy. 10) Fix NULL deref when using chains in act_csum, from Davide Caratti. 11) XDP_REDIRECT needs to check if the interface is up and whether the MTU is sufficient. From Toshiaki Makita. 12) Net diag can do a double free when killing TCP_NEW_SYN_RECV connections, from Lorenzo Colitti. 13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a full minute, delaying device unregister. From Eric Dumazet. 14) Update MAC entries in the correct order in ixgbe, from Alexander Duyck. 15) Don't leave partial mangles bpf program in jit_subprogs, from Daniel Borkmann. 16) Fix pfmemalloc SKB state propagation, from Stefano Brivio. 17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng. 18) Use after free in tun XDP_TX, from Toshiaki Makita. 19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole. 20) Don't reuse remainder of RX page when XDP is set in mlx4, from Saeed Mahameed. 21) Fix window probe handling of TCP rapair sockets, from Stefan Baranoff. 22) Missing socket locking in smc_ioctl(), from Ursula Braun. 23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann. 24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva. 25) Two spots in ipv6 do a rol32() on a hash value but ignore the result. Fixes from Colin Ian King" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits) tcp: identify cryptic messages as TCP seq # bugs ptp: fix missing break in switch hv_netvsc: Fix napi reschedule while receive completion is busy MAINTAINERS: Drop inactive Vitaly Bordug's email net: cavium: Add fine-granular dependencies on PCI net: qca_spi: Fix log level if probe fails net: qca_spi: Make sure the QCA7000 reset is triggered net: qca_spi: Avoid packet drop during initial sync ipv6: fix useless rol32 call on hash ipv6: sr: fix useless rol32 call on hash net: sched: Using NULL instead of plain integer net: usb: asix: replace mii_nway_restart in resume path net: cxgb3_main: fix potential Spectre v1 lib/rhashtable: consider param->min_size when setting initial table size net/smc: reset recv timeout after clc handshake net/smc: add error handling for get_user() net/smc: optimize consumer cursor updates net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL. ipv6: ila: select CONFIG_DST_CACHE net: usb: rtl8150: demote allmulti message to dev_dbg() ...
2018-07-18tcp: identify cryptic messages as TCP seq # bugsRandy Dunlap
Attempt to make cryptic TCP seq number error messages clearer by (1) identifying the source of the message as "TCP", (2) identifying the errors as "seq # bug", and (3) grouping the field identifiers and values by separating them with commas. E.g., the following message is changed from: recvmsg bug 2: copied 73BCB6CD seq 70F17CBE rcvnxt 73BCB9AA fl 0 WARNING: CPU: 2 PID: 1501 at /linux/net/ipv4/tcp.c:1881 tcp_recvmsg+0x649/0xb90 to: TCP recvmsg seq # bug 2: copied 73BCB6CD, seq 70F17CBE, rcvnxt 73BCB9AA, fl 0 WARNING: CPU: 2 PID: 1501 at /linux/net/ipv4/tcp.c:2011 tcp_recvmsg+0x694/0xba0 Suggested-by: 積丹尼 Dan Jacobson <jidanni@jidanni.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18pktgen: convert safe uses of strncpy() to strcpy() to avoid string ↵Jakub Kicinski
truncation warning GCC 8 complains: net/core/pktgen.c: In function ‘pktgen_if_write’: net/core/pktgen.c:1419:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation] strncpy(pkt_dev->src_max, buf, len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/pktgen.c:1399:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation] strncpy(pkt_dev->src_min, buf, len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/pktgen.c:1290:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation] strncpy(pkt_dev->dst_max, buf, len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/pktgen.c:1268:4: warning: ‘strncpy’ output may be truncated copying between 0 and 31 bytes from a string of length 127 [-Wstringop-truncation] strncpy(pkt_dev->dst_min, buf, len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is no bug here, but the code is not perfect either. It copies sizeof(pkt_dev->/member/) - 1 from user space into buf, and then does a strcmp(pkt_dev->/member/, buf) hence assuming buf will be null-terminated and shorter than pkt_dev->/member/ (pkt_dev->/member/ is never explicitly null-terminated, and strncpy() doesn't have to null-terminate so the assumption must be on buf). The use of strncpy() without explicit null-termination looks suspicious. Convert to use straight strcpy(). strncpy() would also null-pad the output, but that's clearly unnecessary since the author calls memset(pkt_dev->/member/, 0, sizeof(..)); prior to strncpy(), anyway. While at it format the code for "dst_min", "dst_max", "src_min" and "src_max" in the same way by removing extra new lines in one case. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18ipv6: sr: fix useless rol32 call on hashColin Ian King
The rol32 call is currently rotating hash but the rol'd value is being discarded. I believe the current code is incorrect and hash should be assigned the rotated value returned from rol32. Detected by CoverityScan, CID#1468411 ("Useless call") Fixes: b5facfdba14c ("ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap mode") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: dlebrun@google.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net: dsa: Remove VLA usageSalvatore Mesoraca
We avoid 2 VLAs by using a pre-allocated field in dsa_switch. We also try to avoid dynamic allocation whenever possible (when using fewer than bits-per-long ports, which is the common case). Link: http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Link: http://lkml.kernel.org/r/20180505185145.GB32630@lunn.ch Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> [kees: tweak commit subject and message slightly] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18Merge tag 'batadv-net-for-davem-20180717' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are some batman-adv fixes: - Fix gateway refcounting in BATMAN IV and V, by Sven Eckelmann (2 patches) - Fix debugfs paths when renaming interfaces, by Sven Eckelmann (2 patches) - Fix TT flag issues, by Linus Luessing (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18tipc: remove unused tipc_group_sizeYueHaibing
After commit eb929a91b213 ("tipc: improve poll() for group member socket"), it is no longer used. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18tipc: remove unused tipc_link_is_activeYueHaibing
tipc_link_is_active is no longer used and can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net: sched: Using NULL instead of plain integerYueHaibing
Fixes the following sparse warnings: net/sched/cls_api.c:1101:43: warning: Using plain integer as NULL pointer net/sched/cls_api.c:1492:75: warning: Using plain integer as NULL pointer Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net: Move skb decrypted field, avoid explicity copyStefano Brivio
Commit 784abe24c903 ("net: Add decrypted field to skb") introduced a 'decrypted' field that is explicitly copied on skb copy and clone. Move it between headers_start[0] and headers_end[0], so that we don't need to copy it explicitly as it's copied by the memcpy() in __copy_skb_header(). While at it, drop the assignment in __skb_clone(), it was already redundant. This doesn't change the size of sk_buff or cacheline boundaries. The 15-bits hole before tc_index becomes a 14-bits hole, and will be again a 15-bits hole when this change is merged with commit 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()"). v2: as reported by kbuild test robot (oops, I forgot to build with CONFIG_TLS_DEVICE it seems), we can't use CHECK_SKB_FIELD() on a bit-field member. Just drop the check for the moment being, perhaps we could think of some magic to also check bit-field members one day. Fixes: 784abe24c903 ("net: Add decrypted field to skb") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18xdp: fix uninitialized 'err' variableJakub Kicinski
Smatch caught an uninitialized variable error which GCC seems to miss. Fixes: a25717d2b604 ("xdp: support simultaneous driver and hw XDP attachment") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net/smc: reset recv timeout after clc handshakeKarsten Graul
During clc handshake the receive timeout is set to CLC_WAIT_TIME. Remember and reset the original timeout value after the receive calls, and remove a duplicate assignment of CLC_WAIT_TIME. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net/smc: add error handling for get_user()Ursula Braun
For security reasons the return code of get_user() should always be checked. Fixes: 01d2f7e2cdd31 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net/smc: optimize consumer cursor updatesUrsula Braun
The SMC protocol requires to send a separate consumer cursor update, if it cannot be piggybacked to updates of the producer cursor. Currently the decision to send a separate consumer cursor update just considers the amount of data already received by the socket program. It does not consider the amount of data already arrived, but not yet consumed by the receiver. Basing the decision on the difference between already confirmed and already arrived data (instead of difference between already confirmed and already consumed data), may lead to a somewhat earlier consumer cursor update send in fast unidirectional traffic scenarios, and thus to better throughput. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Suggested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.Tetsuo Handa
syzbot is reporting stalls at nfc_llcp_send_ui_frame() [1]. This is because nfc_llcp_send_ui_frame() is retrying the loop without any delay when nonblocking nfc_alloc_send_skb() returned NULL. Since there is no need to use MSG_DONTWAIT if we retry until sock_alloc_send_pskb() succeeds, let's use blocking call. Also, in case an unexpected error occurred, let's break the loop if blocking nfc_alloc_send_skb() failed. [1] https://syzkaller.appspot.com/bug?id=4a131cc571c3733e0eff6bc673f4e36ae48f19c6 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18ipv6: ila: select CONFIG_DST_CACHEArnd Bergmann
My randconfig builds came across an old missing dependency for ILA: ERROR: "dst_cache_set_ip6" [net/ipv6/ila/ila.ko] undefined! ERROR: "dst_cache_get" [net/ipv6/ila/ila.ko] undefined! ERROR: "dst_cache_init" [net/ipv6/ila/ila.ko] undefined! ERROR: "dst_cache_destroy" [net/ipv6/ila/ila.ko] undefined! We almost never run into this by accident because randconfig builds end up selecting DST_CACHE from some other tunnel protocol, and this one appears to be the only one missing the explicit 'select'. >From all I can tell, this problem first appeared in linux-4.9 when dst_cache support got added to ILA. Fixes: 79ff2fc31e0f ("ila: Cache a route to translated address") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18netfilter: nft_set_rbtree: fix panic when destroying set by GCTaehee Yoo
This patch fixes below. 1. check null pointer of rb_next. rb_next can return null. so null check routine should be added. 2. add rcu_barrier in destroy routine. GC uses call_rcu to remove elements. but all elements should be removed before destroying set and chains. so that rcu_barrier is added. test script: %cat test.nft table inet aa { map map1 { type ipv4_addr : verdict; flags interval, timeout; elements = { 0-1 : jump a0, 3-4 : jump a0, 6-7 : jump a0, 9-10 : jump a0, 12-13 : jump a0, 15-16 : jump a0, 18-19 : jump a0, 21-22 : jump a0, 24-25 : jump a0, 27-28 : jump a0, } timeout 1s; } chain a0 { } } flush ruleset table inet aa { map map1 { type ipv4_addr : verdict; flags interval, timeout; elements = { 0-1 : jump a0, 3-4 : jump a0, 6-7 : jump a0, 9-10 : jump a0, 12-13 : jump a0, 15-16 : jump a0, 18-19 : jump a0, 21-22 : jump a0, 24-25 : jump a0, 27-28 : jump a0, } timeout 1s; } chain a0 { } } flush ruleset splat looks like: [ 2402.419838] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 2402.428433] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 2402.429343] CPU: 1 PID: 1350 Comm: kworker/1:1 Not tainted 4.18.0-rc2+ #1 [ 2402.429343] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 03/23/2017 [ 2402.429343] Workqueue: events_power_efficient nft_rbtree_gc [nft_set_rbtree] [ 2402.429343] RIP: 0010:rb_next+0x1e/0x130 [ 2402.429343] Code: e9 de f2 ff ff 0f 1f 80 00 00 00 00 41 55 48 89 fa 41 54 55 53 48 c1 ea 03 48 b8 00 00 00 0 [ 2402.429343] RSP: 0018:ffff880105f77678 EFLAGS: 00010296 [ 2402.429343] RAX: dffffc0000000000 RBX: ffff8801143e3428 RCX: 1ffff1002287c69c [ 2402.429343] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000 [ 2402.429343] RBP: 0000000000000000 R08: ffffed0016aabc24 R09: ffffed0016aabc24 [ 2402.429343] R10: 0000000000000001 R11: ffffed0016aabc23 R12: 0000000000000000 [ 2402.429343] R13: ffff8800b6933388 R14: dffffc0000000000 R15: ffff8801143e3440 [ 2402.534486] kasan: CONFIG_KASAN_INLINE enabled [ 2402.534212] FS: 0000000000000000(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000 [ 2402.534212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2402.534212] CR2: 0000000000863008 CR3: 00000000a3c16000 CR4: 00000000001006e0 [ 2402.534212] Call Trace: [ 2402.534212] nft_rbtree_gc+0x2b5/0x5f0 [nft_set_rbtree] [ 2402.534212] process_one_work+0xc1b/0x1ee0 [ 2402.540329] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 2402.534212] ? _raw_spin_unlock_irq+0x29/0x40 [ 2402.534212] ? pwq_dec_nr_in_flight+0x3e0/0x3e0 [ 2402.534212] ? set_load_weight+0x270/0x270 [ 2402.534212] ? __schedule+0x6ea/0x1fb0 [ 2402.534212] ? __sched_text_start+0x8/0x8 [ 2402.534212] ? save_trace+0x320/0x320 [ 2402.534212] ? sched_clock_local+0xe2/0x150 [ 2402.534212] ? find_held_lock+0x39/0x1c0 [ 2402.534212] ? worker_thread+0x35f/0x1150 [ 2402.534212] ? lock_contended+0xe90/0xe90 [ 2402.534212] ? __lock_acquire+0x4520/0x4520 [ 2402.534212] ? do_raw_spin_unlock+0xb1/0x350 [ 2402.534212] ? do_raw_spin_trylock+0x111/0x1b0 [ 2402.534212] ? do_raw_spin_lock+0x1f0/0x1f0 [ 2402.534212] worker_thread+0x169/0x1150 Fixes: 8d8540c4f5e0("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nft_set_hash: add rcu_barrier() in the nft_rhash_destroy()Taehee Yoo
GC of set uses call_rcu() to destroy elements. So that elements would be destroyed after destroying sets and chains. But, elements should be destroyed before destroying sets and chains. In order to wait calling call_rcu(), a rcu_barrier() is added. In order to test correctly, below patch should be applied. https://patchwork.ozlabs.org/patch/940883/ test scripts: %cat test.nft table ip aa { map map1 { type ipv4_addr : verdict; flags timeout; elements = { 0 : jump a0, 1 : jump a0, 2 : jump a0, 3 : jump a0, 4 : jump a0, 5 : jump a0, 6 : jump a0, 7 : jump a0, 8 : jump a0, 9 : jump a0, } timeout 1s; } chain a0 { } } flush ruleset [ ... ] table ip aa { map map1 { type ipv4_addr : verdict; flags timeout; elements = { 0 : jump a0, 1 : jump a0, 2 : jump a0, 3 : jump a0, 4 : jump a0, 5 : jump a0, 6 : jump a0, 7 : jump a0, 8 : jump a0, 9 : jump a0, } timeout 1s; } chain a0 { } } flush ruleset Splat looks like: [ 200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363! [ 200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24 [ 200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [ 200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables] [ 200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff [ 200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282 [ 200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000 [ 200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90 [ 200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa [ 200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200 [ 200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000 [ 200.906354] FS: 00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000 [ 200.915533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0 [ 200.930353] Call Trace: [ 200.932351] ? nf_tables_commit+0x26f6/0x2c60 [nf_tables] [ 200.939525] ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables] [ 200.947525] ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables] [ 200.952383] ? nft_add_set_elem+0x1700/0x1700 [nf_tables] [ 200.959532] ? nla_parse+0xab/0x230 [ 200.963529] ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink] [ 200.968384] ? nfnetlink_net_init+0x130/0x130 [nfnetlink] [ 200.975525] ? debug_show_all_locks+0x290/0x290 [ 200.980363] ? debug_show_all_locks+0x290/0x290 [ 200.986356] ? sched_clock_cpu+0x132/0x170 [ 200.990352] ? find_held_lock+0x39/0x1b0 [ 200.994355] ? sched_clock_local+0x10d/0x130 [ 200.999531] ? memset+0x1f/0x40 Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18Bluetooth: Use lock_sock_nested in bt_accept_enqueuePhilipp Puschmann
Fixes this warning that was provoked by a pairing: [60258.016221] WARNING: possible recursive locking detected [60258.021558] 4.15.0-RD1812-BSP #1 Tainted: G O [60258.027146] -------------------------------------------- [60258.032464] kworker/u5:0/70 is trying to acquire lock: [60258.037609] (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<87759073>] bt_accept_enqueue+0x3c/0x74 [60258.046863] [60258.046863] but task is already holding lock: [60258.052704] (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<d22d7106>] l2cap_sock_new_connection_cb+0x1c/0x88 [60258.062905] [60258.062905] other info that might help us debug this: [60258.069441] Possible unsafe locking scenario: [60258.069441] [60258.075368] CPU0 [60258.077821] ---- [60258.080272] lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP); [60258.085510] lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP); [60258.090748] [60258.090748] *** DEADLOCK *** [60258.090748] [60258.096676] May be due to missing lock nesting notation [60258.096676] [60258.103472] 5 locks held by kworker/u5:0/70: [60258.107747] #0: ((wq_completion)%shdev->name#2){+.+.}, at: [<9460d092>] process_one_work+0x130/0x4fc [60258.117263] #1: ((work_completion)(&hdev->rx_work)){+.+.}, at: [<9460d092>] process_one_work+0x130/0x4fc [60258.126942] #2: (&conn->chan_lock){+.+.}, at: [<7877c8c3>] l2cap_connect+0x80/0x4f8 [60258.134806] #3: (&chan->lock/2){+.+.}, at: [<2e16c724>] l2cap_connect+0x8c/0x4f8 [60258.142410] #4: (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.}, at: [<d22d7106>] l2cap_sock_new_connection_cb+0x1c/0x88 [60258.153043] [60258.153043] stack backtrace: [60258.157413] CPU: 1 PID: 70 Comm: kworker/u5:0 Tainted: G O 4.15.0-RD1812-BSP #1 [60258.165945] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [60258.172485] Workqueue: hci0 hci_rx_work [60258.176331] Backtrace: [60258.178797] [<8010c9fc>] (dump_backtrace) from [<8010ccbc>] (show_stack+0x18/0x1c) [60258.186379] r7:80e55fe4 r6:80e55fe4 r5:20050093 r4:00000000 [60258.192058] [<8010cca4>] (show_stack) from [<809864e8>] (dump_stack+0xb0/0xdc) [60258.199301] [<80986438>] (dump_stack) from [<8016ecc8>] (__lock_acquire+0xffc/0x11d4) [60258.207144] r9:5e2bb019 r8:630f974c r7:ba8a5940 r6:ba8a5ed8 r5:815b5220 r4:80fa081c [60258.214901] [<8016dccc>] (__lock_acquire) from [<8016f620>] (lock_acquire+0x78/0x98) [60258.222655] r10:00000040 r9:00000040 r8:808729f0 r7:00000001 r6:00000000 r5:60050013 [60258.230491] r4:00000000 [60258.233045] [<8016f5a8>] (lock_acquire) from [<806ee974>] (lock_sock_nested+0x64/0x88) [60258.240970] r7:00000000 r6:b796e870 r5:00000001 r4:b796e800 [60258.246643] [<806ee910>] (lock_sock_nested) from [<808729f0>] (bt_accept_enqueue+0x3c/0x74) [60258.255004] r8:00000001 r7:ba7d3c00 r6:ba7d3ea4 r5:ba7d2000 r4:b796e800 [60258.261717] [<808729b4>] (bt_accept_enqueue) from [<808aa39c>] (l2cap_sock_new_connection_cb+0x68/0x88) [60258.271117] r5:b796e800 r4:ba7d2000 [60258.274708] [<808aa334>] (l2cap_sock_new_connection_cb) from [<808a294c>] (l2cap_connect+0x190/0x4f8) [60258.283933] r5:00000001 r4:ba6dce00 [60258.287524] [<808a27bc>] (l2cap_connect) from [<808a4a14>] (l2cap_recv_frame+0x744/0x2cf8) [60258.295800] r10:ba6dcf24 r9:00000004 r8:b78d8014 r7:00000004 r6:bb05d000 r5:00000004 [60258.303635] r4:bb05d008 [60258.306183] [<808a42d0>] (l2cap_recv_frame) from [<808a7808>] (l2cap_recv_acldata+0x210/0x214) [60258.314805] r10:b78e7800 r9:bb05d960 r8:00000001 r7:bb05d000 r6:0000000c r5:b7957a80 [60258.322641] r4:ba6dce00 [60258.325188] [<808a75f8>] (l2cap_recv_acldata) from [<8087630c>] (hci_rx_work+0x35c/0x4e8) [60258.333374] r6:80e5743c r5:bb05d7c8 r4:b7957a80 [60258.338004] [<80875fb0>] (hci_rx_work) from [<8013dc7c>] (process_one_work+0x1a4/0x4fc) [60258.346018] r10:00000001 r9:00000000 r8:baabfef8 r7:ba997500 r6:baaba800 r5:baaa5d00 [60258.353853] r4:bb05d7c8 [60258.356401] [<8013dad8>] (process_one_work) from [<8013e028>] (worker_thread+0x54/0x5cc) [60258.364503] r10:baabe038 r9:baaba834 r8:80e05900 r7:00000088 r6:baaa5d18 r5:baaba800 [60258.372338] r4:baaa5d00 [60258.374888] [<8013dfd4>] (worker_thread) from [<801448f8>] (kthread+0x134/0x160) [60258.382295] r10:ba8310b8 r9:bb07dbfc r8:8013dfd4 r7:baaa5d00 r6:00000000 r5:baaa8ac0 [60258.390130] r4:ba831080 [60258.392682] [<801447c4>] (kthread) from [<801080b4>] (ret_from_fork+0x14/0x20) [60258.399915] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:801447c4 [60258.407751] r4:baaa8ac0 r3:baabe000 Signed-off-by: Philipp Puschmann <pp@emlix.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-07-18ipv6: remove dependency of nf_defrag_ipv6 on ipv6 moduleFlorian Westphal
IPV6=m DEFRAG_IPV6=m CONNTRACK=y yields: net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get': net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable' net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6' Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params ip6_frag_init and ip6_expire_frag_queue so it would be needed to force IPV6=y too. This patch gets rid of the 'followup linker error' by removing the dependency of ipv6.ko symbols from netfilter ipv6 defrag. Shared code is placed into a header, then used from both. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nft_socket: Expose socket markMáté Eckl
Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nft_socket: Break evaluation if no socket foundMáté Eckl
Actual implementation stores 0 in the destination register if no socket is found by the lookup, but that is not intentional as it is not really a value of any socket metadata. This patch fixes this and breaks rule evaluation in this case. Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching") Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_osf: add struct nf_osf_hdr_ctxPablo Neira Ayuso
Wrap context that allow us to guess the OS into a structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_osf: add nf_osf_match_one()Pablo Neira Ayuso
This new function allows us to check if there is TCP syn packet matching with a given fingerprint that can be reused from the upcoming new nf_osf_find() function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_tables: use dedicated mutex to guard transactionsFlorian Westphal
Continue to use nftnl subsys mutex to protect (un)registration of hook types, expressions and so on, but force batch operations to do their own locking. This allows distinct net namespaces to perform transactions in parallel. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_tables: avoid global info storageFlorian Westphal
This works because all accesses are currently serialized by nfnl nf_tables subsys mutex. If we want to have per-netns locking, we need to make this scratch area pernetns or allocate it on demand. This does the latter, its ~28kbyte but we can fallback to vmalloc so it should be fine. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_tables: take module reference when starting a batchFlorian Westphal
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_tables: make valid_genid callback mandatoryFlorian Westphal
always call this function, followup patch can use this to aquire a per-netns transaction log to guard the entire batch instead of using the nfnl susbsys mutex (which is shared among all namespaces). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_tables: add and use helper for module autoloadFlorian Westphal
module autoload is problematic, it requires dropping the mutex that protects the transaction. Once the mutex has been dropped, another client can start a new transaction before we had a chance to abort current transaction log. This helper makes sure we first zap the transaction log, then drop mutex for module autoload. In case autload is successful, the caller has to reply entire message anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: Remove useless param helper of nf_ct_helper_ext_addGao Feng
The param helper of nf_ct_helper_ext_add is useless now, then remove it now. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18ipvs: drop conn templates under attackJulian Anastasov
Before now, connection templates were ignored by the random dropentry procedure. But Michal Koutný suggests that we should add exception for connections under SYN attack. He provided patch that implements it for TCP: <quote> IPVS includes protection against filling the ip_vs_conn_tab by dropping 1/32 of feasible entries every second. The template entries (for persistent services) are never directly deleted by this mechanism but when a picked TCP connection entry is being dropped (1), the respective template entry is dropped too (realized by expiring 60 seconds after the connection entry being dropped). There is another mechanism that removes connection entries when they time out (2), in this case the associated template entry is not deleted. Under SYN flood template entries would accumulate (due to their entry longer timeout). The accumulation takes place also with drop_entry being enabled. Roughly 15% ((31/32)^60) of SYN_RECV connections survive the dropping mechanism (1) and are removed by the timeout mechanism (2)(defaults to 60 seconds for SYN_RECV), thus template entries would still accumulate. The patch ensures that when a connection entry times out, we also remove the template entry from the table. To prevent breaking persistent services (since the connection may time out in already established state) we add a new entry flag to protect templates what spawned at least one established TCP connection. </quote> We already added ASSURED flag for the templates in previous patch, so that we can use it now to decide which connection templates should be dropped under attack. But we also have some cases that need special handling. We modify the dropentry procedure as follows: - Linux timers currently use LIFO ordering but we can not rely on this to drop controlling connections. So, set cp->timeout to 0 to indicate that connection was dropped and that on expiration we should try to drop our controlling connections. As result, we can now avoid the ip_vs_conn_expire_now call. - move the cp->n_control check above, so that it avoids restarting the timer for controlling connections when not needed. - drop unassured connection templates here if they are not referred by any connections. On connection expiration: if connection was dropped (cp->timeout=0) try to drop our controlling connection except if it is a template in assured state. In ip_vs_conn_flush change order of ip_vs_conn_expire_now calls according to the LIFO timer expiration order. It should work faster for controlling connections with single controlled one. Suggested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18ipvs: add assured state for conn templatesJulian Anastasov
cp->state was not used for templates. Add support for state bits and for the first "assured" bit which indicates that some connection controlled by this template was established or assured by the real server. In a followup patch we will use it to drop templates under SYN attack. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18ipvs: provide just conn to ip_vs_state_nameJulian Anastasov
In preparation for followup patches, provide just the cp ptr to ip_vs_state_name. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_conntrack: resolve clash for matching conntracksMartynas Pumputis
This patch enables the clash resolution for NAT (disabled in "590b52e10d41") if clashing conntracks match (i.e. both tuples are equal) and a protocol allows it. The clash might happen for a connections-less protocol (e.g. UDP) when two threads in parallel writes to the same socket and consequent calls to "get_unique_tuple" return the same tuples (incl. reply tuples). In this case it is safe to perform the resolution, as the losing CT describes the same mangling as the winning CT, so no modifications to the packet are needed, and the result of rules traversal for the loser's packet stays valid. Signed-off-by: Martynas Pumputis <martynas@weave.works> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree ↵Yi-Hung Wei
search This patch is originally from Florian Westphal. This patch does the following 3 main tasks. 1) Add list lock to 'struct nf_conncount_list' so that we can alter the lists containing the individual connections without holding the main tree lock. It would be useful when we only need to add/remove to/from a list without allocate/remove a node in the tree. With this change, we update nft_connlimit accordingly since we longer need to maintain a list lock in nft_connlimit now. 2) Use RCU for the initial tree search to improve tree look up performance. 3) Add a garbage collection worker. This worker is schedule when there are excessive tree node that needed to be recycled. Moreover,the rbnode reclaim logic is moved from search tree to insert tree to avoid race condition. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_conncount: Split insert and traversalYi-Hung Wei
This patch is originally from Florian Westphal. When we have a very coarse grouping, e.g. by large subnets, zone id, etc, it's likely that we do not need to do tree rotation because we'll find a node where we can attach new entry. Based on this observation, we split tree traversal and insertion. Later on, we can make traversal lockless (tree protected by RCU), and add extra lock in the individual nodes to protect list insertion/deletion, thereby allowing parallel insert/delete in different tree nodes. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_conncount: Move locking into count_tree()Yi-Hung Wei
This patch is originally from Florian Westphal. This is a preparation patch to allow lockless traversal of the tree via RCU. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanupYi-Hung Wei
This patch is originally from Florian Westphal. This patch does the following three tasks. It applies the same early exit technique for nf_conncount_lookup(). Since now we keep the number of connections in 'struct nf_conncount_list', we no longer need to return the count in nf_conncount_lookup(). Moreover, we expose the garbage collection function nf_conncount_gc_list() for nft_connlimit. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>