summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2020-01-08netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is presentFlorian Westphal
The set uadt functions assume lineno is never NULL, but it is in case of ip_set_utest(). syzkaller managed to generate a netlink message that calls this with LINENO attr present: general protection fault: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:hash_mac4_uadt+0x1bc/0x470 net/netfilter/ipset/ip_set_hash_mac.c:104 Call Trace: ip_set_utest+0x55b/0x890 net/netfilter/ipset/ip_set_core.c:1867 nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 nfnetlink_rcv+0x1ba/0x460 net/netfilter/nfnetlink.c:563 pass a dummy lineno storage, its easier than patching all set implementations. This seems to be a day-0 bug. Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Reported-by: syzbot+34bd2369d38707f3f4a7@syzkaller.appspotmail.com Fixes: a7b4f989a6294 ("netfilter: ipset: IP set core support") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-08netfilter: conntrack: dccp, sctp: handle null timeout argumentFlorian Westphal
The timeout pointer can be NULL which means we should modify the per-nets timeout instead. All do this, except sctp and dccp which instead give: general protection fault: 0000 [#1] PREEMPT SMP KASAN net/netfilter/nf_conntrack_proto_dccp.c:682 ctnl_timeout_parse_policy+0x150/0x1d0 net/netfilter/nfnetlink_cttimeout.c:67 cttimeout_default_set+0x150/0x1c0 net/netfilter/nfnetlink_cttimeout.c:368 nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 Reported-by: syzbot+46a4ad33f345d1dd346e@syzkaller.appspotmail.com Fixes: c779e849608a8 ("netfilter: conntrack: remove get_timeout() indirection") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-06netfilter: flowtable: add nf_flowtable_time_stampPablo Neira Ayuso
This patch adds nf_flowtable_time_stamp and updates the existing code to use it. This patch is also implicitly fixing up hardware statistic fetching via nf_flow_offload_stats() where casting to u32 is missing. Use nf_flow_timeout_delta() to fix this. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: wenxu <wenxu@ucloud.cn>
2020-01-05netfilter: nf_tables: unbind callbacks from flowtable destroy pathPablo Neira Ayuso
Callback unbinding needs to be done after nf_flow_table_free(), otherwise entries are not removed from the hardware. Update nft_unregister_flowtable_net_hooks() to call nf_unregister_net_hook() instead since the commit/abort paths do not deal with the callback unbinding anymore. Add a comment to nft_flowtable_event() to clarify that flow_offload_netdev_event() already removes the entries before the callback unbinding. Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane") Fixes ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: wenxu <wenxu@ucloud.cn>
2020-01-05netfilter: nf_flow_table_offload: fix the nat port mangle.wenxu
Shift on 32-bit word to define the port number depends on the flow direction. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Fixes: 7acd9378dc652 ("netfilter: nf_flow_table_offload: Correct memcpy size for flow_overload_mangle()") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-05netfilter: nf_flow_table_offload: check the status of dst_neighwenxu
It is better to get the dst_neigh with neigh->lock and check the nud_state is VALID. If there is not neigh previous, the lookup will Create a non NUD_VALID with 00:00:00:00:00:00 mac. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-05netfilter: nf_flow_table_offload: fix incorrect ethernet dst addresswenxu
Ethernet destination for original traffic takes the source ethernet address in the reply direction. For reply traffic, this takes the source ethernet address of the original direction. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-05netfilter: nft_flow_offload: fix underflow in flowtable reference counterwenxu
The .deactivate and .activate interfaces already deal with the reference counter. Otherwise, this results in spurious "Device is busy" errors. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix endianness issue in flowtable TCP flags dissector, from Arnd Bergmann. 2) Extend flowtable test script with dnat rules, from Florian Westphal. 3) Reject padding in ebtables user entries and validate computed user offset, reported by syzbot, from Florian Westphal. 4) Fix endianness in nft_tproxy, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24net: add bool confirm_neigh parameter for dst_ops.update_pmtuHangbin Liu
The MTU update code is supposed to be invoked in response to real networking events that update the PMTU. In IPv6 PMTU update function __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor confirmed time. But for tunnel code, it will call pmtu before xmit, like: - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() If the tunnel remote dst mac address changed and we still do the neigh confirm, we will not be able to update neigh cache and ping6 remote will failed. So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we should not be invoking dst_confirm_neigh() as we have no evidence of successful two-way communication at this point. On the other hand it is also important to keep the neigh reachability fresh for TCP flows, so we cannot remove this dst_confirm_neigh() call. To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu to choose whether we should do neigh update or not. I will add the parameter in this patch and set all the callers to true to comply with the previous way, and fix the tunnel code one by one on later patches. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Suggested-by: David Miller <davem@davemloft.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso, including adding a missing ipv6 match description. 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi Bhat. 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet. 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold. 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien. 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul Chaignon. 7) Multicast MAC limit test is off by one in qede, from Manish Chopra. 8) Fix established socket lookup race when socket goes from TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening RCU grace period. From Eric Dumazet. 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet. 10) Fix active backup transition after link failure in bonding, from Mahesh Bandewar. 11) Avoid zero sized hash table in gtp driver, from Taehee Yoo. 12) Fix wrong interface passed to ->mac_link_up(), from Russell King. 13) Fix DSA egress flooding settings in b53, from Florian Fainelli. 14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost. 15) Fix double free in dpaa2-ptp code, from Ioana Ciornei. 16) Reject invalid MTU values in stmmac, from Jose Abreu. 17) Fix refcount leak in error path of u32 classifier, from Davide Caratti. 18) Fix regression causing iwlwifi firmware crashes on boot, from Anders Kaseorg. 19) Fix inverted return value logic in llc2 code, from Chan Shu Tak. 20) Disable hardware GRO when XDP is attached to qede, frm Manish Chopra. 21) Since we encode state in the low pointer bits, dst metrics must be at least 4 byte aligned, which is not necessarily true on m68k. Add annotations to fix this, from Geert Uytterhoeven. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits) sfc: Include XDP packet headroom in buffer step size. sfc: fix channel allocation with brute force net: dst: Force 4-byte alignment of dst_metrics selftests: pmtu: fix init mtu value in description hv_netvsc: Fix unwanted rx_table reset net: phy: ensure that phy IDs are correctly typed mod_devicetable: fix PHY module format qede: Disable hardware gro when xdp prog is installed net: ena: fix issues in setting interrupt moderation params in ethtool net: ena: fix default tx interrupt moderation interval net/smc: unregister ib devices in reboot_event net: stmmac: platform: Fix MDIO init for platforms without PHY llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) net: hisilicon: Fix a BUG trigered by wrong bytes_compl net: dsa: ksz: use common define for tag len s390/qeth: don't return -ENOTSUPP to userspace s390/qeth: fix promiscuous mode after reset s390/qeth: handle error due to unsupported transport mode cxgb4: fix refcount init for TC-MQPRIO offload tc-testing: initial tdc selftests for cls_u32 ...
2019-12-20netfilter: nft_tproxy: Fix port selector on Big EndianPhil Sutter
On Big Endian architectures, u16 port value was extracted from the wrong parts of u32 sreg_port, just like commit 10596608c4d62 ("netfilter: nf_tables: fix mismatch in big-endian system") describes. Fixes: 4ed8eb6570a49 ("netfilter: nf_tables: Add native tproxy support") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-20netfilter: nf_flow_table: fix big-endian integer overflowArnd Bergmann
In some configurations, gcc reports an integer overflow: net/netfilter/nf_flow_table_offload.c: In function 'nf_flow_rule_match': net/netfilter/nf_flow_table_offload.c:80:21: error: unsigned conversion from 'int' to '__be16' {aka 'short unsigned int'} changes value from '327680' to '0' [-Werror=overflow] mask->tcp.flags = TCP_FLAG_RST | TCP_FLAG_FIN; ^~~~~~~~~~~~ From what I can tell, we want the upper 16 bits of these constants, so they need to be shifted in cpu-endian mode. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Wait for rcu grace period after releasing netns in ctnetlink, from Florian Westphal. 2) Incorrect command type in flowtable offload ndo invocation, from wenxu. 3) Incorrect callback type in flowtable offload flow tuple updates, also from wenxu. 4) Fix compile warning on flowtable offload infrastructure due to possible reference to uninitialized variable, from Nathan Chancellor. 5) Do not inline nf_ct_resolve_clash(), this is called from slow path / stress situations. From Florian Westphal. 6) Missing IPv6 flow selector description in flowtable offload. 7) Missing check for NETDEV_UNREGISTER in nf_tables offload infrastructure, from wenxu. 8) Update NAT selftest to use randomized netns names, from Florian Westphal. 9) Restore nfqueue bridge support, from Marco Oliverio. 10) Compilation warning in SCTP_CHUNKMAP_*() on xt_sctp header. From Phil Sutter. 11) Fix bogus lookup/get match for non-anonymous rbtree sets. 12) Missing netlink validation for NFT_SET_ELEM_INTERVAL_END elements. 13) Missing netlink validation for NFT_DATA_VALUE after nft_data_init(). 14) If rule specifies no actions, offload infrastructure returns EOPNOTSUPP. 15) Module refcount leak in object updates. 16) Missing sanitization for ARP traffic from br_netfilter, from Eric Dumazet. 17) Compilation breakage on big-endian due to incorrect memcpy() size in the flowtable offload infrastructure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09netfilter: nf_flow_table_offload: Correct memcpy size for flow_overload_mangle()Pablo Neira Ayuso
In function 'memcpy', inlined from 'flow_offload_mangle' at net/netfilter/nf_flow_table_offload.c:112:2, inlined from 'flow_offload_port_dnat' at net/netfilter/nf_flow_table_offload.c:373:2, inlined from 'nf_flow_rule_route_ipv4' at net/netfilter/nf_flow_table_offload.c:424:3: ./include/linux/string.h:376:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter 376 | __read_overflow2(); | ^~~~~~~~~~~~~~~~~~ The original u8* was done in the hope to make this more adaptable but consensus is to keep this like it is in tc pedit. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09netfilter: nf_tables_offload: return EOPNOTSUPP if rule specifies no actionsPablo Neira Ayuso
If the rule only specifies the matching side, return EOPNOTSUPP. Otherwise, the front-end relies on the drivers to reject this rule. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-09netfilter: nf_tables: skip module reference count bump on object updatesPablo Neira Ayuso
Use __nft_obj_type_get() instead, otherwise there is a module reference counter leak. Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-09netfilter: nf_tables: validate NFT_DATA_VALUE after nft_data_init()Pablo Neira Ayuso
Userspace might bogusly sent NFT_DATA_VERDICT in several netlink attributes that assume NFT_DATA_VALUE. Moreover, make sure that error path invokes nft_data_release() to decrement the reference count on the chain object. Fixes: 96518518cc41 ("netfilter: add nftables") Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-09netfilter: nf_tables: validate NFT_SET_ELEM_INTERVAL_ENDPablo Neira Ayuso
Only NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_FLAGS make sense for elements whose NFT_SET_ELEM_INTERVAL_END flag is set on. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-09netfilter: nft_set_rbtree: bogus lookup/get on consecutive elements in named ↵Pablo Neira Ayuso
sets The existing rbtree implementation might store consecutive elements where the closing element and the opening element might overlap, eg. [ a, a+1) [ a+1, a+2) This patch removes the optimization for non-anonymous sets in the exact matching case, where it is assumed to stop searching in case that the closing element is found. Instead, invalidate candidate interval and keep looking further in the tree. The lookup/get operation might return false, while there is an element in the rbtree. Moreover, the get operation returns true as if a+2 would be in the tree. This happens with named sets after several set updates. The existing lookup optimization (that only works for the anonymous sets) might not reach the opening [ a+1,... element if the closing ...,a+1) is found in first place when walking over the rbtree. Hence, walking the full tree in that case is needed. This patch fixes the lookup and get operations. Fixes: e701001e7cbe ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates") Fixes: ba0e4d9917b4 ("netfilter: nf_tables: get set elements via netlink") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-07netfilter: nf_queue: enqueue skbs with NULL dstMarco Oliverio
Bridge packets that are forwarded have skb->dst == NULL and get dropped by the check introduced by b60a77386b1d4868f72f6353d35dabe5fbe981f2 (net: make skb_dst_force return true when dst is refcounted). To fix this we check skb_dst() before skb_dst_force(), so we don't drop skb packet with dst == NULL. This holds also for skb at the PRE_ROUTING hook so we remove the second check. Fixes: b60a77386b1d ("net: make skb_dst_force return true when dst is refcounted") Signed-off-by: Marco Oliverio <marco.oliverio@tanaza.com> Signed-off-by: Rocco Folino <rocco.folino@tanaza.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-06net: core: rename indirect block ingress cb functionJohn Hurley
With indirect blocks, a driver can register for callbacks from a device that is does not 'own', for example, a tunnel device. When registering to or unregistering from a new device, a callback is triggered to generate a bind/unbind event. This, in turn, allows the driver to receive any existing rules or to properly clean up installed rules. When first added, it was assumed that all indirect block registrations would be for ingress offloads. However, the NFP driver can, in some instances, support clsact qdisc binds for egress offload. Change the name of the indirect block callback command in flow_offload to remove the 'ingress' identifier from it. While this does not change functionality, a follow up patch will implement a more more generic callback than just those currently just supporting ingress offload. Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports") Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-02netfilter: nf_tables_offload: Check for the NETDEV_UNREGISTER eventwenxu
Check for the NETDEV_UNREGISTER event from the nft_offload_netdev_event function, which is the event that actually triggers the clean up. Fixes: 06d392cbe3db ("netfilter: nf_tables_offload: remove rules when the device unregisters") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-30netfilter: nf_flow_table_offload: add IPv6 match descriptionPablo Neira Ayuso
Add missing IPv6 matching description to flow_rule object. Fixes: 5c27d8d76ce8 ("netfilter: nf_flow_table_offload: add IPv6 support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-30netfilter: conntrack: tell compiler to not inline nf_ct_resolve_clashFlorian Westphal
At this time compiler inlines it, but this code will not be executed under normal conditions. Also, no inlining allows to use "nf_ct_resolve_clash%return" perf probe. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-30netfilter: nf_flow_table_offload: Don't use offset uninitialized in ↵Nathan Chancellor
flow_offload_port_{d,s}nat Clang warns (trimmed the second warning for brevity): ../net/netfilter/nf_flow_table_offload.c:342:2: warning: variable 'offset' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] default: ^~~~~~~ ../net/netfilter/nf_flow_table_offload.c:346:57: note: uninitialized use occurs here flow_offload_mangle(entry, flow_offload_l4proto(flow), offset, ^~~~~~ ../net/netfilter/nf_flow_table_offload.c:331:12: note: initialize the variable 'offset' to silence this warning u32 offset; ^ = 0 Match what was done in the flow_offload_ipv{4,6}_{d,s}nat functions and just return in the default case, since port would also be uninitialized. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Link: https://github.com/ClangBuiltLinux/linux/issues/780 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kernelci.org bot <bot@kernelci.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-30netfilter: nf_flow_table_offload: Fix block_cb tc_setup_type as ↵wenxu
TC_SETUP_CLSFLOWER Add/del/stats flows through block_cb call must set the tc_setup_type as TC_SETUP_CLSFLOWER. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-30netfilter: nf_flow_table_offload: Fix block setup as TC_SETUP_FT cmdwenxu
Set up block through TC_SETUP_FT command. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-29netfilter: ctnetlink: netns exit must wait for callbacksFlorian Westphal
Curtis Taylor and Jon Maxwell reported and debugged a crash on 3.10 based kernel. Crash occurs in ctnetlink_conntrack_events because net->nfnl socket is NULL. The nfnl socket was set to NULL by netns destruction running on another cpu. The exiting network namespace calls the relevant destructors in the following order: 1. ctnetlink_net_exit_batch This nulls out the event callback pointer in struct netns. 2. nfnetlink_net_exit_batch This nulls net->nfnl socket and frees it. 3. nf_conntrack_cleanup_net_list This removes all remaining conntrack entries. This is order is correct. The only explanation for the crash so ar is: cpu1: conntrack is dying, eviction occurs: -> nf_ct_delete() -> nf_conntrack_event_report \ -> nf_conntrack_eventmask_report -> notify->fcn() (== ctnetlink_conntrack_events). cpu1: a. fetches rcu protected pointer to obtain ctnetlink event callback. b. gets interrupted. cpu2: runs netns exit handlers: a runs ctnetlink destructor, event cb pointer set to NULL. b runs nfnetlink destructor, nfnl socket is closed and set to NULL. cpu1: c. resumes and trips over NULL net->nfnl. Problem appears to be that ctnetlink_net_exit_batch only prevents future callers of nf_conntrack_eventmask_report() from obtaining the callback. It doesn't wait of other cpus that might have already obtained the callbacks address. I don't see anything in upstream kernels that would prevent similar crash: We need to wait for all cpus to have exited the event callback. Fixes: 9592a5c01e79dbc59eb56fa ("netfilter: ctnetlink: netns support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "This is mostly to fix the iwlwifi regression: 1) Flush GRO state properly in iwlwifi driver, from Alexander Lobakin. 2) Validate TIPC link name with properly length macro, from John Rutherford. 3) Fix completion init and device query timeouts in ibmvnic, from Thomas Falcon. 4) Fix SKB size calculation for netlink messages in psample, from Nikolay Aleksandrov. 5) Similar kind of fix for OVS flow dumps, from Paolo Abeni. 6) Handle queue allocation failure unwind properly in gve driver, we could try to release pages we didn't allocate. From Jeroen de Borst. 7) Serialize TX queue SKB list accesses properly in mscc ocelot driver. From Yangbo Lu" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: net: usb: aqc111: Use the correct style for SPDX License Identifier net: phy: Use the correct style for SPDX License Identifier net: wireless: intel: iwlwifi: fix GRO_NORMAL packet stalling net: mscc: ocelot: use skb queue instead of skbs list net: mscc: ocelot: avoid incorrect consuming in skbs list gve: Fix the queue page list allocated pages count net: inet_is_local_reserved_port() port arg should be unsigned short openvswitch: fix flow command message size net: phy: dp83869: Fix return paths to return proper values net: psample: fix skb_over_panic net: usbnet: Fix -Wcast-function-type net: hso: Fix -Wcast-function-type net: port < inet_prot_sock(net) --> inet_port_requires_bind_service(net, port) ibmvnic: Serialize device queries ibmvnic: Bound waits for device queries ibmvnic: Terminate waiting device threads after loss of service ibmvnic: Fix completion structure initialization net-sctp: replace some sock_net(sk) with just 'net' net: Fix a documentation bug wrt. ip_unprivileged_port_start tipc: fix link name length check
2019-11-26Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Linux-kernel memory consistency model updates. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer() net/sched: Replace rcu_swap_protected() with rcu_replace_pointer() net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer() net/core: Replace rcu_swap_protected() with rcu_replace_pointer() bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer() fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer() drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer() drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer() x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer() rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() rcu: Suppress levelspread uninitialized messages rcu: Fix uninitialized variable in nocb_gp_wait() rcu: Update descriptions for rcu_future_grace_period tracepoint rcu: Update descriptions for rcu_nocb_wake tracepoint rcu: Remove obsolete descriptions for rcu_barrier tracepoint rcu: Ensure that ->rcu_urgent_qs is set before resched IPI workqueue: Convert for_each_wq to use built-in list check rcu: Several rcu_segcblist functions can be static rcu: Remove unused function hlist_bl_del_init_rcu() Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch() ...
2019-11-26net: port < inet_prot_sock(net) --> inet_port_requires_bind_service(net, port)Maciej Żenczykowski
Note that the sysctl write accessor functions guarantee that: net->ipv4.sysctl_ip_prot_sock <= net->ipv4.ip_local_ports.range[0] invariant is maintained, and as such the max() in selinux hooks is actually spurious. ie. even though if (snum < max(inet_prot_sock(sock_net(sk)), low) || snum > high) { per logic is the same as if ((snum < inet_prot_sock(sock_net(sk)) && snum < low) || snum > high) { it is actually functionally equivalent to: if (snum < low || snum > high) { which is equivalent to: if (snum < inet_prot_sock(sock_net(sk)) || snum < low || snum > high) { even though the first clause is spurious. But we want to hold on to it in case we ever want to change what what inet_port_requires_bind_service() means (for example by changing it from a, by default, [0..1024) range to some sort of set). Test: builds, git 'grep inet_prot_sock' finds no other references Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20netfilter: nft_payload: add C-VLAN offload supportPablo Neira Ayuso
Match on h_vlan_encapsulated_proto and set up protocol dependency. Check for protocol dependency before accessing the tci field. Allow to match on the encapsulated ethertype too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20netfilter: nft_payload: add VLAN offload supportPablo Neira Ayuso
Match on ethertype and set up protocol dependency. Check for protocol dependency before accessing the tci field. Allow to match on the encapsulated ethertype too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20netfilter: nf_tables_offload: allow ethernet interface type onlyPablo Neira Ayuso
Hardware offload support at this stage assumes an ethernet device in place. The flow dissector provides the intermediate representation to express this selector, so extend it to allow to store the interface type. Flower does not uses this, so skb_flow_dissect_meta() is not extended to match on this new field. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15netfilter: nf_tables: add nft_unregister_flowtable_hook()Pablo Neira Ayuso
Unbind flowtable callback if hook is unregistered. This patch is implicitly fixing the error path of nf_tables_newflowtable() and nft_flowtable_event(). Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane") Reported-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_tables: check if bind callback fails and unbind if hook ↵wenxu
registration fails Undo the callback binding before unregistering the existing hooks. This should also check for error of the bind setup call. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_tables_offload: undo updates if transaction failsPablo Neira Ayuso
The nft_flow_rule_offload_commit() function might fail after several successful commands, thus, leaving the hardware filtering policy in inconsistent state. This patch adds nft_flow_rule_offload_abort() function which undoes the updates that have been already processed if one command in this transaction fails. Hence, the hardware ruleset is left as it was before this aborted transaction. The deletion path needs to create the flow_rule object too, in case that an existing rule needs to be re-added from the abort path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_tables_offload: release flow_rule on error from commit pathPablo Neira Ayuso
If hardware offload commit path fails, release all flow_rule objects. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_tables_offload: remove reference to flow rule from deletion pathPablo Neira Ayuso
The cookie is sufficient to delete the rule from the hardware. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_flow_table: remove unnecessary parameter in flow_offload_fill_dirwenxu
The ct object is already in the flow_offload structure, remove it. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_flow_table_offload: Fix check ndo_setup_tc when setup_blockwenxu
It should check the ndo_setup_tc in the nf_flow_table_offload_setup. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_flow_table_offload: add IPv6 supportPablo Neira Ayuso
Add nf_flow_rule_route_ipv6() and use it from the IPv6 and the inet flowtable type definitions. Rename the nf_flow_rule_route() function to nf_flow_rule_route_ipv4(). Adjust maximum number of actions, which now becomes 16 to leave sufficient room for the IPv6 address mangling for NAT. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nf_flow_table_offload: add flow_action_entry_next() and use itPablo Neira Ayuso
This function retrieves a spare action entry from the array of actions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: nft_meta: use 64-bit time arithmeticArnd Bergmann
On 32-bit architectures, get_seconds() returns an unsigned 32-bit time value, which also matches the type used in the nft_meta code. This will not overflow in year 2038 as a time_t would, but it still suffers from the overflow problem later on in year 2106. Change this instance to use the time64_t type consistently and avoid the deprecated get_seconds(). The nft_meta_weekday() calculation potentially gets a little slower on 32-bit architectures, but now it has the same behavior as on 64-bit architectures and does not overflow. Fixes: 63d10e12b00d ("netfilter: nft_meta: support for time matching") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-15netfilter: xt_time: use time64_tArnd Bergmann
The current xt_time driver suffers from the y2038 overflow on 32-bit architectures, when the time of day calculations break. Also, on both 32-bit and 64-bit architectures, there is a problem with info->date_start/stop, which is part of the user ABI and overflows in in 2106. Fix the first issue by using time64_t and explicit calls to div_u64() and div_u64_rem(), and document the seconds issue. The explicit 64-bit division is unfortunately slower on 32-bit architectures, but doing it as unsigned lets us use the optimized division-through-multiplication path in most configurations. This should be fine, as the code already does not allow any negative time of day values. Using u32 seconds values consistently would probably also work and be a little more efficient, but that doesn't feel right as it would propagate the y2106 overflow to more place rather than fewer. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-13Merge branch 'master' of git://blackhole.kfki.hu/nf-nextPablo Neira Ayuso
Jozsef Kadlecsik says: ==================== ipset patches for nf-next - Add wildcard support to hash:net,iface which makes possible to match interface prefixes besides complete interfaces names, from Kristian Evensen. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-13netfilter: nft_payload: add C-VLAN supportPablo Neira Ayuso
If the encapsulated ethertype announces another inner VLAN header and the offset falls within the boundaries of the inner VLAN header, then adjust arithmetics to include the extra VLAN header length and fetch the bytes from the vlan header in the skbuff data area that represents this inner VLAN header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-13netfilter: nf_tables_offload: pass extack to nft_flow_cls_offload_setup()Pablo Neira Ayuso
Otherwise this leads to a stack corruption. Fixes: c5d275276ff4 ("netfilter: nf_tables_offload: add nft_flow_cls_offload_setup()") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>