summaryrefslogtreecommitdiff
path: root/net/netfilter/nf_conntrack_netlink.c
AgeCommit message (Collapse)Author
2021-12-16netfilter: ctnetlink: remove expired entries firstFlorian Westphal
When dumping conntrack table to userspace via ctnetlink, check if the ct has already expired before doing any of the 'skip' checks. This expires dead entries faster. /proc handler also removes outdated entries first. Reported-by: Vitaly Zuevsky <vzuevsky@ns1.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08netfilter: conntrack: annotate data-races around ct->timeoutEric Dumazet
(struct nf_conn)->timeout can be read/written locklessly, add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing. BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0: __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563 init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635 resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline] nf_hook_slow+0x72/0x170 net/netfilter/core.c:619 nf_hook include/linux/netfilter.h:262 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline] tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864 tcp_push_pending_frames include/net/tcp.h:1897 [inline] tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452 tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0xf2/0x270 net/core/sock.c:2768 release_sock+0x40/0x110 net/core/sock.c:3300 sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145 tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] __sys_sendto+0x21e/0x2c0 net/socket.c:2036 __do_sys_sendto net/socket.c:2048 [inline] __se_sys_sendto net/socket.c:2044 [inline] __x64_sys_sendto+0x74/0x90 net/socket.c:2044 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1: nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline] ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline] __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807 resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline] nf_hook_slow+0x72/0x170 net/netfilter/core.c:619 nf_hook include/linux/netfilter.h:262 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402 __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956 tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962 __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478 tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline] tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0xf2/0x270 net/core/sock.c:2768 release_sock+0x40/0x110 net/core/sock.c:3300 tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118 rds_send_xmit+0xbed/0x1500 net/rds/send.c:367 rds_send_worker+0x43/0x200 net/rds/threads.c:200 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298 worker_thread+0x616/0xa70 kernel/workqueue.c:2445 kthread+0x2c7/0x2e0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 value changed: 0x00027cc2 -> 0x00000000 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G W 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: krdsd rds_send_worker Note: I chose an arbitrary commit for the Fixes: tag, because I do not think we need to backport this fix to very old kernels. Fixes: e37542ba111f ("netfilter: conntrack: avoid possible false sharing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-08netfilter: ctnetlink: do not erase error code with EINVALFlorent Fourcot
And be consistent in error management for both orig/reply filtering Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-08netfilter: ctnetlink: fix filtering with CTA_TUPLE_REPLYFlorent Fourcot
filter->orig_flags was used for a reply context. Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Protect nft_ct template with global mutex, from Pavel Skripkin. 2) Two recent commits switched inet rt and nexthop exception hashes from jhash to siphash. If those two spots are problematic then conntrack is affected as well, so switch voer to siphash too. While at it, add a hard upper limit on chain lengths and reject insertion if this is hit. Patches from Florian Westphal. 3) Fix use-after-scope in nf_socket_ipv6 reported by KASAN, from Benjamin Hesmans. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: socket: icmp6: fix use-after-scope netfilter: refuse insertion if chain has grown too large netfilter: conntrack: switch to siphash netfilter: conntrack: sanitize table size default settings netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex ==================== Link: https://lore.kernel.org/r/20210903163020.13741-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30netfilter: refuse insertion if chain has grown too largeFlorian Westphal
Also add a stat counter for this that gets exported both via old /proc interface and ctnetlink. Assuming the old default size of 16536 buckets and max hash occupancy of 64k, this results in 128k insertions (origin+reply), so ~8 entries per chain on average. The revised settings in this series will result in about two entries per bucket on average. This allows a hard-limit ceiling of 64. This is not tunable at the moment, but its possible to either increase nf_conntrack_buckets or decrease nf_conntrack_max to reduce average lengths. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25netfilter: ctnetlink: missing counters and timestamp in nfnetlink_{log,queue}Pablo Neira Ayuso
Add counters and timestamps (if available) to the conntrack object that is represented in nfnetlink_log and _queue messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25netfilter: ecache: remove nf_exp_event_notifier structureFlorian Westphal
Reuse the conntrack event notofier struct, this allows to remove the extra register/unregister functions and avoids a pointer in struct net. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25netfilter: ecache: prepare for event notifier mergeFlorian Westphal
This prepares for merge for ct and exp notifier structs. The 'fcn' member is renamed to something unique. Second, the register/unregister api is simplified. There is only one implementation so there is no need to do any error checking. Replace the EBUSY logic with WARN_ON_ONCE. This allows to remove error unwinding. The exp notifier register/unregister function is removed in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-25netfilter: ecache: remove one indent levelFlorian Westphal
nf_conntrack_eventmask_report and nf_ct_deliver_cached_events shared most of their code. This unifies the layout by changing if (nf_ct_is_confirmed(ct)) { foo } to if (!nf_ct_is_confirmed(ct))) return foo This removes one level of indentation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-05netfilter: ctnetlink: allow to filter dump by status bitsFlorian Westphal
If CTA_STATUS is present, but CTA_STATUS_MASK is not, then the mask is automatically set to 'status', so that kernel returns those entries that have all of the requested bits set. This makes more sense than using a all-one mask since we'd hardly ever find a match. There are no other checks for status bits, so if e.g. userspace sets impossible combinations it will get an empty dump. If kernel would reject unknown status bits, then a program that works on a future kernel that has IPS_FOO bit fails on old kernels. Same for 'impossible' combinations: Kernel never sets ASSURED without first having set SEEN_REPLY, but its possible that a future kernel could do so. Therefore no sanity tests other than a 0-mask. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-05netfilter: ctnetlink: add and use a helper for mark parsingFlorian Westphal
ctnetlink dumps can be filtered based on the connmark. Prepare for status bit filtering by using a named structure and by moving the mark parsing code to a helper. Else ctnetlink_alloc_filter size grows a bit too big for my taste when status handling is added. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-02netfilter: ctnetlink: suspicious RCU usage in ctnetlink_dump_helpinfoVasily Averin
Two patches listed below removed ctnetlink_dump_helpinfo call from under rcu_read_lock. Now its rcu_dereference generates following warning: ============================= WARNING: suspicious RCU usage 5.13.0+ #5 Not tainted ----------------------------- net/netfilter/nf_conntrack_netlink.c:221 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 stack backtrace: CPU: 1 PID: 2251 Comm: conntrack Not tainted 5.13.0+ #5 Call Trace: dump_stack+0x7f/0xa1 ctnetlink_dump_helpinfo+0x134/0x150 [nf_conntrack_netlink] ctnetlink_fill_info+0x2c2/0x390 [nf_conntrack_netlink] ctnetlink_dump_table+0x13f/0x370 [nf_conntrack_netlink] netlink_dump+0x10c/0x370 __netlink_dump_start+0x1a7/0x260 ctnetlink_get_conntrack+0x1e5/0x250 [nf_conntrack_netlink] nfnetlink_rcv_msg+0x613/0x993 [nfnetlink] netlink_rcv_skb+0x50/0x100 nfnetlink_rcv+0x55/0x120 [nfnetlink] netlink_unicast+0x181/0x260 netlink_sendmsg+0x23f/0x460 sock_sendmsg+0x5b/0x60 __sys_sendto+0xf1/0x160 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 49ca022bccc5 ("netfilter: ctnetlink: don't dump ct extensions of unconfirmed conntracks") Fixes: 0b35f6031a00 ("netfilter: Remove duplicated rcu_read_lock.") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07netfilter: nfnetlink: add struct nfgenmsg to struct nfnl_info and use itPablo Neira Ayuso
Update the nfnl_info structure to add a pointer to the nfnetlink header. This simplifies the existing codebase since this header is usually accessed. Update existing clients to use this new field. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: use nfnetlink_unicast()Pablo Neira Ayuso
Replace netlink_unicast() calls by nfnetlink_unicast() which already deals with translating EAGAIN to ENOBUFS as the nfnetlink core expects. nfnetlink_unicast() calls nlmsg_unicast() which returns zero in case of success, otherwise the netlink core function netlink_rcv_skb() turns err > 0 into an acknowlegment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: nfnetlink: consolidate callback typesPablo Neira Ayuso
Add enum nfnl_callback_type to identify the callback type to provide one single callback. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-26netfilter: nfnetlink: add struct nfnl_info and pass it to callbacksPablo Neira Ayuso
Add a new structure to reduce callback footprint and to facilite extensions of the nfnetlink callback interface in the future. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13netfilter: conntrack: move ct counter to net_generic dataFlorian Westphal
Its only needed from slowpath (sysctl, ctnetlink, gc worker) and when a new conntrack object is allocated. Furthermore, each write dirties the otherwise read-mostly pernet data in struct net.ct, which are accessed from packet path. Move it to the net_generic data. This makes struct netns_ct read-mostly. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: add helper function to set up the nfnetlink header and use itPablo Neira Ayuso
This patch adds a helper function to set up the netlink and nfnetlink headers. Update existing codebase to use it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-15netfilter: ctnetlink: fix dump of the expect mask attributeFlorian Westphal
Before this change, the mask is never included in the netlink message, so "conntrack -E expect" always prints 0.0.0.0. In older kernels the l3num callback struct was passed as argument, based on tuple->src.l3num. After the l3num indirection got removed, the call chain is based on m.src.l3num, but this value is 0xffff. Init l3num to the correct value. Fixes: f957be9d349a3 ("netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-25netfilter: ctnetlink: remove get_ct indirectionFlorian Westphal
Use nf_ct_get() directly, its a small inline helper without dependencies. Add CONFIG_NF_CONNTRACK guards to elide the relevant part when conntrack isn't available at all. v2: add ifdef guard around nf_ct_get call (kernel test robot) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12netfilter: ctnetlink: add timeout and protoinfo to destroy eventsFlorian Westphal
DESTROY events do not include the remaining timeout. Add the timeout if the entry was removed explicitly. This can happen when a conntrack gets deleted prematurely, e.g. due to a tcp reset, module removal, netdev notifier (nat/masquerade device went down), ctnetlink and so on. Add the protocol state too for the destroy message to check for abnormal state on connection termination. Joint work with Pablo. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-08netfilter: ctnetlink: fix mark based dump filtering regressionMartin Willi
conntrack mark based dump filtering may falsely skip entries if a mask is given: If the mask-based check does not filter out the entry, the else-if check is always true and compares the mark without considering the mask. The if/else-if logic seems wrong. Given that the mask during filter setup is implicitly set to 0xffffffff if not specified explicitly, the mark filtering flags seem to just complicate things. Restore the previously used approach by always matching against a zero mask is no filter mark is given. Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-08netfilter: ctnetlink: add a range check for l3/l4 protonumWill McVicker
The indexes to the nf_nat_l[34]protos arrays come from userspace. So check the tuple's family, e.g. l3num, when creating the conntrack in order to prevent an OOB memory access during setup. Here is an example kernel panic on 4.14.180 when userspace passes in an index greater than NFPROTO_NUMPROTO. Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in:... Process poc (pid: 5614, stack limit = 0x00000000a3933121) CPU: 4 PID: 5614 Comm: poc Tainted: G S W O 4.14.180-g051355490483 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM task: 000000002a3dfffe task.stack: 00000000a3933121 pc : __cfi_check_fail+0x1c/0x24 lr : __cfi_check_fail+0x1c/0x24 ... Call trace: __cfi_check_fail+0x1c/0x24 name_to_dev_t+0x0/0x468 nfnetlink_parse_nat_setup+0x234/0x258 ctnetlink_parse_nat_setup+0x4c/0x228 ctnetlink_new_conntrack+0x590/0xc40 nfnetlink_rcv_msg+0x31c/0x4d4 netlink_rcv_skb+0x100/0x184 nfnetlink_rcv+0xf4/0x180 netlink_unicast+0x360/0x770 netlink_sendmsg+0x5a0/0x6a4 ___sys_sendmsg+0x314/0x46c SyS_sendmsg+0xb4/0x108 el0_svc_naked+0x34/0x38 This crash is not happening since 5.4+, however, ctnetlink still allows for creating entries with unsupported layer 3 protocol number. Fixes: c1d10adb4a521 ("[NETFILTER]: Add ctnetlink port for nf_conntrack") Signed-off-by: Will McVicker <willmcvicker@google.com> [pablo@netfilter.org: rebased original patch on top of nf.git] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: conntrack: add clash resolution stat counterFlorian Westphal
There is a misconception about what "insert_failed" means. We increment this even when a clash got resolved, so it might not indicate a problem. Add a dedicated counter for clash resolution and only increment insert_failed if a clash cannot be resolved. For the old /proc interface, export this in place of an older stat that got removed a while back. For ctnetlink, export this with a new attribute. Also correct an outdated comment that implies we add a duplicate tuple -- we only add the (unique) reply direction. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: conntrack: remove ignore statsFlorian Westphal
This counter increments when nf_conntrack_in sees a packet that already has a conntrack attached or when the packet is marked as UNTRACKED. Neither is an error. The former is normal for loopback traffic. The second happens for certain ICMPv6 packets or when nftables/ip(6)tables rules are in place. In case someone needs to count UNTRACKED packets, or packets that are marked as untracked before conntrack_in this can be done with both nftables and ip(6)tables rules. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-10netfilter: ctnetlink: memleak in filter initialization error pathPablo Neira Ayuso
Release the filter object in case of error. Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Reported-by: syzbot+38b8b548a851a01793c5@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-27netfilter: ctnetlink: add kernel side filtering for dumpRomain Bellan
Conntrack dump does not support kernel side filtering (only get exists, but it returns only one entry. And user has to give a full valid tuple) It means that userspace has to implement filtering after receiving many irrelevant entries, consuming resources (conntrack table is sometimes very huge, much more than a routing table for example). This patch adds filtering in kernel side. To achieve this goal, we: * Add a new CTA_FILTER netlink attributes, actually a flag list to parametize filtering * Convert some *nlattr_to_tuple() functions, to allow a partial parsing of CTA_TUPLE_ORIG and CTA_TUPLE_REPLY (so nf_conntrack_tuple it not fully set) Filtering is now possible on: * IP SRC/DST values * Ports for TCP and UDP flows * IMCP(v6) codes types and IDs Filtering is done as an "AND" operator. For example, when flags PROTO_SRC_PORT, PROTO_NUM and IP_SRC are sets, only entries matching all values are dumped. Changes since v1: Set NLM_F_DUMP_FILTERED in nlm flags if entries are filtered Changes since v2: Move several constants to nf_internals.h Move a fix on netlink values check in a separate patch Add a check on not-supported flags Return EOPNOTSUPP if CDA_FILTER is set in ctnetlink_flush_conntrack (not yet implemented) Code style issues Changes since v3: Fix compilation warning reported by kbuild test robot Changes since v4: Fix a regression introduced in v3 (returned EINVAL for valid netlink messages without CTA_MARK) Changes since v5: Change definition of CTA_FILTER_F_ALL Fix a regression when CTA_TUPLE_ZONE is not set Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: ctnetlink: be more strict when NF_CONNTRACK_MARK is not setRomain Bellan
When CONFIG_NF_CONNTRACK_MARK is not set, any CTA_MARK or CTA_MARK_MASK in netlink message are not supported. We should return an error when one of them is set, not both Fixes: 9306425b70bf ("netfilter: ctnetlink: must check mark attributes vs NULL") Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27netfilter: ctnetlink: Add missing annotation for ctnetlink_parse_nat_setup()Jules Irenge
Sparse reports a warning at ctnetlink_parse_nat_setup() warning: context imbalance in ctnetlink_parse_nat_setup() - unexpected unlock The root cause is the missing annotation at ctnetlink_parse_nat_setup() Add the missing __must_hold(RCU) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-29netfilter: ctnetlink: netns exit must wait for callbacksFlorian Westphal
Curtis Taylor and Jon Maxwell reported and debugged a crash on 3.10 based kernel. Crash occurs in ctnetlink_conntrack_events because net->nfnl socket is NULL. The nfnl socket was set to NULL by netns destruction running on another cpu. The exiting network namespace calls the relevant destructors in the following order: 1. ctnetlink_net_exit_batch This nulls out the event callback pointer in struct netns. 2. nfnetlink_net_exit_batch This nulls net->nfnl socket and frees it. 3. nf_conntrack_cleanup_net_list This removes all remaining conntrack entries. This is order is correct. The only explanation for the crash so ar is: cpu1: conntrack is dying, eviction occurs: -> nf_ct_delete() -> nf_conntrack_event_report \ -> nf_conntrack_eventmask_report -> notify->fcn() (== ctnetlink_conntrack_events). cpu1: a. fetches rcu protected pointer to obtain ctnetlink event callback. b. gets interrupted. cpu2: runs netns exit handlers: a runs ctnetlink destructor, event cb pointer set to NULL. b runs nfnetlink destructor, nfnl socket is closed and set to NULL. cpu1: c. resumes and trips over NULL net->nfnl. Problem appears to be that ctnetlink_net_exit_batch only prevents future callers of nf_conntrack_eventmask_report() from obtaining the callback. It doesn't wait of other cpus that might have already obtained the callbacks address. I don't see anything in upstream kernels that would prevent similar crash: We need to wait for all cpus to have exited the event callback. Fixes: 9592a5c01e79dbc59eb56fa ("netfilter: ctnetlink: netns support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-17netfilter: ctnetlink: don't dump ct extensions of unconfirmed conntracksFlorian Westphal
When dumping the unconfirmed lists, the cpu that is processing the ct entry can reallocate ct->ext at any time. Right now accessing the extensions from another CPU is ok provided we're holding rcu read lock: extension reallocation does use rcu. Once RCU isn't used anymore this becomes unsafe, so skip extensions for the unconfirmed list. Dumping the extension area for confirmed or dying conntracks is fine: no reallocations are allowed and list iteration holds appropriate locks that prevent ct (and this ct->ext) from getting free'd. v2: fix compiler warnings due to misue of 'const' and missing return statement (kbuild robot). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-03netfilter: ctnetlink: honor IPS_OFFLOAD flagPablo Neira Ayuso
If this flag is set, timeout and state are irrelevant to userspace. Fixes: 90964016e5d3 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-16netfilter: nf_conntrack_sip: fix expectation clashxiao ruizhu
When conntracks change during a dialog, SDP messages may be sent from different conntracks to establish expects with identical tuples. In this case expects conflict may be detected for the 2nd SDP message and end up with a process failure. The fixing here is to reuse an existing expect who has the same tuple for a different conntrack if any. Here are two scenarios for the case. 1) SERVER CPE | INVITE SDP | 5060 |<----------------------|5060 | 100 Trying | 5060 |---------------------->|5060 | 183 SDP | 5060 |---------------------->|5060 ===> Conntrack 1 | PRACK | 50601 |<----------------------|5060 | 200 OK (PRACK) | 50601 |---------------------->|5060 | 200 OK (INVITE) | 5060 |---------------------->|5060 | ACK | 50601 |<----------------------|5060 | | |<--- RTP stream ------>| | | | INVITE SDP (t38) | 50601 |---------------------->|5060 ===> Conntrack 2 With a certain configuration in the CPE, SIP messages "183 with SDP" and "re-INVITE with SDP t38" will go through the sip helper to create expects for RTP and RTCP. It is okay to create RTP and RTCP expects for "183", whose master connection source port is 5060, and destination port is 5060. In the "183" message, port in Contact header changes to 50601 (from the original 5060). So the following requests e.g. PRACK and ACK are sent to port 50601. It is a different conntrack (let call Conntrack 2) from the original INVITE (let call Conntrack 1) due to the port difference. In this example, after the call is established, there is RTP stream but no RTCP stream for Conntrack 1, so the RTP expect created upon "183" is cleared, and RTCP expect created for Conntrack 1 retains. When "re-INVITE with SDP t38" arrives to create RTP&RTCP expects, current ALG implementation will call nf_ct_expect_related() for RTP and RTCP. The expects tuples are identical to those for Conntrack 1. RTP expect for Conntrack 2 succeeds in creation as the one for Conntrack 1 has been removed. RTCP expect for Conntrack 2 fails in creation because it has idential tuples and 'conflict' with the one retained for Conntrack 1. And then result in a failure in processing of the re-INVITE. 2) SERVER A CPE | REGISTER | 5060 |<------------------| 5060 ==> CT1 | 200 | 5060 |------------------>| 5060 | | | INVITE SDP(1) | 5060 |<------------------| 5060 | 300(multi choice) | 5060 |------------------>| 5060 SERVER B | ACK | 5060 |<------------------| 5060 | INVITE SDP(2) | 5060 |-------------------->| 5060 ==> CT2 | 100 | 5060 |<--------------------| 5060 | 200(contact changes)| 5060 |<--------------------| 5060 | ACK | 5060 |-------------------->| 50601 ==> CT3 | | |<--- RTP stream ---->| | | | BYE | 5060 |<--------------------| 50601 | 200 | 5060 |-------------------->| 50601 | INVITE SDP(3) | 5060 |<------------------| 5060 ==> CT1 CPE sends an INVITE request(1) to Server A, and creates a RTP&RTCP expect pair for this Conntrack 1 (CT1). Server A responds 300 to redirect to Server B. The RTP&RTCP expect pairs created on CT1 are removed upon 300 response. CPE sends the INVITE request(2) to Server B, and creates an expect pair for the new conntrack (due to destination address difference), let call CT2. Server B changes the port to 50601 in 200 OK response, and the following requests ACK and BYE from CPE are sent to 50601. The call is established. There is RTP stream and no RTCP stream. So RTP expect is removed and RTCP expect for CT2 retains. As BYE request is sent from port 50601, it is another conntrack, let call CT3, different from CT2 due to the port difference. So the BYE request will not remove the RTCP expect for CT2. Then another outgoing call is made, with the same RTP port being used (not definitely but possibly). CPE firstly sends the INVITE request(3) to Server A, and tries to create a RTP&RTCP expect pairs for this CT1. In current ALG implementation, the RTCP expect for CT1 fails in creation because it 'conflicts' with the residual one for CT2. As a result the INVITE request fails to send. Signed-off-by: xiao ruizhu <katrina.xiaorz@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-26netfilter: ctnetlink: Fix regression in conntrack entry deletionFelix Kaechele
Commit f8e608982022 ("netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression") introduced a regression in which deletion of conntrack entries would fail because the L3 protocol information is replaced by AF_UNSPEC. As a result the search for the entry to be deleted would turn up empty due to the tuple used to perform the search is now different from the tuple used to initially set up the entry. For flushing the conntrack table we do however want to keep the option for nfgenmsg->version to have a non-zero value to allow for newer user-space tools to request treatment under the new behavior. With that it is possible to independently flush tables for a defined L3 protocol. This was introduced with the enhancements in in commit 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter on flush"). Older user-space tools will retain the behavior of flushing all tables regardless of defined L3 protocol. Fixes: f8e608982022 ("netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression") Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Felix Kaechele <felix@kaechele.ca> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Postpone chain policy update to drop after transaction is complete, from Florian Westphal. 2) Add entry to flowtable after confirmation to fix UDP flows with packets going in one single direction. 3) Reference count leak in dst object, from Taehee Yoo. 4) Check for TTL field in flowtable datapath, from Taehee Yoo. 5) Fix h323 conntrack helper due to incorrect boundary check, from Jakub Jankowski. 6) Fix incorrect rcu dereference when fetching basechain stats, from Florian Westphal. 7) Missing error check when adding new entries to flowtable, from Taehee Yoo. 8) Use version field in nfnetlink message to honor the nfgen_family field, from Kristian Evensen. 9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6, from Subash Abhinov Kasiviswanathan. 10) Prevent dying entries from being added to the flowtable, from Taehee Yoo. 11) Don't hit WARN_ON() with malformed blob in ebtables with trailing data after last rule, reported by syzbot, patch from Florian Westphal. 12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel code. 13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian Westphal. This batch comes with a conflict that can be fixed with this patch: diff --cc include/uapi/linux/netfilter/nf_tables.h index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address) * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address) * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address) - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack + * @NFT_CT_ID: conntrack id */ enum nft_ct_keys { NFT_CT_STATE, @@@ -991,6 -993,8 +992,7 @@@ NFT_CT_DST_IP, NFT_CT_SRC_IP6, NFT_CT_DST_IP6, - NFT_CT_TIMEOUT, + NFT_CT_ID, __NFT_CT_MAX }; #define NFT_CT_MAX (__NFT_CT_MAX - 1) That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer, I can also solve this conflict here, just let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-06netfilter: ctnetlink: Resolve conntrack L3-protocol flush regressionKristian Evensen
Commit 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter on flush") introduced a user-space regression when flushing connection track entries. Before this commit, the nfgen_family field was not used by the kernel and all entries were removed. Since this commit, nfgen_family is used to filter out entries that should not be removed. One example a broken tool is conntrack. conntrack always sets nfgen_family to AF_INET, so after 59c08c69c278 only IPv4 entries were removed with the -F parameter. Pablo Neira Ayuso suggested using nfgenmsg->version to resolve the regression, and this commit implements his suggestion. nfgenmsg->version is so far set to zero, so it is well-suited to be used as a flag for selecting old or new flush behavior. If version is 0, nfgen_family is ignored and all entries are used. If user-space sets the version to one (or any other value than 0), then the new behavior is used. As version only can have two valid values, I chose not to add a new NFNETLINK_VERSION-constant. Fixes: 59c08c69c278 ("netfilter: ctnetlink: Support L3 protocol-filter on flush") Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-27netlink: make validation more configurable for future strictnessJohannes Berg
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-15netfilter: ctnetlink: don't use conntrack/expect object addresses as idFlorian Westphal
else, we leak the addresses to userspace via ctnetlink events and dumps. Compute an ID on demand based on the immutable parts of nf_conn struct. Another advantage compared to using an address is that there is no immediate re-use of the same ID in case the conntrack entry is freed and reallocated again immediately. Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID") Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: replace NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)Florian Westphal
NF_NAT_NEEDED is true whenever nat support for either ipv4 or ipv6 is enabled. Now that the af-specific nat configuration switches have been removed, IS_ENABLED(CONFIG_NF_NAT) has the same effect. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.hFlorian Westphal
The l3proto name is gone, its header file is the last trace. While at it, also remove nf_nat_core.h, its very small and all users include nf_nat.h too. before: text data bss dec hex filename 22948 1612 4136 28696 7018 nf_nat.ko after removal of l3proto register/unregister functions: text data bss dec hex filename 22196 1516 4136 27848 6cc8 nf_nat.ko checkpatch complains about overly long lines, but line breaks do not make things more readable and the line length gets smaller here, not larger. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12netfilter: conntrack: fix indentation issueColin Ian King
A statement in an if block is not indented correctly. Fix this. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove nf_ct_l4proto_find_getFlorian Westphal
Its now same as __nf_ct_l4proto_find(), so rename that to nf_ct_l4proto_find and use it everywhere. It never returns NULL and doesn't need locks or reference counts. Before this series: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko After: 294864 net/netfilter/nf_conntrack.ko text data bss dec hex filename 106979 19557 240 126776 1ef38 nf_conntrack.ko so, even with builtin gre, total size got reduced. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-17netfilter: nat: remove nf_nat_l4proto structFlorian Westphal
This removes the (now empty) nf_nat_l4proto struct, all its instances and all the no longer needed runtime (un)register functionality. nf_nat_need_gre() can be axed as well: the module that calls it (to load the no-longer-existing nat_gre module) also calls other nat core functions. GRE nat is now always available if kernel is built with it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12netfilter: ctnetlink: always honor CTA_MARK_MASKAndreas Jaggi
Useful to only set a particular range of the conntrack mark while leaving existing parts of the value alone, e.g. when updating conntrack marks via netlink from userspace. For NFQUEUE it was already implemented in commit 534473c6080e ("netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark"). This now adds the same functionality also for the other netlink conntrack mark changes. Signed-off-by: Andreas Jaggi <andreas.jaggi@waterwave.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-21netfilter: ctnetlink: must check mark attributes vs NULLFlorian Westphal
else we will oops (null deref) when the attributes aren't present. Also add back the EOPNOTSUPP in case MARK filtering is requested but kernel doesn't support it. Fixes: 59c08c69c2788 ("netfilter: ctnetlink: Support L3 protocol-filter on flush") Reported-by: syzbot+e45eda8eda6e93a03959@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: remove l3->l4 mapping informationFlorian Westphal
l4 protocols are demuxed by l3num, l4num pair. However, almost all l4 trackers are l3 agnostic. Only exceptions are: - gre, icmp (ipv4 only) - icmpv6 (ipv6 only) This commit gets rid of the l3 mapping, l4 trackers can now be looked up by their IPPROTO_XXX value alone, which gets rid of the additional l3 indirection. For icmp, ipcmp6 and gre, add a check on state->pf and return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4, this seems more fitting than using the generic tracker. Additionally we can kill the 2nd l4proto definitions that were needed for v4/v6 split -- they are now the same so we can use single l4proto struct for each protocol, rather than two. The EXPORT_SYMBOLs can be removed as all these object files are part of nf_conntrack with no external references. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>