summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-01-12sit: allow encapsulated IPv6 traffic to be delivered locallyIgnat Korchagin
While experimenting with FOU encapsulation Amir noticed that encapsulated IPv6 traffic fails to be delivered, if the peer IP address is configured locally. It can be easily verified by creating a sit interface like below: $ sudo ip link add name fou_test type sit remote 127.0.0.1 encap fou encap-sport auto encap-dport 1111 $ sudo ip link set fou_test up and sending some IPv4 and IPv6 traffic to it $ ping -I fou_test -c 1 1.1.1.1 $ ping6 -I fou_test -c 1 fe80::d0b0:dfff:fe4c:fcbc "tcpdump -i any udp dst port 1111" will confirm that only the first IPv4 ping was encapsulated and attempted to be delivered. This seems like a limitation: for example, in a cloud environment the "peer" service may be arbitrarily scheduled on any server within the cluster, where all nodes are trying to send encapsulated traffic. And the unlucky node will not be able to. Moreover, delivering encapsulated IPv4 traffic locally is allowed. But I may not have all the context about this restriction and this code predates the observable git history. Reported-by: Amir Razmjou <arazmjou@cloudflare.com> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220107123842.211335-1-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-12Merge tag 'tty-5.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the big set of tty/serial driver updates for 5.17-rc1. Nothing major in here, just lots of good updates and fixes, including: - more tty core cleanups from Jiri as well as mxser driver cleanups. This is the majority of the core diffstat - tty documentation updates from Jiri - platform_get_irq() updates - various serial driver updates for new features and hardware - fifo usage for 8250 console, reducing cpu load a lot - LED fix for keyboards, long-time bugfix that went through many revisions - minor cleanups All have been in linux-next for a while with no reported problems" * tag 'tty-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits) serial: core: Keep mctrl register state and cached copy in sync serial: stm32: correct loop for dma error handling serial: stm32: fix flow control transfer in DMA mode serial: stm32: rework TX DMA state condition serial: stm32: move tx dma terminate DMA to shutdown serial: pl011: Drop redundant DTR/RTS preservation on close/open serial: pl011: Drop CR register reset on set_termios serial: pl010: Drop CR register reset on set_termios serial: liteuart: fix MODULE_ALIAS serial: 8250_bcm7271: Fix return error code in case of dma_alloc_coherent() failure Revert "serdev: BREAK/FRAME/PARITY/OVERRUN notification prototype V2" tty: goldfish: Use platform_get_irq() to get the interrupt serdev: BREAK/FRAME/PARITY/OVERRUN notification prototype V2 tty: serial: meson: Drop the legacy compatible strings and clock code serial: pmac_zilog: Use platform_get_irq() to get the interrupt serial: bcm63xx: Use platform_get_irq() to get the interrupt serial: ar933x: Use platform_get_irq() to get the interrupt serial: vt8500: Use platform_get_irq() to get the interrupt serial: altera_jtaguart: Use platform_get_irq_optional() to get the interrupt serial: pxa: Use platform_get_irq() to get the interrupt ...
2022-01-12net/smc: fix possible NULL deref in smc_pnet_add_eth()Eric Dumazet
I missed that @ndev value can be NULL. I prefer not factorizing this NULL check, and instead clearly document where a NULL might be expected. general protection fault, probably for non-canonical address 0xdffffc00000000ba: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000005d0-0x00000000000005d7] CPU: 0 PID: 19875 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__lock_acquire+0xd7a/0x5470 kernel/locking/lockdep.c:4897 Code: 14 0e 41 bf 01 00 00 00 0f 86 c8 00 00 00 89 05 5c 20 14 0e e9 bd 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 9f 2e 00 00 49 81 3e 20 c5 1a 8f 0f 84 52 f3 ff RSP: 0018:ffffc900057071d0 EFLAGS: 00010002 RAX: dffffc0000000000 RBX: 1ffff92000ae0e65 RCX: 1ffff92000ae0e4c RDX: 00000000000000ba RSI: 0000000000000000 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 R10: fffffbfff1b24ae2 R11: 000000000008808a R12: 0000000000000000 R13: ffff888040ca4000 R14: 00000000000005d0 R15: 0000000000000000 FS: 00007fbd683e0700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2be22000 CR3: 0000000013fea000 CR4: 00000000003526f0 Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5637 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162 ref_tracker_alloc+0x182/0x440 lib/ref_tracker.c:84 netdev_tracker_alloc include/linux/netdevice.h:3859 [inline] smc_pnet_add_eth net/smc/smc_pnet.c:372 [inline] smc_pnet_enter net/smc/smc_pnet.c:492 [inline] smc_pnet_add+0x49a/0x14d0 net/smc/smc_pnet.c:555 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: b60645248af3 ("net/smc: add net device tracker to struct smc_pnetentry") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-12net: bridge: fix net device refcount tracking issue in error pathEric Dumazet
I left one dev_put() in br_add_if() error path and sure enough syzbot found its way. As the tracker is allocated in new_nbp(), we must make sure to properly free it. We have to call dev_put_track(dev, &p->dev_tracker) before @p object is freed, of course. This is not an issue because br_add_if() owns a reference on @dev. Fixes: b2dcdc7f731d ("net: bridge: add net device refcount tracker") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-12net: fix sock_timestamping_bind_phc() to release deviceMiroslav Lichvar
Don't forget to release the device in sock_timestamping_bind_phc() after it was used to get the vclock indices. Fixes: d463126e23f1 ("net: sock: extend SO_TIMESTAMPING for PHC binding") Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-12Revert "of: net: support NVMEM cells with MAC in text format"Michael Walle
This reverts commit 9ed319e411915e882bb4ed99be3ae78667a70022. We can already post process a nvmem cell value in a particular driver. Instead of having yet another place to convert the values, the post processing hook of the nvmem provider should be used in this case. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-12netfilter: nf_tables: set last expression in register tracking areaPablo Neira Ayuso
nft_rule_for_each_expr() sets on last to nft_rule_last(), however, this is coming after track.last field is set on. Use nft_expr_last() to set track.last accordingly. Fixes: 12e4ecfa244b ("netfilter: nf_tables: add register tracking infrastructure") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-11gre: Don't accidentally set RTO_ONLINK in gre_fill_metadata_dst()Guillaume Nault
Mask the ECN bits before initialising ->flowi4_tos. The tunnel key may have the last ECN bit set, which will interfere with the route lookup process as ip_route_output_key_hash() interpretes this bit specially (to restrict the route scope). Found by code inspection, compile tested only. Fixes: 962924fa2b7a ("ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-11xfrm: Don't accidentally set RTO_ONLINK in decode_session4()Guillaume Nault
Similar to commit 94e2238969e8 ("xfrm4: strip ECN bits from tos field"), clear the ECN bits from iph->tos when setting ->flowi4_tos. This ensures that the last bit of ->flowi4_tos is cleared, so ip_route_output_key_hash() isn't going to restrict the scope of the route lookup. Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason to clear the high order bits. Found by code inspection, compile tested only. Fixes: 4da3089f2b58 ("[IPSEC]: Use TOS when doing tunnel lookups") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-11mctp: test: zero out sockaddrMatt Johnston
MCTP now requires that padding bytes are zero. Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Fixes: 1e4b50f06d97 ("mctp: handle the struct sockaddr_mctp padding fields") Link: https://lore.kernel.org/r/20220110021806.2343023-1-matt@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-11Merge tag 'selinux-pr-20220110' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "Nothing too significant, but five SELinux patches for v5.17 that do the following: - Harden the code through additional use of the struct_size() macro - Plug some memory leaks - Clean up the code via removal of the security_add_mnt_opt() LSM hook and minor tweaks to selinux_add_opt() - Rename security_task_getsecid_subj() to better reflect its actual behavior/use - now called security_current_getsecid_subj()" * tag 'selinux-pr-20220110' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: minor tweaks to selinux_add_opt() selinux: fix potential memleak in selinux_add_opt() security,selinux: remove security_add_mnt_opt() selinux: Use struct_size() helper in kmalloc() lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()
2022-01-11xdp: check prog type before updating BPF linkToke Høiland-Jørgensen
The bpf_xdp_link_update() function didn't check the program type before updating the program, which made it possible to install any program type as an XDP program, which is obviously not good. Syzbot managed to trigger this by swapping in an LWT program on the XDP hook which would crash in a helper call. Fix this by adding a check and bailing out if the types don't match. Fixes: 026a4c28e1db ("bpf, xdp: Implement LINK_UPDATE for BPF XDP link") Reported-by: syzbot+983941aa85af6ded1fd9@syzkaller.appspotmail.com Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220107221115.326171-1-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-11netfilter: nf_tables: remove unused variablePablo Neira Ayuso
> Remove unused variable and fix missing initialization. > > >> net/netfilter/nf_tables_api.c:8266:6: warning: variable 'i' set but not used [-Wunused-but-set-variable] > int i; > ^ Fixes: 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-11netfilter: nf_conntrack_netbios_ns: fix helper module aliasFlorian Westphal
The helper gets registered as 'netbios-ns', not netbios_ns. Intentionally not adding a fixes-tag because i don't want this to go to stable. This wasn't noticed for a very long time so no so no need to risk regressions. Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-10netfilter: nf_tables: typo NULL check in _clone() functionPablo Neira Ayuso
This should check for NULL in case memory allocation fails. Reported-by: Julian Wiedmann <jwiedmann.dev@gmail.com> Fixes: 3b9e2ea6c11b ("netfilter: nft_limit: move stateful fields out of expression data") Fixes: 37f319f37d90 ("netfilter: nft_connlimit: move stateful fields out of expression data") Fixes: 33a24de37e81 ("netfilter: nft_last: move stateful fields out of expression data") Fixes: ed0a0c60f0e5 ("netfilter: nft_quota: move stateful fields out of expression data") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20220110194817.53481-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-10netfilter: nf_tables: don't use 'data_size' uninitializedLinus Torvalds
Commit 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout") never initialized the new 'data_size' variable. I'm not sure how it ever worked, but it might have worked almost by accident - gcc seems to occasionally miss these kinds of 'variable used uninitialized' situations, but I've seen it do so because it ended up zero-initializing them due to some other simplification. But clang is very unhappy about it all, and correctly reports net/netfilter/nf_tables_api.c:8278:4: error: variable 'data_size' is uninitialized when used here [-Werror,-Wuninitialized] data_size += sizeof(*prule) + rule->dlen; ^~~~~~~~~ net/netfilter/nf_tables_api.c:8263:30: note: initialize the variable 'data_size' to silence this warning unsigned int size, data_size; ^ = 0 1 error generated. and this fix just initializes 'data_size' to zero before the loop. Fixes: 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-10SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace pointChuck Lever
While testing, I got an unexpected KASAN splat: Jan 08 13:50:27 oracle-102.nfsv4.dev kernel: BUG: KASAN: stack-out-of-bounds in trace_event_raw_event_svc_xprt_create_err+0x190/0x210 [sunrpc] Jan 08 13:50:27 oracle-102.nfsv4.dev kernel: Read of size 28 at addr ffffc9000008f728 by task mount.nfs/4628 The memcpy() in the TP_fast_assign section of this trace point copies the size of the destination buffer in order that the buffer won't be overrun. In other similar trace points, the source buffer for this memcpy is a "struct sockaddr_storage" so the actual length of the source buffer is always long enough to prevent the memcpy from reading uninitialized or unallocated memory. However, for this trace point, the source buffer can be as small as a "struct sockaddr_in". For AF_INET sockaddrs, the memcpy() reads memory that follows the source buffer, which is not always valid memory. To avoid copying past the end of the passed-in sockaddr, make the source address's length available to the memcpy(). It would be a little nicer if the tracing infrastructure was more friendly about storing socket addresses that are not AF_INET, but I could not find a way to make printk("%pIS") work with a dynamic array. Reported-by: KASAN Fixes: 4b8f380e46e4 ("SUNRPC: Tracepoint to record errors in svc_xpo_create()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in fixes directly in prep for the 5.17 merge window. No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-10net/9p: show error message if user 'msize' cannot be satisfiedChristian Schoenebeck
If user supplied a large value with the 'msize' option, then client would silently limit that 'msize' value to the maximum value supported by transport. That's a bit confusing for users of not having any indication why the preferred 'msize' value could not be satisfied. Link: https://lkml.kernel.org/r/783ba37c1566dd715b9a67d437efa3b77e3cd1a7.1640870037.git.linux_oss@crudebyte.com Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-01-10net/p9: load default transportsThomas Weißschuh
Now that all transports are split into modules it may happen that no transports are registered when v9fs_get_default_trans() is called. When that is the case try to load more transports from modules. Link: https://lkml.kernel.org/r/20211103193823.111007-5-linux@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> [Dominique: constify v9fs_get_trans_by_name argument as per patch1v2] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-01-109p/xen: autoload when xenbus service is availableThomas Weißschuh
Link: https://lkml.kernel.org/r/20211103193823.111007-4-linux@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-01-109p/trans_fd: split into dedicated moduleThomas Weißschuh
This allows these transports only to be used when needed. Link: https://lkml.kernel.org/r/20211103193823.111007-3-linux@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> [Dominique: Kconfig NET_9P_FD: -depends VIRTIO, +default NET_9P] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-01-09tcp: tcp_send_challenge_ack delete useless param `skb`Benjamin Yim
After this parameter is passed in, there is no usage, and deleting it will not bring any impact. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Benjamin Yim <yan2228598786@gmail.com> Link: https://lore.kernel.org/r/20220109130824.2776-1-yan2228598786@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09page_pool: remove spinlock in page_pool_refill_alloc_cache()Yunsheng Lin
As page_pool_refill_alloc_cache() is only called by __page_pool_get_cached(), which assumes non-concurrent access as suggested by the comment in __page_pool_get_cached(), and ptr_ring allows concurrent access between consumer and producer, so remove the spinlock in page_pool_refill_alloc_cache(). Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20220107090042.13605-1-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09net: skb: use kfree_skb_reason() in __udp4_lib_rcv()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in __udp4_lib_rcv. New drop reason 'SKB_DROP_REASON_UDP_CSUM' is added for udp csum error. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09net: skb: use kfree_skb_reason() in tcp_v4_rcv()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in tcp_v4_rcv(). Following drop reasons are added: SKB_DROP_REASON_NO_SOCKET SKB_DROP_REASON_PKT_TOO_SMALL SKB_DROP_REASON_TCP_CSUM SKB_DROP_REASON_TCP_FILTER After this patch, 'kfree_skb' event will print message like this: $ TASK-PID CPU# ||||| TIMESTAMP FUNCTION $ | | | ||||| | | <idle>-0 [000] ..s1. 36.113438: kfree_skb: skbaddr=(____ptrval____) protocol=2048 location=(____ptrval____) reason: NO_SOCKET The reason of skb drop is printed too. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09net: skb: introduce kfree_skb_reason()Menglong Dong
Introduce the interface kfree_skb_reason(), which is able to pass the reason why the skb is dropped to 'kfree_skb' tracepoint. Add the 'reason' field to 'trace_kfree_skb', therefor user can get more detail information about abnormal skb with 'drop_monitor' or eBPF. All drop reasons are defined in the enum 'skb_drop_reason', and they will be print as string in 'kfree_skb' tracepoint in format of 'reason: XXX'. ( Maybe the reasons should be defined in a uapi header file, so that user space can use them? ) Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09net: openvswitch: Fix ct_state nat flags for conns arriving from tcPaul Blakey
Netfilter conntrack maintains NAT flags per connection indicating whether NAT was configured for the connection. Openvswitch maintains NAT flags on the per packet flow key ct_state field, indicating whether NAT was actually executed on the packet. When a packet misses from tc to ovs the conntrack NAT flags are set. However, NAT was not necessarily executed on the packet because the connection's state might still be in NEW state. As such, openvswitch wrongly assumes that NAT was executed and sets an incorrect flow key NAT flags. Fix this, by flagging to openvswitch which NAT was actually done in act_ct via tc_skb_ext and tc_skb_cb to the openvswitch module, so the packet flow key NAT flags will be correctly set. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Paul Blakey <paulb@nvidia.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220106153804.26451-1-paulb@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. This includes one patch to update ovs and act_ct to use nf_ct_put() instead of nf_conntrack_put(). 1) Add netns_tracker to nfnetlink_log and masquerade, from Eric Dumazet. 2) Remove redundant rcu read-size lock in nf_tables packet path. 3) Replace BUG() by WARN_ON_ONCE() in nft_payload. 4) Consolidate rule verdict tracing. 5) Replace WARN_ON() by WARN_ON_ONCE() in nf_tables core. 6) Make counter support built-in in nf_tables. 7) Add new field to conntrack object to identify locally generated traffic, from Florian Westphal. 8) Prevent NAT from shadowing well-known ports, from Florian Westphal. 9) Merge nf_flow_table_{ipv4,ipv6} into nf_flow_table_inet, also from Florian. 10) Remove redundant pointer in nft_pipapo AVX2 support, from Colin Ian King. 11) Replace opencoded max() in conntrack, from Jiapeng Chong. 12) Update conntrack to use refcount_t API, from Florian Westphal. 13) Move ip_ct_attach indirection into the nf_ct_hook structure. 14) Constify several pointer object in the netfilter codebase, from Florian Westphal. 15) Tree-wide replacement of nf_conntrack_put() by nf_ct_put(), also from Florian. 16) Fix egress splat due to incorrect rcu notation, from Florian. 17) Move stateful fields of connlimit, last, quota, numgen and limit out of the expression data area. 18) Build a blob to represent the ruleset in nf_tables, this is a requirement of the new register tracking infrastructure. 19) Add NFT_REG32_NUM to define the maximum number of 32-bit registers. 20) Add register tracking infrastructure to skip redundant store-to-register operations, this includes support for payload, meta and bitwise expresssions. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: (32 commits) netfilter: nft_meta: cancel register tracking after meta update netfilter: nft_payload: cancel register tracking after payload update netfilter: nft_bitwise: track register operations netfilter: nft_meta: track register operations netfilter: nft_payload: track register operations netfilter: nf_tables: add register tracking infrastructure netfilter: nf_tables: add NFT_REG32_NUM netfilter: nf_tables: add rule blob layout netfilter: nft_limit: move stateful fields out of expression data netfilter: nft_limit: rename stateful structure netfilter: nft_numgen: move stateful fields out of expression data netfilter: nft_quota: move stateful fields out of expression data netfilter: nft_last: move stateful fields out of expression data netfilter: nft_connlimit: move stateful fields out of expression data netfilter: egress: avoid a lockdep splat net: prefer nf_ct_put instead of nf_conntrack_put netfilter: conntrack: avoid useless indirection during conntrack destruction netfilter: make function op structures const netfilter: core: move ip_ct_attach indirection to struct nf_ct_hook netfilter: conntrack: convert to refcount_t api ... ==================== Link: https://lore.kernel.org/r/20220109231640.104123-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-09netfilter: nft_meta: cancel register tracking after meta updatePablo Neira Ayuso
The meta expression might mangle the packet metadata, cancel register tracking since any metadata in the registers is stale. Finer grain register tracking cancellation by inspecting the meta type on the register is also possible. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_payload: cancel register tracking after payload updatePablo Neira Ayuso
The payload expression might mangle the packet, cancel register tracking since any payload data in the registers is stale. Finer grain register tracking cancellation by inspecting the payload base, offset and length on the register is also possible. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_bitwise: track register operationsPablo Neira Ayuso
Check if the destination register already contains the data that this bitwise expression performs. This allows to skip this redundant operation. If the destination contains a different bitwise operation, cancel the register tracking information. If the destination contains no bitwise operation, update the register tracking information. Update the payload and meta expression to check if this bitwise operation has been already performed on the register. Hence, both the payload/meta and the bitwise expressions are reduced. There is also a special case: If source register != destination register and source register is not updated by a previous bitwise operation, then transfer selector from the source register to the destination register. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_meta: track register operationsPablo Neira Ayuso
Check if the destination register already contains the data that this meta store expression performs. This allows to skip this redundant operation. If the destination contains a different selector, update the register tracking information. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_payload: track register operationsPablo Neira Ayuso
Check if the destination register already contains the data that this payload store expression performs. This allows to skip this redundant operation. If the destination contains a different selector, update the register tracking information. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nf_tables: add register tracking infrastructurePablo Neira Ayuso
This patch adds new infrastructure to skip redundant selector store operations on the same register to achieve a performance boost from the packet path. This is particularly noticeable in pure linear rulesets but it also helps in rulesets which are already heaving relying in maps to avoid ruleset linear inspection. The idea is to keep data of the most recurrent store operations on register to reuse them with cmp and lookup expressions. This infrastructure allows for dynamic ruleset updates since the ruleset blob reduction happens from the kernel. Userspace still needs to be updated to maximize register utilization to cooperate to improve register data reuse / reduce number of store on register operations. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nf_tables: add rule blob layoutPablo Neira Ayuso
This patch adds a blob layout per chain to represent the ruleset in the packet datapath. size (unsigned long) struct nft_rule_dp struct nft_expr ... struct nft_rule_dp struct nft_expr ... struct nft_rule_dp (is_last=1) The new structure nft_rule_dp represents the rule in a more compact way (smaller memory footprint) compared to the control-plane nft_rule structure. The ruleset blob is a read-only data structure. The first field contains the blob size, then the rules containing expressions. There is a trailing rule which is used by the tracing infrastructure which is equivalent to the NULL rule marker in the previous representation. The blob size field does not include the size of this trailing rule marker. The ruleset blob is generated from the commit path. This patch reuses the infrastructure available since 0cbc06b3faba ("netfilter: nf_tables: remove synchronize_rcu in commit phase") to build the array of rules per chain. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_limit: move stateful fields out of expression dataPablo Neira Ayuso
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_limit: rename stateful structurePablo Neira Ayuso
From struct nft_limit to nft_limit_priv. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_numgen: move stateful fields out of expression dataPablo Neira Ayuso
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_quota: move stateful fields out of expression dataPablo Neira Ayuso
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_last: move stateful fields out of expression dataPablo Neira Ayuso
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: nft_connlimit: move stateful fields out of expression dataPablo Neira Ayuso
In preparation for the rule blob representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09net: prefer nf_ct_put instead of nf_conntrack_putFlorian Westphal
Its the same as nf_conntrack_put(), but without the need for an indirect call. The downside is a module dependency on nf_conntrack, but all of these already depend on conntrack anyway. Cc: Paul Blakey <paulb@mellanox.com> Cc: dev@openvswitch.org Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: conntrack: avoid useless indirection during conntrack destructionFlorian Westphal
nf_ct_put() results in a usesless indirection: nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock + indirect call of ct_hooks->destroy(). There are two _put helpers: nf_ct_put and nf_conntrack_put. The latter is what should be used in code that MUST NOT cause a linker dependency on the conntrack module (e.g. calls from core network stack). Everyone else should call nf_ct_put() instead. A followup patch will convert a few nf_conntrack_put() calls to nf_ct_put(), in particular from modules that already have a conntrack dependency such as act_ct or even nf_conntrack itself. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: make function op structures constFlorian Westphal
No functional changes, these structures should be const. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: core: move ip_ct_attach indirection to struct nf_ct_hookFlorian Westphal
ip_ct_attach predates struct nf_ct_hook, we can place it there and remove the exported symbol. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: conntrack: convert to refcount_t apiFlorian Westphal
Convert nf_conn reference counting from atomic_t to refcount_t based api. refcount_t api provides more runtime sanity checks and will warn on certain constructs, e.g. refcount_inc() on a zero reference count, which usually indicates use-after-free. For this reason template allocation is changed to init the refcount to 1, the subsequenct add operations are removed. Likewise, init_conntrack() is changed to set the initial refcount to 1 instead refcount_inc(). This is safe because the new entry is not (yet) visible to other cpus. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09Merge tag 'for-net-next-2022-01-07' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Foxconn QCA 0xe0d0 - Fix HCI init sequence on MacBook Air 8,1 and 8,2 - Fix Intel firmware loading on legacy ROM devices * tag 'for-net-next-2022-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: Bluetooth: hci_sock: fix endian bug in hci_sock_setsockopt() Bluetooth: L2CAP: uninitialized variables in l2cap_sock_setsockopt() Bluetooth: btqca: sequential validation Bluetooth: btusb: Add support for Foxconn QCA 0xe0d0 Bluetooth: btintel: Fix broken LED quirk for legacy ROM devices Bluetooth: hci_event: Rework hci_inquiry_result_with_rssi_evt Bluetooth: btbcm: disable read tx power for MacBook Air 8,1 and 8,2 Bluetooth: hci_qca: Fix NULL vs IS_ERR_OR_NULL check in qca_serdev_probe Bluetooth: hci_bcm: Check for error irq ==================== Link: https://lore.kernel.org/r/20220107210942.3750887-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-08kbuild: do not quote string values in include/config/auto.confMasahiro Yamada
The previous commit fixed up all shell scripts to not include include/config/auto.conf. Now that include/config/auto.conf is only included by Makefiles, we can change it into a more Make-friendly form. Previously, Kconfig output string values enclosed with double-quotes (both in the .config and include/config/auto.conf): CONFIG_X="foo bar" Unlike shell, Make handles double-quotes (and single-quotes as well) verbatim. We must rip them off when used. There are some patterns: [1] $(patsubst "%",%,$(CONFIG_X)) [2] $(CONFIG_X:"%"=%) [3] $(subst ",,$(CONFIG_X)) [4] $(shell echo $(CONFIG_X)) These are not only ugly, but also fragile. [1] and [2] do not work if the value contains spaces, like CONFIG_X=" foo bar " [3] does not work correctly if the value contains double-quotes like CONFIG_X="foo\"bar" [4] seems to work better, but has a cost of forking a process. Anyway, quoted strings were always PITA for our Makefiles. This commit changes Kconfig to stop quoting in include/config/auto.conf. These are the string type symbols referenced in Makefiles or scripts: ACPI_CUSTOM_DSDT_FILE ARC_BUILTIN_DTB_NAME ARC_TUNE_MCPU BUILTIN_DTB_SOURCE CC_IMPLICIT_FALLTHROUGH CC_VERSION_TEXT CFG80211_EXTRA_REGDB_KEYDIR EXTRA_FIRMWARE EXTRA_FIRMWARE_DIR EXTRA_TARGETS H8300_BUILTIN_DTB INITRAMFS_SOURCE LOCALVERSION MODULE_SIG_HASH MODULE_SIG_KEY NDS32_BUILTIN_DTB NIOS2_DTB_SOURCE OPENRISC_BUILTIN_DTB SOC_CANAAN_K210_DTB_SOURCE SYSTEM_BLACKLIST_HASH_LIST SYSTEM_REVOCATION_KEYS SYSTEM_TRUSTED_KEYS TARGET_CPU UNUSED_KSYMS_WHITELIST XILINX_MICROBLAZE0_FAMILY XILINX_MICROBLAZE0_HW_VER XTENSA_VARIANT_NAME I checked them one by one, and fixed up the code where necessary. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-01-07af_packet: fix tracking issues in packet_do_bind()Eric Dumazet
It appears that my changes in packet_do_bind() were slightly wrong. syzbot found that calling bind() twice would trigger a false positive. Remove proto_curr/dev_curr variables and rewrite things to be less confusing (like not having to use netdev_tracker_alloc(), and instead use the standard dev_hold_track()) Fixes: f1d9268e0618 ("net: add net device refcount tracker to struct packet_type") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220107183953.3886647-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>