summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-12-12wifi: mac80211: check if the existing link config remains unchangedEdward Adam Davis
[Syz report] WARNING: CPU: 1 PID: 5067 at net/mac80211/rate.c:48 rate_control_rate_init+0x540/0x690 net/mac80211/rate.c:48 Modules linked in: CPU: 1 PID: 5067 Comm: syz-executor413 Not tainted 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:rate_control_rate_init+0x540/0x690 net/mac80211/rate.c:48 Code: 48 c7 c2 00 46 0c 8c be 08 03 00 00 48 c7 c7 c0 45 0c 8c c6 05 70 79 0b 05 01 e8 1b a0 6f f7 e9 e0 fd ff ff e8 61 b3 8f f7 90 <0f> 0b 90 e9 36 ff ff ff e8 53 b3 8f f7 e8 5e 0b 78 f7 31 ff 89 c3 RSP: 0018:ffffc90003c57248 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888016bc4000 RCX: ffffffff89f7d519 RDX: ffff888076d43b80 RSI: ffffffff89f7d6df RDI: 0000000000000005 RBP: ffff88801daaae20 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000001 R13: 0000000000000000 R14: ffff888020030e20 R15: ffff888078f08000 FS: 0000555556b94380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000005fdeb8 CR3: 0000000076d22000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sta_apply_auth_flags.constprop.0+0x4b7/0x510 net/mac80211/cfg.c:1674 sta_apply_parameters+0xaf1/0x16c0 net/mac80211/cfg.c:2002 ieee80211_add_station+0x3fa/0x6c0 net/mac80211/cfg.c:2068 rdev_add_station net/wireless/rdev-ops.h:201 [inline] nl80211_new_station+0x13ba/0x1a70 net/wireless/nl80211.c:7603 genl_family_rcv_msg_doit+0x1fc/0x2e0 net/netlink/genetlink.c:972 genl_family_rcv_msg net/netlink/genetlink.c:1052 [inline] genl_rcv_msg+0x561/0x800 net/netlink/genetlink.c:1067 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0x53b/0x810 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0xd5/0x180 net/socket.c:745 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2584 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2638 __sys_sendmsg+0x117/0x1e0 net/socket.c:2667 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b [Analysis] It is inappropriate to make a link configuration change judgment on an non-existent and non new link. [Fix] Quickly exit when there is a existent link and the link configuration has not changed. Fixes: b303835dabe0 ("wifi: mac80211: accept STA changes without link changes") Reported-and-tested-by: syzbot+62d7eef57b09bfebcd84@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://msgid.link/tencent_DE67FF86DB92ED465489A36ECD2EDDCC8C06@qq.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-12wifi: cfg80211: Add my certificateChen-Yu Tsai
As announced [1][2], I have taken over maintainership of the wireless-regdb project. Add my certificate so that newer releases are valid to the kernel. Seth's certificate should be kept around for awhile, at least until a few new releases by me happen. This should also be applied to stable trees so that stable kernels can utilize newly released database binaries. [1] https://lore.kernel.org/linux-wireless/CAGb2v657baNMPKU3QADijx7hZa=GUcSv2LEDdn6N=QQaFX8r-g@mail.gmail.com/ [2] https://lore.kernel.org/linux-wireless/ZWmRR5ul7EDfxCan@wens.tw/ Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai <wens@kernel.org> Acked-by: Seth Forshee <sforshee@kernel.org> Link: https://msgid.link/ZXHGsqs34qZyzZng@wens.tw Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-11net/sched: cls_api: conditional notification of eventsPedro Tammela
As of today tc-filter/chain events are unconditionally built and sent to RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check before-hand if they are really needed. This will help to alleviate system pressure when filters are concurrently added without the rtnl lock as in tc-flower. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20231208192847.714940-8-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11net/sched: cls_api: remove 'unicast' argument from delete notificationPedro Tammela
This argument is never called while set to true, so remove it as there's no need for it. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231208192847.714940-7-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11net/sched: act_api: conditional notification of eventsPedro Tammela
As of today tc-action events are unconditionally built and sent to RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check before-hand if they are really needed. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231208192847.714940-6-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11net/sched: act_api: don't open code max()Pedro Tammela
Use max() in a couple of places that are open coding it with the ternary operator. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231208192847.714940-5-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11ipv6: annotate data-races around np->ucast_oifEric Dumazet
np->ucast_oif is read locklessly in some contexts. Make all accesses to this field lockless, adding appropriate annotations. This also makes setsockopt( IPV6_UNICAST_IF ) lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11ipv6: annotate data-races around np->mcast_oifEric Dumazet
np->mcast_oif is read locklessly in some contexts. Make all accesses to this field lockless, adding appropriate annotations. This also makes setsockopt( IPV6_MULTICAST_IF ) lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11Revert "net: rtnetlink: remove local list in __linkwatch_run_queue()"Johannes Berg
This reverts commit b8dbbbc535a9 ("net: rtnetlink: remove local list in __linkwatch_run_queue()"). It's evidently broken when there's a non-urgent work that gets added back, and then the loop can never finish. While reverting, add a note about that. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: b8dbbbc535a9 ("net: rtnetlink: remove local list in __linkwatch_run_queue()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11net/sched: act_ct: Take per-cb reference to tcf_ct_flow_tableVlad Buslov
The referenced change added custom cleanup code to act_ct to delete any callbacks registered on the parent block when deleting the tcf_ct_flow_table instance. However, the underlying issue is that the drivers don't obtain the reference to the tcf_ct_flow_table instance when registering callbacks which means that not only driver callbacks may still be on the table when deleting it but also that the driver can still have pointers to its internal nf_flowtable and can use it concurrently which results either warning in netfilter[0] or use-after-free. Fix the issue by taking a reference to the underlying struct tcf_ct_flow_table instance when registering the callback and release the reference when unregistering. Expose new API required for such reference counting by adding two new callbacks to nf_flowtable_type and implementing them for act_ct flowtable_ct type. This fixes the issue by extending the lifetime of nf_flowtable until all users have unregistered. [0]: [106170.938634] ------------[ cut here ]------------ [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis try overlay mlx5_core [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1 [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00 [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202 [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48 [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000 [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700 [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58 [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000 [106170.951616] FS: 0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000 [106170.952329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0 [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106170.954766] Call Trace: [106170.955057] <TASK> [106170.955315] ? __warn+0x79/0x120 [106170.955648] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.956172] ? report_bug+0x17c/0x190 [106170.956537] ? handle_bug+0x3c/0x60 [106170.956891] ? exc_invalid_op+0x14/0x70 [106170.957264] ? asm_exc_invalid_op+0x16/0x20 [106170.957666] ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core] [106170.958172] ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core] [106170.958788] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.959339] ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core] [106170.959854] ? mapping_remove+0x154/0x1d0 [mlx5_core] [106170.960342] ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core] [106170.960927] mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core] [106170.961441] mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core] [106170.962001] mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core] [106170.962524] mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core] [106170.963034] mlx5e_flow_put+0x73/0xe0 [mlx5_core] [106170.963506] mlx5e_put_flow_list+0x38/0x70 [mlx5_core] [106170.964002] mlx5e_rep_update_flows+0xec/0x290 [mlx5_core] [106170.964525] mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core] [106170.965056] process_one_work+0x13a/0x2c0 [106170.965443] worker_thread+0x2e5/0x3f0 [106170.965808] ? rescuer_thread+0x410/0x410 [106170.966192] kthread+0xc6/0xf0 [106170.966515] ? kthread_complete_and_exit+0x20/0x20 [106170.966970] ret_from_fork+0x2d/0x50 [106170.967332] ? kthread_complete_and_exit+0x20/0x20 [106170.967774] ret_from_fork_asm+0x11/0x20 [106170.970466] </TASK> [106170.970726] ---[ end trace 0000000000000000 ]--- Fixes: 77ac5e40c44e ("net/sched: act_ct: remove and free nf_table callbacks") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11Merge 6.7-rc5 into tty-nextGreg Kroah-Hartman
We need the serial fixes in here as well to build off of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-09io_uring/af_unix: disable sending io_uring over socketsPavel Begunkov
File reference cycles have caused lots of problems for io_uring in the past, and it still doesn't work exactly right and races with unix_stream_read_generic(). The safest fix would be to completely disallow sending io_uring files via sockets via SCM_RIGHT, so there are no possible cycles invloving registered files and thus rendering SCM accounting on the io_uring side unnecessary. Cc: stable@vger.kernel.org Fixes: 0091bfc81741b ("io_uring/af_unix: defer registered files gc to io_uring release") Reported-and-suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08tcp: fix tcp_disordered_ack() vs usec TS resolutionEric Dumazet
After commit 939463016b7a ("tcp: change data receiver flowlabel after one dup") we noticed an increase of TCPACKSkippedPAWS events. Neal Cardwell tracked the issue to tcp_disordered_ack() assumption about remote peer TS clock. RFC 1323 & 7323 are suggesting the following: "timestamp clock frequency in the range 1 ms to 1 sec per tick between 1ms and 1sec." This has to be adjusted for 1 MHz clock frequency. This hints at reorders of SACK packets on send side, this might deserve a future patch. (skb->ooo_okay is always set for pure ACK packets) Fixes: 614e8316aa4c ("tcp: add support for usec resolution in TCP TS values") Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Morley <morleyd@google.com> Link: https://lore.kernel.org/r/20231207181342.525181-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08net: sysfs: fix locking in carrier readJohannes Berg
My previous patch added a call to linkwatch_sync_dev(), but that of course needs to be called under RTNL, which I missed earlier, but now saw RCU warnings from. Fix that by acquiring the RTNL in a similar fashion to how other files do it here. Fixes: facd15dfd691 ("net: core: synchronize link-watch when carrier is queried") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20231206172122.859df6ba937f.I9c80608bcfbab171943ff4942b52dbd5e97fe06e@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08Merge tag 'io_uring-6.7-2023-12-08' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Two minor fixes for issues introduced in this release cycle, and two fixes for issues or potential issues that are heading to stable. One of these ends up disabling passing io_uring file descriptors via SCM_RIGHTS. There really shouldn't be an overlap between that kind of historic use case and modern usage of io_uring, which is why this was deemed appropriate" * tag 'io_uring-6.7-2023-12-08' of git://git.kernel.dk/linux: io_uring/af_unix: disable sending io_uring over sockets io_uring/kbuf: check for buffer list readiness after NULL check io_uring/kbuf: Fix an NULL vs IS_ERR() bug in io_alloc_pbuf_ring() io_uring: fix mutex_unlock with unreferenced ctx
2023-12-08Use READ/WRITE_ONCE() for IP local_port_range.David Laight
Commit 227b60f5102cd added a seqlock to ensure that the low and high port numbers were always updated together. This is overkill because the two 16bit port numbers can be held in a u32 and read/written in a single instruction. More recently 91d0b78c5177f added support for finer per-socket limits. The user-supplied value is 'high << 16 | low' but they are held separately and the socket options protected by the socket lock. Use a u32 containing 'high << 16 | low' for both the 'net' and 'sk' fields and use READ_ONCE()/WRITE_ONCE() to ensure both values are always updated together. Change (the now trival) inet_get_local_port_range() to a static inline to optimise the calling code. (In particular avoiding returning integers by reference.) Signed-off-by: David Laight <david.laight@aculab.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/4e505d4198e946a8be03fb1b4c3072b0@AcuMS.aculab.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08tty: make tty_operations::send_xchar accept u8 charJiri Slaby (SUSE)
tty_operations::send_xchar is one of the last users of 'char' type for characters in the tty layer. Convert it to u8 now. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: netdev@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-bluetooth@vger.kernel.org Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20231206073712.17776-5-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08ethtool: Implement ethtool_puts()justinstitt@google.com
Use strscpy() to implement ethtool_puts(). Functionally the same as ethtool_sprintf() when it's used with two arguments or with just "%s" format specifier. Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Madhuri Sripada <madhuri.sripada@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08net: ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIXMaciej Żenczykowski
Lorenzo points out that we effectively clear all unknown flags from PIO when copying them to userspace in the netlink RTM_NEWPREFIX notification. We could fix this one at a time as new flags are defined, or in one fell swoop - I choose the latter. We could either define 6 new reserved flags (reserved1..6) and handle them individually (and rename them as new flags are defined), or we could simply copy the entire unmodified byte over - I choose the latter. This unfortunately requires some anonymous union/struct magic, so we add a static assert on the struct size for a little extra safety. Cc: David Ahern <dsahern@kernel.org> Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08neighbour: Don't let neigh_forced_gc() disable preemption for longJudy Hsiao
We are seeing cases where neigh_cleanup_and_release() is called by neigh_forced_gc() many times in a row with preemption turned off. When running on a low powered CPU at a low CPU frequency, this has been measured to keep preemption off for ~10 ms. That's not great on a system with HZ=1000 which expects tasks to be able to schedule in with ~1ms latency. Suggested-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Judy Hsiao <judyhsiao@chromium.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/stmicro/stmmac/dwmac5.c drivers/net/ethernet/stmicro/stmmac/dwmac5.h drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c drivers/net/ethernet/stmicro/stmmac/hwif.h 37e4b8df27bc ("net: stmmac: fix FPE events losing") c3f3b97238f6 ("net: stmmac: Refactor EST implementation") https://lore.kernel.org/all/20231206110306.01e91114@canb.auug.org.au/ Adjacent changes: net/ipv4/tcp_ao.c 9396c4ee93f9 ("net/tcp: Don't store TCP-AO maclen on reqsk") 7b0f570f879a ("tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07calipso: fix memory leak in netlbl_calipso_add_pass()Gavrilov Ilia
If IPv6 support is disabled at boot (ipv6.disable=1), the calipso_init() -> netlbl_calipso_ops_register() function isn't called, and the netlbl_calipso_ops_get() function always returns NULL. In this case, the netlbl_calipso_add_pass() function allocates memory for the doi_def variable but doesn't free it with the calipso_doi_free(). BUG: memory leak unreferenced object 0xffff888011d68180 (size 64): comm "syz-executor.1", pid 10746, jiffies 4295410986 (age 17.928s) hex dump (first 32 bytes): 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<...>] kmalloc include/linux/slab.h:552 [inline] [<...>] netlbl_calipso_add_pass net/netlabel/netlabel_calipso.c:76 [inline] [<...>] netlbl_calipso_add+0x22e/0x4f0 net/netlabel/netlabel_calipso.c:111 [<...>] genl_family_rcv_msg_doit+0x22f/0x330 net/netlink/genetlink.c:739 [<...>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] [<...>] genl_rcv_msg+0x341/0x5a0 net/netlink/genetlink.c:800 [<...>] netlink_rcv_skb+0x14d/0x440 net/netlink/af_netlink.c:2515 [<...>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:811 [<...>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] [<...>] netlink_unicast+0x54b/0x800 net/netlink/af_netlink.c:1339 [<...>] netlink_sendmsg+0x90a/0xdf0 net/netlink/af_netlink.c:1934 [<...>] sock_sendmsg_nosec net/socket.c:651 [inline] [<...>] sock_sendmsg+0x157/0x190 net/socket.c:671 [<...>] ____sys_sendmsg+0x712/0x870 net/socket.c:2342 [<...>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2396 [<...>] __sys_sendmsg+0xea/0x1b0 net/socket.c:2429 [<...>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 [<...>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller Fixes: cb72d38211ea ("netlabel: Initial support for the CALIPSO netlink protocol.") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> [PM: merged via the LSM tree at Jakub Kicinski request] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-12-07vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warningStefano Garzarella
After backporting commit 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") in CentOS Stream 9, CI reported the following error: In file included from ./include/linux/kernel.h:17, from ./include/linux/list.h:9, from ./include/linux/preempt.h:11, from ./include/linux/spinlock.h:56, from net/vmw_vsock/virtio_transport_common.c:9: net/vmw_vsock/virtio_transport_common.c: In function ‘virtio_transport_can_zcopy‘: ./include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror] 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck‘ 26 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~ ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp‘ 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ ./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp‘ 45 | #define min(x, y) __careful_cmp(x, y, <) | ^~~~~~~~~~~~~ net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of macro ‘min‘ 63 | int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); We could solve it by using min_t(), but this operation seems entirely unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(), which performs almost the same check, returning at most MAX_SKB_FRAGS elements. So, let's eliminate this unnecessary comparison. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Cc: avkrasnov@salutedevices.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Link: https://lore.kernel.org/r/20231206164143.281107-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07net/smc: fix missing byte order conversion in CLC handshakeWen Gu
The byte order conversions of ISM GID and DMB token are missing in process of CLC accept and confirm. So fix it. Fixes: 3d9725a6a133 ("net/smc: common routine for CLC accept and confirm") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/1701882157-87956-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" groupIdo Schimmel
The "NET_DM" generic netlink family notifies drop locations over the "events" multicast group. This is problematic since by default generic netlink allows non-root users to listen to these notifications. Fix by adding a new field to the generic netlink multicast group structure that when set prevents non-root users or root without the 'CAP_SYS_ADMIN' capability (in the user namespace owning the network namespace) from joining the group. Set this field for the "events" group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the nature of the information that is shared over this group. Note that the capability check in this case will always be performed against the initial user namespace since the family is not netns aware and only operates in the initial network namespace. A new field is added to the structure rather than using the "flags" field because the existing field uses uAPI flags and it is inappropriate to add a new uAPI flag for an internal kernel check. In net-next we can rework the "flags" field to use internal flags and fold the new field into it. But for now, in order to reduce the amount of changes, add a new field. Since the information can only be consumed by root, mark the control plane operations that start and stop the tracing as root-only using the 'GENL_ADMIN_PERM' flag. Tested using [1]. Before: # capsh -- -c ./dm_repo # capsh --drop=cap_sys_admin -- -c ./dm_repo After: # capsh -- -c ./dm_repo # capsh --drop=cap_sys_admin -- -c ./dm_repo Failed to join "events" multicast group [1] $ cat dm.c #include <stdio.h> #include <netlink/genl/ctrl.h> #include <netlink/genl/genl.h> #include <netlink/socket.h> int main(int argc, char **argv) { struct nl_sock *sk; int grp, err; sk = nl_socket_alloc(); if (!sk) { fprintf(stderr, "Failed to allocate socket\n"); return -1; } err = genl_connect(sk); if (err) { fprintf(stderr, "Failed to connect socket\n"); return err; } grp = genl_ctrl_resolve_grp(sk, "NET_DM", "events"); if (grp < 0) { fprintf(stderr, "Failed to resolve \"events\" multicast group\n"); return grp; } err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE); if (err) { fprintf(stderr, "Failed to join \"events\" multicast group\n"); return err; } return 0; } $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o dm_repo dm.c Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol") Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231206213102.1824398-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07psample: Require 'CAP_NET_ADMIN' when joining "packets" groupIdo Schimmel
The "psample" generic netlink family notifies sampled packets over the "packets" multicast group. This is problematic since by default generic netlink allows non-root users to listen to these notifications. Fix by marking the group with the 'GENL_UNS_ADMIN_PERM' flag. This will prevent non-root users or root without the 'CAP_NET_ADMIN' capability (in the user namespace owning the network namespace) from joining the group. Tested using [1]. Before: # capsh -- -c ./psample_repo # capsh --drop=cap_net_admin -- -c ./psample_repo After: # capsh -- -c ./psample_repo # capsh --drop=cap_net_admin -- -c ./psample_repo Failed to join "packets" multicast group [1] $ cat psample.c #include <stdio.h> #include <netlink/genl/ctrl.h> #include <netlink/genl/genl.h> #include <netlink/socket.h> int join_grp(struct nl_sock *sk, const char *grp_name) { int grp, err; grp = genl_ctrl_resolve_grp(sk, "psample", grp_name); if (grp < 0) { fprintf(stderr, "Failed to resolve \"%s\" multicast group\n", grp_name); return grp; } err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE); if (err) { fprintf(stderr, "Failed to join \"%s\" multicast group\n", grp_name); return err; } return 0; } int main(int argc, char **argv) { struct nl_sock *sk; int err; sk = nl_socket_alloc(); if (!sk) { fprintf(stderr, "Failed to allocate socket\n"); return -1; } err = genl_connect(sk); if (err) { fprintf(stderr, "Failed to connect socket\n"); return err; } err = join_grp(sk, "config"); if (err) return err; err = join_grp(sk, "packets"); if (err) return err; return 0; } $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o psample_repo psample.c Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling") Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231206213102.1824398-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07bpf: sockmap, updating the sg structure should also update currJohn Fastabend
Curr pointer should be updated when the sg structure is shifted. Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20231206232706.374377-3-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07net: tls, update curr on splice as wellJohn Fastabend
The curr pointer must also be updated on the splice similar to how we do this for other copy types. Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reported-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07Merge tag 'nf-23-12-06' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Incorrect nf_defrag registration for bpf link infra, from D. Wythe. 2) Skip inactive elements in pipapo set backend walk to avoid double deactivation, from Florian Westphal. 3) Fix NFT_*_F_PRESENT check with big endian arch, also from Florian. 4) Bail out if number of expressions in NFTA_DYNSET_EXPRESSIONS mismatch stateful expressions in set declaration. 5) Honor family in table lookup by handle. Broken since 4.16. 6) Use sk_callback_lock to protect access to sk->sk_socket in xt_owner. sock_orphan() might zap this pointer, from Phil Sutter. All of these fixes address broken stuff for several releases. * tag 'nf-23-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: xt_owner: Fix for unsafe access of sk->sk_socket netfilter: nf_tables: validate family when identifying table via handle netfilter: nf_tables: bail out on mismatching dynset and set expressions netfilter: nf_tables: fix 'exist' matching on bigendian arches netfilter: nft_set_pipapo: skip inactive elements during set walk netfilter: bpf: fix bad registration on nf_defrag ==================== Link: https://lore.kernel.org/r/20231206180357.959930-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07io_uring/af_unix: disable sending io_uring over socketsPavel Begunkov
File reference cycles have caused lots of problems for io_uring in the past, and it still doesn't work exactly right and races with unix_stream_read_generic(). The safest fix would be to completely disallow sending io_uring files via sockets via SCM_RIGHT, so there are no possible cycles invloving registered files and thus rendering SCM accounting on the io_uring side unnecessary. Cc: <stable@vger.kernel.org> Fixes: 0091bfc81741b ("io_uring/af_unix: defer registered files gc to io_uring release") Reported-and-suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c716c88321939156909cfa1bd8b0faaf1c804103.1701868795.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-07Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-12-06 We've added 4 non-merge commits during the last 6 day(s) which contain a total of 7 files changed, 185 insertions(+), 55 deletions(-). The main changes are: 1) Fix race found by syzkaller on prog_array_map_poke_run when a BPF program's kallsym symbols were still missing, from Jiri Olsa. 2) Fix BPF verifier's branch offset comparison for BPF_JMP32 | BPF_JA, from Yonghong Song. 3) Fix xsk's poll handling to only set mask on bound xsk sockets, from Yewon Choi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add test for early update in prog_array_map_poke_run bpf: Fix prog_array_map_poke_run map poke update xsk: Skip polling event check for unbound socket bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4 ==================== Link: https://lore.kernel.org/r/20231206220528.12093-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06net: rtnetlink: remove local list in __linkwatch_run_queue()Johannes Berg
Due to linkwatch_forget_dev() (and perhaps others?) checking for list_empty(&dev->link_watch_list), we must have all manipulations of even the local on-stack list 'wrk' here under spinlock, since even that list can be reached otherwise via dev->link_watch_list. This is already the case, but makes this a bit counter-intuitive, often local lists are used to _not_ have to use locking for their local use. Remove the local list as it doesn't seem to serve any purpose. While at it, move a variable declaration into the loop using it. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20231205170011.56576dcc1727.I698b72219d9f6ce789bd209b8f6dffd0ca32a8f2@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06tcp: do not accept ACK of bytes we never sentEric Dumazet
This patch is based on a detailed report and ideas from Yepeng Pan and Christian Rossow. ACK seq validation is currently following RFC 5961 5.2 guidelines: The ACK value is considered acceptable only if it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT). All incoming segments whose ACK value doesn't satisfy the above condition MUST be discarded and an ACK sent back. It needs to be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a duplicate (SEG.ACK < SND.UNA), it can be ignored. If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an ACK, drop the segment, and return". The "ignored" above implies that the processing of the incoming data segment continues, which means the ACK value is treated as acceptable. This mitigation makes the ACK check more stringent since any ACK < SND.UNA wouldn't be accepted, instead only ACKs that are in the range ((SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through. This can be refined for new (and possibly spoofed) flows, by not accepting ACK for bytes that were never sent. This greatly improves TCP security at a little cost. I added a Fixes: tag to make sure this patch will reach stable trees, even if the 'blamed' patch was adhering to the RFC. tp->bytes_acked was added in linux-4.2 Following packetdrill test (courtesy of Yepeng Pan) shows the issue at hand: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1024) = 0 // ---------------- Handshake ------------------- // // when window scale is set to 14 the window size can be extended to // 65535 * (2^14) = 1073725440. Linux would accept an ACK packet // with ack number in (Server_ISN+1-1073725440. Server_ISN+1) // ,though this ack number acknowledges some data never // sent by the server. +0 < S 0:0(0) win 65535 <mss 1400,nop,wscale 14> +0 > S. 0:0(0) ack 1 <...> +0 < . 1:1(0) ack 1 win 65535 +0 accept(3, ..., ...) = 4 // For the established connection, we send an ACK packet, // the ack packet uses ack number 1 - 1073725300 + 2^32, // where 2^32 is used to wrap around. // Note: we used 1073725300 instead of 1073725440 to avoid possible // edge cases. // 1 - 1073725300 + 2^32 = 3221241997 // Oops, old kernels happily accept this packet. +0 < . 1:1001(1000) ack 3221241997 win 65535 // After the kernel fix the following will be replaced by a challenge ACK, // and prior malicious frame would be dropped. +0 > . 1:1(0) ack 1001 Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yepeng Pan <yepeng.pan@cispa.de> Reported-by: Christian Rossow <rossow@cispa.de> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20231205161841.2702925-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06bpf: Add helpers for trampoline image managementSong Liu
As BPF trampoline of different archs moves from bpf_jit_[alloc|free]_exec() to bpf_prog_pack_[alloc|free](), we need to use different _alloc, _free for different archs during the transition. Add the following helpers for this transition: void *arch_alloc_bpf_trampoline(unsigned int size); void arch_free_bpf_trampoline(void *image, unsigned int size); void arch_protect_bpf_trampoline(void *image, unsigned int size); void arch_unprotect_bpf_trampoline(void *image, unsigned int size); The fallback version of these helpers require size <= PAGE_SIZE, but they are only called with size == PAGE_SIZE. They will be called with size < PAGE_SIZE when arch_bpf_trampoline_size() helper is introduced later. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # on s390x Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20231206224054.492250-4-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06bpf: consistently use BPF token throughout BPF verifier logicAndrii Nakryiko
Remove remaining direct queries to perfmon_capable() and bpf_capable() in BPF verifier logic and instead use BPF token (if available) to make decisions about privileges. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06bpf: take into account BPF token when fetching helper protosAndrii Nakryiko
Instead of performing unconditional system-wide bpf_capable() and perfmon_capable() calls inside bpf_base_func_proto() function (and other similar ones) to determine eligibility of a given BPF helper for a given program, use previously recorded BPF token during BPF_PROG_LOAD command handling to inform the decision. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06netfilter: xt_owner: Fix for unsafe access of sk->sk_socketPhil Sutter
A concurrently running sock_orphan() may NULL the sk_socket pointer in between check and deref. Follow other users (like nft_meta.c for instance) and acquire sk_callback_lock before dereferencing sk_socket. Fixes: 0265ab44bacc ("[NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: nf_tables: validate family when identifying table via handlePablo Neira Ayuso
Validate table family when looking up for it via NFTA_TABLE_HANDLE. Fixes: 3ecbfd65f50e ("netfilter: nf_tables: allocate handle and delete objects via handle") Reported-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: nf_tables: bail out on mismatching dynset and set expressionsPablo Neira Ayuso
If dynset expressions provided by userspace is larger than the declared set expressions, then bail out. Fixes: 48b0ae046ee9 ("netfilter: nftables: netlink support for several set element expressions") Reported-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: nf_tables: fix 'exist' matching on bigendian archesFlorian Westphal
Maze reports "tcp option fastopen exists" fails to match on OpenWrt 22.03.5, r20134-5f15225c1e (5.10.176) router. "tcp option fastopen exists" translates to: inet [ exthdr load tcpopt 1b @ 34 + 0 present => reg 1 ] [ cmp eq reg 1 0x00000001 ] .. but existing nft userspace generates a 1-byte compare. On LSB (x86), "*reg32 = 1" is identical to nft_reg_store8(reg32, 1), but not on MSB, which will place the 1 last. IOW, on bigendian aches the cmp8 is awalys false. Make sure we store this in a consistent fashion, so existing userspace will also work on MSB (bigendian). Regardless of this patch we can also change nft userspace to generate 'reg32 == 0' and 'reg32 != 0' instead of u8 == 0 // u8 == 1 when adding 'option x missing/exists' expressions as well. Fixes: 3c1fece8819e ("netfilter: nft_exthdr: Allow checking TCP option presence, too") Fixes: b9f9a485fb0e ("netfilter: nft_exthdr: add boolean DCCP option matching") Fixes: 055c4b34b94f ("netfilter: nft_fib: Support existence check") Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Closes: https://lore.kernel.org/netfilter-devel/CAHo-OozyEqHUjL2-ntATzeZOiuftLWZ_HU6TOM_js4qLfDEAJg@mail.gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: nft_set_pipapo: skip inactive elements during set walkFlorian Westphal
Otherwise set elements can be deactivated twice which will cause a crash. Reported-by: Xingyuan Mo <hdthky0@gmail.com> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: bpf: fix bad registration on nf_defragD. Wythe
We should pass a pointer to global_hook to the get_proto_defrag_hook() instead of its value, since the passed value won't be updated even if the request module was loaded successfully. Log: [ 54.915713] nf_defrag_ipv4 has bad registration [ 54.915779] WARNING: CPU: 3 PID: 6323 at net/netfilter/nf_bpf_link.c:62 get_proto_defrag_hook+0x137/0x160 [ 54.915835] CPU: 3 PID: 6323 Comm: fentry Kdump: loaded Tainted: G E 6.7.0-rc2+ #35 [ 54.915839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 54.915841] RIP: 0010:get_proto_defrag_hook+0x137/0x160 [ 54.915844] Code: 4f 8c e8 2c cf 68 ff 80 3d db 83 9a 01 00 0f 85 74 ff ff ff 48 89 ee 48 c7 c7 8f 12 4f 8c c6 05 c4 83 9a 01 01 e8 09 ee 5f ff <0f> 0b e9 57 ff ff ff 49 8b 3c 24 4c 63 e5 e8 36 28 6c ff 4c 89 e0 [ 54.915849] RSP: 0018:ffffb676003fbdb0 EFLAGS: 00010286 [ 54.915852] RAX: 0000000000000023 RBX: ffff9596503d5600 RCX: ffff95996fce08c8 [ 54.915854] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff95996fce08c0 [ 54.915855] RBP: ffffffff8c4f12de R08: 0000000000000000 R09: 00000000fffeffff [ 54.915859] R10: ffffb676003fbc70 R11: ffffffff8d363ae8 R12: 0000000000000000 [ 54.915861] R13: ffffffff8e1f75c0 R14: ffffb676003c9000 R15: 00007ffd15e78ef0 [ 54.915864] FS: 00007fb6e9cab740(0000) GS:ffff95996fcc0000(0000) knlGS:0000000000000000 [ 54.915867] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.915868] CR2: 00007ffd15e75c40 CR3: 0000000101e62006 CR4: 0000000000360ef0 [ 54.915870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 54.915871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 54.915873] Call Trace: [ 54.915891] <TASK> [ 54.915894] ? __warn+0x84/0x140 [ 54.915905] ? get_proto_defrag_hook+0x137/0x160 [ 54.915908] ? __report_bug+0xea/0x100 [ 54.915925] ? report_bug+0x2b/0x80 [ 54.915928] ? handle_bug+0x3c/0x70 [ 54.915939] ? exc_invalid_op+0x18/0x70 [ 54.915942] ? asm_exc_invalid_op+0x1a/0x20 [ 54.915948] ? get_proto_defrag_hook+0x137/0x160 [ 54.915950] bpf_nf_link_attach+0x1eb/0x240 [ 54.915953] link_create+0x173/0x290 [ 54.915969] __sys_bpf+0x588/0x8f0 [ 54.915974] __x64_sys_bpf+0x20/0x30 [ 54.915977] do_syscall_64+0x45/0xf0 [ 54.915989] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 54.915998] RIP: 0033:0x7fb6e9daa51d [ 54.916001] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 89 0c 00 f7 d8 64 89 01 48 [ 54.916003] RSP: 002b:00007ffd15e78ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 54.916006] RAX: ffffffffffffffda RBX: 00007ffd15e78fc0 RCX: 00007fb6e9daa51d [ 54.916007] RDX: 0000000000000040 RSI: 00007ffd15e78ef0 RDI: 000000000000001c [ 54.916009] RBP: 000000000000002d R08: 00007fb6e9e73a60 R09: 0000000000000001 [ 54.916010] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006 [ 54.916012] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000 [ 54.916014] </TASK> [ 54.916015] ---[ end trace 0000000000000000 ]--- Fixes: 91721c2d02d3 ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06net/tcp: Don't store TCP-AO maclen on reqskDmitry Safonov
This extra check doesn't work for a handshake when SYN segment has (current_key.maclen != rnext_key.maclen). It could be amended to preserve rnext_key.maclen instead of current_key.maclen, but that requires a lookup on listen socket. Originally, this extra maclen check was introduced just because it was cheap. Drop it and convert tcp_request_sock::maclen into boolean tcp_request_sock::used_tcp_ao. Fixes: 06b22ef29591 ("net/tcp: Wire TCP-AO to request sockets") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06net/tcp: Don't add key with non-matching VRF on connected socketsDmitry Safonov
If the connection was established, don't allow adding TCP-AO keys that don't match the peer. Currently, there are checks for ip-address matching, but L3 index check is missing. Add it to restrict userspace shooting itself somewhere. Yet, nothing restricts the CAP_NET_RAW user from trying to shoot themselves by performing setsockopt(SO_BINDTODEVICE) or setsockopt(SO_BINDTOIFINDEX) over an established TCP-AO connection. So, this is just "minimum effort" to potentially save someone's debugging time, rather than a full restriction on doing weird things. Fixes: 248411b8cb89 ("net/tcp: Wire up l3index to TCP-AO") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06net/tcp: Limit TCP_AO_REPAIR to non-listen socketsDmitry Safonov
Listen socket is not an established TCP connection, so setsockopt(TCP_AO_REPAIR) doesn't have any impact. Restrict this uAPI for listen sockets. Fixes: faadfaba5e01 ("net/tcp: Add TCP_AO_REPAIR") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06net/tcp: Consistently align TCP-AO option in the headerDmitry Safonov
Currently functions that pre-calculate TCP header options length use unaligned TCP-AO header + MAC-length for skb reservation. And the functions that actually write TCP-AO options into skb do align the header. Nothing good can come out of this for ((maclen % 4) != 0). Provide tcp_ao_len_aligned() helper and use it everywhere for TCP header options space calculations. Fixes: 1e03d32bea8e ("net/tcp: Add TCP-AO sign to outgoing packets") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06wifi: cfg80211: make RX assoc data constJohannes Berg
This is just a collection of data and we only read it, so make it const. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-06wifi: nl80211: refactor nl80211_send_mlme_event() argumentsJohannes Berg
This function has so many arguments already, before adding yet another one, refactor it to take a struct instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-06ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit()Shigeru Yoshida
In ipgre_xmit(), skb_pull() may fail even if pskb_inet_may_pull() returns true. For example, applications can use PF_PACKET to create a malformed packet with no IP header. This type of packet causes a problem such as uninit-value access. This patch ensures that skb_pull() can pull the required size by checking the skb with pskb_network_may_pull() before skb_pull(). Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Suman Ghosh <sumang@marvell.com> Link: https://lore.kernel.org/r/20231202161441.221135-1-syoshida@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05packet: add a generic drop reason for receiveYan Zhai
Commit da37845fdce2 ("packet: uses kfree_skb() for errors.") switches from consume_skb to kfree_skb to improve error handling. However, this could bring a lot of noises when we monitor real packet drops in kfree_skb[1], because in tpacket_rcv or packet_rcv only packet clones can be freed, not actual packets. Adding a generic drop reason to allow distinguish these "clone drops". [1]: https://lore.kernel.org/netdev/CABWYdi00L+O30Q=Zah28QwZ_5RU-xcxLFUK2Zj08A8MrLk9jzg@mail.gmail.com/ Fixes: da37845fdce2 ("packet: uses kfree_skb() for errors.") Suggested-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/ZW4piNbx3IenYnuw@debian.debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>