summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2023-12-12wifi: cfg80211: expose nl80211_chan_width_to_mhz for wide sharingEvan Quan
The newly added WBRF feature needs this interface for channel width calculation. Signed-off-by: Evan Quan <quanliangl@hotmail.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://msgid.link/20231211100630.2170152-4-Jun.Ma2@amd.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-11net, xdp: Allow metadata > 32Aleksander Lobakin
32 bytes may be not enough for some custom metadata. Relax the restriction, allow metadata larger than 32 bytes and make __skb_metadata_differs() work with bigger lengths. Now size of metadata is only limited by the fact it is stored as u8 in skb_shared_info, so maximum possible value is 255. Size still has to be aligned to 4, so the actual upper limit becomes 252. Most driver implementations will offer less, none can offer more. Other important conditions, such as having enough space for xdp_frame building, are already checked in bpf_xdp_adjust_meta(). Signed-off-by: Aleksander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/eb87653c-8ff8-447d-a7a1-25961f60518a@kernel.org Link: https://lore.kernel.org/bpf/20231206205919.404415-3-larysa.zaremba@intel.com
2023-12-11net/sched: act_ct: Take per-cb reference to tcf_ct_flow_tableVlad Buslov
The referenced change added custom cleanup code to act_ct to delete any callbacks registered on the parent block when deleting the tcf_ct_flow_table instance. However, the underlying issue is that the drivers don't obtain the reference to the tcf_ct_flow_table instance when registering callbacks which means that not only driver callbacks may still be on the table when deleting it but also that the driver can still have pointers to its internal nf_flowtable and can use it concurrently which results either warning in netfilter[0] or use-after-free. Fix the issue by taking a reference to the underlying struct tcf_ct_flow_table instance when registering the callback and release the reference when unregistering. Expose new API required for such reference counting by adding two new callbacks to nf_flowtable_type and implementing them for act_ct flowtable_ct type. This fixes the issue by extending the lifetime of nf_flowtable until all users have unregistered. [0]: [106170.938634] ------------[ cut here ]------------ [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis try overlay mlx5_core [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1 [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00 [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202 [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48 [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000 [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700 [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58 [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000 [106170.951616] FS: 0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000 [106170.952329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0 [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106170.954766] Call Trace: [106170.955057] <TASK> [106170.955315] ? __warn+0x79/0x120 [106170.955648] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.956172] ? report_bug+0x17c/0x190 [106170.956537] ? handle_bug+0x3c/0x60 [106170.956891] ? exc_invalid_op+0x14/0x70 [106170.957264] ? asm_exc_invalid_op+0x16/0x20 [106170.957666] ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core] [106170.958172] ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core] [106170.958788] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.959339] ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core] [106170.959854] ? mapping_remove+0x154/0x1d0 [mlx5_core] [106170.960342] ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core] [106170.960927] mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core] [106170.961441] mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core] [106170.962001] mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core] [106170.962524] mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core] [106170.963034] mlx5e_flow_put+0x73/0xe0 [mlx5_core] [106170.963506] mlx5e_put_flow_list+0x38/0x70 [mlx5_core] [106170.964002] mlx5e_rep_update_flows+0xec/0x290 [mlx5_core] [106170.964525] mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core] [106170.965056] process_one_work+0x13a/0x2c0 [106170.965443] worker_thread+0x2e5/0x3f0 [106170.965808] ? rescuer_thread+0x410/0x410 [106170.966192] kthread+0xc6/0xf0 [106170.966515] ? kthread_complete_and_exit+0x20/0x20 [106170.966970] ret_from_fork+0x2d/0x50 [106170.967332] ? kthread_complete_and_exit+0x20/0x20 [106170.967774] ret_from_fork_asm+0x11/0x20 [106170.970466] </TASK> [106170.970726] ---[ end trace 0000000000000000 ]--- Fixes: 77ac5e40c44e ("net/sched: act_ct: remove and free nf_table callbacks") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08ipv6: do not check fib6_has_expires() in fib6_info_release()Eric Dumazet
My prior patch went a bit too far, because apparently fib6_has_expires() could be true while f6i->gc_link is not hashed yet. fib6_set_expires_locked() can indeed set RTF_EXPIRES while f6i->fib6_table is NULL. Original syzbot reports were about corruptions caused by dangling f6i->gc_link. Fixes: 5a08d0065a91 ("ipv6: add debug checks in fib6_info_release()") Reported-by: syzbot+c15aa445274af8674f41@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kui-Feng Lee <thinker.li@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231207201322.549000-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08netlink: Return unsigned value for nla_len()Kees Cook
The return value from nla_len() is never expected to be negative, and can never be more than struct nlattr::nla_len (a u16). Adjust the prototype on the function. This will let GCC's value range optimization passes know that the return can never be negative, and can never be larger than u16. As recently discussed[1], this silences the following warning in GCC 12+: net/wireless/nl80211.c: In function 'nl80211_set_cqm_rssi.isra': net/wireless/nl80211.c:12892:17: warning: 'memcpy' specified bound 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=] 12892 | memcpy(cqm_config->rssi_thresholds, thresholds, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 12893 | flex_array_size(cqm_config, rssi_thresholds, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 12894 | n_thresholds)); | ~~~~~~~~~~~~~~ A future change would be to clamp the subtraction to make sure it never wraps around if nla_len is somehow less than NLA_HDRLEN, which would have the additional benefit of being defensive in the face of nlattr corruption or logic errors. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311090752.hWcJWAHL-lkp@intel.com/ [1] Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jeff Johnson <quic_jjohnson@quicinc.com> Cc: Michael Walle <mwalle@kernel.org> Cc: Max Schulze <max.schulze@online.de> Link: https://lore.kernel.org/r/20231202202539.it.704-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231206205904.make.018-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08Use READ/WRITE_ONCE() for IP local_port_range.David Laight
Commit 227b60f5102cd added a seqlock to ensure that the low and high port numbers were always updated together. This is overkill because the two 16bit port numbers can be held in a u32 and read/written in a single instruction. More recently 91d0b78c5177f added support for finer per-socket limits. The user-supplied value is 'high << 16 | low' but they are held separately and the socket options protected by the socket lock. Use a u32 containing 'high << 16 | low' for both the 'net' and 'sk' fields and use READ_ONCE()/WRITE_ONCE() to ensure both values are always updated together. Change (the now trival) inet_get_local_port_range() to a static inline to optimise the calling code. (In particular avoiding returning integers by reference.) Signed-off-by: David Laight <david.laight@aculab.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/4e505d4198e946a8be03fb1b4c3072b0@AcuMS.aculab.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08net: ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIXMaciej Żenczykowski
Lorenzo points out that we effectively clear all unknown flags from PIO when copying them to userspace in the netlink RTM_NEWPREFIX notification. We could fix this one at a time as new flags are defined, or in one fell swoop - I choose the latter. We could either define 6 new reserved flags (reserved1..6) and handle them individually (and rename them as new flags are defined), or we could simply copy the entire unmodified byte over - I choose the latter. This unfortunately requires some anonymous union/struct magic, so we add a static assert on the struct size for a little extra safety. Cc: David Ahern <dsahern@kernel.org> Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/stmicro/stmmac/dwmac5.c drivers/net/ethernet/stmicro/stmmac/dwmac5.h drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c drivers/net/ethernet/stmicro/stmmac/hwif.h 37e4b8df27bc ("net: stmmac: fix FPE events losing") c3f3b97238f6 ("net: stmmac: Refactor EST implementation") https://lore.kernel.org/all/20231206110306.01e91114@canb.auug.org.au/ Adjacent changes: net/ipv4/tcp_ao.c 9396c4ee93f9 ("net/tcp: Don't store TCP-AO maclen on reqsk") 7b0f570f879a ("tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" groupIdo Schimmel
The "NET_DM" generic netlink family notifies drop locations over the "events" multicast group. This is problematic since by default generic netlink allows non-root users to listen to these notifications. Fix by adding a new field to the generic netlink multicast group structure that when set prevents non-root users or root without the 'CAP_SYS_ADMIN' capability (in the user namespace owning the network namespace) from joining the group. Set this field for the "events" group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the nature of the information that is shared over this group. Note that the capability check in this case will always be performed against the initial user namespace since the family is not netns aware and only operates in the initial network namespace. A new field is added to the structure rather than using the "flags" field because the existing field uses uAPI flags and it is inappropriate to add a new uAPI flag for an internal kernel check. In net-next we can rework the "flags" field to use internal flags and fold the new field into it. But for now, in order to reduce the amount of changes, add a new field. Since the information can only be consumed by root, mark the control plane operations that start and stop the tracing as root-only using the 'GENL_ADMIN_PERM' flag. Tested using [1]. Before: # capsh -- -c ./dm_repo # capsh --drop=cap_sys_admin -- -c ./dm_repo After: # capsh -- -c ./dm_repo # capsh --drop=cap_sys_admin -- -c ./dm_repo Failed to join "events" multicast group [1] $ cat dm.c #include <stdio.h> #include <netlink/genl/ctrl.h> #include <netlink/genl/genl.h> #include <netlink/socket.h> int main(int argc, char **argv) { struct nl_sock *sk; int grp, err; sk = nl_socket_alloc(); if (!sk) { fprintf(stderr, "Failed to allocate socket\n"); return -1; } err = genl_connect(sk); if (err) { fprintf(stderr, "Failed to connect socket\n"); return err; } grp = genl_ctrl_resolve_grp(sk, "NET_DM", "events"); if (grp < 0) { fprintf(stderr, "Failed to resolve \"events\" multicast group\n"); return grp; } err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE); if (err) { fprintf(stderr, "Failed to join \"events\" multicast group\n"); return err; } return 0; } $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o dm_repo dm.c Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol") Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231206213102.1824398-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06ipv6: add debug checks in fib6_info_release()Eric Dumazet
Some elusive syzbot reports are hinting to fib6_info_release(), with a potential dangling f6i->gc_link anchor. Add debug checks so that syzbot can catch the issue earlier eventually. BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline] BUG: KASAN: slab-use-after-free in hlist_del_init include/linux/list.h:1016 [inline] BUG: KASAN: slab-use-after-free in fib6_clean_expires_locked include/net/ip6_fib.h:533 [inline] BUG: KASAN: slab-use-after-free in fib6_purge_rt+0x986/0x9c0 net/ipv6/ip6_fib.c:1064 Write of size 8 at addr ffff88802805a840 by task syz-executor.1/10057 CPU: 1 PID: 10057 Comm: syz-executor.1 Not tainted 6.7.0-rc2-syzkaller-00029-g9b6de136b5f0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0xc4/0x620 mm/kasan/report.c:475 kasan_report+0xda/0x110 mm/kasan/report.c:588 __hlist_del include/linux/list.h:990 [inline] hlist_del_init include/linux/list.h:1016 [inline] fib6_clean_expires_locked include/net/ip6_fib.h:533 [inline] fib6_purge_rt+0x986/0x9c0 net/ipv6/ip6_fib.c:1064 fib6_del_route net/ipv6/ip6_fib.c:1993 [inline] fib6_del+0xa7a/0x1750 net/ipv6/ip6_fib.c:2038 __ip6_del_rt net/ipv6/route.c:3866 [inline] ip6_del_rt+0xf7/0x200 net/ipv6/route.c:3881 ndisc_router_discovery+0x295b/0x3560 net/ipv6/ndisc.c:1372 ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856 icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979 ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438 ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492 ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:461 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529 __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643 netif_receive_skb_internal net/core/dev.c:5729 [inline] netif_receive_skb+0x133/0x700 net/core/dev.c:5788 tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579 tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002 tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x64f/0xdf0 fs/read_write.c:584 ksys_write+0x12f/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7f38e387b82f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 b9 80 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c 81 02 00 48 RSP: 002b:00007f38e45c9090 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f38e399bf80 RCX: 00007f38e387b82f RDX: 00000000000003b6 RSI: 0000000020000680 RDI: 00000000000000c8 RBP: 00007f38e38c847a R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000003b6 R11: 0000000000000293 R12: 0000000000000000 R13: 000000000000000b R14: 00007f38e399bf80 R15: 00007f38e3abfa48 </TASK> Allocated by task 10044: kasan_save_stack+0x33/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] __kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383 kasan_kmalloc include/linux/kasan.h:198 [inline] __do_kmalloc_node mm/slab_common.c:1007 [inline] __kmalloc+0x59/0x90 mm/slab_common.c:1020 kmalloc include/linux/slab.h:604 [inline] kzalloc include/linux/slab.h:721 [inline] fib6_info_alloc+0x40/0x160 net/ipv6/ip6_fib.c:155 ip6_route_info_create+0x337/0x1e70 net/ipv6/route.c:3749 ip6_route_add+0x26/0x150 net/ipv6/route.c:3843 rt6_add_route_info+0x2e7/0x4b0 net/ipv6/route.c:4316 rt6_route_rcv+0x76c/0xbf0 net/ipv6/route.c:985 ndisc_router_discovery+0x138b/0x3560 net/ipv6/ndisc.c:1529 ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856 icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979 ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438 ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492 ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:461 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529 __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643 netif_receive_skb_internal net/core/dev.c:5729 [inline] netif_receive_skb+0x133/0x700 net/core/dev.c:5788 tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579 tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002 tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x64f/0xdf0 fs/read_write.c:584 ksys_write+0x12f/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Freed by task 5123: kasan_save_stack+0x33/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:522 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x15b/0x1b0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:164 [inline] slab_free_hook mm/slub.c:1800 [inline] slab_free_freelist_hook+0x114/0x1e0 mm/slub.c:1826 slab_free mm/slub.c:3809 [inline] __kmem_cache_free+0xc0/0x180 mm/slub.c:3822 rcu_do_batch kernel/rcu/tree.c:2158 [inline] rcu_core+0x819/0x1680 kernel/rcu/tree.c:2431 __do_softirq+0x21a/0x8de kernel/softirq.c:553 Last potentially related work creation: kasan_save_stack+0x33/0x50 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:492 __call_rcu_common.constprop.0+0x9a/0x7a0 kernel/rcu/tree.c:2681 fib6_info_release include/net/ip6_fib.h:332 [inline] fib6_info_release include/net/ip6_fib.h:329 [inline] rt6_route_rcv+0xa4e/0xbf0 net/ipv6/route.c:997 ndisc_router_discovery+0x138b/0x3560 net/ipv6/ndisc.c:1529 ndisc_rcv+0x3de/0x5f0 net/ipv6/ndisc.c:1856 icmpv6_rcv+0x1470/0x19c0 net/ipv6/icmp.c:979 ip6_protocol_deliver_rcu+0x170/0x13e0 net/ipv6/ip6_input.c:438 ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_input+0xa1/0xc0 net/ipv6/ip6_input.c:492 ip6_mc_input+0x48b/0xf40 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:461 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ipv6_rcv+0x24e/0x380 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5529 __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5643 netif_receive_skb_internal net/core/dev.c:5729 [inline] netif_receive_skb+0x133/0x700 net/core/dev.c:5788 tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579 tun_get_user+0x29e3/0x3bc0 drivers/net/tun.c:2002 tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x64f/0xdf0 fs/read_write.c:584 ksys_write+0x12f/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Second to last potentially related work creation: kasan_save_stack+0x33/0x50 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:492 insert_work+0x38/0x230 kernel/workqueue.c:1647 __queue_work+0xcdc/0x11f0 kernel/workqueue.c:1803 call_timer_fn+0x193/0x590 kernel/time/timer.c:1700 expire_timers kernel/time/timer.c:1746 [inline] __run_timers+0x585/0xb20 kernel/time/timer.c:2022 run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035 __do_softirq+0x21a/0x8de kernel/softirq.c:553 The buggy address belongs to the object at ffff88802805a800 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 64 bytes inside of freed 512-byte region [ffff88802805a800, ffff88802805aa00) The buggy address belongs to the physical page: page:ffffea0000a01600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x28058 head:ffffea0000a01600 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 00fff00000000840 ffff888013041c80 ffffea0001e02600 dead000000000002 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 18706, tgid 18699 (syz-executor.2), ts 999991973280, free_ts 996884464281 set_page_owner include/linux/page_owner.h:31 [inline] post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1544 [inline] get_page_from_freelist+0xa25/0x36d0 mm/page_alloc.c:3312 __alloc_pages+0x22e/0x2420 mm/page_alloc.c:4568 alloc_pages_mpol+0x258/0x5f0 mm/mempolicy.c:2133 alloc_slab_page mm/slub.c:1870 [inline] allocate_slab mm/slub.c:2017 [inline] new_slab+0x283/0x3c0 mm/slub.c:2070 ___slab_alloc+0x979/0x1500 mm/slub.c:3223 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3322 __slab_alloc_node mm/slub.c:3375 [inline] slab_alloc_node mm/slub.c:3468 [inline] __kmem_cache_alloc_node+0x131/0x310 mm/slub.c:3517 __do_kmalloc_node mm/slab_common.c:1006 [inline] __kmalloc+0x49/0x90 mm/slab_common.c:1020 kmalloc include/linux/slab.h:604 [inline] kzalloc include/linux/slab.h:721 [inline] copy_splice_read+0x1ac/0x8f0 fs/splice.c:338 vfs_splice_read fs/splice.c:992 [inline] vfs_splice_read+0x2ea/0x3b0 fs/splice.c:962 splice_direct_to_actor+0x2a5/0xa30 fs/splice.c:1069 do_splice_direct+0x1af/0x280 fs/splice.c:1194 do_sendfile+0xb3e/0x1310 fs/read_write.c:1254 __do_sys_sendfile64 fs/read_write.c:1322 [inline] __se_sys_sendfile64 fs/read_write.c:1308 [inline] __x64_sys_sendfile64+0x1d6/0x220 fs/read_write.c:1308 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1137 [inline] free_unref_page_prepare+0x4fa/0xaa0 mm/page_alloc.c:2347 free_unref_page_list+0xe6/0xb40 mm/page_alloc.c:2533 release_pages+0x32a/0x14f0 mm/swap.c:1042 tlb_batch_pages_flush+0x9a/0x190 mm/mmu_gather.c:98 tlb_flush_mmu_free mm/mmu_gather.c:293 [inline] tlb_flush_mmu mm/mmu_gather.c:300 [inline] tlb_finish_mmu+0x14b/0x6f0 mm/mmu_gather.c:392 exit_mmap+0x38b/0xa70 mm/mmap.c:3321 __mmput+0x12a/0x4d0 kernel/fork.c:1349 mmput+0x62/0x70 kernel/fork.c:1371 exit_mm kernel/exit.c:567 [inline] do_exit+0x9ad/0x2ae0 kernel/exit.c:858 do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 get_signal+0x23be/0x2790 kernel/signal.c:2904 arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x121/0x240 kernel/entry/common.c:204 irqentry_exit_to_user_mode+0xa/0x40 kernel/entry/common.c:309 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231205173250.2982846-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06net/tcp: Consistently align TCP-AO option in the headerDmitry Safonov
Currently functions that pre-calculate TCP header options length use unaligned TCP-AO header + MAC-length for skb reservation. And the functions that actually write TCP-AO options into skb do align the header. Nothing good can come out of this for ((maclen % 4) != 0). Provide tcp_ao_len_aligned() helper and use it everywhere for TCP header options space calculations. Fixes: 1e03d32bea8e ("net/tcp: Add TCP-AO sign to outgoing packets") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06wifi: cfg80211: make RX assoc data constJohannes Berg
This is just a collection of data and we only read it, so make it const. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-05packet: add a generic drop reason for receiveYan Zhai
Commit da37845fdce2 ("packet: uses kfree_skb() for errors.") switches from consume_skb to kfree_skb to improve error handling. However, this could bring a lot of noises when we monitor real packet drops in kfree_skb[1], because in tpacket_rcv or packet_rcv only packet clones can be freed, not actual packets. Adding a generic drop reason to allow distinguish these "clone drops". [1]: https://lore.kernel.org/netdev/CABWYdi00L+O30Q=Zah28QwZ_5RU-xcxLFUK2Zj08A8MrLk9jzg@mail.gmail.com/ Fixes: da37845fdce2 ("packet: uses kfree_skb() for errors.") Suggested-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/ZW4piNbx3IenYnuw@debian.debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05tcp: fix mid stream window clamp.Paolo Abeni
After the blamed commit below, if the user-space application performs window clamping when tp->rcv_wnd is 0, the TCP socket will never be able to announce a non 0 receive window, even after completely emptying the receive buffer and re-setting the window clamp to higher values. Refactor tcp_set_window_clamp() to address the issue: when the user decreases the current clamp value, set rcv_ssthresh according to the same logic used at buffer initialization, but ensuring reserved mem provisioning. To avoid code duplication factor-out the relevant bits from tcp_adjust_rcv_ssthresh() in a new helper and reuse it in the above scenario. When increasing the clamp value, give the rcv_ssthresh a chance to grow according to previously implemented heuristic. Fixes: 3aa7857fe1d7 ("tcp: enable mid stream window clamp") Reported-by: David Gibson <david@gibson.dropbear.id.au> Reported-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/705dad54e6e6e9a010e571bf58e0b35a8ae70503.1701706073.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04net: Add queue and napi associationAmritha Nambiar
Add the napi pointer in netdev queue for tracking the napi instance for each queue. This achieves the queue<->napi mapping. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://lore.kernel.org/r/170147331483.5260.15723438819994285695.stgit@anambiarhost.jf.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04tcp: Dump bound-only sockets in inet_diag.Guillaume Nault
Walk the hashinfo->bhash2 table so that inet_diag can dump TCP sockets that are bound but haven't yet called connect() or listen(). The code is inspired by the ->lhash2 loop. However there's no manual test of the source port, since this kind of filtering is already handled by inet_diag_bc_sk(). Also, a maximum of 16 sockets are dumped at a time, to avoid running with bh disabled for too long. There's no TCP state for bound but otherwise inactive sockets. Such sockets normally map to TCP_CLOSE. However, "ss -l", which is supposed to only dump listening sockets, actually requests the kernel to dump sockets in either the TCP_LISTEN or TCP_CLOSE states. To avoid dumping bound-only sockets with "ss -l", we therefore need to define a new pseudo-state (TCP_BOUND_INACTIVE) that user space will be able to set explicitly. With an IPv4, an IPv6 and an IPv6-only socket, bound respectively to 40000, 64000, 60000, an updated version of iproute2 could work as follow: $ ss -t state bound-inactive Recv-Q Send-Q Local Address:Port Peer Address:Port Process 0 0 0.0.0.0:40000 0.0.0.0:* 0 0 [::]:60000 [::]:* 0 0 *:64000 *:* Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/b3a84ae61e19c06806eea9c602b3b66e8f0cfc81.1701362867.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-02netns-ipv4: reorganize netns_ipv4 fast path variablesCoco Li
Reorganize fast path variables on tx-txrx-rx order. Fastpath cacheline ends after sysctl_tcp_rmem. There are only read-only variables here. (write is on the control path and not considered in this case) Below data generated with pahole on x86 architecture. Fast path variables span cache lines before change: 4 Fast path variables span cache lines after change: 2 Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-30Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-11-30 We've added 30 non-merge commits during the last 7 day(s) which contain a total of 58 files changed, 1598 insertions(+), 154 deletions(-). The main changes are: 1) Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload, from Stanislav Fomichev with stmmac implementation from Song Yoong Siang. 2) Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques, from Andrii Nakryiko. 3) Add BPF link_info support for uprobe multi link along with bpftool integration for the latter, from Jiri Olsa. 4) Use pkg-config in BPF selftests to determine ld flags which is in particular needed for linking statically, from Akihiko Odaki. 5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits) bpf/tests: Remove duplicate JSGT tests selftests/bpf: Add TX side to xdp_hw_metadata selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP selftests/bpf: Add TX side to xdp_metadata selftests/bpf: Add csum helpers selftests/xsk: Support tx_metadata_len xsk: Add option to calculate TX checksum in SW xsk: Validate xsk_tx_metadata flags xsk: Document tx_metadata_len layout net: stmmac: Add Tx HWTS support to XDP ZC net/mlx5e: Implement AF_XDP TX timestamp and checksum offload tools: ynl: Print xsk-features from the sample xsk: Add TX timestamp and TX checksum offload support xsk: Support tx_metadata_len selftests/bpf: Use pkg-config for libelf selftests/bpf: Override PKG_CONFIG for static builds selftests/bpf: Choose pkg-config for the target bpftool: Add support to display uprobe_multi links selftests/bpf: Add link_info test for uprobe_multi link selftests/bpf: Use bpf_link__destroy in fill_link_info tests ... ==================== Conflicts: Documentation/netlink/specs/netdev.yaml: 839ff60df3ab ("net: page_pool: add nlspec for basic access to page pools") 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/ While at it also regen, tree is dirty after: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") looks like code wasn't re-rendered after "render-max" was removed. Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29tcp: Factorise cookie-dependent fields initialisation in cookie_v[46]_check()Kuniyuki Iwashima
We will support arbitrary SYN Cookie with BPF, and then kfunc at TC will preallocate reqsk and initialise some fields that should not be overwritten later by cookie_v[46]_check(). To simplify the flow in cookie_v[46]_check(), we move such fields' initialisation to cookie_tcp_reqsk_alloc() and factorise non-BPF SYN Cookie handling into cookie_tcp_check(), where we validate the cookie and allocate reqsk, as done by kfunc later. Note that we set ireq->ecn_ok in two steps, the latter of which will be shared by the BPF case. As cookie_ecn_ok() is one-liner, now it's inlined. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().Kuniyuki Iwashima
We initialise treq->af_specific in cookie_tcp_reqsk_alloc() so that we can look up a key later in tcp_create_openreq_child(). Initially, that change was added for MD5 by commit ba5a4fdd63ae ("tcp: make sure treq->af_specific is initialized"), but it has not been used since commit d0f2b7a9ca0a ("tcp: Disable header prediction for MD5 flow."). Now, treq->af_specific is used only by TCP-AO, so, we can move that initialisation into tcp_ao_syncookie(). In addition to that, l3index in cookie_v[46]_check() is only used for tcp_ao_syncookie(), so let's move it as well. While at it, we move down tcp_ao_syncookie() in cookie_v4_check() so that it will be called after security_inet_conn_request() to make functions order consistent with cookie_v6_check(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29tcp: Don't initialise tp->tsoffset in tcp_get_cookie_sock().Kuniyuki Iwashima
When we create a full socket from SYN Cookie, we initialise tcp_sk(sk)->tsoffset redundantly in tcp_get_cookie_sock() as the field is inherited from tcp_rsk(req)->ts_off. cookie_v[46]_check |- treq->ts_off = 0 `- tcp_get_cookie_sock |- tcp_v[46]_syn_recv_sock | `- tcp_create_openreq_child | `- newtp->tsoffset = treq->ts_off `- tcp_sk(child)->tsoffset = tsoff Let's initialise tcp_rsk(req)->ts_off with the correct offset and remove the second initialisation of tcp_sk(sk)->tsoffset. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29tcp: Don't pass cookie to __cookie_v[46]_check().Kuniyuki Iwashima
tcp_hdr(skb) and SYN Cookie are passed to __cookie_v[46]_check(), but none of the callers passes cookie other than ntohl(th->ack_seq) - 1. Let's fetch it in __cookie_v[46]_check() instead of passing the cookie over and over. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29net: mana: Fix spelling mistake "enforecement" -> "enforcement"Colin Ian King
There is a spelling mistake in struct field hc_tx_err_sqpdid_enforecement. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Link: https://lore.kernel.org/r/20231128095304.515492-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29Merge tag 'wireless-2023-11-29' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== wireless fixes: - debugfs had a deadlock (removal vs. use of files), fixes going through wireless ACKed by Greg - support for HT STAs on 320 MHz channels, even if it's not clear that should ever happen (that's 6 GHz), best not to WARN() - fix for the previous CQM fix that broke most cases - various wiphy locking fixes - various small driver fixes * tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: use wiphy locked debugfs for sdata/link wifi: mac80211: use wiphy locked debugfs helpers for agg_status wifi: cfg80211: add locked debugfs wrappers debugfs: add API to allow debugfs operations cancellation debugfs: annotate debugfs handlers vs. removal with lockdep debugfs: fix automount d_fsdata usage wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap wifi: avoid offset calculation on NULL pointer wifi: cfg80211: hold wiphy mutex for send_interface wifi: cfg80211: lock wiphy mutex for rfkill poll wifi: cfg80211: fix CQM for non-range use wifi: mac80211: do not pass AP_VLAN vif pointer to drivers during flush wifi: iwlwifi: mvm: fix an error code in iwl_mvm_mld_add_sta() wifi: mt76: mt7925: fix typo in mt7925_init_he_caps wifi: mt76: mt7921: fix 6GHz disabled by the missing default CLC config ==================== Link: https://lore.kernel.org/r/20231129150809.31083-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-11-30 We've added 5 non-merge commits during the last 7 day(s) which contain a total of 10 files changed, 66 insertions(+), 15 deletions(-). The main changes are: 1) Fix AF_UNIX splat from use after free in BPF sockmap, from John Fastabend. 2) Fix a syzkaller splat in netdevsim by properly handling offloaded programs (and not device-bound ones), from Stanislav Fomichev. 3) Fix bpf_mem_cache_alloc_flags() to initialize the allocation hint, from Hou Tao. 4) Fix netkit by rejecting IFLA_NETKIT_PEER_INFO in changelink, from Daniel Borkmann. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Add af_unix test with both sockets in map bpf, sockmap: af_unix stream sockets need to hold ref for pair sock netkit: Reject IFLA_NETKIT_PEER_INFO in netkit_change_link bpf: Add missed allocation hint for bpf_mem_cache_alloc_flags() netdevsim: Don't accept device bound programs ==================== Link: https://lore.kernel.org/r/20231129234916.16128-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30bpf, sockmap: af_unix stream sockets need to hold ref for pair sockJohn Fastabend
AF_UNIX stream sockets are a paired socket. So sending on one of the pairs will lookup the paired socket as part of the send operation. It is possible however to put just one of the pairs in a BPF map. This currently increments the refcnt on the sock in the sockmap to ensure it is not free'd by the stack before sockmap cleans up its state and stops any skbs being sent/recv'd to that socket. But we missed a case. If the peer socket is closed it will be free'd by the stack. However, the paired socket can still be referenced from BPF sockmap side because we hold a reference there. Then if we are sending traffic through BPF sockmap to that socket it will try to dereference the free'd pair in its send logic creating a use after free. And following splat: [59.900375] BUG: KASAN: slab-use-after-free in sk_wake_async+0x31/0x1b0 [59.901211] Read of size 8 at addr ffff88811acbf060 by task kworker/1:2/954 [...] [59.905468] Call Trace: [59.905787] <TASK> [59.906066] dump_stack_lvl+0x130/0x1d0 [59.908877] print_report+0x16f/0x740 [59.910629] kasan_report+0x118/0x160 [59.912576] sk_wake_async+0x31/0x1b0 [59.913554] sock_def_readable+0x156/0x2a0 [59.914060] unix_stream_sendmsg+0x3f9/0x12a0 [59.916398] sock_sendmsg+0x20e/0x250 [59.916854] skb_send_sock+0x236/0xac0 [59.920527] sk_psock_backlog+0x287/0xaa0 To fix let BPF sockmap hold a refcnt on both the socket in the sockmap and its paired socket. It wasn't obvious how to contain the fix to bpf_unix logic. The primarily problem with keeping this logic in bpf_unix was: In the sock close() we could handle the deref by having a close handler. But, when we are destroying the psock through a map delete operation we wouldn't have gotten any signal thorugh the proto struct other than it being replaced. If we do the deref from the proto replace its too early because we need to deref the sk_pair after the backlog worker has been stopped. Given all this it seems best to just cache it at the end of the psock and eat 8B for the af_unix and vsock users. Notice dgram sockets are OK because they handle locking already. Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20231129012557.95371-2-john.fastabend@gmail.com
2023-11-29xsk: Add option to calculate TX checksum in SWStanislav Fomichev
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM to call skb_checksum_help in transmit path. Might be useful to debugging issues with real hardware. I also use this mode in the selftests. Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-29xsk: Validate xsk_tx_metadata flagsStanislav Fomichev
Accept only the flags that the kernel knows about to make sure we can extend this field in the future. Note that only in XDP_COPY mode we propagate the error signal back to the user (via sendmsg). For zerocopy mode we silently skip the metadata for the descriptors that have wrong flags (since we process the descriptors deep in the driver). Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231127190319.1190813-8-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-29xsk: Add TX timestamp and TX checksum offload supportStanislav Fomichev
This change actually defines the (initial) metadata layout that should be used by AF_XDP userspace (xsk_tx_metadata). The first field is flags which requests appropriate offloads, followed by the offload-specific fields. The supported per-device offloads are exported via netlink (new xsk-flags). The offloads themselves are still implemented in a bit of a framework-y fashion that's left from my initial kfunc attempt. I'm introducing new xsk_tx_metadata_ops which drivers are supposed to implement. The drivers are also supposed to call xsk_tx_metadata_request/xsk_tx_metadata_complete in the right places. Since xsk_tx_metadata_{request,_complete} are static inline, we don't incur any extra overhead doing indirect calls. The benefit of this scheme is as follows: - keeps all metadata layout parsing away from driver code - makes it easy to grep and see which drivers implement what - don't need any extra flags to maintain to keep track of what offloads are implemented; if the callback is implemented - the offload is supported (used by netlink reporting code) Two offloads are defined right now: 1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset 2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata area upon completion (tx_timestamp field) XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass metadata pointer). The struct is forward-compatible and can be extended in the future by appending more fields. Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-29xsk: Support tx_metadata_lenStanislav Fomichev
For zerocopy mode, tx_desc->addr can point to an arbitrary offset and carry some TX metadata in the headroom. For copy mode, there is no way currently to populate skb metadata. Introduce new tx_metadata_len umem config option that indicates how many bytes to treat as metadata. Metadata bytes come prior to tx_desc address (same as in RX case). The size of the metadata has mostly the same constraints as XDP: - less than 256 bytes - 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte timestamp in the completion) - non-zero This data is not interpreted in any way right now. Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-28net: page_pool: expose page pool stats via netlinkJakub Kicinski
Dump the stats into netlink. More clever approaches like dumping the stats per-CPU for each CPU individually to see where the packets get consumed can be implemented in the future. A trimmed example from a real (but recently booted system): $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-stats-get [{'info': {'id': 19, 'ifindex': 2}, 'alloc-empty': 48, 'alloc-fast': 3024, 'alloc-refill': 0, 'alloc-slow': 48, 'alloc-slow-high-order': 0, 'alloc-waive': 0, 'recycle-cache-full': 0, 'recycle-cached': 0, 'recycle-released-refcnt': 0, 'recycle-ring': 0, 'recycle-ring-full': 0}, {'info': {'id': 18, 'ifindex': 2}, 'alloc-empty': 66, 'alloc-fast': 11811, 'alloc-refill': 35, 'alloc-slow': 66, 'alloc-slow-high-order': 0, 'alloc-waive': 0, 'recycle-cache-full': 1145, 'recycle-cached': 6541, 'recycle-released-refcnt': 0, 'recycle-ring': 1275, 'recycle-ring-full': 0}, {'info': {'id': 17, 'ifindex': 2}, 'alloc-empty': 73, 'alloc-fast': 62099, 'alloc-refill': 413, ... Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-28net: page_pool: report when page pool was destroyedJakub Kicinski
Report when page pool was destroyed. Together with the inflight / memory use reporting this can serve as a replacement for the warning about leaked page pools we currently print to dmesg. Example output for a fake leaked page pool using some hacks in netdevsim (one "live" pool, and one "leaked" on the same dev): $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-get [{'id': 2, 'ifindex': 3}, {'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}] Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-28net: page_pool: stash the NAPI ID for easier accessJakub Kicinski
To avoid any issues with race conditions on accessing napi and having to think about the lifetime of NAPI objects in netlink GET - stash the napi_id to which page pool was linked at creation time. Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-28net: page_pool: record pools per netdevJakub Kicinski
Link the page pools with netdevs. This needs to be netns compatible so we have two options. Either we record the pools per netns and have to worry about moving them as the netdev gets moved. Or we record them directly on the netdev so they move with the netdev without any extra work. Implement the latter option. Since pools may outlast netdev we need a place to store orphans. In time honored tradition use loopback for this purpose. Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-28net: page_pool: id the page poolsJakub Kicinski
To give ourselves the flexibility of creating netlink commands and ability to refer to page pool instances in uAPIs create IDs for page pools. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-28neighbour: Fix __randomize_layout crash in struct neighbourGustavo A. R. Silva
Previously, one-element and zero-length arrays were treated as true flexible arrays, even though they are actually "fake" flex arrays. The __randomize_layout would leave them untouched at the end of the struct, similarly to proper C99 flex-array members. However, this approach changed with commit 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays"). Now, only C99 flexible-array members will remain untouched at the end of the struct, while one-element and zero-length arrays will be subject to randomization. Fix a `__randomize_layout` crash in `struct neighbour` by transforming zero-length array `primary_key` into a proper C99 flexible-array member. Fixes: 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays") Closes: https://lore.kernel.org/linux-hardening/20231124102458.GB1503258@e124191.cambridge.arm.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/ZWJoRsJGnCPdJ3+2@work Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-27Merge tag 'wireless-next-2023-11-27' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.8 The first features pull request for v6.8. Not so big in number of commits but we removed quite a few ancient drivers: libertas 16-bit PCMCIA support, atmel, hostap, zd1201, orinoco, ray_cs, wl3501 and rndis_wlan. Major changes: cfg80211/mac80211 - extend support for scanning while Multi-Link Operation (MLO) connected * tag 'wireless-next-2023-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (68 commits) wifi: nl80211: Documentation update for NL80211_CMD_PORT_AUTHORIZED event wifi: mac80211: Extend support for scanning while MLO connected wifi: cfg80211: Extend support for scanning while MLO connected wifi: ieee80211: fix PV1 frame control field name rfkill: return ENOTTY on invalid ioctl MAINTAINERS: update iwlwifi maintainers wifi: rtw89: 8922a: read efuse content from physical map wifi: rtw89: 8922a: read efuse content via efuse map struct from logic map wifi: rtw89: 8852c: read RX gain offset from efuse for 6GHz channels wifi: rtw89: mac: add to access efuse for WiFi 7 chips wifi: rtw89: mac: use mac_gen pointer to access about efuse wifi: rtw89: 8922a: add 8922A basic chip info wifi: rtlwifi: drop unused const_amdpci_aspm wifi: mwifiex: mwifiex_process_sleep_confirm_resp(): remove unused priv variable wifi: rtw89: regd: update regulatory map to R65-R44 wifi: rtw89: regd: handle policy of 6 GHz according to BIOS wifi: rtw89: acpi: process 6 GHz band policy from DSM wifi: rtlwifi: simplify rtl_action_proc() and rtl_tx_agg_start() wifi: rtw89: pci: update interrupt mitigation register for 8922AE wifi: rtw89: pci: correct interrupt mitigation register for 8852CE ... ==================== Link: https://lore.kernel.org/r/20231127180056.0B48DC433C8@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-27net :mana :Add remaining GDMA stats for MANA to ethtoolShradha Gupta
Extend performance counter stats in 'ethtool -S <interface>' for MANA VF to include all GDMA stat counter. Tested-on: Ubuntu22 Testcases: 1. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic 2. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Link: https://lore.kernel.org/r/1700830950-803-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-27wifi: cfg80211: add locked debugfs wrappersJohannes Berg
Add wrappers for debugfs files that should be called with the wiphy mutex held, while the file is also to be removed under the wiphy mutex. This could otherwise deadlock when a file is trying to acquire the wiphy mutex while the code removing it holds the mutex but waits for the removal. This actually works by pushing the execution of the read or write handler to a wiphy work that can be cancelled using the debugfs cancellation API. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-24wifi: cfg80211: Extend support for scanning while MLO connectedIlan Peer
To extend the support of TSF accounting in scan results for MLO connections, allow to indicate in the scan request the link ID corresponding to the BSS whose TSF should be used for the TSF accounting. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20231113112844.d4490bcdefb1.I8fcd158b810adddef4963727e9153096416b30ce@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-24net/smc: add sysctl for max conns per lgr for SMC-R v2.1Guangguan Wang
Add a new sysctl: net.smc.smcr_max_conns_per_lgr, which is used to control the preferred max connections per lgr for SMC-R v2.1. The default value of this sysctl is 255, and the acceptable value ranges from 16 to 255. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-24net/smc: add sysctl for max links per lgr for SMC-R v2.1Guangguan Wang
Add a new sysctl: net.smc.smcr_max_links_per_lgr, which is used to control the preferred max links per lgr for SMC-R v2.1. The default value of this sysctl is 2, and the acceptable value ranges from 1 to 2. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/intel/ice/ice_main.c c9663f79cd82 ("ice: adjust switchdev rebuild path") 7758017911a4 ("ice: restore timestamp configuration after device reset") https://lore.kernel.org/all/20231121211259.3348630-1-anthony.l.nguyen@intel.com/ Adjacent changes: kernel/bpf/verifier.c bb124da69c47 ("bpf: keep track of max number of bpf_loop callback iterations") 5f99f312bd3b ("bpf: add register bounds sanity checks and sanitization") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21net: page_pool: avoid touching slow on the fastpathJakub Kicinski
To fully benefit from previous commit add one byte of state in the first cache line recording if we need to look at the slow part. The packing isn't all that impressive right now, we create a 7B hole. I'm expecting Olek's rework will reshuffle this, anyway. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20231121000048.789613-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21net: page_pool: split the page_pool_params into fast and slowJakub Kicinski
struct page_pool is rather performance critical and we use 16B of the first cache line to store 2 pointers used only by test code. Future patches will add more informational (non-fast path) attributes. It's convenient for the user of the API to not have to worry which fields are fast and which are slow path. Use struct groups to split the params into the two categories internally. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20231121000048.789613-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-20bpf, netkit: Add indirect call wrapper for fetching peer devDaniel Borkmann
ndo_get_peer_dev is used in tcx BPF fast path, therefore make use of indirect call wrapper and therefore optimize the bpf_redirect_peer() internal handling a bit. Add a small skb_get_peer_dev() wrapper which utilizes the INDIRECT_CALL_1() macro instead of open coding. Future work could potentially add a peer pointer directly into struct net_device in future and convert veth and netkit over to use it so that eventually ndo_get_peer_dev can be removed. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231114004220.6495-7-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-20ieee802154: Give the user the association listMiquel Raynal
Upon request, we must be able to provide to the user the list of associations currently in place. Let's add a new netlink command and attribute for this purpose. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/linux-wpan/20230927181214.129346-12-miquel.raynal@bootlin.com
2023-11-20mac802154: Follow the number of associated devicesMiquel Raynal
Track the count of associated devices. Limit the number of associations using the value provided by the user if any. If we reach the maximum number of associations, we tell the device we are at capacity. If the user do not want to accept any more associations, it may specify the value 0 to the maximum number of associations, which will lead to an access denied error status returned to the peers trying to associate. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/linux-wpan/20230927181214.129346-10-miquel.raynal@bootlin.com
2023-11-20ieee802154: Add support for limiting the number of associated devicesMiquel Raynal
Coordinators may refuse associations. We need a user input for that. Let's add a new netlink command which can provide a maximum number of devices we accept to associate with as a first step. Later, we could also forward the request to userspace and check whether the association should be accepted or not. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/linux-wpan/20230927181214.129346-9-miquel.raynal@bootlin.com