summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-10-12inet: ping: fix recent breakageEric Dumazet
Blamed commit broke the assumption used by ping sendmsg() that allocated skb would have MAX_HEADER bytes in skb->head. This patch changes the way ping works, by making sure the skb head contains space for the icmp header, and adjusting ping_getfrag() which was desperate about going past the icmp header :/ This is adopting what UDP does, mostly. syzbot is able to crash a host using both kfence and following repro in a loop. fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6) connect(fd, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28 sendmsg(fd, {msg_name=NULL, msg_namelen=0, msg_iov=[ {iov_base="\200\0\0\0\23\0\0\0\0\0\0\0\0\0"..., iov_len=65496}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 When kfence triggers, skb->head only has 64 bytes, immediately followed by struct skb_shared_info (no extra headroom based on ksize(ptr)) Then icmpv6_push_pending_frames() is overwriting first bytes of skb_shinfo(skb), making nr_frags bigger than MAX_SKB_FRAGS, and/or setting shinfo->gso_size to a non zero value. If nr_frags is mangled, a crash happens in skb_release_data() If gso_size is mangled, we have the following report: lo: caps=(0x00000516401d7c69, 0x00000516401d7c69) WARNING: CPU: 0 PID: 7548 at net/core/dev.c:3239 skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239 Modules linked in: CPU: 0 PID: 7548 Comm: syz-executor268 Not tainted 6.0.0-syzkaller-02754-g557f050166e5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 RIP: 0010:skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239 Code: 70 03 00 00 e8 58 c3 24 fa 4c 8d a5 e8 00 00 00 e8 4c c3 24 fa 4c 89 e9 4c 89 e2 4c 89 f6 48 c7 c7 00 53 f5 8a e8 13 ac e7 01 <0f> 0b 5b 5d 41 5c 41 5d 41 5e e9 28 c3 24 fa e8 23 c3 24 fa 48 89 RSP: 0018:ffffc9000366f3e8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88807a9d9d00 RCX: 0000000000000000 RDX: ffff8880780c0000 RSI: ffffffff8160f6f8 RDI: fffff520006cde6f RBP: ffff888079952000 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12: ffff8880799520e8 R13: ffff88807a9da070 R14: ffff888079952000 R15: 0000000000000000 FS: 0000555556be6300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020010000 CR3: 000000006eb7b000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> gso_features_check net/core/dev.c:3521 [inline] netif_skb_features+0x83e/0xb90 net/core/dev.c:3554 validate_xmit_skb+0x2b/0xf10 net/core/dev.c:3659 __dev_queue_xmit+0x998/0x3ad0 net/core/dev.c:4248 dev_queue_xmit include/linux/netdevice.h:3008 [inline] neigh_hh_output include/net/neighbour.h:530 [inline] neigh_output include/net/neighbour.h:544 [inline] ip6_finish_output2+0xf97/0x1520 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xaf/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xb7/0x340 net/ipv6/ip6_output.c:1966 ip6_push_pending_frames+0xdd/0x100 net/ipv6/ip6_output.c:1986 icmpv6_push_pending_frames+0x2af/0x490 net/ipv6/icmp.c:303 ping_v6_sendmsg+0xc44/0x1190 net/ipv6/ping.c:190 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 ____sys_sendmsg+0x712/0x8c0 net/socket.c:2482 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536 __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f21aab42b89 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff1729d038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f21aab42b89 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d R10: 000000000000000d R11: 0000000000000246 R12: 00007fff1729d050 R13: 00000000000f4240 R14: 0000000000021dd1 R15: 00007fff1729d044 </TASK> Fixes: 47cf88993c91 ("net: unify alloclen calculation for paged requests") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-12ipv6: ping: fix wrong checksum for large framesEric Dumazet
For a given ping datagram, ping_getfrag() is called once per skb fragment. A large datagram requiring more than one page fragment is currently getting the checksum of the last fragment, instead of the cumulative one. After this patch, "ping -s 35000 ::1" is working correctly. Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-11Merge tag 'wireless-2022-10-11' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.1 First set of fixes for v6.1. Quite a lot of fixes in stack but also for mt76. cfg80211/mac80211 - fix locking error in mac80211's hw addr change - fix TX queue stop for internal TXQs - handling of very small (e.g. STP TCN) packets - two memcpy() hardening fixes - fix probe request 6 GHz capability warning - fix various connection prints - fix decapsulation offload for AP VLAN mt76 - fix rate reporting, LLC packets and receive checksum offload on specific chipsets iwlwifi - fix crash due to list corruption ath11k - fix a compiler warning with GCC 11 and KASAN * tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases) wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921 wifi: mt76: fix receiving LLC packets on mt7615/mt7915 wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array wifi: wext: use flex array destination for memcpy() wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets wifi: mac80211: netdev compatible TX stop for iTXQ drivers wifi: mac80211: fix decap offload for stations on AP_VLAN interfaces wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change() wifi: mac80211: remove/avoid misleading prints wifi: mac80211: fix probe req HE capabilities access wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx wifi: mt76: fix rate reporting / throughput regression on mt7915 and newer ==================== Link: https://lore.kernel.org/r/20221011163123.A093CC433D6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-11treewide: use get_random_bytes() when possibleJason A. Donenfeld
The prandom_bytes() function has been a deprecated inline wrapper around get_random_bytes() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use get_random_{u8,u16}() when possible, part 2Jason A. Donenfeld
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done by hand, identifying all of the places where one of the random integer functions was used in a non-32-bit context. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use get_random_{u8,u16}() when possible, part 1Jason A. Donenfeld
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done mechanically with this coccinelle script: @@ expression E; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; typedef __be16; typedef __le16; typedef u8; @@ ( - (get_random_u32() & 0xffff) + get_random_u16() | - (get_random_u32() & 0xff) + get_random_u8() | - (get_random_u32() % 65536) + get_random_u16() | - (get_random_u32() % 256) + get_random_u8() | - (get_random_u32() >> 16) + get_random_u16() | - (get_random_u32() >> 24) + get_random_u8() | - (u16)get_random_u32() + get_random_u16() | - (u8)get_random_u32() + get_random_u8() | - (__be16)get_random_u32() + (__be16)get_random_u16() | - (__le16)get_random_u32() + (__le16)get_random_u16() | - prandom_u32_max(65536) + get_random_u16() | - prandom_u32_max(256) + get_random_u8() | - E->inet_id = get_random_u32() + E->inet_id = get_random_u16() ) @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; identifier v; @@ - u16 v = get_random_u32(); + u16 v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; identifier v; @@ - u8 v = get_random_u32(); + u8 v = get_random_u8(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; u16 v; @@ - v = get_random_u32(); + v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; u8 v; @@ - v = get_random_u32(); + v = get_random_u8(); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Examine limits @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value < 256: coccinelle.RESULT = cocci.make_ident("get_random_u8") elif value < 65536: coccinelle.RESULT = cocci.make_ident("get_random_u16") else: print("Skipping large mask of %s" % (literal)) cocci.include_match(False) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; identifier add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + (RESULT() & LITERAL) Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-10Merge tag '9p-for-6.1' of https://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: "Smaller buffers for small messages and fixes. The highlight of this is Christian's patch to allocate smaller buffers for most metadata requests: 9p with a big msize would try to allocate large buffers when just 4 or 8k would be more than enough; this brings in nice performance improvements. There's also a few fixes for problems reported by syzkaller (thanks to Schspa Shi, Tetsuo Handa for tests and feedback/patches) as well as some minor cleanup" * tag '9p-for-6.1' of https://github.com/martinetd/linux: net/9p: clarify trans_fd parse_opt failure handling net/9p: add __init/__exit annotations to module init/exit funcs net/9p: use a dedicated spinlock for trans_fd 9p/trans_fd: always use O_NONBLOCK read/write net/9p: allocate appropriate reduced message buffers net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module net/9p: add p9_msg_buf_size() 9p: add P9_ERRMAX for 9p2000 and 9p2000.u net/9p: split message size argument into 't_size' and 'r_size' pair 9p: trans_fd/p9_conn_cancel: drop client lock earlier
2022-10-10Merge tag 'cgroup-for-6.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cpuset now support isolated cpus.partition type, which will enable dynamic CPU isolation - pids.peak added to remember the max number of pids used - holes in cgroup namespace plugged - internal cleanups * tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits) cgroup: use strscpy() is more robust and safer iocost_monitor: reorder BlkgIterator cgroup: simplify code in cgroup_apply_control cgroup: Make cgroup_get_from_id() prettier cgroup/cpuset: remove unreachable code cgroup: Remove CFTYPE_PRESSURE cgroup: Improve cftype add/rm error handling kselftest/cgroup: Add cpuset v2 partition root state test cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule cgroup/cpuset: Relocate a code block in validate_change() cgroup/cpuset: Show invalid partition reason string cgroup/cpuset: Add a new isolated cpus.partition type cgroup/cpuset: Relax constraints to partition & cpus changes cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective cgroup/cpuset: Miscellaneous cleanups & add helper functions cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset cgroup: add pids.peak interface for pids controller cgroup: Remove data-race around cgrp_dfl_visible cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG ...
2022-10-10Merge tag 'sched-core-2022-10-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Debuggability: - Change most occurances of BUG_ON() to WARN_ON_ONCE() - Reorganize & fix TASK_ state comparisons, turn it into a bitmap - Update/fix misc scheduler debugging facilities Load-balancing & regular scheduling: - Improve the behavior of the scheduler in presence of lot of SCHED_IDLE tasks - in particular they should not impact other scheduling classes. - Optimize task load tracking, cleanups & fixes - Clean up & simplify misc load-balancing code Freezer: - Rewrite the core freezer to behave better wrt thawing and be simpler in general, by replacing PF_FROZEN with TASK_FROZEN & fixing/adjusting all the fallout. Deadline scheduler: - Fix the DL capacity-aware code - Factor out dl_task_is_earliest_deadline() & replenish_dl_new_period() - Relax/optimize locking in task_non_contending() Cleanups: - Factor out the update_current_exec_runtime() helper - Various cleanups, simplifications" * tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) sched: Fix more TASK_state comparisons sched: Fix TASK_state comparisons sched/fair: Move call to list_last_entry() in detach_tasks sched/fair: Cleanup loop_max and loop_break sched/fair: Make sure to try to detach at least one movable task sched: Show PF_flag holes freezer,sched: Rewrite core freezer logic sched: Widen TAKS_state literals sched/wait: Add wait_event_state() sched/completion: Add wait_for_completion_state() sched: Add TASK_ANY for wait_task_inactive() sched: Change wait_task_inactive()s match_state freezer,umh: Clean up freezer/initrd interaction freezer: Have {,un}lock_system_sleep() save/restore flags sched: Rename task_running() to task_on_cpu() sched/fair: Cleanup for SIS_PROP sched/fair: Default to false in test_idle_cores() sched/fair: Remove useless check in select_idle_core() sched/fair: Avoid double search on same cpu sched/fair: Remove redundant check in select_idle_smt() ...
2022-10-10wifi: cfg80211: update hidden BSSes to avoid WARN_ONJohannes Berg
When updating beacon elements in a non-transmitted BSS, also update the hidden sub-entries to the same beacon elements, so that a future update through other paths won't trigger a WARN_ON(). The warning is triggered because the beacon elements in the hidden BSSes that are children of the BSS should always be the same as in the parent. Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-10wifi: mac80211: fix crash in beacon protection for P2P-deviceJohannes Berg
If beacon protection is active but the beacon cannot be decrypted or is otherwise malformed, we call the cfg80211 API to report this to userspace, but that uses a netdev pointer, which isn't present for P2P-Device. Fix this to call it only conditionally to ensure cfg80211 won't crash in the case of P2P-Device. This fixes CVE-2022-42722. Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: 9eaf183af741 ("mac80211: Report beacon protection failures to user space") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-10wifi: cfg80211: avoid nontransmitted BSS list corruptionJohannes Berg
If a non-transmitted BSS shares enough information (both SSID and BSSID!) with another non-transmitted BSS of a different AP, then we can find and update it, and then try to add it to the non-transmitted BSS list. We do a search for it on the transmitted BSS, but if it's not there (but belongs to another transmitted BSS), the list gets corrupted. Since this is an erroneous situation, simply fail the list insertion in this case and free the non-transmitted BSS. This fixes CVE-2022-42721. Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-10wifi: cfg80211: fix BSS refcounting bugsJohannes Berg
There are multiple refcounting bugs related to multi-BSSID: - In bss_ref_get(), if the BSS has a hidden_beacon_bss, then the bss pointer is overwritten before checking for the transmitted BSS, which is clearly wrong. Fix this by using the bss_from_pub() macro. - In cfg80211_bss_update() we copy the transmitted_bss pointer from tmp into new, but then if we release new, we'll unref it erroneously. We already set the pointer and ref it, but need to NULL it since it was copied from the tmp data. - In cfg80211_inform_single_bss_data(), if adding to the non- transmitted list fails, we unlink the BSS and yet still we return it, but this results in returning an entry without a reference. We shouldn't return it anyway if it was broken enough to not get added there. This fixes CVE-2022-42720. Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: a3584f56de1c ("cfg80211: Properly track transmitting and non-transmitting BSS") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-10wifi: cfg80211: ensure length byte is present before accessJohannes Berg
When iterating the elements here, ensure the length byte is present before checking it to see if the entire element will fit into the buffer. Longer term, we should rewrite this code using the type-safe element iteration macros that check all of this. Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Reported-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-10wifi: mac80211: fix MBSSID parsing use-after-freeJohannes Berg
When we parse a multi-BSSID element, we might point some element pointers into the allocated nontransmitted_profile. However, we free this before returning, causing UAF when the relevant pointers in the parsed elements are accessed. Fix this by not allocating the scratch buffer separately but as part of the returned structure instead, that way, there are no lifetime issues with it. The scratch buffer introduction as part of the returned data here is taken from MLO feature work done by Ilan. This fixes CVE-2022-42719. Fixes: 5023b14cf4df ("mac80211: support profile split between elements") Co-developed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-10wifi: cfg80211/mac80211: reject bad MBSSID elementsJohannes Berg
Per spec, the maximum value for the MaxBSSID ('n') indicator is 8, and the minimum is 1 since a multiple BSSID set with just one BSSID doesn't make sense (the # of BSSIDs is limited by 2^n). Limit this in the parsing in both cfg80211 and mac80211, rejecting any elements with an invalid value. This fixes potentially bad shifts in the processing of these inside the cfg80211_gen_new_bssid() function later. I found this during the investigation of CVE-2022-41674 fixed by the previous patch. Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Fixes: 78ac51f81532 ("mac80211: support multi-bssid") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-10wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans()Johannes Berg
In the copy code of the elements, we do the following calculation to reach the end of the MBSSID element: /* copy the IEs after MBSSID */ cpy_len = mbssid[1] + 2; This looks fine, however, cpy_len is a u8, the same as mbssid[1], so the addition of two can overflow. In this case the subsequent memcpy() will overflow the allocated buffer, since it copies 256 bytes too much due to the way the allocation and memcpy() sizes are calculated. Fix this by using size_t for the cpy_len variable. This fixes CVE-2022-41674. Reported-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de> Tested-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-09net: dsa: fix wrong pointer passed to PTR_ERR() in dsa_port_phylink_create()Yang Yingliang
Fix wrong pointer passed to PTR_ERR() in dsa_port_phylink_create() to print error message. Fixes: cf5ca4ddc37a ("net: dsa: don't leave dangling pointers in dp->pl when failing") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-07Merge tag 'driver-core-6.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core and debug printk changes for 6.1-rc1. Included in here is: - dynamic debug updates for the core and the drm subsystem. The drm changes have all been acked by the relevant maintainers - kernfs fixes for syzbot reported problems - kernfs refactors and updates for cgroup requirements - magic number cleanups and removals from the kernel tree (they were not being used and they really did not actually do anything) - other tiny cleanups All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (74 commits) docs: filesystems: sysfs: Make text and code for ->show() consistent Documentation: NBD_REQUEST_MAGIC isn't a magic number a.out: restore CMAGIC device property: Add const qualifier to device_get_match_data() parameter drm_print: add _ddebug descriptor to drm_*dbg prototypes drm_print: prefer bare printk KERN_DEBUG on generic fn drm_print: optimize drm_debug_enabled for jump-label drm-print: add drm_dbg_driver to improve namespace symmetry drm-print.h: include dyndbg header drm_print: wrap drm_*_dbg in dyndbg descriptor factory macro drm_print: interpose drm_*dbg with forwarding macros drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers. drm_print: condense enum drm_debug_category debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops driver core: use IS_ERR_OR_NULL() helper in device_create_groups_vargs() Documentation: ENI155_MAGIC isn't a magic number Documentation: NBD_REPLY_MAGIC isn't a magic number nbd: remove define-only NBD_MAGIC, previously magic number Documentation: FW_HEADER_MAGIC isn't a magic number Documentation: EEPROM_MAGIC_VALUE isn't a magic number ...
2022-10-07Merge tag 'tty-6.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the big set of TTY and Serial driver updates for 6.1-rc1. Lots of cleanups in here, no real new functionality this time around, with the diffstat being that we removed more lines than we added! Included in here are: - termios unification cleanups from Al Viro, it's nice to finally get this work done - tty serial transmit cleanups in various drivers in preparation for more cleanup and unification in future releases (that work was not ready for this release) - n_gsm fixes and updates - ktermios cleanups and code reductions - dt bindings json conversions and updates for new devices - some serial driver updates for new devices - lots of other tiny cleanups and janitorial stuff. Full details in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'tty-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (102 commits) serial: cpm_uart: Don't request IRQ too early for console port tty: serial: do unlock on a common path in altera_jtaguart_console_putc() tty: serial: unify TX space reads under altera_jtaguart_tx_space() tty: serial: use FIELD_GET() in lqasc_tx_ready() tty: serial: extend lqasc_tx_ready() to lqasc_console_putchar() tty: serial: allow pxa.c to be COMPILE_TESTed serial: stm32: Fix unused-variable warning tty: serial: atmel: Add COMMON_CLK dependency to SERIAL_ATMEL serial: 8250: Fix restoring termios speed after suspend serial: Deassert Transmit Enable on probe in driver-specific way serial: 8250_dma: Convert to use uart_xmit_advance() serial: 8250_omap: Convert to use uart_xmit_advance() MAINTAINERS: Solve warning regarding inexistent atmel-usart binding serial: stm32: Deassert Transmit Enable on ->rs485_config() serial: ar933x: Deassert Transmit Enable on ->rs485_config() tty: serial: atmel: Use FIELD_PREP/FIELD_GET tty: serial: atmel: Make the driver aware of the existence of GCLK tty: serial: atmel: Only divide Clock Divisor if the IP is USART tty: serial: atmel: Separate mode clearing between UART and USART dt-bindings: serial: atmel,at91-usart: Add gclk as a possible USART clock ...
2022-10-07wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token ↵Kees Cook
flexible array To work around a misbehavior of the compiler's ability to see into composite flexible array structs (as detailed in the coming memcpy() hardening series[1]), split the memcpy() of the header and the payload so no false positive run-time overflow warning will be generated. [1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: wext: use flex array destination for memcpy()Hawkins Jiawei
Syzkaller reports buffer overflow false positive as follows: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 8) of single field "&compat_event->pointer" at net/wireless/wext-core.c:623 (size 4) WARNING: CPU: 0 PID: 3607 at net/wireless/wext-core.c:623 wireless_send_event+0xab5/0xca0 net/wireless/wext-core.c:623 Modules linked in: CPU: 1 PID: 3607 Comm: syz-executor659 Not tainted 6.0.0-rc6-next-20220921-syzkaller #0 [...] Call Trace: <TASK> ioctl_standard_call+0x155/0x1f0 net/wireless/wext-core.c:1022 wireless_process_ioctl+0xc8/0x4c0 net/wireless/wext-core.c:955 wext_ioctl_dispatch net/wireless/wext-core.c:988 [inline] wext_ioctl_dispatch net/wireless/wext-core.c:976 [inline] wext_handle_ioctl+0x26b/0x280 net/wireless/wext-core.c:1049 sock_ioctl+0x285/0x640 net/socket.c:1220 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> Wireless events will be sent on the appropriate channels in wireless_send_event(). Different wireless events may have different payload structure and size, so kernel uses **len** and **cmd** field in struct __compat_iw_event as wireless event common LCP part, uses **pointer** as a label to mark the position of remaining different part. Yet the problem is that, **pointer** is a compat_caddr_t type, which may be smaller than the relative structure at the same position. So during wireless_send_event() tries to parse the wireless events payload, it may trigger the memcpy() run-time destination buffer bounds checking when the relative structure's data is copied to the position marked by **pointer**. This patch solves it by introducing flexible-array field **ptr_bytes**, to mark the position of the wireless events remaining part next to LCP part. What's more, this patch also adds **ptr_len** variable in wireless_send_event() to improve its maintainability. Reported-and-tested-by: syzbot+473754e5af963cf014cf@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/00000000000070db2005e95a5984@google.com/ Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packetsFelix Fietkau
STP topology change notification packets only have a payload of 7 bytes, so they get dropped due to the skb->len < hdrlen + 8 check. Fix this by removing the extra 8 from the skb->len check and checking the return code on the skb_copy_bits calls. Fixes: 2d1c304cb2d5 ("cfg80211: add function for 802.3 conversion with separate output buffer") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: netdev compatible TX stop for iTXQ driversAlexander Wetzel
Properly handle TX stop for internal queues (iTXQs) within mac80211. mac80211 must not stop netdev queues when using mac80211 iTXQs. For these drivers the netdev interface is created with IFF_NO_QUEUE. While netdev still drops frames for IFF_NO_QUEUE interfaces when we stop the netdev queues, it also prints a warning when this happens: Assuming the mac80211 interface is called wlan0 we would get "Virtual device wlan0 asks to queue packet!" when netdev has to drop a frame. This patch is keeping the harmless netdev queue starts for iTXQ drivers. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: fix decap offload for stations on AP_VLAN interfacesFelix Fietkau
Since AP_VLAN interfaces are not passed to the driver, check offload_flags on the bss vif instead. Reported-by: Howard Hsu <howard-yh.hsu@mediatek.com> Fixes: 80a915ec4427 ("mac80211: add rx decapsulation offload support") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change()Dan Carpenter
Unlock before returning -EOPNOTSUPP. Fixes: 3c06e91b40db ("wifi: mac80211: Support POWERED_ADDR_CHANGE feature") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: remove/avoid misleading printsJames Prestwood
At some point a few kernel debug prints started appearing which indicated something was sending invalid IEs: "bad VHT capabilities, disabling VHT" "Invalid HE elem, Disable HE" Turns out these were being printed because the local hardware supported HE/VHT but the peer/AP did not. Bad/invalid indicates, to me at least, that the IE is in some way malformed, not missing. For the HE print (ieee80211_verify_peer_he_mcs_support) it will now silently fail if the HE capability element is missing (still prints if the element size is wrong). For the VHT print, it has been removed completely and will silently set the DISABLE_VHT flag which is consistent with how DISABLE_HT is set. Signed-off-by: James Prestwood <prestwoj@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: fix probe req HE capabilities accessJames Prestwood
When building the probe request IEs HE support is checked for the 6GHz band (wiphy->bands[NL80211_BAND_6GHZ]). If supported the HE capability IE should be included according to the spec. The problem is the 16-bit capability is obtained from the band object (sband) that was passed in, not the 6GHz band object (sband6). If the sband object doesn't support HE it will result in a warning. Fixes: 7d29bc50b30e ("mac80211: always include HE 6GHz capability in probe request") Signed-off-by: James Prestwood <prestwoj@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rxFelix Fietkau
Since STP TCN frames are only 7 bytes, the pskb_may_pull call returns an error. Instead of dropping those packets, bump them back to the slow path for proper processing. Fixes: 49ddf8e6e234 ("mac80211: add fast-rx path") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-10-07net/9p: clarify trans_fd parse_opt failure handlingLi Zhong
This parse_opts will set invalid opts.rfd/wfd in case of failure which we already check, but it is not clear for readers that parse_opts error are handled in p9_fd_create: clarify this by explicitely checking the return value. Link: https://lkml.kernel.org/r/20220921210921.1654735-1-floridsleeves@gmail.com Signed-off-by: Li Zhong <floridsleeves@gmail.com> [Dominique: reworded commit message to clarify this is NOOP] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-07net/9p: add __init/__exit annotations to module init/exit funcsXiu Jianfeng
xen transport was missing annotations Link: https://lkml.kernel.org/r/20220909103546.73015-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-07net/9p: use a dedicated spinlock for trans_fdDominique Martinet
Shamelessly copying the explanation from Tetsuo Handa's suggested patch[1] (slightly reworded): syzbot is reporting inconsistent lock state in p9_req_put()[2], for p9_tag_remove() from p9_req_put() from IRQ context is using spin_lock_irqsave() on "struct p9_client"->lock but trans_fd (not from IRQ context) is using spin_lock(). Since the locks actually protect different things in client.c and in trans_fd.c, just replace trans_fd.c's lock by a new one specific to the transport (client.c's protect the idr for fid/tag allocations, while trans_fd.c's protects its own req list and request status field that acts as the transport's state machine) Link: https://lore.kernel.org/r/20220904112928.1308799-1-asmadeus@codewreck.org Link: https://lkml.kernel.org/r/2470e028-9b05-2013-7198-1fdad071d999@I-love.SAKURA.ne.jp [1] Link: https://syzkaller.appspot.com/bug?extid=2f20b523930c32c160cc [2] Reported-by: syzbot <syzbot+2f20b523930c32c160cc@syzkaller.appspotmail.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-07ipv4: Handle attempt to delete multipath route when fib_info contains an nh ↵David Ahern
reference Gwangun Jung reported a slab-out-of-bounds access in fib_nh_match: fib_nh_match+0xf98/0x1130 linux-6.0-rc7/net/ipv4/fib_semantics.c:961 fib_table_delete+0x5f3/0xa40 linux-6.0-rc7/net/ipv4/fib_trie.c:1753 inet_rtm_delroute+0x2b3/0x380 linux-6.0-rc7/net/ipv4/fib_frontend.c:874 Separate nexthop objects are mutually exclusive with the legacy multipath spec. Fix fib_nh_match to return if the config for the to be deleted route contains a multipath spec while the fib_info is using a nexthop object. Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects") Fixes: 6bf92d70e690 ("net: ipv4: fix route with nexthop object delete warning") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-07net: ieee802154: fix error return code in dgram_bind()Wei Yongjun
Fix to return error code -EINVAL from the error handling case instead of 0, as done elsewhere in this function. Fixes: 94160108a70c ("net/ieee802154: fix uninit value bug in dgram_sendmsg") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Link: https://lore.kernel.org/r/20220919160830.1436109-1-weiyongjun@huaweicloud.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-10-079p/trans_fd: always use O_NONBLOCK read/writeTetsuo Handa
syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is failing to interrupt already started kernel_read() from p9_fd_read() from p9_read_work() and/or kernel_write() from p9_fd_write() from p9_write_work() requests. Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open() does not set O_NONBLOCK flag, but pipe blocks unless signal is pending, p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when the file descriptor refers to a pipe. In other words, pipe file descriptor needs to be handled as if socket file descriptor. We somehow need to interrupt kernel_read()/kernel_write() on pipes. A minimal change, which this patch is doing, is to set O_NONBLOCK flag from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing of regular files. But this approach changes O_NONBLOCK flag on userspace- supplied file descriptors (which might break userspace programs), and O_NONBLOCK flag could be changed by userspace. It would be possible to set O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still remains small race window for clearing O_NONBLOCK flag. If we don't want to manipulate O_NONBLOCK flag, we might be able to surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING) and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are processed by kernel threads which process global system_wq workqueue, signals could not be delivered from remote threads when p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be needed if we count on signals for making kernel_read()/kernel_write() non-blocking. Link: https://lkml.kernel.org/r/345de429-a88b-7097-d177-adecf9fed342@I-love.SAKURA.ne.jp Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1] Reported-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: add comment at Christian's suggestion] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-06Merge tag 'pull-d_path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs d_path updates from Al Viro. * tag 'pull-d_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: d_path.c: typo fix... dynamic_dname(): drop unused dentry argument
2022-10-06SUNRPC: Add API to force the client to disconnectTrond Myklebust
Allow the caller to force a disconnection of the RPC client so that we can clear any pending requests that are buffered in the socket. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-06SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC callsTrond Myklebust
Add the helper rpc_cancel_tasks(), which uses a caller-defined selection function to define a set of in-flight RPC calls to cancel. This is mainly intended for pNFS drivers which are subject to a layout recall, and which may therefore want to cancel all pending I/O using that layout in order to redrive it after the layout recall has been satisfied. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-06SUNRPC: Fix races with rpc_killall_tasks()Trond Myklebust
Ensure that we immediately call rpc_exit_task() after waking up, and that the tk_rpc_status cannot get clobbered by some other function. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-05Merge tag 'ieee802154-for-net-2022-10-05' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2022-10-05 Only two patches this time around. A revert from Alexander Aring to a patch that hit net and the updated patch to fix the problem from Tetsuo Handa. * tag 'ieee802154-for-net-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: net/ieee802154: don't warn zero-sized raw_sendmsg() Revert "net/ieee802154: reject zero-sized raw_sendmsg()" ==================== Link: https://lore.kernel.org/r/20221005144508.787376-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-05Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo ↵Vladimir Oltean
child qdiscs" taprio_attach() has this logic at the end, which should have been removed with the blamed patch (which is now being reverted): /* access to the child qdiscs is not needed in offload mode */ if (FULL_OFFLOAD_IS_ENABLED(q->flags)) { kfree(q->qdiscs); q->qdiscs = NULL; } because otherwise, we make use of q->qdiscs[] even after this array was deallocated, namely in taprio_leaf(). Therefore, whenever one would try to attach a valid child qdisc to a fully offloaded taprio root, one would immediately dereference a NULL pointer. $ tc qdisc replace dev eno0 handle 8001: parent root taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ max-sdu 0 0 0 0 0 200 0 0 \ base-time 200 \ sched-entry S 80 20000 \ sched-entry S a0 20000 \ sched-entry S 5f 60000 \ flags 2 $ max_frame_size=1500 $ data_rate_kbps=20000 $ port_transmit_rate_kbps=1000000 $ idleslope=$data_rate_kbps $ sendslope=$(($idleslope - $port_transmit_rate_kbps)) $ locredit=$(($max_frame_size * $sendslope / $port_transmit_rate_kbps)) $ hicredit=$(($max_frame_size * $idleslope / $port_transmit_rate_kbps)) $ tc qdisc replace dev eno0 parent 8001:7 cbs \ idleslope $idleslope \ sendslope $sendslope \ hicredit $hicredit \ locredit $locredit \ offload 0 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030 pc : taprio_leaf+0x28/0x40 lr : qdisc_leaf+0x3c/0x60 Call trace: taprio_leaf+0x28/0x40 tc_modify_qdisc+0xf0/0x72c rtnetlink_rcv_msg+0x12c/0x390 netlink_rcv_skb+0x5c/0x130 rtnetlink_rcv+0x1c/0x2c The solution is not as obvious as the problem. The code which deallocates q->qdiscs[] is in fact copied and pasted from mqprio, which also deallocates the array in mqprio_attach() and never uses it afterwards. Therefore, the identical cleanup logic of priv->qdiscs[] that mqprio_destroy() has is deceptive because it will never take place at qdisc_destroy() time, but just at raw ops->destroy() time (otherwise said, priv->qdiscs[] do not last for the entire lifetime of the mqprio root), but rather, this is just the twisted way in which the Qdisc API understands error path cleanup should be done (Qdisc_ops :: destroy() is called even when Qdisc_ops :: init() never succeeded). Side note, in fact this is also what the comment in mqprio_init() says: /* pre-allocate qdisc, attachment can't fail */ Or reworded, mqprio's priv->qdiscs[] scheme is only meant to serve as data passing between Qdisc_ops :: init() and Qdisc_ops :: attach(). [ this comment was also copied and pasted into the initial taprio commit, even though taprio_attach() came way later ] The problem is that taprio also makes extensive use of the q->qdiscs[] array in the software fast path (taprio_enqueue() and taprio_dequeue()), but it does not keep a reference of its own on q->qdiscs[i] (you'd think that since it creates these Qdiscs, it holds the reference, but nope, this is not completely true). To understand the difference between taprio_destroy() and mqprio_destroy() one must look before commit 13511704f8d7 ("net: taprio offload: enforce qdisc to netdev queue mapping"), because that just muddied the waters. In the "original" taprio design, taprio always attached itself (the root Qdisc) to all netdev TX queues, so that dev_qdisc_enqueue() would go through taprio_enqueue(). It also called qdisc_refcount_inc() on itself for as many times as there were netdev TX queues, in order to counter-balance what tc_get_qdisc() does when destroying a Qdisc (simplified for brevity below): if (n->nlmsg_type == RTM_DELQDISC) err = qdisc_graft(dev, parent=NULL, new=NULL, q, extack); qdisc_graft(where "new" is NULL so this deletes the Qdisc): for (i = 0; i < num_q; i++) { struct netdev_queue *dev_queue; dev_queue = netdev_get_tx_queue(dev, i); old = dev_graft_qdisc(dev_queue, new); if (new && i > 0) qdisc_refcount_inc(new); qdisc_put(old); ~~~~~~~~~~~~~~ this decrements taprio's refcount once for each TX queue } notify_and_destroy(net, skb, n, classid, rtnl_dereference(dev->qdisc), new); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and this finally decrements it to zero, making qdisc_put() call qdisc_destroy() The q->qdiscs[] created using qdisc_create_dflt() (or their replacements, if taprio_graft() was ever to get called) were then privately freed by taprio_destroy(). This is still what is happening after commit 13511704f8d7 ("net: taprio offload: enforce qdisc to netdev queue mapping"), but only for software mode. In full offload mode, the per-txq "qdisc_put(old)" calls from qdisc_graft() now deallocate the child Qdiscs rather than decrement taprio's refcount. So when notify_and_destroy(taprio) finally calls taprio_destroy(), the difference is that the child Qdiscs were already deallocated. And this is exactly why the taprio_attach() comment "access to the child qdiscs is not needed in offload mode" is deceptive too. Not only the q->qdiscs[] array is not needed, but it is also necessary to get rid of it as soon as possible, because otherwise, we will also call qdisc_put() on the child Qdiscs in qdisc_destroy() -> taprio_destroy(), and this will cause a nasty use-after-free/refcount-saturate/whatever. In short, the problem is that since the blamed commit, taprio_leaf() needs q->qdiscs[] to not be freed by taprio_attach(), while qdisc_destroy() -> taprio_destroy() does need q->qdiscs[] to be freed by taprio_attach() for full offload. Fixing one problem triggers the other. All of this can be solved by making taprio keep its q->qdiscs[i] with a refcount elevated at 2 (in offloaded mode where they are attached to the netdev TX queues), both in taprio_attach() and in taprio_graft(). The generic qdisc_graft() would just decrement the child qdiscs' refcounts to 1, and taprio_destroy() would give them the final coup de grace. However the rabbit hole of changes is getting quite deep, and the complexity increases. The blamed commit was supposed to be a bug fix in the first place, and the bug it addressed is not so significant so as to justify further rework in stable trees. So I'd rather just revert it. I don't know enough about multi-queue Qdisc design to make a proper judgement right now regarding what is/isn't idiomatic use of Qdisc concepts in taprio. I will try to study the problem more and come with a different solution in net-next. Fixes: 1461d212ab27 ("net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs") Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20221004220100.1650558-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-05xprtrdma: Fix uninitialized variableChuck Lever
net/sunrpc/xprtrdma/frwr_ops.c:151:32: warning: variable 'rc' is uninitialized when used here [-Wuninitialized] trace_xprtrdma_frwr_alloc(mr, rc); ^~ net/sunrpc/xprtrdma/frwr_ops.c:127:8: note: initialize the variable 'rc' to silence this warning int rc; ^ = 0 1 warning generated. The tracepoint is intended to record the error returned from ib_alloc_mr(). In the current code there is no other purpose for @rc, so simply replace it. Reported-by: kernel test robot <lkp@intel.com> Fixes: d8cf39a280c3b0 ('xprtrdma: MR-related memory allocation should be allowed to fail') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-05xprtrdma: Prevent memory allocations from driving a reclaimChuck Lever
Many memory allocations that xprtrdma does can fail safely. Let's use this fact to avoid some potential deadlocks: Replace GFP_KERNEL with GFP flags that do not try hard to acquire memory. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-05xprtrdma: Memory allocation should be allowed to fail during connectChuck Lever
An attempt to establish a connection can always fail and then be retried. GFP_KERNEL allocation is not necessary here. Like MR allocation, establishing a connection is always done in a worker thread. The new GFP flags align with the flags that would be returned by rpc_task_gfp_mask() in this case. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-05xprtrdma: MR-related memory allocation should be allowed to failChuck Lever
xprtrdma always drives a retry of MR allocation if it should fail. It should be safe to not use GFP_KERNEL for this purpose rather than sleeping in the memory allocator. In theory, if these weaker allocations are attempted first, memory exhaustion is likely to cause xprtrdma to fail fast and not then invoke the RDMA core APIs, which still might use GFP_KERNEL. Also note that rpc_task_gfp_mask() always sets __GFP_NORETRY and __GFP_NOWARN when an RPC-related allocation is being done in a worker thread. MR allocation is already always done in worker threads. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-05xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc()Chuck Lever
Currently all rpcrdma_regbuf_alloc() call sites pass the same value as their third argument. That argument can therefore be eliminated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-05xprtrdma: Clean up synopsis of rpcrdma_req_create()Chuck Lever
Commit 1769e6a816df ("xprtrdma: Clean up rpcrdma_create_req()") added rpcrdma_req_create() with a GFP flags argument in case a caller might want to avoid waiting for memory. There has never been a caller that does not pass GFP_KERNEL as the third argument. That argument can therefore be eliminated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-05svcrdma: Clean up RPCRDMA_DEF_GFPChuck Lever
xprt_rdma_bc_allocate() is now the only user of RPCRDMA_DEF_GFP. Replace that macro with the raw flags. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>