summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-01-15bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()Ziyang Xuan
Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). Main use case is for using cls_bpf on ingress hook to decapsulate IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the new IP header version after decapsulating the outer IP header. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-13sch_htb: Avoid grafting on htb_destroy_class_offload when destroying htbRahul Rameshbabu
Peek at old qdisc and graft only when deleting a leaf class in the htb, rather than when deleting the htb itself. Do not peek at the qdisc of the netdev queue when destroying the htb. The caller may already have grafted a new qdisc that is not part of the htb structure being destroyed. This fix resolves two use cases. 1. Using tc to destroy the htb. - Netdev was being prematurely activated before the htb was fully destroyed. 2. Using tc to replace the htb with another qdisc (which also leads to the htb being destroyed). - Premature netdev activation like previous case. Newly grafted qdisc was also getting accidentally overwritten when destroying the htb. Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maxtram95@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230113005528.302625-1-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13mptcp: netlink: respect v4/v6-only socketsMatthieu Baerts
If an MPTCP socket has been created with AF_INET6 and the IPV6_V6ONLY option has been set, the userspace PM would allow creating subflows using IPv4 addresses, e.g. mapped in v6. The kernel side of userspace PM will also accept creating subflows with local and remote addresses having different families. Depending on the subflow socket's family, different behaviours are expected: - If AF_INET is forced with a v6 address, the kernel will take the last byte of the IP and try to connect to that: a new subflow is created but to a non expected address. - If AF_INET6 is forced with a v4 address, the kernel will try to connect to a v4 address (v4-mapped-v6). A -EBADF error from the connect() part is then expected. It is then required to check the given families can be accepted. This is done by using a new helper for addresses family matching, taking care of IPv4 vs IPv4-mapped-IPv6 addresses. This helper will be re-used later by the in-kernel path-manager to use mixed IPv4 and IPv6 addresses. While at it, a clear error message is now reported if there are some conflicts with the families that have been passed by the userspace. Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13mptcp: explicitly specify sock family at subflow creation timePaolo Abeni
Let the caller specify the to-be-created subflow family. For a given MPTCP socket created with the AF_INET6 family, the current userspace PM can already ask the kernel to create subflows in v4 and v6. If "plain" IPv4 addresses are passed to the kernel, they are automatically mapped in v6 addresses "by accident". This can be problematic because the userspace will need to pass different addresses, now the v4-mapped-v6 addresses to destroy this new subflow. On the other hand, if the MPTCP socket has been created with the AF_INET family, the command to create a subflow in v6 will be accepted but the result will not be the one as expected as new subflow will be created in IPv4 using part of the v6 addresses passed to the kernel: not creating the expected subflow then. No functional change intended for the in-kernel PM where an explicit enforcement is currently in place. This arbitrary enforcement will be leveraged by other patches in a future version. Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13plca.c: fix obvious mistake in checking retvalPiergiorgio Beruto
Revert a wrong fix that was done during the review process. The intention was to substitute "if(ret < 0)" with "if(ret)". Unfortunately, the intended fix did not meet the code. Besides, after additional review, it was decided that "if(ret < 0)" was actually the right thing to do. Fixes: 8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS") Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/f2277af8951a51cfee2fb905af8d7a812b7beaf4.1673616357.git.piergiorgio.beruto@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13ipv6: remove max_size check inline with ipv4Jon Maxwell
In ip6_dst_gc() replace: if (entries > gc_thresh) With: if (entries > ops->gc_thresh) Sending Ipv6 packets in a loop via a raw socket triggers an issue where a route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly consumes the Ipv6 max_size threshold which defaults to 4096 resulting in these warnings: [1] 99.187805] dst_alloc: 7728 callbacks suppressed [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. . . [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. When this happens the packet is dropped and sendto() gets a network is unreachable error: remaining pkt 200557 errno 101 remaining pkt 196462 errno 101 . . remaining pkt 126821 errno 101 Implement David Aherns suggestion to remove max_size check seeing that Ipv6 has a GC to manage memory usage. Ipv4 already does not check max_size. Here are some memory comparisons for Ipv4 vs Ipv6 with the patch: Test by running 5 instances of a program that sends UDP packets to a raw socket 5000000 times. Compare Ipv4 and Ipv6 performance with a similar program. Ipv4: Before test: MemFree: 29427108 kB Slab: 237612 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 2881 3990 192 42 2 : tunables 0 0 0 During test: MemFree: 29417608 kB Slab: 247712 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 44394 44394 192 42 2 : tunables 0 0 0 After test: MemFree: 29422308 kB Slab: 238104 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 Ipv6 with patch: Errno 101 errors are not observed anymore with the patch. Before test: MemFree: 29422308 kB Slab: 238104 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 During Test: MemFree: 29431516 kB Slab: 240940 kB ip6_dst_cache 11980 12064 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 After Test: MemFree: 29441816 kB Slab: 238132 kB ip6_dst_cache 1902 2432 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 Tested-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230112012532.311021-1-jmaxwell37@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13net: nfc: Fix use-after-free in local_cleanup()Jisoo Jang
Fix a use-after-free that occurs in kfree_skb() called from local_cleanup(). This could happen when killing nfc daemon (e.g. neard) after detaching an nfc device. When detaching an nfc device, local_cleanup() called from nfc_llcp_unregister_device() frees local->rx_pending and decreases local->ref by kref_put() in nfc_llcp_local_put(). In the terminating process, nfc daemon releases all sockets and it leads to decreasing local->ref. After the last release of local->ref, local_cleanup() called from local_release() frees local->rx_pending again, which leads to the bug. Setting local->rx_pending to NULL in local_cleanup() could prevent use-after-free when local_cleanup() is called twice. Found by a modified version of syzkaller. BUG: KASAN: use-after-free in kfree_skb() Call Trace: dump_stack_lvl (lib/dump_stack.c:106) print_address_description.constprop.0.cold (mm/kasan/report.c:306) kasan_check_range (mm/kasan/generic.c:189) kfree_skb (net/core/skbuff.c:955) local_cleanup (net/nfc/llcp_core.c:159) nfc_llcp_local_put.part.0 (net/nfc/llcp_core.c:172) nfc_llcp_local_put (net/nfc/llcp_core.c:181) llcp_sock_destruct (net/nfc/llcp_sock.c:959) __sk_destruct (net/core/sock.c:2133) sk_destruct (net/core/sock.c:2181) __sk_free (net/core/sock.c:2192) sk_free (net/core/sock.c:2203) llcp_sock_release (net/nfc/llcp_sock.c:646) __sock_release (net/socket.c:650) sock_close (net/socket.c:1365) __fput (fs/file_table.c:306) task_work_run (kernel/task_work.c:179) ptrace_notify (kernel/signal.c:2354) syscall_exit_to_user_mode_prepare (kernel/entry/common.c:278) syscall_exit_to_user_mode (kernel/entry/common.c:296) do_syscall_64 (arch/x86/entry/common.c:86) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:106) Allocated by task 4719: kasan_save_stack (mm/kasan/common.c:45) __kasan_slab_alloc (mm/kasan/common.c:325) slab_post_alloc_hook (mm/slab.h:766) kmem_cache_alloc_node (mm/slub.c:3497) __alloc_skb (net/core/skbuff.c:552) pn533_recv_response (drivers/nfc/pn533/usb.c:65) __usb_hcd_giveback_urb (drivers/usb/core/hcd.c:1671) usb_giveback_urb_bh (drivers/usb/core/hcd.c:1704) tasklet_action_common.isra.0 (kernel/softirq.c:797) __do_softirq (kernel/softirq.c:571) Freed by task 1901: kasan_save_stack (mm/kasan/common.c:45) kasan_set_track (mm/kasan/common.c:52) kasan_save_free_info (mm/kasan/genericdd.c:518) __kasan_slab_free (mm/kasan/common.c:236) kmem_cache_free (mm/slub.c:3809) kfree_skbmem (net/core/skbuff.c:874) kfree_skb (net/core/skbuff.c:931) local_cleanup (net/nfc/llcp_core.c:159) nfc_llcp_unregister_device (net/nfc/llcp_core.c:1617) nfc_unregister_device (net/nfc/core.c:1179) pn53x_unregister_nfc (drivers/nfc/pn533/pn533.c:2846) pn533_usb_disconnect (drivers/nfc/pn533/usb.c:579) usb_unbind_interface (drivers/usb/core/driver.c:458) device_release_driver_internal (drivers/base/dd.c:1279) bus_remove_device (drivers/base/bus.c:529) device_del (drivers/base/core.c:3665) usb_disable_device (drivers/usb/core/message.c:1420) usb_disconnect (drivers/usb/core.c:2261) hub_event (drivers/usb/core/hub.c:5833) process_one_work (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:212 include/trace/events/workqueue.h:108 kernel/workqueue.c:2281) worker_thread (include/linux/list.h:282 kernel/workqueue.c:2423) kthread (kernel/kthread.c:319) ret_from_fork (arch/x86/entry/entry_64.S:301) Fixes: 3536da06db0b ("NFC: llcp: Clean local timers and works when removing a device") Signed-off-by: Jisoo Jang <jisoo.jang@yonsei.ac.kr> Link: https://lore.kernel.org/r/20230111131914.3338838-1-jisoo.jang@yonsei.ac.kr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13caif: don't assume iov_iter typeKeith Busch
The details of the iov_iter types are appropriately abstracted, so there's no need to check for specific type fields. Just let the abstractions handle it. This is preparing for io_uring/net's io_send to utilize the more efficient ITER_UBUF. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20230111184245.3784393-1-kbusch@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13sock: add tracepoint for send recv lengthYunhui Cui
Add 2 tracepoints to monitor the tcp/udp traffic of per process and per cgroup. Regarding monitoring the tcp/udp traffic of each process, there are two existing solutions, the first one is https://www.atoptool.nl/netatop.php. The second is via kprobe/kretprobe. Netatop solution is implemented by registering the hook function at the hook point provided by the netfilter framework. These hook functions may be in the soft interrupt context and cannot directly obtain the pid. Some data structures are added to bind packets and processes. For example, struct taskinfobucket, struct taskinfo ... Every time the process sends and receives packets it needs multiple hashmaps,resulting in low performance and it has the problem fo inaccurate tcp/udp traffic statistics(for example: multiple threads share sockets). We can obtain the information with kretprobe, but as we know, kprobe gets the result by trappig in an exception, which loses performance compared to tracepoint. We compared the performance of tracepoints with the above two methods, and the results are as follows: ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html without trace: Time per request: 39.660 [ms] (mean) Time per request: 0.040 [ms] (mean, across all concurrent requests) netatop: Time per request: 50.717 [ms] (mean) Time per request: 0.051 [ms] (mean, across all concurrent requests) kr: Time per request: 43.168 [ms] (mean) Time per request: 0.043 [ms] (mean, across all concurrent requests) tracepoint: Time per request: 41.004 [ms] (mean) Time per request: 0.041 [ms] (mean, across all concurrent requests It can be seen that tracepoint has better performance. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Xiongchun Duan <duanxiongchun@bytedance.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13ethtool: add tx aggregation parametersDaniele Palmas
Add the following ethtool tx aggregation parameters: ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES Maximum size in bytes of a tx aggregated block of frames. ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES Maximum number of frames that can be aggregated into a block. ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS Time in usecs after the first packet arrival in an aggregated block for the block to be sent. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tagChristian Eggers
For PDelay_Resp messages we will likely have a negative value in the correction field. The switch hardware cannot correctly update such values (produces an off by one error in the UDP checksum), so it must be moved to the time stamp field in the tail tag. Format of the correction field is 48 bit ns + 16 bit fractional ns. After updating the correction field, clone is no longer required hence it is freed. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add packet transmission timestampingChristian Eggers
This patch adds the routines for transmission of ptp packets. When the ptp pdelay_req packet to be transmitted, it uses the deferred xmit worker to schedule the packets. During irq_setup, interrupt for Sync, Pdelay_req and Pdelay_rsp are enabled. So interrupt is triggered for all three packets. But for p2p1step, we require only time stamp of Pdelay_req packet. Hence to avoid posting of the completion from ISR routine for Sync and Pdelay_resp packets, ts_en flag is introduced. This controls which packets need to processed for timestamp. After the packet is transmitted, ISR is triggered. The time at which packet transmitted is recorded to separate register. This value is reconstructed to absolute time and posted to the user application through socket error queue. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add packet reception timestampingChristian Eggers
Rx Timestamping is done through 4 additional bytes in tail tag. Whenever the ptp packet is received, the 4 byte hardware time stamped value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates it as PTP frame. This 4 byte value is extracted from the tail tag and reconstructed to absolute time and assigned to skb hwtstamp. If the packet received in PDelay_Resp, then partial ingress timestamp is subtracted from the correction field. Since user space tools expects to be done in hardware. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add 4 bytes in tail tag when ptp enabledArun Ramadoss
When the PTP is enabled in hardware bit 6 of PTP_MSG_CONF1 register, the transmit frame needs additional 4 bytes before the tail tag. It is needed for all the transmission packets irrespective of PTP packets or not. The 4-byte timestamp field is 0 for frames other than Pdelay_Resp. For the one-step Pdelay_Resp, the switch needs the receive timestamp of the Pdelay_Req message so that it can put the turnaround time in the correction field. Since PTP has to be enabled for both Transmission and reception timestamping, driver needs to track of the tx and rx setting of the all the user ports in the switch. Two flags hw_tx_en and hw_rx_en are added in ksz_port to track the timestampping setting of each port. When any one of ports has tx or rx timestampping enabled, bit 6 of PTP_MSG_CONF1 is set and it is indicated to tag_ksz.c through tagger bytes. This flag adds 4 additional bytes to the tail tag. When tx and rx timestamping of all the ports are disabled, then 4 bytes are not added. Tested using hwstamp -i <interface> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> # mostly api Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-12ethtool: add netlink attr in rss get reply only if value is not nullSudheer Mogilappagari
Current code for RSS_GET ethtool command includes netlink attributes in reply message to user space even if they are null. Added checks to include netlink attribute in reply message only if a value is received from driver. Drivers might return null for RSS indirection table or hash key. Instead of including attributes with empty value in the reply message, add netlink attribute only if there is content. Fixes: 7112a04664bf ("ethtool: add netlink based get rss support") Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/r/20230111235607.85509-1-sudheer.mogilappagari@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-12rxrpc: Fix wrong error return in rxrpc_connect_call()David Howells
Fix rxrpc_connect_call() to return -ENOMEM rather than 0 if it fails to look up a peer. This generated a smatch warning: net/rxrpc/call_object.c:303 rxrpc_connect_call() warn: missing error code 'ret' I think this also fixes a syzbot-found bug: rxrpc: Assertion failed - 1(0x1) == 11(0xb) is false ------------[ cut here ]------------ kernel BUG at net/rxrpc/call_object.c:645! where the call being put is in the wrong state - as would be the case if we failed to clear up correctly after the error in rxrpc_connect_call(). Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Reported-and-tested-by: syzbot+4bb6356bb29d6299360e@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/202301111153.9eZRYLf1-lkp@intel.com/ Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/2438405.1673460435@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-12Merge tag 'wireless-2023-01-12' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Some fixes, stack only for now: * iTXQ conversion fixes, various bugs reported * properly reset multiple BSSID settings * fix for a link_sta crash * fix for AP VLAN checks * fix for MLO address translation * tag 'wireless-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: fix MLO + AP_VLAN check mac80211: Fix MLO address translation for multiple bss case wifi: mac80211: reset multiple BSSID options in stop_ap() wifi: mac80211: Fix iTXQ AMPDU fragmentation handling wifi: mac80211: sdata can be NULL during AMPDU start wifi: mac80211: Proper mark iTXQs for resumption wifi: mac80211: fix initialization of rx->link and rx->link_sta ==================== Link: https://lore.kernel.org/r/20230112111941.82408-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/usb/r8152.c be53771c87f4 ("r8152: add vendor/device ID pair for Microsoft Devkit") ec51fbd1b8a2 ("r8152: add USB device driver for config selection") https://lore.kernel.org/all/20230113113339.658c4723@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-12Merge tag 'net-6.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from rxrpc. The rxrpc changes are noticeable large: to address a recent regression has been necessary completing the threaded refactor. Current release - regressions: - rxrpc: - only disconnect calls in the I/O thread - move client call connection to the I/O thread - fix incoming call setup race - eth: mlx5: - restore pkt rate policing support - fix memory leak on updating vport counters Previous releases - regressions: - gro: take care of DODGY packets - ipv6: deduct extension header length in rawv6_push_pending_frames - tipc: fix unexpected link reset due to discovery messages Previous releases - always broken: - sched: disallow noqueue for qdisc classes - eth: ice: fix potential memory leak in ice_gnss_tty_write() - eth: ixgbe: fix pci device refcount leak - eth: mlx5: - fix command stats access after free - fix macsec possible null dereference when updating MAC security entity (SecY)" * tag 'net-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) r8152: add vendor/device ID pair for Microsoft Devkit net: stmmac: add aux timestamps fifo clearance wait bnxt: make sure we return pages to the pool net: hns3: fix wrong use of rss size during VF rss config ipv6: raw: Deduct extension header length in rawv6_push_pending_frames net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit() net: sched: disallow noqueue for qdisc classes iavf/iavf_main: actually log ->src mask when talking about it igc: Fix PPS delta between two synchronized end-points ixgbe: fix pci device refcount leak octeontx2-pf: Fix resource leakage in VF driver unbind selftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure. selftests/net: l2_tos_ttl_inherit.sh: Run tests in their own netns. selftests/net: l2_tos_ttl_inherit.sh: Set IPv6 addresses with "nodad". net/mlx5e: Fix macsec possible null dereference when updating MAC security entity (SecY) net/mlx5e: Fix macsec ssci attribute handling in offload path net/mlx5: E-switch, Coverity: overlapping copy net/mlx5e: Don't support encap rules with gbp option net/mlx5: Fix ptp max frequency adjustment range net/mlx5e: Fix memory leak on updating vport counters ...
2023-01-12Merge tag 'for-linus-6.2-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - two cleanup patches - a fix of a memory leak in the Xen pvfront driver - a fix of a locking issue in the Xen hypervisor console driver * tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvcalls: free active map buffer on pvcalls_front_free_map hvc/xen: lock console list traversal x86/xen: Remove the unused function p2m_index() xen: make remove callback of xen driver void returned
2023-01-12xfrm: compat: change expression for switch in xfrm_xlate64Anastasia Belova
Compare XFRM_MSG_NEWSPDINFO (value from netlink configuration messages enum) with nlh_src->nlmsg_type instead of nlh_src->nlmsg_type - XFRM_MSG_BASE. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 4e9505064f58 ("net/xfrm/compat: Copy xfrm_spdattr_type_t atributes") Signed-off-by: Anastasia Belova <abelova@astralinux.ru> Acked-by: Dmitry Safonov <0x7f454c46@gmail.com> Tested-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-01-12vsock: return errors other than -ENOMEM to socketBobby Eshleman
This removes behaviour, where error code returned from any transport was always switched to ENOMEM. For example when user tries to send too big message via SEQPACKET socket, transport layers return EMSGSIZE, but this error code was always replaced with ENOMEM and returned to user. Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com> Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-12net: rfkill: gpio: add DT supportPhilipp Zabel
Allow probing rfkill-gpio via device tree. This hooks up the already existing support that was started in commit 262c91ee5e52 ("net: rfkill: gpio: prepare for DT and ACPI support") via the "rfkill-gpio" compatible, with the "name" and "type" properties renamed to "label" and "radio-type", respectively, in the device tree case. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://lore.kernel.org/r/20230102-rfkill-gpio-dt-v2-2-d1b83758c16d@pengutronix.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-12wifi: mac80211: fix double space in commentNick Hainke
Remove a space in "the frames". Signed-off-by: Nick Hainke <vincent@systemli.org> Link: https://lore.kernel.org/r/20221222092957.870790-1-vincent@systemli.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-12wifi: mac80211: Drop stations iterator where the iterator function may sleepMartin Blumenstingl
This reverts commit acb99b9b2a08f ("mac80211: Add stations iterator where the iterator function may sleep"). A different approach was found for the rtw88 driver where most of the problematic locks were converted to a driver-local mutex. Drop ieee80211_iterate_stations() because there are no users of that function. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/r/20221226191609.2934234-1-martin.blumenstingl@googlemail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-11devlink: keep the instance mutex alive until references are goneJakub Kicinski
The reference needs to keep the instance memory around, but also the instance lock must remain valid. Users will take the lock, check registration status and release the lock. mutex_destroy() etc. belong in the same place as the freeing of the memory. Unfortunately lockdep_unregister_key() sleeps so we need to switch the an rcu_work. Note that the problem is a bit hard to repro, because devlink_pernet_pre_exit() iterates over registered instances. AFAIU the instances must get devlink_free()d concurrently with the namespace getting deleted for the problem to occur. Reported-by: syzbot+d94d214ea473e218fc89@syzkaller.appspotmail.com Reported-by: syzbot+9f0dd863b87113935acf@syzkaller.appspotmail.com Fixes: 9053637e0da7 ("devlink: remove the registration guarantee of references") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230111042908.988199-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-11netfilter: nft_payload: incorrect arithmetics when fetching VLAN header bitsPablo Neira Ayuso
If the offset + length goes over the ethernet + vlan header, then the length is adjusted to copy the bytes that are within the boundaries of the vlan_ethhdr scratchpad area. The remaining bytes beyond ethernet + vlan header are copied directly from the skbuff data area. Fix incorrect arithmetic operator: subtract, not add, the size of the vlan header in case of double-tagged packets to adjust the length accordingly to address CVE-2023-0179. Reported-by: Davide Ornaghi <d.ornaghi97@gmail.com> Fixes: f6ae9f120dad ("netfilter: nft_payload: add C-VLAN support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-11netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function.Gavrilov Ilia
When first_ip is 0, last_ip is 0xFFFFFFFF, and netmask is 31, the value of an arithmetic expression 2 << (netmask - mask_bits - 1) is subject to overflow due to a failure casting operands to a larger data type before performing the arithmetic. Note that it's harmless since the value will be checked at the next step. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b9fed748185a ("netfilter: ipset: Check and reject crazy /0 input parameters") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-11ipv6: raw: Deduct extension header length in rawv6_push_pending_framesHerbert Xu
The total cork length created by ip6_append_data includes extension headers, so we must exclude them when comparing them against the IPV6_CHECKSUM offset which does not include extension headers. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Fixes: 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can corrupt kernel memory") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11drivers/net/phy: add the link modes for the 10BASE-T1S Ethernet PHYPiergiorgio Beruto
This patch adds the link modes for the IEEE 802.3cg Clause 147 10BASE-T1S Ethernet PHY. According to the specifications, the 10BASE-T1S supports Point-To-Point Full-Duplex, Point-To-Point Half-Duplex and/or Point-To-Multipoint (AKA Multi-Drop) Half-Duplex operations. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-11net/ethtool: add netlink interface for the PLCA RSPiergiorgio Beruto
Add support for configuring the PLCA Reconciliation Sublayer on multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g., 10BASE-T1S). This patch adds the appropriate netlink interface to ethtool. Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-10net: sched: disallow noqueue for qdisc classesFrederick Lawler
While experimenting with applying noqueue to a classful queue discipline, we discovered a NULL pointer dereference in the __dev_queue_xmit() path that generates a kernel OOPS: # dev=enp0s5 # tc qdisc replace dev $dev root handle 1: htb default 1 # tc class add dev $dev parent 1: classid 1:1 htb rate 10mbit # tc qdisc add dev $dev parent 1:1 handle 10: noqueue # ping -I $dev -w 1 -c 1 1.1.1.1 [ 2.172856] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 2.173217] #PF: supervisor instruction fetch in kernel mode ... [ 2.178451] Call Trace: [ 2.178577] <TASK> [ 2.178686] htb_enqueue+0x1c8/0x370 [ 2.178880] dev_qdisc_enqueue+0x15/0x90 [ 2.179093] __dev_queue_xmit+0x798/0xd00 [ 2.179305] ? _raw_write_lock_bh+0xe/0x30 [ 2.179522] ? __local_bh_enable_ip+0x32/0x70 [ 2.179759] ? ___neigh_create+0x610/0x840 [ 2.179968] ? eth_header+0x21/0xc0 [ 2.180144] ip_finish_output2+0x15e/0x4f0 [ 2.180348] ? dst_output+0x30/0x30 [ 2.180525] ip_push_pending_frames+0x9d/0xb0 [ 2.180739] raw_sendmsg+0x601/0xcb0 [ 2.180916] ? _raw_spin_trylock+0xe/0x50 [ 2.181112] ? _raw_spin_unlock_irqrestore+0x16/0x30 [ 2.181354] ? get_page_from_freelist+0xcd6/0xdf0 [ 2.181594] ? sock_sendmsg+0x56/0x60 [ 2.181781] sock_sendmsg+0x56/0x60 [ 2.181958] __sys_sendto+0xf7/0x160 [ 2.182139] ? handle_mm_fault+0x6e/0x1d0 [ 2.182366] ? do_user_addr_fault+0x1e1/0x660 [ 2.182627] __x64_sys_sendto+0x1b/0x30 [ 2.182881] do_syscall_64+0x38/0x90 [ 2.183085] entry_SYSCALL_64_after_hwframe+0x63/0xcd ... [ 2.187402] </TASK> Previously in commit d66d6c3152e8 ("net: sched: register noqueue qdisc"), NULL was set for the noqueue discipline on noqueue init so that __dev_queue_xmit() falls through for the noqueue case. This also sets a bypass of the enqueue NULL check in the register_qdisc() function for the struct noqueue_disc_ops. Classful queue disciplines make it past the NULL check in __dev_queue_xmit() because the discipline is set to htb (in this case), and then in the call to __dev_xmit_skb(), it calls into htb_enqueue() which grabs a leaf node for a class and then calls qdisc_enqueue() by passing in a queue discipline which assumes ->enqueue() is not set to NULL. Fix this by not allowing classes to be assigned to the noqueue discipline. Linux TC Notes states that classes cannot be set to the noqueue discipline. [1] Let's enforce that here. Links: 1. https://linux-tc-notes.sourceforge.net/tc/doc/sch_noqueue.txt Fixes: d66d6c3152e8 ("net: sched: register noqueue qdisc") Cc: stable@vger.kernel.org Signed-off-by: Frederick Lawler <fred@cloudflare.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20230109163906.706000-1-fred@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-10bpf: Remove the unnecessary insn buffer comparisonHaiyue Wang
The variable 'insn' is initialized to 'insn_buf' without being changed, only some helper macros are defined, so the insn buffer comparison is unnecessary. Just remove it. This missed removal back in 2377b81de527 ("bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation"). Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230108151258.96570-1-haiyue.wang@intel.com
2023-01-10Merge tag 'nfsd-6.2-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a race when creating NFSv4 files - Revert the use of relaxed bitops * tag 'nfsd-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Use set_bit(RQ_DROPME) Revert "SUNRPC: Use RMW bitops in single-threaded hot paths" nfsd: fix handling of cached open files in nfsd4_open codepath
2023-01-10wifi: mac80211: fix MLO + AP_VLAN checkFelix Fietkau
Instead of preventing adding AP_VLAN to MLO enabled APs, this check was preventing adding more than one 4-addr AP_VLAN regardless of the MLO status. Fix this by adding missing extra checks. Fixes: ae960ee90bb1 ("wifi: mac80211: prevent VLANs on MLDs") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20221214130326.37756-1-nbd@nbd.name Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-10mac80211: Fix MLO address translation for multiple bss caseSriram R
When multiple interfaces are present in the local interface list, new skb copy is taken before rx processing except for the first interface. The address translation happens each time only on the original skb since the hdr pointer is not updated properly to the newly created skb. As a result frames start to drop in userspace when address based checks or search fails. Signed-off-by: Sriram R <quic_srirrama@quicinc.com> Link: https://lore.kernel.org/r/20221208040050.25922-1-quic_srirrama@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-10wifi: mac80211: reset multiple BSSID options in stop_ap()Aloka Dixit
Reset multiple BSSID options when all AP related configurations are reset in ieee80211_stop_ap(). Stale values result in HWSIM test failures (e.g. p2p_group_cli_invalid), if run after 'he_ap_ema'. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Link: https://lore.kernel.org/r/20221221185616.11514-1-quic_alokad@quicinc.com Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-10wifi: mac80211: Fix iTXQ AMPDU fragmentation handlingAlexander Wetzel
mac80211 must not enable aggregation wile transmitting a fragmented MPDU. Enforce that for mac80211 internal TX queues (iTXQs). Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202301021738.7cd3e6ae-oliver.sang@intel.com Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20230106223141.98696-1-alexander@wetzel-home.de Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-10wifi: mac80211: sdata can be NULL during AMPDU startAlexander Wetzel
ieee80211_tx_ba_session_handle_start() may get NULL for sdata when a deauthentication is ongoing. Here a trace triggering the race with the hostapd test multi_ap_fronthaul_on_ap: (gdb) list *drv_ampdu_action+0x46 0x8b16 is in drv_ampdu_action (net/mac80211/driver-ops.c:396). 391 int ret = -EOPNOTSUPP; 392 393 might_sleep(); 394 395 sdata = get_bss_sdata(sdata); 396 if (!check_sdata_in_driver(sdata)) 397 return -EIO; 398 399 trace_drv_ampdu_action(local, sdata, params); 400 wlan0: moving STA 02:00:00:00:03:00 to state 3 wlan0: associated wlan0: deauthenticating from 02:00:00:00:03:00 by local choice (Reason: 3=DEAUTH_LEAVING) wlan3.sta1: Open BA session requested for 02:00:00:00:00:00 tid 0 wlan3.sta1: dropped frame to 02:00:00:00:00:00 (unauthorized port) wlan0: moving STA 02:00:00:00:03:00 to state 2 wlan0: moving STA 02:00:00:00:03:00 to state 1 wlan0: Removed STA 02:00:00:00:03:00 wlan0: Destroyed STA 02:00:00:00:03:00 BUG: unable to handle page fault for address: fffffffffffffb48 PGD 11814067 P4D 11814067 PUD 11816067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 2 PID: 133397 Comm: kworker/u16:1 Tainted: G W 6.1.0-rc8-wt+ #59 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 Workqueue: phy3 ieee80211_ba_session_work [mac80211] RIP: 0010:drv_ampdu_action+0x46/0x280 [mac80211] Code: 53 48 89 f3 be 89 01 00 00 e8 d6 43 bf ef e8 21 46 81 f0 83 bb a0 1b 00 00 04 75 0e 48 8b 9b 28 0d 00 00 48 81 eb 10 0e 00 00 <8b> 93 58 09 00 00 f6 c2 20 0f 84 3b 01 00 00 8b 05 dd 1c 0f 00 85 RSP: 0018:ffffc900025ebd20 EFLAGS: 00010287 RAX: 0000000000000000 RBX: fffffffffffff1f0 RCX: ffff888102228240 RDX: 0000000080000000 RSI: ffffffff918c5de0 RDI: ffff888102228b40 RBP: ffffc900025ebd40 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: ffff888118c18ec0 R13: 0000000000000000 R14: ffffc900025ebd60 R15: ffff888018b7efb8 FS: 0000000000000000(0000) GS:ffff88817a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffb48 CR3: 0000000105228006 CR4: 0000000000170ee0 Call Trace: <TASK> ieee80211_tx_ba_session_handle_start+0xd0/0x190 [mac80211] ieee80211_ba_session_work+0xff/0x2e0 [mac80211] process_one_work+0x29f/0x620 worker_thread+0x4d/0x3d0 ? process_one_work+0x620/0x620 kthread+0xfb/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20221230121850.218810-2-alexander@wetzel-home.de Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-10wifi: mac80211: Proper mark iTXQs for resumptionAlexander Wetzel
When a running wake_tx_queue() call is aborted due to a hw queue stop the corresponding iTXQ is not always correctly marked for resumption: wake_tx_push_queue() can stops the queue run without setting @IEEE80211_TXQ_STOP_NETIF_TX. Without the @IEEE80211_TXQ_STOP_NETIF_TX flag __ieee80211_wake_txqs() will not schedule a new queue run and remaining frames in the queue get stuck till another frame is queued to it. Fix the issue for all drivers - also the ones with custom wake_tx_queue callbacks - by moving the logic into ieee80211_tx_dequeue() and drop the redundant @txqs_stopped. @IEEE80211_TXQ_STOP_NETIF_TX is also renamed to @IEEE80211_TXQ_DIRTY to better describe the flag. Fixes: c850e31f79f0 ("wifi: mac80211: add internal handler for wake_tx_queue") Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20221230121850.218810-1-alexander@wetzel-home.de Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-10wifi: mac80211: fix initialization of rx->link and rx->link_staFelix Fietkau
There are some codepaths that do not initialize rx->link_sta properly. This causes a crash in places which assume that rx->link_sta is valid if rx->sta is valid. One known instance is triggered by __ieee80211_rx_h_amsdu being called from fast-rx. It results in a crash like this one: BUG: kernel NULL pointer dereference, address: 00000000000000a8 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 1 PID: 506 Comm: mt76-usb-rx phy Tainted: G E 6.1.0-debian64x+1.7 #3 Hardware name: ZOTAC ZBOX-ID92/ZBOX-IQ01/ZBOX-ID92/ZBOX-IQ01, BIOS B220P007 05/21/2014 RIP: 0010:ieee80211_deliver_skb+0x62/0x1f0 [mac80211] Code: 00 48 89 04 24 e8 9e a7 c3 df 89 c0 48 03 1c c5 a0 ea 39 a1 4c 01 6b 08 48 ff 03 48 83 7d 28 00 74 11 48 8b 45 30 48 63 55 44 <48> 83 84 d0 a8 00 00 00 01 41 8b 86 c0 11 00 00 8d 50 fd 83 fa 01 RSP: 0018:ffff999040803b10 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffffb9903f496480 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff999040803ce0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d21828ac900 R13: 000000000000004a R14: ffff8d2198ed89c0 R15: ffff8d2198ed8000 FS: 0000000000000000(0000) GS:ffff8d24afe80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 0000000429810002 CR4: 00000000001706e0 Call Trace: <TASK> __ieee80211_rx_h_amsdu+0x1b5/0x240 [mac80211] ? ieee80211_prepare_and_rx_handle+0xcdd/0x1320 [mac80211] ? __local_bh_enable_ip+0x3b/0xa0 ieee80211_prepare_and_rx_handle+0xcdd/0x1320 [mac80211] ? prepare_transfer+0x109/0x1a0 [xhci_hcd] ieee80211_rx_list+0xa80/0xda0 [mac80211] mt76_rx_complete+0x207/0x2e0 [mt76] mt76_rx_poll_complete+0x357/0x5a0 [mt76] mt76u_rx_worker+0x4f5/0x600 [mt76_usb] ? mt76_get_min_avg_rssi+0x140/0x140 [mt76] __mt76_worker_fn+0x50/0x80 [mt76] kthread+0xed/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Since the initialization of rx->link and rx->link_sta is rather convoluted and duplicated in many places, clean it up by using a helper function to set it. Fixes: ccdde7c74ffd ("wifi: mac80211: properly implement MLO key handling") Fixes: b320d6c456ff ("wifi: mac80211: use correct rx link_sta instead of default") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20221230200747.19040-1-nbd@nbd.name [remove unnecessary rx->sta->sta.mlo check] Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-09net/sched: act_mpls: Fix warning during failed attribute validationIdo Schimmel
The 'TCA_MPLS_LABEL' attribute is of 'NLA_U32' type, but has a validation type of 'NLA_VALIDATE_FUNCTION'. This is an invalid combination according to the comment above 'struct nla_policy': " Meaning of `validate' field, use via NLA_POLICY_VALIDATE_FN: NLA_BINARY Validation function called for the attribute. All other Unused - but note that it's a union " This can trigger the warning [1] in nla_get_range_unsigned() when validation of the attribute fails. Despite being of 'NLA_U32' type, the associated 'min'/'max' fields in the policy are negative as they are aliased by the 'validate' field. Fix by changing the attribute type to 'NLA_BINARY' which is consistent with the above comment and all other users of NLA_POLICY_VALIDATE_FN(). As a result, move the length validation to the validation function. No regressions in MPLS tests: # ./tdc.py -f tc-tests/actions/mpls.json [...] # echo $? 0 [1] WARNING: CPU: 0 PID: 17743 at lib/nlattr.c:118 nla_get_range_unsigned+0x1d8/0x1e0 lib/nlattr.c:117 Modules linked in: CPU: 0 PID: 17743 Comm: syz-executor.0 Not tainted 6.1.0-rc8 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:nla_get_range_unsigned+0x1d8/0x1e0 lib/nlattr.c:117 [...] Call Trace: <TASK> __netlink_policy_dump_write_attr+0x23d/0x990 net/netlink/policy.c:310 netlink_policy_dump_write_attr+0x22/0x30 net/netlink/policy.c:411 netlink_ack_tlv_fill net/netlink/af_netlink.c:2454 [inline] netlink_ack+0x546/0x760 net/netlink/af_netlink.c:2506 netlink_rcv_skb+0x1b7/0x240 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6109 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x5e9/0x6b0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x739/0x860 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x38f/0x500 net/socket.c:2482 ___sys_sendmsg net/socket.c:2536 [inline] __sys_sendmsg+0x197/0x230 net/socket.c:2565 __do_sys_sendmsg net/socket.c:2574 [inline] __se_sys_sendmsg net/socket.c:2572 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2572 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lore.kernel.org/netdev/CAO4mrfdmjvRUNbDyP0R03_DrD_eFCLCguz6OxZ2TYRSv0K9gxA@mail.gmail.com/ Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Reported-by: Wei Chen <harperchen1110@gmail.com> Tested-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230107171004.608436-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-09net: skb: remove old comments about frag_size for build_skb()Jakub Kicinski
Since commit ce098da1497c ("skbuff: Introduce slab_build_skb()") drivers trying to build skb around slab-backed buffers should go via slab_build_skb() rather than passing frag_size = 0 to the main build_skb(). Remove the copy'n'pasted comments about 0 meaning slab. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09gro: take care of DODGY packetsEric Dumazet
Jaroslav reported a recent throughput regression with virtio_net caused by blamed commit. It is unclear if DODGY GSO packets coming from user space can be accepted by GRO engine in the future with minimal changes, and if there is any expected gain from it. In the meantime, make sure to detect and flush DODGY packets. Fixes: 5eddb24901ee ("gro: add support of (hw)gro packets to gro stack") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-and-bisected-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09mptcp: add statistics for mptcp socket in useMenglong Dong
Do the statistics of mptcp socket in use with sock_prot_inuse_add(). Therefore, we can get the count of used mptcp socket from /proc/net/protocols: & cat /proc/net/protocols protocol size sockets memory press maxhdr slab module cl co di ac io in de sh ss gs se re sp bi br ha uh gp em MPTCPv6 2048 0 0 no 0 yes kernel y n y y y y y y y y y y n n n y y y n MPTCP 1896 1 0 no 0 yes kernel y n y y y y y y y y y y n n n y y y n Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09mptcp: rename 'sk' to 'ssk' in mptcp_token_new_connect()Menglong Dong
'ssk' should be more appropriate to be the name of the first argument in mptcp_token_new_connect(). Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09mptcp: init sk->sk_prot in build_msk()Menglong Dong
The 'sk_prot' field in token KUNIT self-tests will be dereferenced in mptcp_token_new_connect(). Therefore, init it with tcp_prot. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09mptcp: introduce 'sk' to replace 'sock->sk' in mptcp_listen()Menglong Dong
'sock->sk' is used frequently in mptcp_listen(). Therefore, we can introduce the 'sk' and replace 'sock->sk' with it. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09mptcp: use local variable ssk in write_optionsGeliang Tang
The local variable 'ssk' has been defined at the beginning of the function mptcp_write_options(), use it instead of getting 'ssk' again. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09mptcp: use net instead of sock_netGeliang Tang
Use the local variable 'net' instead of sock_net() in the functions where the variable 'struct net *net' has been defined. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>