summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-06-15bpf: Fix request_sock leak in sk lookup helpersJon Maxwell
A customer reported a request_socket leak in a Calico cloud environment. We found that a BPF program was doing a socket lookup with takes a refcnt on the socket and that it was finding the request_socket but returning the parent LISTEN socket via sk_to_full_sk() without decrementing the child request socket 1st, resulting in request_sock slab object leak. This patch retains the existing behaviour of returning full socks to the caller but it also decrements the child request_socket if one is present before doing so to prevent the leak. Thanks to Curtis Taylor for all the help in diagnosing and testing this. And thanks to Antoine Tenart for the reproducer and patch input. v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to validate RCU flags on the listen socket so that it balances with bpf_sk_release() and update comments as per Martin KaFai Lau's suggestion. One small change to Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra instruction. Fixes: f7355a6c0497 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()") Fixes: edbf8c01de5a ("bpf: add skc_lookup_tcp helper") Co-developed-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Curtis Taylor <cutaylor-pub@yahoo.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com
2022-06-15net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsgDuoming Zhou
The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock and block until it receives a packet from the remote. If the client doesn`t connect to server and calls read() directly, it will not receive any packets forever. As a result, the deadlock will happen. The fail log caused by deadlock is shown below: [ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds. [ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.613058] Call Trace: [ 369.613315] <TASK> [ 369.614072] __schedule+0x2f9/0xb20 [ 369.615029] schedule+0x49/0xb0 [ 369.615734] __lock_sock+0x92/0x100 [ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20 [ 369.617941] lock_sock_nested+0x6e/0x70 [ 369.618809] ax25_bind+0xaa/0x210 [ 369.619736] __sys_bind+0xca/0xf0 [ 369.620039] ? do_futex+0xae/0x1b0 [ 369.620387] ? __x64_sys_futex+0x7c/0x1c0 [ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40 [ 369.620613] __x64_sys_bind+0x11/0x20 [ 369.621791] do_syscall_64+0x3b/0x90 [ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 369.623319] RIP: 0033:0x7f43c8aa8af7 [ 369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 [ 369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7 [ 369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005 [ 369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700 [ 369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe [ 369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700 This patch replaces skb_recv_datagram() with an open-coded variant of it releasing the socket lock before the __skb_wait_for_more_packets() call and re-acquiring it after such call in order that other functions that need socket lock could be executed. what's more, the socket lock will be released only when recvmsg() will block and that should produce nicer overall behavior. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Thomas Osterried <thomas@osterried.de> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reported-by: Thomas Habets <thomas@@habets.se> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-15net: don't check skb_count twiceSieng Piaw Liew
NAPI cache skb_count is being checked twice without condition. Change to checking the second time only if the first check is run. Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-15net: bridge: allow add/remove permanent mdb entries on disabled portsCasper Andersson
Adding mdb entries on disabled ports allows you to do setup before accepting any traffic, avoiding any time where the port is not in the multicast group. Signed-off-by: Casper Andersson <casper.casan@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-14xsk: Fix generic transmit when completion queue reservation failsCiara Loftus
Two points of potential failure in the generic transmit function are: 1. completion queue (cq) reservation failure. 2. skb allocation failure Originally the cq reservation was performed first, followed by the skb allocation. Commit 675716400da6 ("xdp: fix possible cq entry leak") reversed the order because at the time there was no mechanism available to undo the cq reservation which could have led to possible cq entry leaks in the event of skb allocation failure. However if the skb allocation is performed first and the cq reservation then fails, the xsk skb destructor is called which blindly adds the skb address to the already full cq leading to undefined behavior. This commit restores the original order (cq reservation followed by skb allocation) and uses the xskq_prod_cancel helper to undo the cq reserve in event of skb allocation failure. Fixes: 675716400da6 ("xdp: fix possible cq entry leak") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220614070746.8871-1-ciara.loftus@intel.com
2022-06-14mac802154: fix atomic_dec_and_test checksAlexander Aring
We need to call wake_up() when hold_txs reaches zero. The semantic of atomic_dec_and_test() is that it returns true when it's zero. Fixes: f0feb3490473 ("net: mac802154: Introduce a tx queue flushing mechanism") Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220613043735.1039895-3-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-14mac802154: util: fix release queue handlingAlexander Aring
The semantic of atomic_dec_and_test() is to return true if zero is reached and we need call ieee802154_wake_queue() when zero is reached. Fixes: 20a19d1df3e4 ("net: mac802154: Bring the ability to hold the transmit queue") Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220613043735.1039895-2-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-13ethtool: Fix and simplify ethtool_convert_link_mode_to_legacy_u32()Marco Bonelli
Fix the implementation of ethtool_convert_link_mode_to_legacy_u32(), which is supposed to return false if src has bits higher than 31 set. The current implementation uses the complement of bitmap_fill(ext, 32) to test high bits of src, which is wrong as bitmap_fill() fills _with long granularity_, and sizeof(long) can be > 4. No users of this function currently check the return value, so the bug was dormant. Also remove the check for __ETHTOOL_LINK_MODE_MASK_NBITS > 32, as the enum ethtool_link_mode_bit_indices contains far beyond 32 values. Using find_next_bit() to test the src bitmask works regardless of this anyway. Signed-off-by: Marco Bonelli <marco@mebeim.net> Link: https://lore.kernel.org/r/20220609134900.11201-1-marco@mebeim.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-13net: make __sys_accept4_file() staticYajun Deng
__sys_accept4_file() isn't used outside of the file, make it static. As the same time, move file_flags and nofile parameters into __sys_accept4_file(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13tcp: sk_forced_mem_schedule() optimizationEric Dumazet
sk_memory_allocated_add() has three callers, and returns to them @memory_allocated. sk_forced_mem_schedule() is one of them, and ignores the returned value. Change sk_memory_allocated_add() to return void. Change sock_reserve_memory() and __sk_mem_raise_allocated() to call sk_memory_allocated(). This removes one cache line miss [1] for RPC workloads, as first skbs in TCP write queue and receive queue go through sk_forced_mem_schedule(). [1] Cache line holding tcp_memory_allocated. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-11net: Kconfig: move the CAN device menu to the "Device Drivers" sectionVincent Mailhol
The devices are meant to be under the "Device Drivers" category of the menuconfig. The CAN subsystem is currently one of the rare exception with all of its devices under the "Networking support" category. The CAN_DEV menuentry gets moved to fix this discrepancy. The CAN menu is added just before MCTP in order to be consistent with the current order under the "Networking support" menu. A dependency on CAN is added to CAN_DEV so that the CAN devices only show up if the CAN subsystem is enabled. Link: https://lore.kernel.org/all/20220610143009.323579-6-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Acked-by: Max Staudt <max@enpas.org> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-06-10Merge tag 'nfsd-5.19-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Notable changes: - There is now a backup maintainer for NFSD Notable fixes: - Prevent array overruns in svc_rdma_build_writes() - Prevent buffer overruns when encoding NFSv3 READDIR results - Fix a potential UAF in nfsd_file_put()" * tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer() SUNRPC: Clean up xdr_get_next_encode_buffer() SUNRPC: Clean up xdr_commit_encode() SUNRPC: Optimize xdr_reserve_space() SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer() SUNRPC: Trap RDMA segment overflows NFSD: Fix potential use-after-free in nfsd_file_put() MAINTAINERS: reciprocal co-maintainership for file locking and nfsd
2022-06-10net: unexport __sk_mem_{raise|reduce}_allocatedEric Dumazet
These two helpers are only used from core networking. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: keep sk->sk_forward_alloc as small as possibleEric Dumazet
Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast. Each TCP socket can forward allocate up to 2 MB of memory, even after flow became less active. 10,000 sockets can have reserved 20 GB of memory, and we have no shrinker in place to reclaim that. Instead of trying to reclaim the extra allocations in some places, just keep sk->sk_forward_alloc values as small as possible. This should not impact performance too much now we have per-cpu reserves: Changes to tcp_memory_allocated should not be too frequent. For sockets not using SO_RESERVE_MEM: - idle sockets (no packets in tx/rx queues) have zero forward alloc. - non idle sockets have a forward alloc smaller than one page. Note: - Removal of SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: add per_cpu_fw_alloc field to struct protoEric Dumazet
Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use. Instead of having reserved bytes per socket, we want to have per-cpu reserves. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: remove SK_MEM_QUANTUM and SK_MEM_QUANTUM_SHIFTEric Dumazet
Due to memcg interface, SK_MEM_QUANTUM is effectively PAGE_SIZE. This might change in the future, but it seems better to avoid the confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10Merge tag 'wireless-next-2022-06-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== wireless-next patches for v5.20 Here's a first set of patches for v5.20. This is just a queue flush, before we get things back from net-next that are causing conflicts, and then can start merging a lot of MLO (multi-link operation, part of 802.11be) code. Lots of cleanups all over. The only notable change is perhaps wilc1000 being the first driver to disable WEP (while enabling WPA3). * tag 'wireless-next-2022-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (29 commits) wifi: mac80211_hwsim: Directly use ida_alloc()/free() wifi: mac80211: refactor some key code wifi: mac80211: remove cipher scheme support wifi: nl80211: fix typo in comment wifi: virt_wifi: fix typo in comment rtw89: add new state to CFO state machine for UL-OFDMA rtw89: 8852c: add trigger frame counter ieee80211: add trigger frame definition wifi: wfx: Remove redundant NULL check before release_firmware() call wifi: rtw89: support MULTI_BSSID and correct BSSID mask of H2C wifi: ray_cs: Drop useless status variable in parse_addr() wifi: ray_cs: Utilize strnlen() in parse_addr() wifi: rtw88: use %*ph to print small buffer wifi: wilc1000: add IGTK support wifi: wilc1000: add WPA3 SAE support wifi: wilc1000: remove WEP security support wifi: wilc1000: use correct sequence of RESET for chip Power-UP/Down wifi: rtlwifi: fix error codes in rtl_debugfs_set_write_h2c() wifi: rtw88: Fix Sparse warning for rtw8821c_hw_spec wifi: rtw88: Fix Sparse warning for rtw8723d_hw_spec ... ==================== Link: https://lore.kernel.org/r/20220610142838.330862-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10wifi: mac80211: refactor some key codeJohannes Berg
There's some pretty close code here, with the exception of RCU dereference vs. protected dereference. Refactor this to just return a pointer and then do the deref in the caller later. Change-Id: Ide5315e2792da6ac66eaf852293306a3ac71ced9 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-10net: ipconfig: Relax fw_devlink if we need to mount a network rootfsSaravana Kannan
If there are network devices that could probe without some of their suppliers probing and those network devices are needed to mount a network rootfs, then fw_devlink=on might break that usecase by blocking the network devices from probing by the time IP auto config starts. So, if no network devices are available when IP auto config is enabled and we have a network rootfs, make sure fw_devlink doesn't block the probing of any device that has a driver and then retry finding a network device. Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20220601070707.3946847-6-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-10wifi: mac80211: remove cipher scheme supportJohannes Berg
The only driver using this was iwlwifi, where we just removed the support because it was never really used. Remove the code from mac80211 as well. Change-Id: I1667417a5932315ee9d81f5c233c56a354923f09 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-10treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_149.RULEThomas Gleixner
Based on the normalized pattern: netapp provides this source code under the gpl v2 license the gpl v2 license is available at https://opensource org/licenses/gpl-license php this software is provided by the copyright holders and contributors as is and any express or implied warranties including but not limited to the implied warranties of merchantability and fitness for a particular purpose are disclaimed in no event shall the copyright owner or contributors be liable for any direct indirect incidental special exemplary or consequential damages (including but not limited to procurement of substitute goods or services loss of use data or profits or business interruption) however caused and on any theory of liability whether in contract strict liability or tort (including negligence or otherwise) arising in any way out of the use of this software even if advised of the possibility of such damage extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal <allison@lohutok.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-10treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_30.RULE ↵Thomas Gleixner
(part 2) Based on the normalized pattern: this program is free software you can redistribute it and/or modify it under the terms of the gnu general public license as published by the free software foundation version 2 this program is distributed as is without any warranty of any kind whether express or implied without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal <allison@lohutok.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-10xfrm: no need to set DST_NOPOLICY in IPv4Eyal Birger
This is a cleanup patch following commit e6175a2ed1f1 ("xfrm: fix "disable_policy" flag use when arriving from different devices") which made DST_NOPOLICY no longer be used for inbound policy checks. On outbound the flag was set, but never used. As such, avoid setting it altogether and remove the nopolicy argument from rt_dst_alloc(). Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-06-10net: mac802154: Add a warning in the slow pathMiquel Raynal
In order to be able to detect possible conflicts between the net interface core and the ieee802154 core, let's add a warning in the slow path: we want to be sure that whenever we start an asynchronous MLME transmission (which can be fully asynchronous) the net core somehow agrees that this transmission is possible, ie. the device was not stopped. Warning in this case would allow us to track down more easily possible issues with the MLME logic if we ever get reports. Unlike in the hot path, such a situation cannot be handled. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-12-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Add a warning in the hot pathMiquel Raynal
We should never start a transmission after the queue has been stopped. But because it might work we don't kill the function here but rather warn loudly the user that something is wrong. Set a flag when the queue should remain stopped. Reset this flag when the queue actually gets restarded. Just check this value to know if a transmission is legitimate, warn if it is not. Turn the flags variable into an unsigned long to allow the use of atomic helpers on it. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-11-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Introduce a synchronous API for MLME commandsMiquel Raynal
This is the slow path, we need to wait for each command to be processed before continuing so let's introduce an helper which does the transmission and blocks until it gets notified of its asynchronous completion. This helper is going to be used when introducing scan support. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-10-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Introduce a tx queue flushing mechanismMiquel Raynal
Right now we are able to stop a queue but we have no indication if a transmission is ongoing or not. Thanks to recent additions, we can track the number of ongoing transmissions so we know if the last transmission is over. Adding on top of it an internal wait queue also allows to be woken up asynchronously when this happens. If, beforehands, we marked the queue to be held and stopped it, we end up flushing and stopping the tx queue. Thanks to this feature, we will soon be able to introduce a synchronous transmit API. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-9-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Introduce a helper to disable the queueMiquel Raynal
Sometimes calling the stop queue helper is not enough because it does not hold any lock. In order to be safe and avoid racy situations when trying to (soon) sync the Tx queue, for instance before sending an MLME frame, let's now introduce an helper which actually hold the necessary locks when doing so. Suggested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-8-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Create a hot tx pathMiquel Raynal
Let's rename the current Tx path to show that this is the "hot" Tx path. We will soon introduce a slower Tx path for MLME commands. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-7-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Bring the ability to hold the transmit queueMiquel Raynal
Create a hold_txs atomic variable and increment/decrement it when relevant, ie. when we want to hold the queue or release it: currently all the "stopped" situations are suitable, but very soon we will more extensively use this feature for MLME purposes. Upon release, the atomic counter is decremented and checked. If it is back to 0, then the netif queue gets woken up. This makes the whole process fully transparent, provided that all the users of ieee802154_wake/stop_queue() now call ieee802154_hold/release_queue() instead. In no situation individual drivers should call any of these helpers manually in order to avoid messing with the counters. There are other functions more suited for this purpose which have been introduced, such as the _xmit_complete() and _xmit_error() helpers which will handle all that for them. One advantage is that, as no more drivers call the stop/wake helpers directly, we can safely stop exporting them and only declare the hold/release ones in a header only accessible to the core. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-6-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Follow the count of ongoing transmissionsMiquel Raynal
In order to create a synchronous API for MLME command purposes, we need to be able to track the end of the ongoing transmissions. Let's introduce an atomic variable which is incremented when a transmission starts and decremented when relevant so that we know at any moment whether there is an ongoing transmission. The counter gets decremented in the following situations: - The operation is asynchronous and there was a failure during the offloading process. - The operation is synchronous and the synchronous operation failed. - The operation finished, either successfully or not. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-5-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Enhance the error path in the main tx helperMiquel Raynal
Before adding more logic in the error path, let's move the wake queue call there, rename the default label and create an additional one. There is no functional change. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Rename the main tx_work structMiquel Raynal
This entry is dedicated to synchronous transmissions done by drivers without async hook. Make this clearer that this is not a work that any driver can use by at least prefixing it with "sync_". While at it, let's enhance the comment explaining why we choose one or the other. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-3-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Rename the synchronous xmit workerMiquel Raynal
There are currently two driver hooks: one is synchronous, the other is not. We cannot rely on driver implementations to provide a synchronous API (which is related to the bus medium more than a wish to have a synchronized implementation) so we are going to introduce a sync API above any kind of driver transmit function. In order to clarify what this worker is for (synchronous driver implementation), let's rename it so that people don't get bothered by the fact that their driver does not make use of the "xmit worker" which is a too generic name. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09Merge tag 'ieee802154-for-net-next-2022-06-09' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2022-06-09 This is a separate pull request for 6lowpan changes. We agreed with the bluetooth maintainers to switch the trees these changing are going into from bluetooth to ieee802154. Jukka Rissanen stepped down as a co-maintainer of 6lowpan (Thanks for the work!). Alexander is staying as maintainer. Alexander reworked the nhc_id lookup in 6lowpan to be way simpler. Moved the data structure from rb to an array, which is all we need in this case. * tag 'ieee802154-for-net-next-2022-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next: MAINTAINERS: Remove Jukka Rissanen as 6lowpan maintainer net: 6lowpan: constify lowpan_nhc structures net: 6lowpan: use array for find nhc id net: 6lowpan: remove const from scalars ==================== Link: https://lore.kernel.org/r/20220609202956.1512156-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: seg6: fix seg6_lookup_any_nexthop() to handle VRFs using flowi_l3mdevAndrea Mayer
Commit 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") adds a new entry (flowi_l3mdev) in the common flow struct used for indicating the l3mdev index for later rule and table matching. The l3mdev_update_flow() has been adapted to properly set the flowi_l3mdev based on the flowi_oif/flowi_iif. In fact, when a valid flowi_iif is supplied to the l3mdev_update_flow(), this function can update the flowi_l3mdev entry only if it has not yet been set (i.e., the flowi_l3mdev entry is equal to 0). The SRv6 End.DT6 behavior in VRF mode leverages a VRF device in order to force the routing lookup into the associated routing table. This routing operation is performed by seg6_lookup_any_nextop() preparing a flowi6 data structure used by ip6_route_input_lookup() which, in turn, (indirectly) invokes l3mdev_update_flow(). However, seg6_lookup_any_nexthop() does not initialize the new flowi_l3mdev entry which is filled with random garbage data. This prevents l3mdev_update_flow() from properly updating the flowi_l3mdev with the VRF index, and thus SRv6 End.DT6 (VRF mode)/DT46 behaviors are broken. This patch correctly initializes the flowi6 instance allocated and used by seg6_lookup_any_nexhtop(). Specifically, the entire flowi6 instance is wiped out: in case new entries are added to flowi/flowi6 (as happened with the flowi_l3mdev entry), we should no longer have incorrectly initialized values. As a result of this operation, the value of flowi_l3mdev is also set to 0. The proposed fix can be tested easily. Starting from the commit referenced in the Fixes, selftests [1],[2] indicate that the SRv6 End.DT6 (VRF mode)/DT46 behaviors no longer work correctly. By applying this patch, those behaviors are back to work properly again. [1] - tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh [2] - tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") Reported-by: Anton Makarov <am@3a-alliance.com> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220608091917.20345-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: add napi_get_frags_check() helperEric Dumazet
This is a follow up of commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") When/if we increase MAX_SKB_FRAGS, we better make sure the old bug will not come back. Adding a check in napi_get_frags() would be costly, even if using DEBUG_NET_WARN_ON_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: add debug checks in napi_consume_skb and __napi_alloc_skb()Eric Dumazet
Commit 6454eca81eae ("net: Use lockdep_assert_in_softirq() in napi_consume_skb()") added a check in napi_consume_skb() which is a bit weak. napi_consume_skb() and __napi_alloc_skb() should only be used from BH context, not from hard irq or nmi context, otherwise we could have races. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use DEBUG_NET_WARN_ON_ONCE() in skb_release_head_state()Eric Dumazet
Remove this check from fast path unless CONFIG_DEBUG_NET=y Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09af_unix: use DEBUG_NET_WARN_ON_ONCE()Eric Dumazet
Replace four WARN_ON() that have not triggered recently with DEBUG_NET_WARN_ON_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use WARN_ON_ONCE() in sk_stream_kill_queues()Eric Dumazet
sk_stream_kill_queues() has three checks which have been useful to detect kernel bugs in the past. However they are potentially a problem because they could flood the syslog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use WARN_ON_ONCE() in inet_sock_destruct()Eric Dumazet
inet_sock_destruct() has four warnings which have been useful to point to kernel bugs in the past. However they are potentially a problem because they could flood the syslog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use DEBUG_NET_WARN_ON_ONCE() in dev_loopback_xmit()Eric Dumazet
One check in dev_loopback_xmit() has not caught issues in the past. Keep it for CONFIG_DEBUG_NET=y builds only. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use DEBUG_NET_WARN_ON_ONCE() in __release_sock()Eric Dumazet
Check against skb dst in socket backlog has never triggered in past years. Keep the check omly for CONFIG_DEBUG_NET=y builds. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09drop_monitor: adopt u64_stats_tEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09devlink: adopt u64_stats_tEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: adopt u64_stats_t in struct pcpu_sw_netstatsEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09ip6_tunnel: use dev_sw_netstats_rx_add()Eric Dumazet
We have a convenient helper, let's use it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09sit: use dev_sw_netstats_rx_add()Eric Dumazet
We have a convenient helper, let's use it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>