summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-06-10treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_149.RULEThomas Gleixner
Based on the normalized pattern: netapp provides this source code under the gpl v2 license the gpl v2 license is available at https://opensource org/licenses/gpl-license php this software is provided by the copyright holders and contributors as is and any express or implied warranties including but not limited to the implied warranties of merchantability and fitness for a particular purpose are disclaimed in no event shall the copyright owner or contributors be liable for any direct indirect incidental special exemplary or consequential damages (including but not limited to procurement of substitute goods or services loss of use data or profits or business interruption) however caused and on any theory of liability whether in contract strict liability or tort (including negligence or otherwise) arising in any way out of the use of this software even if advised of the possibility of such damage extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal <allison@lohutok.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-10treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_30.RULE ↵Thomas Gleixner
(part 2) Based on the normalized pattern: this program is free software you can redistribute it and/or modify it under the terms of the gnu general public license as published by the free software foundation version 2 this program is distributed as is without any warranty of any kind whether express or implied without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal <allison@lohutok.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-10xfrm: no need to set DST_NOPOLICY in IPv4Eyal Birger
This is a cleanup patch following commit e6175a2ed1f1 ("xfrm: fix "disable_policy" flag use when arriving from different devices") which made DST_NOPOLICY no longer be used for inbound policy checks. On outbound the flag was set, but never used. As such, avoid setting it altogether and remove the nopolicy argument from rt_dst_alloc(). Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-06-10net: mac802154: Add a warning in the slow pathMiquel Raynal
In order to be able to detect possible conflicts between the net interface core and the ieee802154 core, let's add a warning in the slow path: we want to be sure that whenever we start an asynchronous MLME transmission (which can be fully asynchronous) the net core somehow agrees that this transmission is possible, ie. the device was not stopped. Warning in this case would allow us to track down more easily possible issues with the MLME logic if we ever get reports. Unlike in the hot path, such a situation cannot be handled. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-12-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Add a warning in the hot pathMiquel Raynal
We should never start a transmission after the queue has been stopped. But because it might work we don't kill the function here but rather warn loudly the user that something is wrong. Set a flag when the queue should remain stopped. Reset this flag when the queue actually gets restarded. Just check this value to know if a transmission is legitimate, warn if it is not. Turn the flags variable into an unsigned long to allow the use of atomic helpers on it. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-11-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Introduce a synchronous API for MLME commandsMiquel Raynal
This is the slow path, we need to wait for each command to be processed before continuing so let's introduce an helper which does the transmission and blocks until it gets notified of its asynchronous completion. This helper is going to be used when introducing scan support. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-10-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Introduce a tx queue flushing mechanismMiquel Raynal
Right now we are able to stop a queue but we have no indication if a transmission is ongoing or not. Thanks to recent additions, we can track the number of ongoing transmissions so we know if the last transmission is over. Adding on top of it an internal wait queue also allows to be woken up asynchronously when this happens. If, beforehands, we marked the queue to be held and stopped it, we end up flushing and stopping the tx queue. Thanks to this feature, we will soon be able to introduce a synchronous transmit API. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-9-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Introduce a helper to disable the queueMiquel Raynal
Sometimes calling the stop queue helper is not enough because it does not hold any lock. In order to be safe and avoid racy situations when trying to (soon) sync the Tx queue, for instance before sending an MLME frame, let's now introduce an helper which actually hold the necessary locks when doing so. Suggested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-8-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Create a hot tx pathMiquel Raynal
Let's rename the current Tx path to show that this is the "hot" Tx path. We will soon introduce a slower Tx path for MLME commands. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-7-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Bring the ability to hold the transmit queueMiquel Raynal
Create a hold_txs atomic variable and increment/decrement it when relevant, ie. when we want to hold the queue or release it: currently all the "stopped" situations are suitable, but very soon we will more extensively use this feature for MLME purposes. Upon release, the atomic counter is decremented and checked. If it is back to 0, then the netif queue gets woken up. This makes the whole process fully transparent, provided that all the users of ieee802154_wake/stop_queue() now call ieee802154_hold/release_queue() instead. In no situation individual drivers should call any of these helpers manually in order to avoid messing with the counters. There are other functions more suited for this purpose which have been introduced, such as the _xmit_complete() and _xmit_error() helpers which will handle all that for them. One advantage is that, as no more drivers call the stop/wake helpers directly, we can safely stop exporting them and only declare the hold/release ones in a header only accessible to the core. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-6-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Follow the count of ongoing transmissionsMiquel Raynal
In order to create a synchronous API for MLME command purposes, we need to be able to track the end of the ongoing transmissions. Let's introduce an atomic variable which is incremented when a transmission starts and decremented when relevant so that we know at any moment whether there is an ongoing transmission. The counter gets decremented in the following situations: - The operation is asynchronous and there was a failure during the offloading process. - The operation is synchronous and the synchronous operation failed. - The operation finished, either successfully or not. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-5-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Enhance the error path in the main tx helperMiquel Raynal
Before adding more logic in the error path, let's move the wake queue call there, rename the default label and create an additional one. There is no functional change. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Rename the main tx_work structMiquel Raynal
This entry is dedicated to synchronous transmissions done by drivers without async hook. Make this clearer that this is not a work that any driver can use by at least prefixing it with "sync_". While at it, let's enhance the comment explaining why we choose one or the other. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-3-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-10net: mac802154: Rename the synchronous xmit workerMiquel Raynal
There are currently two driver hooks: one is synchronous, the other is not. We cannot rely on driver implementations to provide a synchronous API (which is related to the bus medium more than a wish to have a synchronized implementation) so we are going to introduce a sync API above any kind of driver transmit function. In order to clarify what this worker is for (synchronous driver implementation), let's rename it so that people don't get bothered by the fact that their driver does not make use of the "xmit worker" which is a too generic name. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20220519150516.443078-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09Merge tag 'ieee802154-for-net-next-2022-06-09' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2022-06-09 This is a separate pull request for 6lowpan changes. We agreed with the bluetooth maintainers to switch the trees these changing are going into from bluetooth to ieee802154. Jukka Rissanen stepped down as a co-maintainer of 6lowpan (Thanks for the work!). Alexander is staying as maintainer. Alexander reworked the nhc_id lookup in 6lowpan to be way simpler. Moved the data structure from rb to an array, which is all we need in this case. * tag 'ieee802154-for-net-next-2022-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next: MAINTAINERS: Remove Jukka Rissanen as 6lowpan maintainer net: 6lowpan: constify lowpan_nhc structures net: 6lowpan: use array for find nhc id net: 6lowpan: remove const from scalars ==================== Link: https://lore.kernel.org/r/20220609202956.1512156-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: seg6: fix seg6_lookup_any_nexthop() to handle VRFs using flowi_l3mdevAndrea Mayer
Commit 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") adds a new entry (flowi_l3mdev) in the common flow struct used for indicating the l3mdev index for later rule and table matching. The l3mdev_update_flow() has been adapted to properly set the flowi_l3mdev based on the flowi_oif/flowi_iif. In fact, when a valid flowi_iif is supplied to the l3mdev_update_flow(), this function can update the flowi_l3mdev entry only if it has not yet been set (i.e., the flowi_l3mdev entry is equal to 0). The SRv6 End.DT6 behavior in VRF mode leverages a VRF device in order to force the routing lookup into the associated routing table. This routing operation is performed by seg6_lookup_any_nextop() preparing a flowi6 data structure used by ip6_route_input_lookup() which, in turn, (indirectly) invokes l3mdev_update_flow(). However, seg6_lookup_any_nexthop() does not initialize the new flowi_l3mdev entry which is filled with random garbage data. This prevents l3mdev_update_flow() from properly updating the flowi_l3mdev with the VRF index, and thus SRv6 End.DT6 (VRF mode)/DT46 behaviors are broken. This patch correctly initializes the flowi6 instance allocated and used by seg6_lookup_any_nexhtop(). Specifically, the entire flowi6 instance is wiped out: in case new entries are added to flowi/flowi6 (as happened with the flowi_l3mdev entry), we should no longer have incorrectly initialized values. As a result of this operation, the value of flowi_l3mdev is also set to 0. The proposed fix can be tested easily. Starting from the commit referenced in the Fixes, selftests [1],[2] indicate that the SRv6 End.DT6 (VRF mode)/DT46 behaviors no longer work correctly. By applying this patch, those behaviors are back to work properly again. [1] - tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh [2] - tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") Reported-by: Anton Makarov <am@3a-alliance.com> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220608091917.20345-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: add napi_get_frags_check() helperEric Dumazet
This is a follow up of commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") When/if we increase MAX_SKB_FRAGS, we better make sure the old bug will not come back. Adding a check in napi_get_frags() would be costly, even if using DEBUG_NET_WARN_ON_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: add debug checks in napi_consume_skb and __napi_alloc_skb()Eric Dumazet
Commit 6454eca81eae ("net: Use lockdep_assert_in_softirq() in napi_consume_skb()") added a check in napi_consume_skb() which is a bit weak. napi_consume_skb() and __napi_alloc_skb() should only be used from BH context, not from hard irq or nmi context, otherwise we could have races. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use DEBUG_NET_WARN_ON_ONCE() in skb_release_head_state()Eric Dumazet
Remove this check from fast path unless CONFIG_DEBUG_NET=y Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09af_unix: use DEBUG_NET_WARN_ON_ONCE()Eric Dumazet
Replace four WARN_ON() that have not triggered recently with DEBUG_NET_WARN_ON_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use WARN_ON_ONCE() in sk_stream_kill_queues()Eric Dumazet
sk_stream_kill_queues() has three checks which have been useful to detect kernel bugs in the past. However they are potentially a problem because they could flood the syslog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use WARN_ON_ONCE() in inet_sock_destruct()Eric Dumazet
inet_sock_destruct() has four warnings which have been useful to point to kernel bugs in the past. However they are potentially a problem because they could flood the syslog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use DEBUG_NET_WARN_ON_ONCE() in dev_loopback_xmit()Eric Dumazet
One check in dev_loopback_xmit() has not caught issues in the past. Keep it for CONFIG_DEBUG_NET=y builds only. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: use DEBUG_NET_WARN_ON_ONCE() in __release_sock()Eric Dumazet
Check against skb dst in socket backlog has never triggered in past years. Keep the check omly for CONFIG_DEBUG_NET=y builds. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09drop_monitor: adopt u64_stats_tEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09devlink: adopt u64_stats_tEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: adopt u64_stats_t in struct pcpu_sw_netstatsEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09ip6_tunnel: use dev_sw_netstats_rx_add()Eric Dumazet
We have a convenient helper, let's use it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09sit: use dev_sw_netstats_rx_add()Eric Dumazet
We have a convenient helper, let's use it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09vlan: adopt u64_stats_tEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Add READ_ONCE() when reading rx_errors & tx_dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: rename reference+tracking helpersJakub Kicinski
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09tls: Rename TLS_INFO_ZC_SENDFILE to TLS_INFO_ZC_TXMaxim Mikityanskiy
To embrace possible future optimizations of TLS, rename zerocopy sendfile definitions to more generic ones: * setsockopt: TLS_TX_ZEROCOPY_SENDFILE- > TLS_TX_ZEROCOPY_RO * sock_diag: TLS_INFO_ZC_SENDFILE -> TLS_INFO_ZC_RO_TX RO stands for readonly and emphasizes that the application shouldn't modify the data being transmitted with zerocopy to avoid potential disconnection. Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220608153425.3151146-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: 6lowpan: constify lowpan_nhc structuresAlexander Aring
This patch constify the lowpan_nhc declarations. Since we drop the rb node datastructure there is no need for runtime manipulation of this structure. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Link: https://lore.kernel.org/r/20220428030534.3220410-4-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09net: 6lowpan: use array for find nhc idAlexander Aring
This patch will remove the complete overengineered and overthinking rb data structure for looking up the nhc by nhcid. Instead we using the existing nhc next header array and iterate over it. It works now for 1 byte values only. However there are only 1 byte nhc id values currently supported and IANA also does not specify large than 1 byte values yet. If there are 2 byte values for nhc ids specified we can revisit this data structure and add support for it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Link: https://lore.kernel.org/r/20220428030534.3220410-3-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09net: 6lowpan: remove const from scalarsAlexander Aring
The keyword const makes no sense for scalar types inside the lowpan_nhc structure. Most compilers will ignore it so we remove the keyword from the scalar types. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Link: https://lore.kernel.org/r/20220428030534.3220410-2-aahringo@redhat.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09Merge tag 'net-5.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
2022-06-099p: handling Rerror without copy_from_iter_full()Al Viro
p9_client_zc_rpc()/p9_check_zc_errors() are playing fast and loose with copy_from_iter_full(). Reading from file is done by sending Tread request. Response consists of fixed-sized header (including the amount of data actually read) followed by the data itself. For zero-copy case we arrange the things so that the first 11 bytes of reply go into the fixed-sized buffer, with the rest going straight into the pages we want to read into. What makes the things inconvenient is that sglist describing what should go where has to be set *before* the reply arrives. As the result, if reply is an error, the things get interesting. On success we get size[4] Rread tag[2] count[4] data[count] For error layout varies depending upon the protocol variant - in original 9P and 9P2000 it's size[4] Rerror tag[2] len[2] error[len] in 9P2000.U size[4] Rerror tag[2] len[2] error[len] errno[4] in 9P2000.L size[4] Rlerror tag[2] errno[4] The last case is nice and simple - we have an 11-byte response that fits into the fixed-sized buffer we hoped to get an Rread into. In other two, though, we get a variable-length string spill into the pages we'd prepared for the data to be read. Had that been in fixed-sized buffer (which is actually 4K), we would've dealt with that the same way we handle non-zerocopy case. However, for zerocopy it doesn't end up there, so we need to copy it from those pages. The trouble is, by the time we get around to that, the references to pages in question are already dropped. As the result, p9_zc_check_errors() tries to get the data using copy_from_iter_full(). Unfortunately, the iov_iter it's trying to read from might *NOT* be capable of that. It is, after all, a data destination, not data source. In particular, if it's an ITER_PIPE one, copy_from_iter_full() will simply fail. In ->zc_request() itself we do have those pages and dealing with the problem in there would be a simple matter of memcpy_from_page() into the fixed-sized buffer. Moreover, it isn't hard to recognize the (rare) case when such copying is needed. That way we get rid of p9_zc_check_errors() entirely - p9_check_errors() can be used instead both for zero-copy and non-zero-copy cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-06-08tcp: use alloc_large_system_hash() to allocate table_perturbMuchun Song
In our server, there may be no high order (>= 6) memory since we reserve lots of HugeTLB pages when booting. Then the system panic. So use alloc_large_system_hash() to allocate table_perturb. Fixes: e9261476184b ("tcp: dynamically allocate the perturb table used by source ports") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220607070214.94443-1-songmuchun@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: openvswitch: fix misuse of the cached connection on tuple changesIlya Maximets
If packet headers changed, the cached nfct is no longer relevant for the packet and attempt to re-use it leads to the incorrect packet classification. This issue is causing broken connectivity in OpenStack deployments with OVS/OVN due to hairpin traffic being unexpectedly dropped. The setup has datapath flows with several conntrack actions and tuple changes between them: actions:ct(commit,zone=8,mark=0/0x1,nat(src)), set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), ct(zone=8),recirc(0x4) After the first ct() action the packet headers are almost fully re-written. The next ct() tries to re-use the existing nfct entry and marks the packet as invalid, so it gets dropped later in the pipeline. Clearing the cached conntrack entry whenever packet tuple is changed to avoid the issue. The flow key should not be cleared though, because we should still be able to match on the ct_state if the recirculation happens after the tuple change but before the next ct() action. Cc: stable@vger.kernel.org Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Frode Nordahl <frode.nordahl@canonical.com> Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ip_gre: test csum_start instead of transport headerWillem de Bruijn
GRE with TUNNEL_CSUM will apply local checksum offload on CHECKSUM_PARTIAL packets. ipgre_xmit must validate csum_start after an optional skb_pull, else lco_csum may trigger an overflow. The original check was if (csum && skb_checksum_start(skb) < skb->data) return -EINVAL; This had false positives when skb_checksum_start is undefined: when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement was straightforward if (csum && skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start(skb) < skb->data) return -EINVAL; But was eventually revised more thoroughly: - restrict the check to the only branch where needed, in an uncommon GRE path that uses header_ops and calls skb_pull. - test skb_transport_header, which is set along with csum_start in skb_partial_csum_set in the normal header_ops datapath. Turns out skbs can arrive in this branch without the transport header set, e.g., through BPF redirection. Revise the check back to check csum_start directly, and only if CHECKSUM_PARTIAL. Do leave the check in the updated location. Check field regardless of whether TUNNEL_CSUM is configured. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Link: https://lore.kernel.org/all/20210902193447.94039-2-willemdebruijn.kernel@gmail.com/T/#u Fixes: 8a0ed250f911 ("ip_gre: validate csum_start only on pull") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20220606132107.3582565-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2022-06-09 We've added 6 non-merge commits during the last 2 day(s) which contain a total of 8 files changed, 49 insertions(+), 15 deletions(-). The main changes are: 1) Fix an illegal copy_to_user() attempt seen by syzkaller through arm64 BPF JIT compiler, from Eric Dumazet. 2) Fix calling global functions from BPF_PROG_TYPE_EXT programs by using the correct program context type, from Toke Høiland-Jørgensen. 3) Fix XSK TX batching invalid descriptor handling, from Maciej Fijalkowski. 4) Fix potential integer overflows in multi-kprobe link code by using safer kvmalloc_array() allocation helpers, from Dan Carpenter. 5) Add Quentin as bpftool maintainer, from Quentin Monnet. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: MAINTAINERS: Add a maintainer for bpftool xsk: Fix handling of invalid descriptors in XSK TX batching API selftests/bpf: Add selftest for calling global functions from freplace bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programs bpf: Use safer kvmalloc_array() where possible bpf, arm64: Clear prog->jited_len along prog->jited ==================== Link: https://lore.kernel.org/r/20220608234133.32265-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ipv6: Fix signed integer overflow in l2tp_ip6_sendmsgWang Yufen
When len >= INT_MAX - transhdrlen, ulen = len + transhdrlen will be overflow. To fix, we can follow what udpv6 does and subtract the transhdrlen from the max. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-2-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ipv6: Fix signed integer overflow in __ip6_append_dataWang Yufen
Resurrect ubsan overflow checks and ubsan report this warning, fix it by change the variable [length] type to size_t. UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19 2147479552 + 8567 cannot be represented in type 'int' CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x214/0x230 show_stack+0x30/0x78 dump_stack_lvl+0xf8/0x118 dump_stack+0x18/0x30 ubsan_epilogue+0x18/0x60 handle_overflow+0xd0/0xf0 __ubsan_handle_add_overflow+0x34/0x44 __ip6_append_data.isra.48+0x1598/0x1688 ip6_append_data+0x128/0x260 udpv6_sendmsg+0x680/0xdd0 inet6_sendmsg+0x54/0x90 sock_sendmsg+0x70/0x88 ____sys_sendmsg+0xe8/0x368 ___sys_sendmsg+0x98/0xe0 __sys_sendmmsg+0xf4/0x3b8 __arm64_sys_sendmmsg+0x34/0x48 invoke_syscall+0x64/0x160 el0_svc_common.constprop.4+0x124/0x300 do_el0_svc+0x44/0xc8 el0_svc+0x3c/0x1e8 el0t_64_sync_handler+0x88/0xb0 el0t_64_sync+0x16c/0x170 Changes since v1: -Change the variable [length] type to unsigned, as Eric Dumazet suggested. Changes since v2: -Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested. Changes since v3: -Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as Jakub Kicinski suggested. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: ipv6: unexport __init-annotated seg6_hmac_init()Masahiro Yamada
EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the caller (net/ipv6/seg6.c) and the callee (net/ipv6/seg6_hmac.c) belong to the same module. It seems an internal function call in ipv6.ko. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: xfrm: unexport __init-annotated xfrm4_protocol_init()Masahiro Yamada
EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the only in-tree call-site, net/ipv4/xfrm4_policy.c is never compiled as modular. (CONFIG_XFRM is boolean) Fixes: 2f32b51b609f ("xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()Chuck Lever
To make the code easier to read, remove visual clutter by changing the declared type of @p. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
2022-06-08SUNRPC: Clean up xdr_get_next_encode_buffer()Chuck Lever
The value of @p is not used until the "location of the next item" is computed. Help human readers by moving its initial assignment to the paragraph where that value is used and by clarifying the antecedents in the documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.com> Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
2022-06-08SUNRPC: Clean up xdr_commit_encode()Chuck Lever
Both the kvec::iov_len field and the third parameter of memcpy() and memmove() are size_t. There's no reason for the implicit conversion from size_t to int and back. Change the type of @shift to make the code easier to read and understand. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
2022-06-08SUNRPC: Optimize xdr_reserve_space()Chuck Lever
Transitioning between encode buffers is quite infrequent. It happens about 1 time in 400 calls to xdr_reserve_space(), measured on NFSD with a typical build/test workload. Force the compiler to remove that code from xdr_reserve_space(), which is a hot path on both the server and the client. This change reduces the size of xdr_reserve_space() from 10 cache lines to 2 when compiled with -Os. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: J. Bruce Fields <bfields@fieldses.org>