summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-01-23ipv6: fix reachability confirmation with proxy_ndpGergely Risko
When proxying IPv6 NDP requests, the adverts to the initial multicast solicits are correct and working. On the other hand, when later a reachability confirmation is requested (on unicast), no reply is sent. This causes the neighbor entry expiring on the sending node, which is mostly a non-issue, as a new multicast request is sent. There are routers, where the multicast requests are intentionally delayed, and in these environments the current implementation causes periodic packet loss for the proxied endpoints. The root cause is the erroneous decrease of the hop limit, as this is checked in ndisc.c and no answer is generated when it's 254 instead of the correct 255. Cc: stable@vger.kernel.org Fixes: 46c7655f0b56 ("ipv6: decrease hop limit counter in ip6_forward()") Signed-off-by: Gergely Risko <gergely.risko@gmail.com> Tested-by: Gergely Risko <gergely.risko@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23net: ethtool: netlink: introduce ethnl_update_bool()Vladimir Oltean
Due to the fact that the kernel-side data structures have been carried over from the ioctl-based ethtool, we are now in the situation where we have an ethnl_update_bool32() function, but the plain function that operates on a boolean value kept in an actual u8 netlink attribute doesn't exist. With new ethtool features that are exposed solely over netlink, the kernel data structures will use the "bool" type, so we will need this kind of helper. Introduce it now; it's needed for things like verify-disabled for the MAC merge configuration. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()Eric Dumazet
int type = nla_type(nla); if (type > XFRMA_MAX) { return -EOPNOTSUPP; } @type is then used as an array index and can be used as a Spectre v1 gadget. if (nla_len(nla) < compat_policy[type].len) { array_index_nospec() can be used to prevent leaking content of kernel memory to malicious users. Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dmitry Safonov <dima@arista.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-01-22Merge 6.2-rc5 into tty-nextGreg Kroah-Hartman
We need the serial/tty changes into this branch as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-21batman-adv: tvlv: prepare for tvlv enabled multicast packet typeLinus Lüssing
Prepare TVLV infrastructure for more packet types, in particular the upcoming batman-adv multicast packet type. For that swap the OGM vs. unicast-tvlv packet boolean indicator to an explicit unsigned integer packet type variable. And provide the skb to a call to batadv_tvlv_containers_process(), as later the multicast packet's TVLV handler will need to have access not only to the TVLV but the full skb for forwarding. Forwarding will be invoked from the multicast packet's TVLVs' contents later. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2023-01-21batman-adv: mcast: remove now redundant single ucast forwardingLinus Lüssing
The multicast code to send a multicast packet via multiple batman-adv unicast packets is not only capable of sending to multiple but also to a single node. Therefore we can safely remove the old, specialized, now redundant multicast-to-single-unicast code. The only functional change of this simplification is that the edge case of allowing a multicast packet with an unsnoopable destination address (224.0.0.0/24 or ff02::1) where only a single node has signaled interest in it via the batman-adv want-all-unsnoopables multicast flag is now transmitted via a batman-adv broadcast instead of a batman-adv unicast packet. Maintaining this edge case feature does not seem worth the extra lines of code and people should just not expect to be able to snoop and optimize such unsnoopable multicast addresses when bridges are involved. While at it also renaming a few items in the batadv_forw_mode enum to prepare for the new batman-adv multicast packet type. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2023-01-20net: fix UaF in netns ops registration error pathPaolo Abeni
If net_assign_generic() fails, the current error path in ops_init() tries to clear the gen pointer slot. Anyway, in such error path, the gen pointer itself has not been modified yet, and the existing and accessed one is smaller than the accessed index, causing an out-of-bounds error: BUG: KASAN: slab-out-of-bounds in ops_init+0x2de/0x320 Write of size 8 at addr ffff888109124978 by task modprobe/1018 CPU: 2 PID: 1018 Comm: modprobe Not tainted 6.2.0-rc2.mptcp_ae5ac65fbed5+ #1641 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6a/0x9f print_address_description.constprop.0+0x86/0x2b5 print_report+0x11b/0x1fb kasan_report+0x87/0xc0 ops_init+0x2de/0x320 register_pernet_operations+0x2e4/0x750 register_pernet_subsys+0x24/0x40 tcf_register_action+0x9f/0x560 do_one_initcall+0xf9/0x570 do_init_module+0x190/0x650 load_module+0x1fa5/0x23c0 __do_sys_finit_module+0x10d/0x1b0 do_syscall_64+0x58/0x80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f42518f778d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cb 56 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff96869688 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 00005568ef7f7c90 RCX: 00007f42518f778d RDX: 0000000000000000 RSI: 00005568ef41d796 RDI: 0000000000000003 RBP: 00005568ef41d796 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 R13: 00005568ef7f7d30 R14: 0000000000040000 R15: 0000000000000000 </TASK> This change addresses the issue by skipping the gen pointer de-reference in the mentioned error-path. Found by code inspection and verified with explicit error injection on a kasan-enabled kernel. Fixes: d266935ac43d ("net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/cec4e0f3bb2c77ac03a6154a8508d3930beb5f0f.1674154348.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ipa/ipa_interrupt.c drivers/net/ipa/ipa_interrupt.h 9ec9b2a30853 ("net: ipa: disable ipa interrupt during suspend") 8e461e1f092b ("net: ipa: introduce ipa_interrupt_enable()") d50ed3558719 ("net: ipa: enable IPA interrupt handlers separate from registration") https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/ https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20tcp: fix rate_app_limited to default to 1David Morley
The initial default value of 0 for tp->rate_app_limited was incorrect, since a flow is indeed application-limited until it first sends data. Fixing the default to be 1 is generally correct but also specifically will help user-space applications avoid using the initial tcpi_delivery_rate value of 0 that persists until the connection has some non-zero bandwidth sample. Fixes: eb8329e0a04d ("tcp: export data delivery rate") Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David Morley <morleyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Tested-by: David Morley <morleyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-20net: dcb: add helper functions to retrieve PCP and DSCP rewrite mapsDaniel Machon
Add two new helper functions to retrieve a mapping of priority to PCP and DSCP bitmasks, where each bitmap contains ones in positions that match a rewrite entry. dcb_ieee_getrewr_prio_dscp_mask_map() reuses the dcb_ieee_app_prio_map, as this struct is already used for a similar mapping in the app table. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-20net: dcb: add new rewrite tableDaniel Machon
Add new rewrite table and all the required functions, offload hooks and bookkeeping for maintaining it. The rewrite table reuses the app struct, and the entire set of app selectors. As such, some bookeeping code can be shared between the rewrite- and the APP table. New functions for getting, setting and deleting entries has been added. Apart from operating on the rewrite list, these functions do not emit a DCB_APP_EVENT when the list os modified. The new dcb_getrewr does a lookup based on selector and priority and returns the protocol, so that mappings from priority to protocol, for a given selector and ifindex is obtained. Also, a new nested attribute has been added, that encapsulates one or more app structs. This attribute is used to distinguish the two tables. The dcb_lock used for the APP table is reused for the rewrite table. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-20net: dcb: add new common function for set/del of app/rewr entriesDaniel Machon
In preparation for DCB rewrite. Add a new function for setting and deleting both app and rewrite entries. Moving this into a separate function reduces duplicate code, as both type of entries requires the same set of checks. The function will now iterate through a configurable nested attribute (app or rewrite attr), validate each attribute and call the appropriate set- or delete function. Note that this function always checks for nla_len(attr_itr) < sizeof(struct dcb_app), which was only done in dcbnl_ieee_set and not in dcbnl_ieee_del prior to this patch. This means, that any userspace tool that used to shove in data < sizeof(struct dcb_app) would now receive -ERANGE. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-20net: dcb: modify dcb_app_add to take list_head ptr as parameterDaniel Machon
In preparation to DCB rewrite. Modify dcb_app_add to take new struct list_head * as parameter, to make the used list configurable. This is done to allow reusing the function for adding rewrite entries to the rewrite table, which is introduced in a later patch. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-19devlink: add instance lock assertion in devl_is_registered()Jiri Pirko
After region and linecard lock removals, this helper is always supposed to be called with instance lock held. So put the assertion here and remove the comment which is no longer accurate. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: remove devlink_dump_for_each_instance_get() helperJiri Pirko
devlink_dump_for_each_instance_get() is currently called from a single place in netlink.c. As there is no need to use this helper anywhere else in the future, remove it and call devlinks_xa_find_get() directly from while loop in devlink_nl_instance_iter_dump(). Also remove redundant idx clear on loop end as it is already done in devlink_nl_instance_iter_dump(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: convert reporters dump to devlink_nl_instance_iter_dump()Jiri Pirko
Benefit from recently introduced instance iteration and convert reporters .dumpit generic netlink callback to use it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: convert linecards dump to devlink_nl_instance_iter_dump()Jiri Pirko
Benefit from recently introduced instance iteration and convert linecards .dumpit generic netlink callback to use it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: remove reporter reference countingJiri Pirko
As long as the reporter life time is protected by devlink instance lock, the reference counting is no longer needed. Remove it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: remove devl*_port_health_reporter_destroy()Jiri Pirko
Remove port-specific health reporter destroy function as it is currently the same as the instance one so no longer needed. Inline __devlink_health_reporter_destroy() as it is no longer called from multiple places. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: remove reporters_lockJiri Pirko
Similar to other devlink objects, rely on devlink instance lock and remove object specific reporters_lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: protect health reporter operation with instance lockJiri Pirko
Similar to other devlink objects, protect the reporters list by devlink instance lock. Alongside add unlocked versions of health reporter create/destroy functions and use them in drivers on call paths where the instance lock is held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: remove linecard reference countingJiri Pirko
As long as the linecard life time is protected by devlink instance lock, the reference counting is no longer needed. Remove it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19devlink: remove linecards lockJiri Pirko
Similar to other devlink objects, convert the linecards list to be protected by devlink instance lock. Alongside with that rename the create/destroy() functions to devl_* to indicate the devlink instance lock needs to be held while calling them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19wifi: wireless: deny wireless extensions on MLO-capable devicesJohannes Berg
These are WiFi 7 devices that will be introduced into the market in 2023, with new drivers. Wireless extensions haven't been in real development since 2006. Since wireless has evolved a lot, and continues to evolve significantly with Multi-Link Operation, there's really no good way to still support wireless extensions for devices that do MLO. Stop supporting wireless extensions for new devices. We don't consider this a regression since no such devices (apart from hwsim) exist yet. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230118105152.45f85078a1e0.Ib9eabc2ec5bf6b0244e4d973e93baaa3d8c91bd8@changeid
2023-01-19wifi: wireless: warn on most wireless extension usageJohannes Berg
With WiFi 7 (802.11ax, MLO/EHT) around the corner, we're going to remove support for wireless extensions with new devices since MLO (multi-link operation) cannot be properly indicated using them. Add a warning to indicate which processes are still using wireless extensions, if being used with modern (i.e. cfg80211) drivers. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230118105152.a7158a929a6f.Ifcf30eeeb8fc7019e4dcf2782b04515254d165e1@changeid
2023-01-19net/ulp: use consistent error code when blocking ULPPaolo Abeni
The referenced commit changed the error code returned by the kernel when preventing a non-established socket from attaching the ktls ULP. Before to such a commit, the user-space got ENOTCONN instead of EINVAL. The existing self-tests depend on such error code, and the change caused a failure: RUN global.non_established ... tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107) non_established: Test failed at step #3 FAIL global.non_established In the unlikely event existing applications do the same, address the issue by restoring the prior error code in the above scenario. Note that the only other ULP performing similar checks at init time - smc_ulp_ops - also fails with ENOTCONN when trying to attach the ULP to a non-established socket. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Fixes: 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering the LISTEN status") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.1674042685.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-19tty: Convert ->carrier_raised() and callchains to boolIlpo Järvinen
Return boolean from ->carrier_raised() instead of 0 and 1. Make the return type change also to tty_port_carrier_raised() that makes the ->carrier_raised() call (+ cd variable in moxa into which its return value is stored). Also cleans up a few unnecessary constructs related to this change: return xx ? 1 : 0; -> return xx; if (xx) return 1; return 0; -> return xx; Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230117090358.4796-7-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-19wifi: mac80211: drop extra 'e' from ieeee80211... nameJohannes Berg
Somehow an extra 'e' slipped in there without anyone noticing, drop that from ieeee80211_obss_color_collision_notify(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-19wifi: cfg80211: Deduplicate certificate loadingLukas Wunner
load_keys_from_buffer() in net/wireless/reg.c duplicates x509_load_certificate_list() in crypto/asymmetric_keys/x509_loader.c for no apparent reason. Deduplicate it. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/e7280be84acda02634bc7cb52c97656182b9c700.1673197326.git.lukas@wunner.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-19tcp: avoid the lookup process failing to get sk in ehash tableJason Xing
While one cpu is working on looking up the right socket from ehash table, another cpu is done deleting the request socket and is about to add (or is adding) the big socket from the table. It means that we could miss both of them, even though it has little chance. Let me draw a call trace map of the server side. CPU 0 CPU 1 ----- ----- tcp_v4_rcv() syn_recv_sock() inet_ehash_insert() -> sk_nulls_del_node_init_rcu(osk) __inet_lookup_established() -> __sk_nulls_add_node_rcu(sk, list) Notice that the CPU 0 is receiving the data after the final ack during 3-way shakehands and CPU 1 is still handling the final ack. Why could this be a real problem? This case is happening only when the final ack and the first data receiving by different CPUs. Then the server receiving data with ACK flag tries to search one proper established socket from ehash table, but apparently it fails as my map shows above. After that, the server fetches a listener socket and then sends a RST because it finds a ACK flag in the skb (data), which obeys RST definition in RFC 793. Besides, Eric pointed out there's one more race condition where it handles tw socket hashdance. Only by adding to the tail of the list before deleting the old one can we avoid the race if the reader has already begun the bucket traversal and it would possibly miss the head. Many thanks to Eric for great help from beginning to end. Fixes: 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/ Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-19fs: port xattr to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-18net: sched: gred: prevent races when adding offloads to statsJakub Kicinski
Naresh reports seeing a warning that gred is calling u64_stats_update_begin() with preemption enabled. Arnd points out it's coming from _bstats_update(). We should be holding the qdisc lock when writing to stats, they are also updated from the datapath. Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/all/CA+G9fYsTr9_r893+62u6UGD3dVaCE-kN9C-Apmb2m=hxjc1Cqg@mail.gmail.com/ Fixes: e49efd5288bd ("net: sched: gred: support reporting stats from offloads") Link: https://lore.kernel.org/r/20230113044137.1383067-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-18Merge tag 'wireless-2023-01-18' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.2 Third set of fixes for v6.2. This time most of them are for drivers, only one revert for mac80211. For an important mt76 fix we had to cherry pick two commits from wireless-next. * tag 'wireless-2023-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()" wifi: mt76: dma: fix a regression in adding rx buffers wifi: mt76: handle possible mt76_rx_token_consume failures wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices wifi: brcmfmac: avoid NULL-deref in survey dump for 2G only device wifi: brcmfmac: avoid handling disabled channels for survey dump ==================== Link: https://lore.kernel.org/r/20230118073749.AF061C433EF@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-18mm: remove zap_page_range and create zap_vma_pagesMike Kravetz
zap_page_range was originally designed to unmap pages within an address range that could span multiple vmas. While working on [1], it was discovered that all callers of zap_page_range pass a range entirely within a single vma. In addition, the mmu notification call within zap_page range does not correctly handle ranges that span multiple vmas. When crossing a vma boundary, a new mmu_notifier_range_init/end call pair with the new vma should be made. Instead of fixing zap_page_range, do the following: - Create a new routine zap_vma_pages() that will remove all pages within the passed vma. Most users of zap_page_range pass the entire vma and can use this new routine. - For callers of zap_page_range not passing the entire vma, instead call zap_page_range_single(). - Remove zap_page_range. [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/ Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Peter Xu <peterx@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18fs: port vfs_*() helpers to struct mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-18mac80211: support minimal EHT rate reporting on RXJohannes Berg
Add minimal support for RX EHT rate reporting, not yet adding (modifying) any radiotap headers, just statistics for cfg80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: mac80211: Add HE MU-MIMO related flags in ieee80211_bss_confMuna Sinada
Adding flags for SU Beamformer, SU Beamformee, MU Beamformer and Full Bandwidth UL MU-MIMO for HE. This is utilized to pass MU-MIMO configurations from user space to driver in AP mode. Signed-off-by: Muna Sinada <quic_msinada@quicinc.com> Link: https://lore.kernel.org/r/1665006886-23874-2-git-send-email-quic_msinada@quicinc.com [fixed indentation, removed redundant !!] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: mac80211: Add VHT MU-MIMO related flags in ieee80211_bss_confMuna Sinada
Adding flags for SU Beamformer, SU Beamformee, MU Beamformer and MU Beamformee for VHT. This is utilized to pass MU-MIMO configurations from user space to driver in AP mode. Signed-off-by: Muna Sinada <quic_msinada@quicinc.com> Link: https://lore.kernel.org/r/1665006886-23874-1-git-send-email-quic_msinada@quicinc.com [fixed indentation, removed redundant !!] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: cfg80211: Support 32 bytes KCK key in GTK rekey offloadShivani Baranwal
Currently, maximum KCK key length supported for GTK rekey offload is 24 bytes but with some newer AKMs the KCK key length can be 32 bytes. e.g., 00-0F-AC:24 AKM suite with SAE finite cyclic group 21. Add support to allow 32 bytes KCK keys in GTK rekey offload. Signed-off-by: Shivani Baranwal <quic_shivbara@quicinc.com> Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20221206143715.1802987-3-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: cfg80211: Fix extended KCK key length check in nl80211_set_rekey_data()Shivani Baranwal
The extended KCK key length check wrongly using the KEK key attribute for validation. Due to this GTK rekey offload is failing when the KCK key length is 24 bytes even though the driver advertising WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK flag. Use correct attribute to fix the same. Fixes: 093a48d2aa4b ("cfg80211: support bigger kek/kck key length") Signed-off-by: Shivani Baranwal <quic_shivbara@quicinc.com> Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20221206143715.1802987-2-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18wifi: cfg80211: remove support for static WEPJohannes Berg
This reverts commit b8676221f00d ("cfg80211: Add support for static WEP in the driver") since no driver ever ended up using it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-01-18l2tp: prevent lockdep issue in l2tp_tunnel_register()Eric Dumazet
lockdep complains with the following lock/unlock sequence: lock_sock(sk); write_lock_bh(&sk->sk_callback_lock); [1] release_sock(sk); [2] write_unlock_bh(&sk->sk_callback_lock); We need to swap [1] and [2] to fix this issue. Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()") Reported-by: syzbot+bbd35b345c7cab0d9a08@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/20230114030137.672706-1-xiyou.wangcong@gmail.com/T/#m1164ff20628671b0f326a24cb106ab3239c70ce3 Cc: Cong Wang <cong.wang@bytedance.com> Cc: Guillaume Nault <gnault@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18xdp: document xdp_do_flush() before napi_complete_done()Magnus Karlsson
Document in the XDP_REDIRECT manual section that drivers must call xdp_do_flush() before napi_complete_done(). The two reasons behind this can be found following the links below. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/ Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller
Pablo Niera Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Fix syn-retransmits until initiator gives up when connection is re-used due to rst marked as invalid, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller
Florian Westphal says: ==================== Netfilter updates for net-next following patch set includes netfilter updates for your *net-next* tree. 1. Replace pr_debug use with nf_log infra for debugging in sctp conntrack. 2. Remove pr_debug calls, they are either useless or we have better options in place. 3. Avoid repeated load of ct->status in some spots. Some bit-flags cannot change during the lifeetime of a connection, so no need to re-fetch those. 4. Avoid uneeded nesting of rcu_read_lock during tuple lookup. 5. Remove the CLUSTERIP target. Marked as obsolete for years, and we still have WARN splats wrt. races of the out-of-band /proc interface installed by this target. 6. Add static key to nf_tables to avoid the retpoline mitigation if/else if cascade provided the cpu doesn't need the retpoline thunk. 7. add nf_tables objref calls to the retpoline mitigation workaround. 8. Split parts of nft_ct.c that do not need symbols exported by the conntrack modules and place them in nf_tables directly. This allows to avoid indirect call for 'ct status' checks. 9. Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not indicate an error if the referenced object (set, chain, rule...) did not exist, from Fernando. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18ipv6: Remove extra counter pull before gcTanmay Bhushan
Per cpu entries are no longer used in consideration for doing gc or not. Remove the extra per cpu entries pull to directly check for time and perform gc. Signed-off-by: Tanmay Bhushan <007047221b@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18netfilter: nf_tables: add support to destroy operationFernando Fernandez Mancera
Introduce NFT_MSG_DESTROY* message type. The destroy operation performs a delete operation but ignoring the ENOENT errors. This is useful for the transaction semantics, where failing to delete an object which does not exist results in aborting the transaction. This new command allows the transaction to proceed in case the object does not exist. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: nf_tables: avoid retpoline overhead for some ct expression callsFlorian Westphal
nft_ct expression cannot be made builtin to nf_tables without also forcing the conntrack itself to be builtin. However, this can be avoided by splitting retrieval of a few selector keys that only need to access the nf_conn structure, i.e. no function calls to nf_conntrack code. Many rulesets start with something like "ct status established,related accept" With this change, this no longer requires an indirect call, which gives about 1.8% more throughput with a simple conntrack-enabled forwarding test (retpoline thunk used). Signed-off-by: Florian Westphal <fw@strlen.de>
2023-01-18netfilter: nf_tables: avoid retpoline overhead for objref callsFlorian Westphal
objref expression is builtin, so avoid calls to it for RETOLINE=y builds. Signed-off-by: Florian Westphal <fw@strlen.de>