summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-03-25wifi: mac80211: don't ask driver about no-op link changesJohannes Berg
If the links won't actually change, nothing will happen. This was previously done in the inner function (twice in some cases), but we shouldn't bother the driver with it. Clean that up. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.a8190a312a27.If4e6f5ce8228eda7afac0fc8c17dd731c5da9ed9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: don't select link ID if not provided in scan requestAyala Beker
If scan request doesn't include a link ID to be used for TSF reporting, don't select it as it might become inactive before scan is actually started by the driver. Instead, let the driver select one of the active links. Fixes: cbde0b49f276 ("wifi: mac80211: Extend support for scanning while MLO connected") Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.a6b643a15755.Ic28ed9a611432387b7f85e9ca9a97a4ce34a6e0f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: clarify IEEE80211_STATUS_SUBDATA_MASKJohannes Berg
We have 13 bits for the status_data, so restricting type to 4 and subdata to 8 bits is confusing, even if we don't need more bits now. Change subdata mask to be 9 bits instead, just to make things match up. If we actually need more types or more subdata bits we can later also reshuffle the bits between these, but we should probably keep them at 13 bits together. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.28ac7b665039.I1abbb13e90f016cab552492e05f5cb5b52de6463@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: don't enter idle during link switchJohannes Berg
When doing link switch with a disjoint set of links before and after the switch, we end up removing all channel contexts, adding new ones later. This looks like 'idle' to the code now, and we enter idle which also includes flushing queues. But we can't actually flush since we don't have a link active (bound to a channel context), and entering idle just to leave it again is also wrong. Fix this by passing through an indication that we shouldn't do any idle checks in this case. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240320091155.170328bac555.If4a522a9dd3133b91983854b909a4de13aa635da@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: add support for tearing down negotiated TTLMAyala Beker
In order to activate a link that is currently inactive due to a negotiated TTLM request, need to first tear down the negotiated TTLM request. Add support for sending TTLM teardown request and update the links state accordingly. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.d480cbf46fcf.Idedad472469d2c27dd2a088cf80a13a1e1cf9b78@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: cfg80211: ignore non-TX BSSs in per-STA profileBenjamin Berg
If a non-TX BSS is included in a per-STA profile, then we cannot set transmitted_bss for it. Even worse, if we do things properly we should be configuring both bssid_index and max_bssid_indicator correctly. We do not actually have both pieces of information (and, some APs currently do not include either). So, ignore any per-STA profile where the RNR says that the BSS is not transmitted. Also fix transmitted_bss to never be set for per-STA profiles. This fixes issues where mac80211 was setting the reference BSSID to an incorrect value. Fixes: 2481b5da9c6b ("wifi: cfg80211: handle BSS data contained in ML probe responses") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.6a0babed655a.Iad447fea417c63f683da793556b97c31d07a4aab@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: cfg80211: check BSSID Index against MaxBSSIDBenjamin Berg
Add a verification that the BSSID Index does not exceed the maximum number of BSSIDs in the Multiple-BSSID set. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.a7574d415adc.I02f40c2920a9f602898190679cc27d0c8ee2c67d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: improve association error reporting slightlyBenjamin Berg
There is no reason to check the request flags for each of the links, so pull that out of the loop. Also, within the loop we can set the per-link error everywhere. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.695faa9be279.I71b11a8d66a9cae4c27e242a47d1d92922609b03@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: add flag to disallow puncturing in 5 GHzJohannes Berg
Some devices may not be capable of handling puncturing in 5 GHz only (vs. the current flag that just removes puncturing support completely). Add a flag to support such devices: check and then downgrade the channel width if needed. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.49759510da7d.I12c5a61f0be512e0c4e574c2f794ef4b37ecaf6b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: cfg80211: handle indoor AFC/LPI AP in probe response and beaconAnjaneyulu
Mark Indoor LPI and Indoor AFC power types as valid based on channel flags. While on it, added default case. Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.091cfaaa5f45.I23cfa1104a16fd4eb9751b3d0d7b158db4ff3ecd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: handle indoor AFC/LPI AP on assoc successAnjaneyulu
Update power_type in bss_conf based on Indoor AFC and LPI power types received in HE 6 GHz operation element on assoc success. Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.89c25dae34ff.Ifd8b2983f400623ac03dc032fc9a20025c9ca365@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: spectmgmt: simplify 6 GHz HE/EHT handlingJohannes Berg
Clean up the code here a bit to have only a single call to ieee80211_chandef_he_6ghz_oper() by using a local pointer variable for the difference. Link: https://msgid.link/20240312112048.94c421d767f9.Ia7ca2f315b392c74d39b44fa9eb872a2e62e75c1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: supplement parsing of puncturing bitmapKang Yang
Current mac80211 won't parsing puncturing bitmap when process EHT Operation element in 6 GHz band or Bandwidth Indication element. This leads to puncturing bitmap cannot be updated in related situations, such as connecting to an EHT AP in 6 GHz band. So supplement parsing of puncturing bitmap for these elements. Signed-off-by: Kang Yang <quic_kangyang@quicinc.com> Link: https://msgid.link/20240312045947.576231-2-quic_kangyang@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: correctly set active links upon TTLMAyala Beker
Fix ieee80211_ttlm_set_links() to not set all active links, but instead let the driver know that valid links status changed and select the active links properly. Fixes: 8f500fbc6c65 ("wifi: mac80211: process and save negotiated TID to Link mapping request") Signed-off-by: Ayala Beker <ayala.beker@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.acddbbf39584.Ide858f95248fcb3e483c97fcaa14b0cd4e964b10@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: fix prep_connection error pathJohannes Berg
If prep_channel fails in prep_connection, the code releases the deflink's chanctx, which is wrong since we may be using a different link. It's already wrong to even do that always though, since we might still have the station. Remove it only if prep_channel succeeded and later updates fail. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240318184907.2780c1f08c3d.I033c9b15483933088f32a2c0789612a33dd33d82@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: cfg80211: fix rdev_dump_mpp() arguments orderIgor Artemiev
Fix the order of arguments in the TP_ARGS macro for the rdev_dump_mpp tracepoint event. Found by Linux Verification Center (linuxtesting.org). Signed-off-by: Igor Artemiev <Igor.A.Artemiev@mcst.ru> Link: https://msgid.link/20240311164519.118398-1-Igor.A.Artemiev@mcst.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: cfg80211: add a flag to disable wireless extensionsJohannes Berg
Wireless extensions are already disabled if MLO is enabled, given that we cannot support MLO there with all the hard- coded assumptions about BSSID etc. However, the WiFi7 ecosystem is still stabilizing, and some devices may need MLO disabled while that happens. In that case, we might end up with a device that supports wext (but not MLO) in one kernel, and then breaks wext in the future (by enabling MLO), which is not desirable. Add a flag to let such drivers/devices disable wext even if MLO isn't yet enabled. Cc: stable@vger.kernel.org Link: https://msgid.link/20240314110951.b50f1dc4ec21.I656ddd8178eedb49dc5c6c0e70f8ce5807afb54f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: fix ieee80211_bss_*_flags kernel-docJeff Johnson
Running kernel-doc on ieee80211_i.h flagged the following: net/mac80211/ieee80211_i.h:145: warning: expecting prototype for enum ieee80211_corrupt_data_flags. Prototype was for enum ieee80211_bss_corrupt_data_flags instead net/mac80211/ieee80211_i.h:162: warning: expecting prototype for enum ieee80211_valid_data_flags. Prototype was for enum ieee80211_bss_valid_data_flags instead Fix these warnings. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://msgid.link/20240314-kdoc-ieee80211_i-v1-1-72b91b55b257@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changesFelix Fietkau
When moving a station out of a VLAN and deleting the VLAN afterwards, the fast_rx entry still holds a pointer to the VLAN's netdev, which can cause use-after-free bugs. Fix this by immediately calling ieee80211_check_fast_rx after the VLAN change. Cc: stable@vger.kernel.org Reported-by: ranygh@riseup.net Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://msgid.link/20240316074336.40442-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: mac80211: fix mlme_link_id_dbg()Johan Hovold
Make sure that the new mlme_link_id_dbg() macro honours CONFIG_MAC80211_MLME_DEBUG as intended to avoid spamming the log with messages like: wlan0: no EHT support, limiting to HE wlan0: determined local STA to be HE, BW limited to 160 MHz wlan0: determined AP xx:xx:xx:xx:xx:xx to be VHT wlan0: connecting with VHT mode, max bandwidth 160 MHz Fixes: 310c8387c638 ("wifi: mac80211: clean up connection process") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://msgid.link/20240325085948.26203-1-johan+linaro@kernel.org Tested-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-22ipv6: Fix address dump when IPv6 is disabled on an interfaceIdo Schimmel
Cited commit started returning an error when user space requests to dump the interface's IPv6 addresses and IPv6 is disabled on the interface. Restore the previous behavior and do not return an error. Before cited commit: # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff inet6 fe80::1852:2ff:fe5a:c26e/64 scope link proto kernel_ll valid_lft forever preferred_lft forever # ip link set dev dummy1 mtu 1000 # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff After cited commit: # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 1e:9b:94:00:ac:e8 brd ff:ff:ff:ff:ff:ff inet6 fe80::1c9b:94ff:fe00:ace8/64 scope link proto kernel_ll valid_lft forever preferred_lft forever # ip link set dev dummy1 mtu 1000 # ip address show dev dummy1 RTNETLINK answers: No such device Dump terminated With this patch: # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff inet6 fe80::4035:fcff:fe53:66cf/64 scope link proto kernel_ll valid_lft forever preferred_lft forever # ip link set dev dummy1 mtu 1000 # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff Fixes: 9cc4cc329d30 ("ipv6: use xa_array iterator to implement inet6_dump_addr()") Reported-by: Gal Pressman <gal@nvidia.com> Closes: https://lore.kernel.org/netdev/7e261328-42eb-411d-b1b4-ad884eeaae4d@linux.dev/ Tested-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240321173042.2151756-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-22nexthop: fix uninitialized variable in nla_put_nh_group_stats()Dan Carpenter
The "*hw_stats_used" value needs to be set on the success paths to prevent an uninitialized variable bug in the caller, nla_put_nh_group_stats(). Fixes: 5072ae00aea4 ("net: nexthop: Expose nexthop group HW stats to user space") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/f08ac289-d57f-4a1a-830f-cf9a0563cb9c@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-22SUNRPC: Revert 561141dd494382217bace4d1a51d08168420eaceChuck Lever
Scott reports an occasional scatterlist BUG that is triggered by the RFC 8009 Kunit test, then says: > Looking through the git history of the auth_gss code, there are various > places where static buffers were replaced by dynamically allocated ones > because they're being used with scatterlists. Reported-by: Scott Mayhew <smayhew@redhat.com> Fixes: 561141dd4943 ("SUNRPC: Use a static buffer for the checksum initialization vector") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-22nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packetRyosuke Yasuoka
syzbot reported the following uninit-value access issue [1][2]: nci_rx_work() parses and processes received packet. When the payload length is zero, each message type handler reads uninitialized payload and KMSAN detects this issue. The receipt of a packet with a zero-size payload is considered unexpected, and therefore, such packets should be silently discarded. This patch resolved this issue by checking payload size before calling each message type handler codes. Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Reported-and-tested-by: syzbot+7ea9413ea6749baf5574@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+29b5ca705d2e0f4a44d2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7ea9413ea6749baf5574 [1] Closes: https://syzkaller.appspot.com/bug?extid=29b5ca705d2e0f4a44d2 [2] Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Reviewed-by: Jeremy Cline <jeremy@jcline.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-21Merge tag 'net-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from CAN, netfilter, wireguard and IPsec. I'd like to highlight [ lowlight? - Linus ] Florian W stepping down as a netfilter maintainer due to constant stream of bug reports. Not sure what we can do but IIUC this is not the first such case. Current release - regressions: - rxrpc: fix use of page_frag_alloc_align(), it changed semantics and we added a new caller in a different subtree - xfrm: allow UDP encapsulation only in offload modes Current release - new code bugs: - tcp: fix refcnt handling in __inet_hash_connect() - Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets", conflicted with some expectations in BPF uAPI Previous releases - regressions: - ipv4: raw: fix sending packets from raw sockets via IPsec tunnels - devlink: fix devlink's parallel command processing - veth: do not manipulate GRO when using XDP - esp: fix bad handling of pages from page_pool Previous releases - always broken: - report RCU QS for busy network kthreads (with Paul McK's blessing) - tcp/rds: fix use-after-free on netns with kernel TCP reqsk - virt: vmxnet3: fix missing reserved tailroom with XDP Misc: - couple of build fixes for Documentation" * tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits) selftests: forwarding: Fix ping failure due to short timeout MAINTAINERS: step down as netfilter maintainer netfilter: nf_tables: Fix a memory leak in nf_tables_updchain net: dsa: mt7530: fix handling of all link-local frames net: dsa: mt7530: fix link-local frames that ingress vlan filtering ports bpf: report RCU QS in cpumap kthread net: report RCU QS on threaded NAPI repolling rcu: add a helper to report consolidated flavor QS ionic: update documentation for XDP support lib/bitmap: Fix bitmap_scatter() and bitmap_gather() kernel doc netfilter: nf_tables: do not compare internal table flags on updates netfilter: nft_set_pipapo: release elements in clone only from destroy path octeontx2-af: Use separate handlers for interrupts octeontx2-pf: Send UP messages to VF only when VF is up. octeontx2-pf: Use default max_active works instead of one octeontx2-pf: Wait till detach_resources msg is complete octeontx2: Detect the mbox up or down message via register devlink: fix port new reply cmd type tcp: Clear req->syncookie in reqsk_alloc(). net/bnx2x: Prevent access to a freed page in page_pool ...
2024-03-21Merge tag 'kbuild-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Generate a list of built DTB files (arch/*/boot/dts/dtbs-list) - Use more threads when building Debian packages in parallel - Fix warnings shown during the RPM kernel package uninstallation - Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to Makefile - Support GCC's -fmin-function-alignment flag - Fix a null pointer dereference bug in modpost - Add the DTB support to the RPM package - Various fixes and cleanups in Kconfig * tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits) kconfig: tests: test dependency after shuffling choices kconfig: tests: add a test for randconfig with dependent choices kconfig: tests: support KCONFIG_SEED for the randconfig runner kbuild: rpm-pkg: add dtb files in kernel rpm kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig() kconfig: check prompt for choice while parsing kconfig: lxdialog: remove unused dialog colors kconfig: lxdialog: fix button color for blackbg theme modpost: fix null pointer dereference kbuild: remove GCC's default -Wpacked-bitfield-compat flag kbuild: unexport abs_srctree and abs_objtree kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1 kconfig: remove named choice support kconfig: use linked list in get_symbol_str() to iterate over menus kconfig: link menus to a symbol kbuild: fix inconsistent indentation in top Makefile kbuild: Use -fmin-function-alignment when available alpha: merge two entries for CONFIG_ALPHA_GAMMA alpha: merge two entries for CONFIG_ALPHA_EV4 kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj) ...
2024-03-21Merge tag 'nf-24-03-21' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net. There is a larger batch of fixes still pending that will follow up asap, this is what I deemed to be more urgent at this time: 1) Use clone view in pipapo set backend to release elements from destroy path, otherwise it is possible to destroy elements twice. 2) Incorrect check for internal table flags lead to bogus transaction objects. 3) Fix counters memleak in netdev basechain update error path, from Quan Tian. netfilter pull request 24-03-21 * tag 'nf-24-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: Fix a memory leak in nf_tables_updchain netfilter: nf_tables: do not compare internal table flags on updates netfilter: nft_set_pipapo: release elements in clone only from destroy path ==================== Link: https://lore.kernel.org/r/20240321112117.36737-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-21netfilter: nf_tables: Fix a memory leak in nf_tables_updchainQuan Tian
If nft_netdev_register_hooks() fails, the memory associated with nft_stats is not freed, causing a memory leak. This patch fixes it by moving nft_stats_alloc() down after nft_netdev_register_hooks() succeeds. Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Quan Tian <tianquan23@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-20net: report RCU QS on threaded NAPI repollingYan Zhai
NAPI threads can keep polling packets under load. Currently it is only calling cond_resched() before repolling, but it is not sufficient to clear out the holdout of RCU tasks, which prevent BPF tracing programs from detaching for long period. This can be reproduced easily with following set up: ip netns add test1 ip netns add test2 ip -n test1 link add veth1 type veth peer name veth2 netns test2 ip -n test1 link set veth1 up ip -n test1 link set lo up ip -n test2 link set veth2 up ip -n test2 link set lo up ip -n test1 addr add 192.168.1.2/31 dev veth1 ip -n test1 addr add 1.1.1.1/32 dev lo ip -n test2 addr add 192.168.1.3/31 dev veth2 ip -n test2 addr add 2.2.2.2/31 dev lo ip -n test1 route add default via 192.168.1.3 ip -n test2 route add default via 192.168.1.2 for i in `seq 10 210`; do for j in `seq 10 210`; do ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport 5201 done done ip netns exec test2 ethtool -K veth2 gro on ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded' ip netns exec test1 ethtool -K veth1 tso off Then run an iperf3 client/server and a bpftrace script can trigger it: ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null& ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t 100 >/dev/null& bpftrace -e 'kfunc:__napi_poll{@=count();} interval:s:1{exit();}' Report RCU quiescent states periodically will resolve the issue. Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/4c3b0d3f32d3b18949d75b18e5e1d9f13a24f025.1710877680.git.yan@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-21netfilter: nf_tables: do not compare internal table flags on updatesPablo Neira Ayuso
Restore skipping transaction if table update does not modify flags. Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-21netfilter: nft_set_pipapo: release elements in clone only from destroy pathPablo Neira Ayuso
Clone already always provides a current view of the lookup table, use it to destroy the set, otherwise it is possible to destroy elements twice. This fix requires: 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol") which came after: 9827a0e6e23b ("netfilter: nft_set_pipapo: release elements in clone from abort path"). Fixes: 9827a0e6e23b ("netfilter: nft_set_pipapo: release elements in clone from abort path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-19Merge tag 'ipsec-2024-03-19' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-03-19 1) Fix possible page_pool leak triggered by esp_output. From Dragos Tatulea. 2) Fix UDP encapsulation in software GSO path. From Leon Romanovsky. * tag 'ipsec-2024-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Allow UDP encapsulation only in offload modes net: esp: fix bad handling of pages from page_pool ==================== Link: https://lore.kernel.org/r/20240319110151.409825-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-19devlink: fix port new reply cmd typeJiri Pirko
Due to a c&p error, port new reply fills-up cmd with wrong value, any other existing port command replies and notifications. Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW. Skimmed through devlink userspace implementations, none of them cares about this cmd value. Reported-by: Chenyuan Yang <chenyuan0y@gmail.com> Closes: https://lore.kernel.org/all/ZfZcDxGV3tSy4qsV@cy-server/ Fixes: cd76dcd68d96 ("devlink: Support add and delete devlink port") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20240318091908.2736542-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-19tcp: Clear req->syncookie in reqsk_alloc().Kuniyuki Iwashima
syzkaller reported a read of uninit req->syncookie. [0] Originally, req->syncookie was used only in tcp_conn_request() to indicate if we need to encode SYN cookie in SYN+ACK, so the field remains uninitialised in other places. The commit 695751e31a63 ("bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().") added another meaning in ACK path; req->syncookie is set true if SYN cookie is validated by BPF kfunc. After the change, cookie_v[46]_check() always read req->syncookie, but it is not initialised in the normal SYN cookie case as reported by KMSAN. Let's make sure we always initialise req->syncookie in reqsk_alloc(). [0]: BUG: KMSAN: uninit-value in cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477 cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477 tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline] tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914 tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322 ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449 NF_HOOK include/linux/netfilter.h:314 [inline] ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5538 [inline] __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652 process_backlog+0x480/0x8b0 net/core/dev.c:5981 __napi_poll+0xe7/0x980 net/core/dev.c:6632 napi_poll net/core/dev.c:6701 [inline] net_rx_action+0x89d/0x1820 net/core/dev.c:6813 __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554 do_softirq+0x9a/0x100 kernel/softirq.c:455 __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline] __dev_queue_xmit+0x2776/0x52c0 net/core/dev.c:4362 dev_queue_xmit include/linux/netdevice.h:3091 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0x187a/0x1b70 net/ipv4/ip_output.c:235 __ip_finish_output+0x287/0x810 ip_finish_output+0x4b/0x550 net/ipv4/ip_output.c:323 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:450 [inline] ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x1e93/0x2030 net/ipv4/ip_output.c:535 ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:549 __tcp_transmit_skb+0x3c70/0x4890 net/ipv4/tcp_output.c:1462 tcp_transmit_skb net/ipv4/tcp_output.c:1480 [inline] tcp_write_xmit+0x3ee1/0x8900 net/ipv4/tcp_output.c:2792 __tcp_push_pending_frames net/ipv4/tcp_output.c:2977 [inline] tcp_send_fin+0xa90/0x12e0 net/ipv4/tcp_output.c:3578 tcp_shutdown+0x198/0x1f0 net/ipv4/tcp.c:2716 inet_shutdown+0x33f/0x5b0 net/ipv4/af_inet.c:923 __sys_shutdown_sock net/socket.c:2425 [inline] __sys_shutdown net/socket.c:2437 [inline] __do_sys_shutdown net/socket.c:2445 [inline] __se_sys_shutdown+0x2a4/0x440 net/socket.c:2443 __x64_sys_shutdown+0x6c/0xa0 net/socket.c:2443 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was stored to memory at: reqsk_alloc include/net/request_sock.h:148 [inline] inet_reqsk_alloc+0x651/0x7a0 net/ipv4/tcp_input.c:6978 cookie_tcp_reqsk_alloc+0xd4/0x900 net/ipv4/syncookies.c:328 cookie_tcp_check net/ipv4/syncookies.c:388 [inline] cookie_v4_check+0x289f/0x29e0 net/ipv4/syncookies.c:420 tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline] tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914 tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322 ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449 NF_HOOK include/linux/netfilter.h:314 [inline] ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5538 [inline] __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652 process_backlog+0x480/0x8b0 net/core/dev.c:5981 __napi_poll+0xe7/0x980 net/core/dev.c:6632 napi_poll net/core/dev.c:6701 [inline] net_rx_action+0x89d/0x1820 net/core/dev.c:6813 __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554 Uninit was created at: __alloc_pages+0x9a7/0xe00 mm/page_alloc.c:4592 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page mm/slub.c:2175 [inline] allocate_slab mm/slub.c:2338 [inline] new_slab+0x2de/0x1400 mm/slub.c:2391 ___slab_alloc+0x1184/0x33d0 mm/slub.c:3525 __slab_alloc mm/slub.c:3610 [inline] __slab_alloc_node mm/slub.c:3663 [inline] slab_alloc_node mm/slub.c:3835 [inline] kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852 reqsk_alloc include/net/request_sock.h:131 [inline] inet_reqsk_alloc+0x66/0x7a0 net/ipv4/tcp_input.c:6978 tcp_conn_request+0x484/0x44e0 net/ipv4/tcp_input.c:7135 tcp_v4_conn_request+0x16f/0x1d0 net/ipv4/tcp_ipv4.c:1716 tcp_rcv_state_process+0x2e5/0x4bb0 net/ipv4/tcp_input.c:6655 tcp_v4_do_rcv+0xbfd/0x10b0 net/ipv4/tcp_ipv4.c:1929 tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322 ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline] ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:639 ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:674 __netif_receive_skb_list_ptype net/core/dev.c:5581 [inline] __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5629 __netif_receive_skb_list net/core/dev.c:5681 [inline] netif_receive_skb_list_internal+0x106c/0x16f0 net/core/dev.c:5773 gro_normal_list include/net/gro.h:438 [inline] napi_complete_done+0x425/0x880 net/core/dev.c:6113 virtqueue_napi_complete drivers/net/virtio_net.c:465 [inline] virtnet_poll+0x149d/0x2240 drivers/net/virtio_net.c:2211 __napi_poll+0xe7/0x980 net/core/dev.c:6632 napi_poll net/core/dev.c:6701 [inline] net_rx_action+0x89d/0x1820 net/core/dev.c:6813 __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554 CPU: 0 PID: 16792 Comm: syz-executor.2 Not tainted 6.8.0-syzkaller-05562-g61387b8dcf1d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Fixes: 695751e31a63 ("bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/bpf/CANn89iKdN9c+C_2JAUbc+VY3DDQjAQukMtiBbormAmAk9CdvQA@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240315224710.55209-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-19bpf: Allow helper bpf_get_[ns_]current_pid_tgid() for all prog typesYonghong Song
Currently bpf_get_current_pid_tgid() is allowed in tracing, cgroup and sk_msg progs while bpf_get_ns_current_pid_tgid() is only allowed in tracing progs. We have an internal use case where for an application running in a container (with pid namespace), user wants to get the pid associated with the pid namespace in a cgroup bpf program. Currently, cgroup bpf progs already allow bpf_get_current_pid_tgid(). Let us allow bpf_get_ns_current_pid_tgid() as well. With auditing the code, bpf_get_current_pid_tgid() is also used by sk_msg prog. But there are no side effect to expose these two helpers to all prog types since they do not reveal any kernel specific data. The detailed discussion is in [1]. So with this patch, both bpf_get_current_pid_tgid() and bpf_get_ns_current_pid_tgid() are put in bpf_base_func_proto(), making them available to all program types. [1] https://lore.kernel.org/bpf/20240307232659.1115872-1-yonghong.song@linux.dev/ Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240315184854.2975190-1-yonghong.song@linux.dev
2024-03-19Merge tag 's390-6.9-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - Various virtual vs physical address usage fixes - Add new bitwise types and helper functions and use them in s390 specific drivers and code to make it easier to find virtual vs physical address usage bugs. Right now virtual and physical addresses are identical for s390, except for module, vmalloc, and similar areas. This will be changed, hopefully with the next merge window, so that e.g. the kernel image and modules will be located close to each other, allowing for direct branches and also for some other simplifications. As a prerequisite this requires to fix all misuses of virtual and physical addresses. As it turned out people are so used to the concept that virtual and physical addresses are the same, that new bugs got added to code which was already fixed. In order to avoid that even more code gets merged which adds such bugs add and use new bitwise types, so that sparse can be used to find such usage bugs. Most likely the new types can go away again after some time - Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation - Fix kprobe branch handling: if an out-of-line single stepped relative branch instruction has a target address within a certain address area in the entry code, the program check handler may incorrectly execute cleanup code as if KVM code was executed, leading to crashes - Fix reference counting of zcrypt card objects - Various other small fixes and cleanups * tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits) s390/entry: compare gmap asce to determine guest/host fault s390/entry: remove OUTSIDE macro s390/entry: add CIF_SIE flag and remove sie64a() address check s390/cio: use while (i--) pattern to clean up s390/raw3270: make class3270 constant s390/raw3270: improve raw3270_init() readability s390/tape: make tape_class constant s390/vmlogrdr: make vmlogrdr_class constant s390/vmur: make vmur_class constant s390/zcrypt: make zcrypt_class constant s390/mm: provide simple ARCH_HAS_DEBUG_VIRTUAL support s390/vfio_ccw_cp: use new address translation helpers s390/iucv: use new address translation helpers s390/ctcm: use new address translation helpers s390/lcs: use new address translation helpers s390/qeth: use new address translation helpers s390/zfcp: use new address translation helpers s390/tape: fix virtual vs physical address confusion s390/3270: use new address translation helpers s390/3215: use new address translation helpers ...
2024-03-19net/sched: Add module alias for sch_fq_pieMichal Koutný
The commit 2c15a5aee2f3 ("net/sched: Load modules via their alias") starts loading modules via aliases and not canonical names. The new aliases were added in commit 241a94abcf46 ("net/sched: Add module aliases for cls_,sch_,act_ modules") via a Coccinele script. sch_fq_pie.c is missing module.h header and thus Coccinele did not patch it. Add the include and module alias manually, so that autoloading works for sch_fq_pie too. (Note: commit message in commit 241a94abcf46 ("net/sched: Add module aliases for cls_,sch_,act_ modules") was mangled due to '#' misinterpretation. The predicate haskernel is: | @ haskernel @ | @@ | | #include <linux/module.h> | .) Fixes: 241a94abcf46 ("net/sched: Add module aliases for cls_,sch_,act_ modules") Signed-off-by: Michal Koutný <mkoutny@suse.com> Link: https://lore.kernel.org/r/20240315160210.8379-1-mkoutny@suse.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19ipv4: raw: Fix sending packets from raw sockets via IPsec tunnelsTobias Brunner
Since the referenced commit, the xfrm_inner_extract_output() function uses the protocol field to determine the address family. So not setting it for IPv4 raw sockets meant that such packets couldn't be tunneled via IPsec anymore. IPv6 raw sockets are not affected as they already set the protocol since 9c9c9ad5fae7 ("ipv6: set skb->protocol on tcp, raw and ip6_append_data genereated skbs"). Fixes: f4796398f21b ("xfrm: Remove inner/outer modes from output path") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/c5d9a947-eb19-4164-ac99-468ea814ce20@strongswan.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19hsr: Handle failures in module initFelix Maurer
A failure during registration of the netdev notifier was not handled at all. A failure during netlink initialization did not unregister the netdev notifier. Handle failures of netdev notifier registration and netlink initialization. Both functions should only return negative values on failure and thereby lead to the hsr module not being loaded. Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/3ce097c15e3f7ace98fc7fd9bcbf299f092e63d1.1710504184.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19rds: introduce acquire/release ordering in acquire/release_in_xmit()Yewon Choi
acquire/release_in_xmit() work as bit lock in rds_send_xmit(), so they are expected to ensure acquire/release memory ordering semantics. However, test_and_set_bit/clear_bit() don't imply such semantics, on top of this, following smp_mb__after_atomic() does not guarantee release ordering (memory barrier actually should be placed before clear_bit()). Instead, we use clear_bit_unlock/test_and_set_bit_lock() here. Fixes: 0f4b1c7e89e6 ("rds: fix rds_send_xmit() serialization") Fixes: 1f9ecd7eacfd ("RDS: Pass rds_conn_path to rds_send_xmit()") Signed-off-by: Yewon Choi <woni9911@gmail.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://lore.kernel.org/r/ZfQUxnNTO9AJmzwc@libra05 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-19net: move dev->state into net_device_read_txrx groupEric Dumazet
dev->state can be read in rx and tx fast paths. netif_running() which needs dev->state is called from - enqueue_to_backlog() [RX path] - __dev_direct_xmit() [TX path] Fixes: 43a71cd66b9c ("net-device: reorganize net_device fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240314200845.3050179-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-18bpf: Check return from set_memory_rox()Christophe Leroy
arch_protect_bpf_trampoline() and alloc_new_pack() call set_memory_rox() which can fail, leading to unprotected memory. Take into account return from set_memory_rox() function and add __must_check flag to arch_protect_bpf_trampoline(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/fe1c163c83767fde5cab31d209a4a6be3ddb3a73.1710574353.git.christophe.leroy@csgroup.eu Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-18Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets"Abhishek Chauhan
This reverts commit 885c36e59f46375c138de18ff1692f18eff67b7f. The patch currently broke the bpf selftest test_tc_dtime because uapi field __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono with the original fix as the bit was re-used for userspace timestamp as well to avoid tstamp reset in the forwarding path. To solve this we need to keep mono_delivery_time as is and introduce another bit called user_delivery_time and fall back to the initial proposal of setting the user_delivery_time bit based on sk_clockid set from userspace. Fixes: 885c36e59f46 ("net: Re-use and set mono_delivery_time bit for userspace tstamp packets") Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-18xfrm: Allow UDP encapsulation only in offload modesLeon Romanovsky
The missing check of x->encap caused to the situation where GSO packets were created with UDP encapsulation. As a solution return the encap check for non-offloaded SA. Fixes: 983a73da1f99 ("xfrm: Pass UDP encapsulation in TX packet offload") Closes: https://lore.kernel.org/all/a650221ae500f0c7cf496c61c96c1b103dcb6f67.camel@redhat.com Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-03-18net: esp: fix bad handling of pages from page_poolDragos Tatulea
When the skb is reorganized during esp_output (!esp->inline), the pages coming from the original skb fragments are supposed to be released back to the system through put_page. But if the skb fragment pages are originating from a page_pool, calling put_page on them will trigger a page_pool leak which will eventually result in a crash. This leak can be easily observed when using CONFIG_DEBUG_VM and doing ipsec + gre (non offloaded) forwarding: BUG: Bad page state in process ksoftirqd/16 pfn:1451b6 page:00000000de2b8d32 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1451b6000 pfn:0x1451b6 flags: 0x200000000000000(node=0|zone=2) page_type: 0xffffffff() raw: 0200000000000000 dead000000000040 ffff88810d23c000 0000000000000000 raw: 00000001451b6000 0000000000000001 00000000ffffffff 0000000000000000 page dumped because: page_pool leak Modules linked in: ip_gre gre mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay zram zsmalloc fuse [last unloaded: mlx5_core] CPU: 16 PID: 96 Comm: ksoftirqd/16 Not tainted 6.8.0-rc4+ #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x36/0x50 bad_page+0x70/0xf0 free_unref_page_prepare+0x27a/0x460 free_unref_page+0x38/0x120 esp_ssg_unref.isra.0+0x15f/0x200 esp_output_tail+0x66d/0x780 esp_xmit+0x2c5/0x360 validate_xmit_xfrm+0x313/0x370 ? validate_xmit_skb+0x1d/0x330 validate_xmit_skb_list+0x4c/0x70 sch_direct_xmit+0x23e/0x350 __dev_queue_xmit+0x337/0xba0 ? nf_hook_slow+0x3f/0xd0 ip_finish_output2+0x25e/0x580 iptunnel_xmit+0x19b/0x240 ip_tunnel_xmit+0x5fb/0xb60 ipgre_xmit+0x14d/0x280 [ip_gre] dev_hard_start_xmit+0xc3/0x1c0 __dev_queue_xmit+0x208/0xba0 ? nf_hook_slow+0x3f/0xd0 ip_finish_output2+0x1ca/0x580 ip_sublist_rcv_finish+0x32/0x40 ip_sublist_rcv+0x1b2/0x1f0 ? ip_rcv_finish_core.constprop.0+0x460/0x460 ip_list_rcv+0x103/0x130 __netif_receive_skb_list_core+0x181/0x1e0 netif_receive_skb_list_internal+0x1b3/0x2c0 napi_gro_receive+0xc8/0x200 gro_cell_poll+0x52/0x90 __napi_poll+0x25/0x1a0 net_rx_action+0x28e/0x300 __do_softirq+0xc3/0x276 ? sort_range+0x20/0x20 run_ksoftirqd+0x1e/0x30 smpboot_thread_fn+0xa6/0x130 kthread+0xcd/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x31/0x50 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> The suggested fix is to introduce a new wrapper (skb_page_unref) that covers page refcounting for page_pool pages as well. Cc: stable@vger.kernel.org Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") Reported-and-tested-by: Anatoli N.Chechelnickiy <Anatoli.Chechelnickiy@m.interpipe.biz> Reported-by: Ian Kumlien <ian.kumlien@gmail.com> Link: https://lore.kernel.org/netdev/CAA85sZvvHtrpTQRqdaOx6gd55zPAVsqMYk_Lwh4Md5knTq7AyA@mail.gmail.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-03-18packet: annotate data-races around ignore_outgoingEric Dumazet
ignore_outgoing is read locklessly from dev_queue_xmit_nit() and packet_getsockopt() Add appropriate READ_ONCE()/WRITE_ONCE() annotations. syzbot reported: BUG: KCSAN: data-race in dev_queue_xmit_nit / packet_setsockopt write to 0xffff888107804542 of 1 bytes by task 22618 on cpu 0: packet_setsockopt+0xd83/0xfd0 net/packet/af_packet.c:4003 do_sock_setsockopt net/socket.c:2311 [inline] __sys_setsockopt+0x1d8/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2340 do_syscall_64+0xd3/0x1d0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 read to 0xffff888107804542 of 1 bytes by task 27 on cpu 1: dev_queue_xmit_nit+0x82/0x620 net/core/dev.c:2248 xmit_one net/core/dev.c:3527 [inline] dev_hard_start_xmit+0xcc/0x3f0 net/core/dev.c:3547 __dev_queue_xmit+0xf24/0x1dd0 net/core/dev.c:4335 dev_queue_xmit include/linux/netdevice.h:3091 [inline] batadv_send_skb_packet+0x264/0x300 net/batman-adv/send.c:108 batadv_send_broadcast_skb+0x24/0x30 net/batman-adv/send.c:127 batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:392 [inline] batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:420 [inline] batadv_iv_send_outstanding_bat_ogm_packet+0x3f0/0x4b0 net/batman-adv/bat_iv_ogm.c:1700 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0x465/0x990 kernel/workqueue.c:3335 worker_thread+0x526/0x730 kernel/workqueue.c:3416 kthread+0x1d1/0x210 kernel/kthread.c:388 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 value changed: 0x00 -> 0x01 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 27 Comm: kworker/u8:1 Tainted: G W 6.8.0-syzkaller-08073-g480e035fc4c7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet Fixes: fa788d986a3a ("packet: add sockopt to ignore outgoing packets") Reported-by: syzbot+c669c1136495a2e7c31f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/CANn89i+Z7MfbkBLOv=p7KZ7=K1rKHO4P1OL5LYDCtBiyqsa9oQ@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-16Merge tag 'nfs-for-6.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Bugfixes: - Fix for an Oops in the NFSv4.2 listxattr handler - Correct an incorrect buffer size in listxattr - Fix for an Oops in the pNFS flexfiles layout - Fix a refcount leak in NFS O_DIRECT writes - Fix missing locking in NFS O_DIRECT - Avoid an infinite loop in pnfs_update_layout - Fix an overflow in the RPC waitqueue queue length counter - Ensure that pNFS I/O is also protected by TLS when xprtsec is specified by the mount options - Fix a leaked folio lock in the netfs read code - Fix a potential deadlock in fscache - Allow setting the fscache uniquifier in NFSv4 - Fix an off by one in root_nfs_cat() - Fix another off by one in rpc_sockaddr2uaddr() - nfs4_do_open() can incorrectly trigger state recovery - Various fixes for connection shutdown Features and cleanups: - Ensure that containers only see their own RPC and NFS stats - Enable nconnect for RDMA - Remove dead code from nfs_writepage_locked() - Various tracepoint additions to track EXCHANGE_ID, GETDEVICEINFO, and mount options" * tag 'nfs-for-6.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits) nfs: fix panic when nfs4_ff_layout_prepare_ds() fails NFS: trace the uniquifier of fscache NFS: Read unlock folio on nfs_page_create_from_folio() error NFS: remove unused variable nfs_rpcstat nfs: fix UAF in direct writes nfs: properly protect nfs_direct_req fields NFS: enable nconnect for RDMA NFSv4: nfs4_do_open() is incorrectly triggering state recovery NFS: avoid infinite loop in pnfs_update_layout. NFS: remove sync_mode test from nfs_writepage_locked() NFSv4.1/pnfs: fix NFS with TLS in pnfs NFS: Fix an off by one in root_nfs_cat() nfs: make the rpc_stat per net namespace nfs: expose /proc/net/sunrpc/nfs in net namespaces sunrpc: add a struct rpc_stats arg to rpc_create_args nfs: remove unused NFS_CALL macro NFSv4.1: add tracepoint to trunked nfs4_exchange_id calls NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int nfs: fix regression in handling of fsc= option in NFSv4 ...
2024-03-14Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...
2024-03-14net: remove {revc,send}msg_copy_msghdr() from exportsJens Axboe
The only user of these was io_uring, and it's not using them anymore. Make them static and remove them from the socket header file. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/1b6089d3-c1cf-464a-abd3-b0f0b6bb2523@kernel.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-14rxrpc: Fix error check on ->alloc_txbuf()David Howells
rxrpc_alloc_*_txbuf() and ->alloc_txbuf() return NULL to indicate no memory, but rxrpc_send_data() uses IS_ERR(). Fix rxrpc_send_data() to check for NULL only and set -ENOMEM if it sees that. Fixes: 49489bb03a50 ("rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags") Signed-off-by: David Howells <dhowells@redhat.com> Reported-by: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>