summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-03-07wifi: mac80211: add support for driver adding radiotap TLVsMordechay Goodstein
The new TLV format enables adding TLVs after the fixed fields in radiotap, as part of the radiotap header. Support this and move vendor data to the TLV format, allowing a reuse of the RX_FLAG_RADIOTAP_VENDOR_DATA as the new RX_FLAG_RADIOTAP_TLV_AT_END flag. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.b18fd5da8477.I576400ec40a7b35ef97a3b09a99b3a49e9174786@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07netfilter: conntrack: adopt safer max chain lengthEric Dumazet
Customers using GKE 1.25 and 1.26 are facing conntrack issues root caused to commit c9c3b6811f74 ("netfilter: conntrack: make max chain length random"). Even if we assume Uniform Hashing, a bucket often reachs 8 chained items while the load factor of the hash table is smaller than 0.5 With a limit of 16, we reach load factors of 3. With a limit of 32, we reach load factors of 11. With a limit of 40, we reach load factors of 15. With a limit of 50, we reach load factors of 24. This patch changes MIN_CHAINLEN to 50, to minimize risks. Ideally, we could in the future add a cushion based on expected load factor (2 * nf_conntrack_max / nf_conntrack_buckets), because some setups might expect unusual values. Fixes: c9c3b6811f74 ("netfilter: conntrack: make max chain length random") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-03-07wifi: mac80211: fix ieee80211_link_set_associated() typeJohannes Berg
The return type here should be u64 for the flags, even if it doesn't matter right now because it doesn't return any flags that don't fit into u32. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.d67ccae57d60.Ia4768e547ba8b1deb2b84ce3bbfbe216d5bfff6a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: simplify reasoning about EHT capa handlingJohannes Berg
Given the code in cfg80211, EHT capa cannot be non-NULL when HE capa is NULL, but it's easier to reason about it if both are checked and the compiler will likely integrate the check with the previous one for HE capa anyway. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.7413d50d23bc.I6fef7484721be9bd5364f64921fc5e9168495f62@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: mlme: remove pointless sta checkJohannes Berg
We already exited the function if sta ended up NULL, so just remove the extra check. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.4cbac9cfd03a.I21ec81c96d246afdabc2b0807d3856e6b1182cb7@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add netdev per-link debugfs data and driver hookBenjamin Berg
This adds the infrastructure to have netdev specific per-link data both for mac80211 and the driver in debugfs. For the driver, a new callback is added which is only used if MLO is supported. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.fb4c947e4df8.I69b3516ddf4c8a7501b395f652d6063444ecad63@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: remove SMPS from AP debugfsBenjamin Berg
The spatial multiplexing power save feature does not apply to AP mode. Remove it from debugfs in this case. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.01b167027dd5.Iee69f2e4df98581f259ef2c76309b940b20174be@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add pointer from bss_conf to vifBenjamin Berg
While often not needed, this considerably simplifies going from a link specific bss_config to the vif. This helps with e.g. creating link specific debugfs entries inside drivers. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.46f701a10ed5.I20390b2a8165ff222d66585915689206ea93222b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: warn only once on AP probeJohannes Berg
We should perhaps support this API for MLO, but it's not clear that it makes sense, in any case then we'd have to update it to probe the correct BSS. For now, if it happens, warn only once so that we don't get flooded with messages if the driver misbehaves and calls this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.1c8499b6fbe6.I1a76a2be3b42ff93904870ac069f0319507adc23@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: cfg80211/mac80211: report link ID on control port RXJohannes Berg
For control port RX, report the link ID for MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.fe06dfc3791b.Iddcab94789cafe336417be406072ce8a6312fc2d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add support for set_hw_timestamp commandAvraham Stern
Support the set_hw_timestamp callback for enabling and disabling HW timestamping if the low level driver supports it. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.700ded7badde.Ib2f7c228256ce313a04d3d9f9ecc6c7b9aa602bb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: nl80211: add a command to enable/disable HW timestampingAvraham Stern
Add a command to enable and disable HW timestamping of TM and FTM frames. HW timestamping can be enabled for a specific mac address or for all addresses. The low level driver will indicate how many peers HW timestamping can be enabled concurrently, and this information will be passed to userspace. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.05678d7b1c17.Iccc08869ea8156f1c71a3111a47f86dd56234bd0@changeid [switch to needing netdev UP, minor edits] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: wireless: cleanup unused function parametersMordechay Goodstein
In the past ftype was used for deciding about 6G DUP beacon, but the logic was removed and ftype is not needed anymore. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.98d4761b809b.I255f5ecd77cb24fcf2f1641bb5833ea2d121296e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: wireless: correct primary channel validation on 6 GHzMordechay Goodstein
The check that beacon primary channel is in the range of 80 MHz (abs < 80) is invalid for 320 MHz since duplicate beacon transmit means that the AP transmits it on all the 20 MHz sub-channels: 9.4.2.249 HE Operation element - ... AP transmits Beacon frames in non-HT duplicate PPDU with a TXVECTOR parameter CH_BANDWIDTH value that is up to the BSS bandwidth. So in case of 320 MHz the DUP beacon can be in upper 160 for primary channel in the lower 160 giving possibly an absolute range of over 80 MHz. Also this check is redundant alltogether, if AP has a wrong primary channel in the beacon it's a faulty AP, and we would fail in next steps to connect. While at it, fix the frequency comparison to no longer compare between KHz and MHz, which was introduced by commit 7f599aeccbd2 ("cfg80211: Use the HE operation IE to determine a 6GHz BSS channel"). Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.314faf725255.I5e27251ac558297553b590d3917a7b6d1aae0e74@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: wireless: return primary channel regardless of DUPMordechay Goodstein
Currently in case DUP bit is not set we don't return the primary channel for 6 GHz Band, but the spec says that the DUP bit has no effect on this field: 9.4.2.249 HE Operation element: The Duplicate Beacon subfield is set to 1 if the AP transmits Beacon frames in non-HT duplicate PPDU with a TXVECTOR parameter CH_BANDWIDTH value that is up to the BSS bandwidth and is set to 0 otherwise. So remove the condition for returning primary channel based on DUP. Since the caller code already marks the signal as invalid in case the indicated frequency is not the tuned frequency, there's no need to additionally handle this case here since that's already true for duplicated beacons on the non-primary channel(s). Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.66d7f05f7d11.I5e0add054f72ede95611391b99804c61c40cc959@changeid [clarify commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: allow beacon protection HW offloadJohannes Berg
In case of beacon protection, check if the key was offloaded to the hardware and in that case set control.hw_key so that the encryption function will see it and only do the needed steps that aren't done in hardware. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.b2becd9a22fb.I6c0b9c50c6a481128ba912a11cb7afc92c4b6da7@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: check key taint for beacon protectionJohannes Berg
This will likely never happen, but for completeness check the key taint flag before using a key for beacon protection. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.cf2c3fee6f1f.I2f19b3e04e31c99bed3c9dc71935bf513b2cd177@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: clear all bits that relate rtap fields on skbMordechay Goodstein
Since we remove radiotap from skb data, clear all RX_FLAG_X related info that indicate info on the skb data. Also we need to do it only once so remove the clear from cooked_monitor. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.74d3efe19eae.Ie17a35864d2e120f9858516a2e3d3047d83cf805@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: adjust scan cancel comment/checkJohannes Berg
Instead of the comment about holding RTNL, which is now wrong, add a proper lockdep assertion for the wiphy mutex. Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.84352e46f342.Id90fef8c581cebe19cb30274340cf43885d55c74@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-06Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-03-06 We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-). The main changes are: 1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong. 2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko. 3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov. 4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi. 5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman. 6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller. 7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo. 8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo. 9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong. 10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang. 11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen. 12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi. 13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui. 14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song. 15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest. 16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao. 17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich. 18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ==================== Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-03-06 We've added 8 non-merge commits during the last 7 day(s) which contain a total of 9 files changed, 64 insertions(+), 18 deletions(-). The main changes are: 1) Fix BTF resolver for DATASEC sections when a VAR points at a modifier, that is, keep resolving such instances instead of bailing out, from Lorenz Bauer. 2) Fix BPF test framework with regards to xdp_frame info misplacement in the "live packet" code, from Alexander Lobakin. 3) Fix an infinite loop in BPF sockmap code for TCP/UDP/AF_UNIX, from Liu Jian. 4) Fix a build error for riscv BPF JIT under PERF_EVENTS=n, from Randy Dunlap. 5) Several BPF doc fixes with either broken links or external instead of internal doc links, from Bagas Sanjaya. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: check that modifier resolves after pointer btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES bpf, doc: Link to submitting-patches.rst for general patch submission info bpf, doc: Do not link to docs.kernel.org for kselftest link bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() riscv, bpf: Fix patch_text implicit declaration bpf, docs: Fix link to BTF doc ==================== Link: https://lore.kernel.org/r/20230306215944.11981-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06net: tls: fix device-offloaded sendpage straddling recordsJakub Kicinski
Adrien reports that incorrect data is transmitted when a single page straddles multiple records. We would transmit the same data in all iterations of the loop. Reported-by: Adrien Moulin <amoulin@corp.free.fr> Link: https://lore.kernel.org/all/61481278.42813558.1677845235112.JavaMail.zimbra@corp.free.fr Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()") Tested-by: Adrien Moulin <amoulin@corp.free.fr> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com> Link: https://lore.kernel.org/r/20230304192610.3818098-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMESAlexander Lobakin
&xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed *before* the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-06netfilter: tproxy: fix deadlock due to missing BH disableFlorian Westphal
The xtables packet traverser performs an unconditional local_bh_disable(), but the nf_tables evaluation loop does not. Functions that are called from either xtables or nftables must assume that they can be called in process context. inet_twsk_deschedule_put() assumes that no softirq interrupt can occur. If tproxy is used from nf_tables its possible that we'll deadlock trying to aquire a lock already held in process context. Add a small helper that takes care of this and use it. Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/ Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Reported-and-tested-by: Major Dávid <major.david@balasys.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-03-06netfilter: ctnetlink: revert to dumping mark regardless of event typeIvan Delalande
It seems that change was unintentional, we have userspace code that needs the mark while listening for events like REPLY, DESTROY, etc. Also include 0-marks in requested dumps, as they were before that fix. Fixes: 1feeae071507 ("netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark") Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-03-03bpf: allow ctx writes using BPF_ST_MEM instructionEduard Zingerman
Lift verifier restriction to use BPF_ST_MEM instructions to write to context data structures. This requires the following changes: - verifier.c:do_check() for BPF_ST updated to: - no longer forbid writes to registers of type PTR_TO_CTX; - track dst_reg type in the env->insn_aux_data[...].ptr_type field (same way it is done for BPF_STX and BPF_LDX instructions). - verifier.c:convert_ctx_access() and various callbacks invoked by it are updated to handled BPF_ST instruction alongside BPF_STX. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03bpf: Introduce kptr_rcu.Alexei Starovoitov
The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU. Derefrence of other kptr-s returns PTR_UNTRUSTED. For example: struct map_value { struct cgroup __kptr *cgrp; }; SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path) { struct cgroup *cg, *cg2; cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0 bpf_kptr_xchg(&v->cgrp, cg); cg2 = v->cgrp; // This is new feature introduced by this patch. // cg2 is PTR_MAYBE_NULL | MEM_RCU. // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero if (cg2) bpf_cgroup_ancestor(cg2, level); // safe to do. } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
2023-03-03bpf, sockmap: Fix an infinite loop error when len is 0 in ↵Liu Jian
tcp_bpf_recvmsg_parser() When the buffer length of the recvmsg system call is 0, we got the flollowing soft lockup problem: watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [a.out:6149] CPU: 3 PID: 6149 Comm: a.out Kdump: loaded Not tainted 6.2.0+ #30 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:remove_wait_queue+0xb/0xc0 Code: 5e 41 5f c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 <41> 56 41 55 41 54 55 48 89 fd 53 48 89 f3 4c 8d 6b 18 4c 8d 73 20 RSP: 0018:ffff88811b5978b8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88811a7d3780 RCX: ffffffffb7a4d768 RDX: dffffc0000000000 RSI: ffff88811b597908 RDI: ffff888115408040 RBP: 1ffff110236b2f1b R08: 0000000000000000 R09: ffff88811a7d37e7 R10: ffffed10234fa6fc R11: 0000000000000001 R12: ffff88811179b800 R13: 0000000000000001 R14: ffff88811a7d38a8 R15: ffff88811a7d37e0 FS: 00007f6fb5398740(0000) GS:ffff888237180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000010b6ba002 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_msg_wait_data+0x279/0x2f0 tcp_bpf_recvmsg_parser+0x3c6/0x490 inet_recvmsg+0x280/0x290 sock_recvmsg+0xfc/0x120 ____sys_recvmsg+0x160/0x3d0 ___sys_recvmsg+0xf0/0x180 __sys_recvmsg+0xea/0x1a0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc The logic in tcp_bpf_recvmsg_parser is as follows: msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { wait data; goto msg_bytes_ready; } In this case, "copied" always is 0, the infinite loop occurs. According to the Linux system call man page, 0 should be returned in this case. Therefore, in tcp_bpf_recvmsg_parser(), if the length is 0, directly return. Also modify several other functions with the same problem. Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap") Fixes: 9825d866ce0d ("af_unix: Implement unix_dgram_bpf_recvmsg()") Fixes: c5d2177a72a1 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230303080946.1146638-1-liujian56@huawei.com
2023-03-02bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program typesTejun Heo
These helpers are safe to call from any context and there's no reason to restrict access to them. Remove them from bpf_trace and filter lists and add to bpf_base_func_proto() under perfmon_capable(). v2: After consulting with Andrii, relocated in bpf_base_func_proto() so that they require bpf_capable() but not perfomon_capable() as it doesn't read from or affect others on the system. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/ZAD8QyoszMZiTzBY@slm.duckdns.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02Merge tag 'ieee802154-for-net-2023-03-02' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan Stefan Schmidt says: ==================== ieee802154 for net 2023-03-02 Two small fixes this time. Alexander Aring fixed a potential negative array access in the ca8210 driver. Miquel Raynal fixed a crash that could have been triggered through the extended netlink API for 802154. This only came in this merge window. Found by syzkaller. * tag 'ieee802154-for-net-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan: ieee802154: Prevent user from crashing the host ca8210: fix mac_len negative array access ==================== Link: https://lore.kernel.org/r/20230302153032.1312755-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-02net: caif: Fix use-after-free in cfusbl_device_notify()Shigeru Yoshida
syzbot reported use-after-free in cfusbl_device_notify() [1]. This causes a stack trace like below: BUG: KASAN: use-after-free in cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138 Read of size 8 at addr ffff88807ac4e6f0 by task kworker/u4:6/1214 CPU: 0 PID: 1214 Comm: kworker/u4:6 Not tainted 5.19.0-rc3-syzkaller-00146-g92f20ff72066 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313 print_report mm/kasan/report.c:429 [inline] kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491 cfusbl_device_notify+0x7c9/0x870 net/caif/caif_usb.c:138 notifier_call_chain+0xb5/0x200 kernel/notifier.c:87 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1945 call_netdevice_notifiers_extack net/core/dev.c:1983 [inline] call_netdevice_notifiers net/core/dev.c:1997 [inline] netdev_wait_allrefs_any net/core/dev.c:10227 [inline] netdev_run_todo+0xbc0/0x10f0 net/core/dev.c:10341 default_device_exit_batch+0x44e/0x590 net/core/dev.c:11334 ops_exit_list+0x125/0x170 net/core/net_namespace.c:167 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 </TASK> When unregistering a net device, unregister_netdevice_many_notify() sets the device's reg_state to NETREG_UNREGISTERING, calls notifiers with NETDEV_UNREGISTER, and adds the device to the todo list. Later on, devices in the todo list are processed by netdev_run_todo(). netdev_run_todo() waits devices' reference count become 1 while rebdoadcasting NETDEV_UNREGISTER notification. When cfusbl_device_notify() is called with NETDEV_UNREGISTER multiple times, the parent device might be freed. This could cause UAF. Processing NETDEV_UNREGISTER multiple times also causes inbalance of reference count for the module. This patch fixes the issue by accepting only first NETDEV_UNREGISTER notification. Fixes: 7ad65bf68d70 ("caif: Add support for CAIF over CDC NCM USB interface") CC: sjur.brandeland@stericsson.com <sjur.brandeland@stericsson.com> Reported-by: syzbot+b563d33852b893653a9e@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=c3bfd8e2450adab3bffe4d80821fbbced600407f [1] Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://lore.kernel.org/r/20230301163913.391304-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-02ieee802154: Prevent user from crashing the hostMiquel Raynal
Avoid crashing the machine by checking info->attrs[NL802154_ATTR_SCAN_TYPE] presence before de-referencing it, which was the primary intend of the blamed patch. Reported-by: Sanan Hasanov <sanan.hasanov@Knights.ucf.edu> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: a0b6106672b5 ("ieee802154: Convert scan error messages to extack") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230301154450.547716-1-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-03-02net: use indirect calls helpers for sk_exit_memory_pressure()Brian Vazquez
Florian reported a regression and sent a patch with the following changelog: <quote> There is a noticeable tcp performance regression (loopback or cross-netns), seen with iperf3 -Z (sendfile mode) when generic retpolines are needed. With SK_RECLAIM_THRESHOLD checks gone number of calls to enter/leave memory pressure happen much more often. For TCP indirect calls are used. We can't remove the if-set-return short-circuit check in tcp_enter_memory_pressure because there are callers other than sk_enter_memory_pressure. Doing a check in the sk wrapper too reduces the indirect calls enough to recover some performance. Before, 0.00-60.00 sec 322 GBytes 46.1 Gbits/sec receiver After: 0.00-60.04 sec 359 GBytes 51.4 Gbits/sec receiver "iperf3 -c $peer -t 60 -Z -f g", connected via veth in another netns. </quote> It seems we forgot to upstream this indirect call mitigation we had for years, lets do this instead. [edumazet] - It seems we forgot to upstream this indirect call mitigation we had for years, let's do this instead. - Changed to INDIRECT_CALL_INET_1() to avoid bots reports. Fixes: 4890b686f408 ("net: keep sk->sk_forward_alloc as small as possible") Reported-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/netdev/20230227152741.4a53634b@kernel.org/T/ Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230301133247.2346111-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfPaolo Abeni
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix bogus error report in selftests/netfilter/nft_nat.sh, from Hangbin Liu. 2) Initialize last and quota expressions from template when expr_ops::clone is called, otherwise, states are not restored accordingly when loading a dynamic set with elements using these two expressions. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_quota: copy content when cloning expression netfilter: nft_last: copy content when cloning expression selftests: nft_nat: ensuring the listening side is up before starting the client ==================== Link: https://lore.kernel.org/r/20230301222021.154670-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-01net: tls: avoid hanging tasks on the tx_lockJakub Kicinski
syzbot sent a hung task report and Eric explains that adversarial receiver may keep RWIN at 0 for a long time, so we are not guaranteed to make forward progress. Thread which took tx_lock and went to sleep may not release tx_lock for hours. Use interruptible sleep where possible and reschedule the work if it can't take the lock. Testing: existing selftest passes Reported-by: syzbot+9c0268252b8ef967c62e@syzkaller.appspotmail.com Fixes: 79ffe6087e91 ("net/tls: add a TX lock") Link: https://lore.kernel.org/all/000000000000e412e905f5b46201@google.com/ Cc: stable@vger.kernel.org # wait 4 weeks Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230301002857.2101894-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-01net: tls: fix possible race condition between do_tls_getsockopt_conf() and ↵Hangyu Hua
do_tls_setsockopt_conf() ctx->crypto_send.info is not protected by lock_sock in do_tls_getsockopt_conf(). A race condition between do_tls_getsockopt_conf() and error paths of do_tls_setsockopt_conf() may lead to a use-after-free or null-deref. More discussion: https://lore.kernel.org/all/Y/ht6gQL+u6fj3dG@hog/ Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20230228023344.9623-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-01Merge tag 'nfsd-6.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Make new GSS Kerberos Kunit tests work on non-x86 platforms * tag 'nfsd-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Properly terminate test case arrays SUNRPC: Let Kunit tests run with some enctypes compiled out
2023-03-01bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwrJoanne Koong
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr. The user must pass in a buffer to store the contents of the data slice if a direct pointer to the data cannot be obtained. For skb and xdp type dynptrs, these two APIs are the only way to obtain a data slice. However, for other types of dynptrs, there is no difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data. For skb type dynptrs, the data is copied into the user provided buffer if any of the data is not in the linear portion of the skb. For xdp type dynptrs, the data is copied into the user provided buffer if the data is between xdp frags. If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then the skb will be uncloned (see bpf_unclone_prologue()). Please note that any bpf_dynptr_write() automatically invalidates any prior data slices of the skb dynptr. This is because the skb may be cloned or may need to pull its paged buffer into the head. As such, any bpf_dynptr_write() will automatically have its prior data slices invalidated, even if the write is to data in the skb head of an uncloned skb. Please note as well that any other helper calls that change the underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data slices of the skb dynptr as well, for the same reasons. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add xdp dynptrsJoanne Koong
Add xdp dynptrs, which are dynptrs whose underlying pointer points to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of xdp->data and xdp->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For reads and writes on the dynptr, this includes reading/writing from/to and across fragments. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() should be used. For examples of how xdp dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add skb dynptrsJoanne Koong
Add skb dynptrs, which are dynptrs whose underlying pointer points to a skb. The dynptr acts on skb data. skb dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of skb->data and skb->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For bpf prog types that don't support writes on skb data, the dynptr is read-only (bpf_dynptr_write() will return an error) For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write() interfaces, reading and writing from/to data in the head as well as from/to non-linear paged buffers is supported. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used. For examples of how skb dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01Merge tag '9p-6.3-for-linus-part1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p updates from Eric Van Hensbergen: - some fixes and cleanup setting up for a larger set of performance patches I've been working on - a contributed fixes relating to 9p/rdma - some contributed fixes relating to 9p/xen * tag '9p-6.3-for-linus-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: fix error reporting in v9fs_dir_release net/9p: fix bug in client create for .L 9p/rdma: unmap receive dma buffer in rdma_request()/post_recv() 9p/xen: fix connection sequence 9p/xen: fix version parsing fs/9p: Expand setup of writeback cache to all levels net/9p: Adjust maximum MSIZE to account for p9 header
2023-03-01netfilter: nft_quota: copy content when cloning expressionPablo Neira Ayuso
If the ruleset contains consumed quota, restore them accordingly. Otherwise, listing after restoration shows never used items. Restore the user-defined quota and flags too. Fixes: ed0a0c60f0e5 ("netfilter: nft_quota: move stateful fields out of expression data") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-03-01netfilter: nft_last: copy content when cloning expressionPablo Neira Ayuso
If the ruleset contains last timestamps, restore them accordingly. Otherwise, listing after restoration shows never used items. Fixes: 33a24de37e81 ("netfilter: nft_last: move stateful fields out of expression data") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-03-01net/sched: flower: fix fl_change() error recovery pathEric Dumazet
The two "goto errout;" paths in fl_change() became wrong after cited commit. Indeed we only must not call __fl_put() until the net pointer has been set in tcf_exts_init_ex() This is a minimal fix. We might in the future validate TCA_FLOWER_FLAGS before we allocate @fnew. BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:72 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] BUG: KASAN: null-ptr-deref in refcount_read include/linux/refcount.h:147 [inline] BUG: KASAN: null-ptr-deref in __refcount_add_not_zero include/linux/refcount.h:152 [inline] BUG: KASAN: null-ptr-deref in __refcount_inc_not_zero include/linux/refcount.h:227 [inline] BUG: KASAN: null-ptr-deref in refcount_inc_not_zero include/linux/refcount.h:245 [inline] BUG: KASAN: null-ptr-deref in maybe_get_net include/net/net_namespace.h:269 [inline] BUG: KASAN: null-ptr-deref in tcf_exts_get_net include/net/pkt_cls.h:260 [inline] BUG: KASAN: null-ptr-deref in __fl_put net/sched/cls_flower.c:513 [inline] BUG: KASAN: null-ptr-deref in __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508 Read of size 4 at addr 000000000000014c by task syz-executor548/5082 CPU: 0 PID: 5082 Comm: syz-executor548 Not tainted 6.2.0-syzkaller-05251-g5b7c4cabbb65 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_report mm/kasan/report.c:420 [inline] kasan_report+0xec/0x130 mm/kasan/report.c:517 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x141/0x190 mm/kasan/generic.c:189 instrument_atomic_read include/linux/instrumented.h:72 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] refcount_read include/linux/refcount.h:147 [inline] __refcount_add_not_zero include/linux/refcount.h:152 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] maybe_get_net include/net/net_namespace.h:269 [inline] tcf_exts_get_net include/net/pkt_cls.h:260 [inline] __fl_put net/sched/cls_flower.c:513 [inline] __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508 fl_change+0x101b/0x4ab0 net/sched/cls_flower.c:2341 tc_new_tfilter+0x97c/0x2290 net/sched/cls_api.c:2310 rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 ____sys_sendmsg+0x334/0x900 net/socket.c:2504 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2558 __sys_sendmmsg+0x18f/0x460 net/socket.c:2644 __do_sys_sendmmsg net/socket.c:2673 [inline] __se_sys_sendmmsg net/socket.c:2670 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2670 Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier") Reported-by: syzbot+baabf3efa7c1e57d28b2@syzkaller.appspotmail.com Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paul Blakey <paulb@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-01ila: do not generate empty messages in ila_xlat_nl_cmd_get_mapping()Eric Dumazet
ila_xlat_nl_cmd_get_mapping() generates an empty skb, triggerring a recent sanity check [1]. Instead, return an error code, so that user space can get it. [1] skb_assert_len WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 skb_assert_len include/linux/skbuff.h:2527 [inline] WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156 Modules linked in: CPU: 0 PID: 5923 Comm: syz-executor269 Not tainted 6.2.0-syzkaller-18300-g2ebd1fbb946d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_assert_len include/linux/skbuff.h:2527 [inline] pc : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156 lr : skb_assert_len include/linux/skbuff.h:2527 [inline] lr : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156 sp : ffff80001e0d6c40 x29: ffff80001e0d6e60 x28: dfff800000000000 x27: ffff0000c86328c0 x26: dfff800000000000 x25: ffff0000c8632990 x24: ffff0000c8632a00 x23: 0000000000000000 x22: 1fffe000190c6542 x21: ffff0000c8632a10 x20: ffff0000c8632a00 x19: ffff80001856e000 x18: ffff80001e0d5fc0 x17: 0000000000000000 x16: ffff80001235d16c x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000001 x11: ff80800008353a30 x10: 0000000000000000 x9 : 21567eaf25bfb600 x8 : 21567eaf25bfb600 x7 : 0000000000000001 x6 : 0000000000000001 x5 : ffff80001e0d6558 x4 : ffff800015c74760 x3 : ffff800008596744 x2 : 0000000000000001 x1 : 0000000100000000 x0 : 000000000000000e Call trace: skb_assert_len include/linux/skbuff.h:2527 [inline] __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156 dev_queue_xmit include/linux/netdevice.h:3033 [inline] __netlink_deliver_tap_skb net/netlink/af_netlink.c:307 [inline] __netlink_deliver_tap+0x45c/0x6f8 net/netlink/af_netlink.c:325 netlink_deliver_tap+0xf4/0x174 net/netlink/af_netlink.c:338 __netlink_sendskb net/netlink/af_netlink.c:1283 [inline] netlink_sendskb+0x6c/0x154 net/netlink/af_netlink.c:1292 netlink_unicast+0x334/0x8d4 net/netlink/af_netlink.c:1380 nlmsg_unicast include/net/netlink.h:1099 [inline] genlmsg_unicast include/net/genetlink.h:433 [inline] genlmsg_reply include/net/genetlink.h:443 [inline] ila_xlat_nl_cmd_get_mapping+0x620/0x7d0 net/ipv6/ila/ila_xlat.c:493 genl_family_rcv_msg_doit net/netlink/genetlink.c:968 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline] genl_rcv_msg+0x938/0xc1c net/netlink/genetlink.c:1065 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2574 genl_rcv+0x38/0x50 net/netlink/genetlink.c:1076 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x800/0xae0 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x558/0x844 net/socket.c:2479 ___sys_sendmsg net/socket.c:2533 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2562 __do_sys_sendmsg net/socket.c:2571 [inline] __se_sys_sendmsg net/socket.c:2569 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2569 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52 el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193 el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 irq event stamp: 136484 hardirqs last enabled at (136483): [<ffff800008350244>] __up_console_sem+0x60/0xb4 kernel/printk/printk.c:345 hardirqs last disabled at (136484): [<ffff800012358d60>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:405 softirqs last enabled at (136418): [<ffff800008020ea8>] softirq_handle_end kernel/softirq.c:414 [inline] softirqs last enabled at (136418): [<ffff800008020ea8>] __do_softirq+0xd4c/0xfa4 kernel/softirq.c:600 softirqs last disabled at (136371): [<ffff80000802b4a4>] ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80 ---[ end trace 0000000000000000 ]--- skb len=0 headroom=0 headlen=0 tailroom=192 mac=(0,0) net=(0,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0010 pkttype=6 iif=0 dev name=nlmon0 feat=0x0000000000005861 Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-01net/sched: act_connmark: handle errno on tcf_idr_check_allocPedro Tammela
Smatch reports that 'ci' can be used uninitialized. The current code ignores errno coming from tcf_idr_check_alloc, which will lead to the incorrect usage of 'ci'. Handle the errno as it should. Fixes: 288864effe33 ("net/sched: act_connmark: transition to percpu stats and rcu") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-01net: avoid skb end_offset change in __skb_unclone_keeptruesize()Eric Dumazet
Once initial skb->head has been allocated from skb_small_head_cache, we need to make sure to use the same strategy whenever skb->head has to be re-allocated, as found by syzbot [1] This means kmalloc_reserve() can not fallback from using skb_small_head_cache to generic (power-of-two) kmem caches. It seems that we probably want to rework things in the future, to partially revert following patch, because we no longer use ksize() for skb allocated in TX path. 2b88cba55883 ("net: preserve skb_end_offset() in skb_unclone_keeptruesize()") Ideally, TCP stack should never put payload in skb->head, this effort has to be completed. In the mean time, add a sanity check. [1] BUG: KASAN: invalid-free in slab_free mm/slub.c:3787 [inline] BUG: KASAN: invalid-free in kmem_cache_free+0xee/0x5c0 mm/slub.c:3809 Free of addr ffff88806cdee800 by task syz-executor239/5189 CPU: 0 PID: 5189 Comm: syz-executor239 Not tainted 6.2.0-rc8-syzkaller-02400-gd1fabc68f8e0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x15e/0x45d mm/kasan/report.c:417 kasan_report_invalid_free+0x9b/0x1b0 mm/kasan/report.c:482 ____kasan_slab_free+0x1a5/0x1c0 mm/kasan/common.c:216 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1807 slab_free mm/slub.c:3787 [inline] kmem_cache_free+0xee/0x5c0 mm/slub.c:3809 skb_kfree_head net/core/skbuff.c:857 [inline] skb_kfree_head net/core/skbuff.c:853 [inline] skb_free_head+0x16f/0x1a0 net/core/skbuff.c:872 skb_release_data+0x57a/0x820 net/core/skbuff.c:901 skb_release_all net/core/skbuff.c:966 [inline] __kfree_skb+0x4f/0x70 net/core/skbuff.c:980 tcp_wmem_free_skb include/net/tcp.h:302 [inline] tcp_rtx_queue_purge net/ipv4/tcp.c:3061 [inline] tcp_write_queue_purge+0x617/0xcf0 net/ipv4/tcp.c:3074 tcp_v4_destroy_sock+0x125/0x810 net/ipv4/tcp_ipv4.c:2302 inet_csk_destroy_sock+0x19a/0x440 net/ipv4/inet_connection_sock.c:1195 __tcp_close+0xb96/0xf50 net/ipv4/tcp.c:3021 tcp_close+0x2d/0xc0 net/ipv4/tcp.c:3033 inet_release+0x132/0x270 net/ipv4/af_inet.c:426 __sock_release+0xcd/0x280 net/socket.c:651 sock_close+0x1c/0x20 net/socket.c:1393 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x23c/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f2511f546c3 Code: c7 c2 c0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 RSP: 002b:00007ffef0103d48 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f2511f546c3 RDX: 0000000000000978 RSI: 00000000200000c0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000003434 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffef0103d6c R13: 00007ffef0103d80 R14: 00007ffef0103dc0 R15: 0000000000000003 </TASK> Allocated by task 5189: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] ____kasan_kmalloc mm/kasan/common.c:333 [inline] __kasan_kmalloc+0xa5/0xb0 mm/kasan/common.c:383 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc_node_track_caller+0x5b/0xc0 mm/slab_common.c:988 kmalloc_reserve+0xf1/0x230 net/core/skbuff.c:539 pskb_expand_head+0x237/0x1160 net/core/skbuff.c:1995 __skb_unclone_keeptruesize+0x93/0x220 net/core/skbuff.c:2094 skb_unclone_keeptruesize include/linux/skbuff.h:1910 [inline] skb_prepare_for_shift net/core/skbuff.c:3804 [inline] skb_shift+0xef8/0x1e20 net/core/skbuff.c:3877 tcp_skb_shift net/ipv4/tcp_input.c:1538 [inline] tcp_shift_skb_data net/ipv4/tcp_input.c:1646 [inline] tcp_sacktag_walk+0x93b/0x18a0 net/ipv4/tcp_input.c:1713 tcp_sacktag_write_queue+0x1599/0x31d0 net/ipv4/tcp_input.c:1974 tcp_ack+0x2e9f/0x5a10 net/ipv4/tcp_input.c:3847 tcp_rcv_established+0x667/0x2230 net/ipv4/tcp_input.c:6006 tcp_v4_do_rcv+0x670/0x9b0 net/ipv4/tcp_ipv4.c:1721 sk_backlog_rcv include/net/sock.h:1113 [inline] __release_sock+0x133/0x3b0 net/core/sock.c:2921 release_sock+0x58/0x1b0 net/core/sock.c:3488 tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1485 inet_sendmsg+0x9d/0xe0 net/ipv4/af_inet.c:825 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 sock_write_iter+0x295/0x3d0 net/socket.c:1136 call_write_iter include/linux/fs.h:2189 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff88806cdee800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 0 bytes inside of 1024-byte region [ffff88806cdee800, ffff88806cdeec00) The buggy address belongs to the physical page: page:ffffea0001b37a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x6cde8 head:ffffea0001b37a00 order:3 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 ffff888012441dc0 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1f2a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_MEMALLOC|__GFP_HARDWALL), pid 75, tgid 75 (kworker/u4:4), ts 96369578780, free_ts 26734162530 prep_new_page mm/page_alloc.c:2531 [inline] get_page_from_freelist+0x119c/0x2ce0 mm/page_alloc.c:4283 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2287 alloc_slab_page mm/slub.c:1851 [inline] allocate_slab+0x25f/0x350 mm/slub.c:1998 new_slab mm/slub.c:2051 [inline] ___slab_alloc+0xa91/0x1400 mm/slub.c:3193 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3292 __slab_alloc_node mm/slub.c:3345 [inline] slab_alloc_node mm/slub.c:3442 [inline] __kmem_cache_alloc_node+0x1a4/0x430 mm/slub.c:3491 __do_kmalloc_node mm/slab_common.c:967 [inline] __kmalloc_node_track_caller+0x4b/0xc0 mm/slab_common.c:988 kmalloc_reserve+0xf1/0x230 net/core/skbuff.c:539 __alloc_skb+0x129/0x330 net/core/skbuff.c:608 __netdev_alloc_skb+0x74/0x410 net/core/skbuff.c:672 __netdev_alloc_skb_ip_align include/linux/skbuff.h:3203 [inline] netdev_alloc_skb_ip_align include/linux/skbuff.h:3213 [inline] batadv_iv_ogm_aggregate_new+0x106/0x4e0 net/batman-adv/bat_iv_ogm.c:558 batadv_iv_ogm_queue_add net/batman-adv/bat_iv_ogm.c:670 [inline] batadv_iv_ogm_schedule_buff+0xe6b/0x1450 net/batman-adv/bat_iv_ogm.c:849 batadv_iv_ogm_schedule net/batman-adv/bat_iv_ogm.c:868 [inline] batadv_iv_ogm_schedule net/batman-adv/bat_iv_ogm.c:861 [inline] batadv_iv_send_outstanding_bat_ogm_packet+0x744/0x910 net/batman-adv/bat_iv_ogm.c:1712 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1446 [inline] free_pcp_prepare+0x66a/0xc20 mm/page_alloc.c:1496 free_unref_page_prepare mm/page_alloc.c:3369 [inline] free_unref_page+0x1d/0x490 mm/page_alloc.c:3464 free_contig_range+0xb5/0x180 mm/page_alloc.c:9488 destroy_args+0xa8/0x64c mm/debug_vm_pgtable.c:998 debug_vm_pgtable+0x28de/0x296f mm/debug_vm_pgtable.c:1318 do_one_initcall+0x141/0x790 init/main.c:1306 do_initcall_level init/main.c:1379 [inline] do_initcalls init/main.c:1395 [inline] do_basic_setup init/main.c:1414 [inline] kernel_init_freeable+0x6f9/0x782 init/main.c:1634 kernel_init+0x1e/0x1d0 init/main.c:1522 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 Memory state around the buggy address: ffff88806cdee700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88806cdee780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88806cdee800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffff88806cdee880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-28tls: rx: fix return value for async cryptoJakub Kicinski
Gaurav reports that TLS Rx is broken with async crypto accelerators. The commit under fixes missed updating the retval byte counting logic when updating how records are stored. Even tho both before and after the change 'decrypted' was updated inside the main loop, it was completely overwritten when processing the async completions. Now that the rx_list only holds non-zero-copy records we need to add, not overwrite. Reported-and-bisected-by: Gaurav Jain <gaurav.jain@nxp.com> Fixes: cbbdee9918a2 ("tls: rx: async: don't put async zc on the list") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217064 Tested-by: Gaurav Jain <gaurav.jain@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230227181201.1793772-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-27Merge tag 'net-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. The notable fixes here are the EEE fix which restores boot for many embedded platforms (real and QEMU); WiFi warning suppression and the ICE Kconfig cleanup. Current release - regressions: - phy: multiple fixes for EEE rework - wifi: wext: warn about usage only once - wifi: ath11k: allow system suspend to survive ath11k Current release - new code bugs: - mlx5: Fix memory leak in IPsec RoCE creation - ibmvnic: assign XPS map to correct queue index Previous releases - regressions: - netfilter: ip6t_rpfilter: Fix regression with VRF interfaces - netfilter: ctnetlink: make event listener tracking global - nf_tables: allow to fetch set elements when table has an owner - mlx5: - fix skb leak while fifo resync and push - fix possible ptp queue fifo use-after-free Previous releases - always broken: - sched: fix action bind logic - ptp: vclock: use mutex to fix "sleep on atomic" bug if driver also uses a mutex - netfilter: conntrack: fix rmmod double-free race - netfilter: xt_length: use skb len to match in length_mt6, avoid issues with BIG TCP Misc: - ice: remove unnecessary CONFIG_ICE_GNSS - mlx5e: remove hairpin write debugfs files - sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy" * tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) tcp: tcp_check_req() can be called from process context net: phy: c45: fix network interface initialization failures on xtensa, arm:cubieboard xen-netback: remove unused variables pending_idx and index net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy net: dsa: ocelot_ext: remove unnecessary phylink.h include net: mscc: ocelot: fix duplicate driver name error net: dsa: felix: fix internal MDIO controller resource length net: dsa: seville: ignore mscc-miim read errors from Lynx PCS net/sched: act_sample: fix action bind logic net/sched: act_mpls: fix action bind logic net/sched: act_pedit: fix action bind logic wifi: wext: warn about usage only once wifi: mt76: usb: fix use-after-free in mt76u_free_rx_queue qede: avoid uninitialized entries in coal_entry array nfc: fix memory leak of se_io context in nfc_genl_se_io ice: remove unnecessary CONFIG_ICE_GNSS net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy() ibmvnic: Assign XPS map to correct queue index docs: net: fix inaccuracies in msg_zerocopy.rst tools: net: add __pycache__ to gitignore ...
2023-02-27SUNRPC: Properly terminate test case arraysChuck Lever
Unable to handle kernel paging request at virtual address 73657420 when execute [73657420] *pgd=00000000 Internal error: Oops: 80000005 [#1] ARM CPU: 0 PID: 1 Comm: swapper Tainted: G N 6.2.0-rc7-00133-g373f26a81164-dirty #9 Hardware name: Generic DT based system PC is at 0x73657420 LR is at kunit_run_tests+0x3e0/0x5f4 On x86 with GCC 12, the missing array terminators did not seem to matter. Other platforms appear to be more picky. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>