Age | Commit message (Collapse) | Author |
|
It is possible for syzbot to side-step the restriction imposed by the
blamed commit in the Fixes: tag, because the taprio UAPI permits a
cycle-time different from (and potentially shorter than) the sum of
entry intervals.
We need one more restriction, which is that the cycle time itself must
be larger than N * ETH_ZLEN bit times, where N is the number of schedule
entries. This restriction needs to apply regardless of whether the cycle
time came from the user or was the implicit, auto-calculated value, so
we move the existing "cycle == 0" check outside the "if "(!new->cycle_time)"
branch. This way covers both conditions and scenarios.
Add a selftest which illustrates the issue triggered by syzbot.
Fixes: b5b73b26b3ca ("taprio: Fix allowing too small intervals")
Reported-by: syzbot+a7d2b1d5d1af83035567@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/0000000000007d66bc06196e7c66@google.com/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240527153955.553333-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In commit b5b73b26b3ca ("taprio: Fix allowing too small intervals"), a
comparison of user input against length_to_duration(q, ETH_ZLEN) was
introduced, to avoid RCU stalls due to frequent hrtimers.
The implementation of length_to_duration() depends on q->picos_per_byte
being set for the link speed. The blamed commit in the Fixes: tag has
moved this too late, so the checks introduced above are ineffective.
The q->picos_per_byte is zero at parse_taprio_schedule() ->
parse_sched_list() -> parse_sched_entry() -> fill_sched_entry() time.
Move the taprio_set_picos_per_byte() call as one of the first things in
taprio_change(), before the bulk of the netlink attribute parsing is
done. That's because it is needed there.
Add a selftest to make sure the issue doesn't get reintroduced.
Fixes: 09dbdf28f9f9 ("net/sched: taprio: fix calculation of maximum gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240527153955.553333-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This removes the restriction of needing iif selector in the
forward/input hooks for fib lookups when requested result is
oif/oifname.
Removing this restriction allows "loose" lookups from the forward hooks.
Fixes: be8be04e5ddb ("netfilter: nft_fib: reverse path filter for policy-based routing on iif")
Signed-off-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
syzbot reports:
general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
[..]
RIP: 0010:nf_tproxy_laddr4+0xb7/0x340 net/ipv4/netfilter/nf_tproxy_ipv4.c:62
Call Trace:
nft_tproxy_eval_v4 net/netfilter/nft_tproxy.c:56 [inline]
nft_tproxy_eval+0xa9a/0x1a00 net/netfilter/nft_tproxy.c:168
__in_dev_get_rcu() can return NULL, so check for this.
Reported-and-tested-by: syzbot+b94a6818504ea90d7661@syzkaller.appspotmail.com
Fixes: cc6eb4338569 ("tproxy: use the interface primary IP address as a default value for --on-ip")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Userspace assumes vlan header is present at a given offset, but vlan
offload allows to store this in metadata fields of the skbuff. Hence
mangling vlan results in a garbled packet. Handle this transparently by
adding a parser to the kernel.
If vlan metadata is present and payload offset is over 12 bytes (source
and destination mac address fields), then subtract vlan header present
in vlan metadata, otherwise mangle vlan metadata based on offset and
length, extracting data from the source register.
This is similar to:
8cfd23e67401 ("netfilter: nft_payload: work around vlan header stripping")
to deal with vlan payload mangling.
Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-05-28
We've added 23 non-merge commits during the last 11 day(s) which contain
a total of 45 files changed, 696 insertions(+), 277 deletions(-).
The main changes are:
1) Rename skb's mono_delivery_time to tstamp_type for extensibility
and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(),
from Abhishek Chauhan.
2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary
CT zones can be used from XDP/tc BPF netfilter CT helper functions,
from Brad Cowie.
3) Several tweaks to the instruction-set.rst IETF doc to address
the Last Call review comments, from Dave Thaler.
4) Small batch of riscv64 BPF JIT optimizations in order to emit more
compressed instructions to the JITed image for better icache efficiency,
from Xiao Wang.
5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h
diffing and forcing more natural type definitions ordering,
from Mykyta Yatsenko.
6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence
a syzbot/KCSAN race report for the tx_errors counter,
from Jiang Yunshui.
7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+
which started treating const structs as constants and thus breaking
full BTF program name resolution, from Ivan Babrou.
8) Fix up BPF program numbers in test_sockmap selftest in order to reduce
some of the test-internal array sizes, from Geliang Tang.
9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only
pahole, from Alan Maguire.
10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless
rebuilds in some corner cases, from Artem Savkov.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
bpf, net: Use DEV_STAT_INC()
bpf, docs: Fix instruction.rst indentation
bpf, docs: Clarify call local offset
bpf, docs: Add table captions
bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
bpf, docs: Use RFC 2119 language for ISA requirements
bpf, docs: Move sentence about returning R0 to abi.rst
bpf: constify member bpf_sysctl_kern:: Table
riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
riscv, bpf: Use STACK_ALIGN macro for size rounding up
riscv, bpf: Optimize zextw insn with Zba extension
selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
net: Add additional bit to support clockid_t timestamp type
net: Rename mono_delivery_time to tstamp_type for scalabilty
selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs
net: netfilter: Make ct zone opts configurable for bpf ct helpers
selftests/bpf: Fix prog numbers in test_sockmap
bpf: Remove unused variable "prev_state"
bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer
bpf: Fix order of args in call to bpf_map_kvcalloc
...
====================
Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In commit cdfbabfb2f0c ("net: Work around lockdep limitation in
sockets that use sockets"), it introduces 'af_kern_callback_keys'
to lockdep-init of sk_callback_lock according to 'sk_kern_sock',
it modifies sock_init_data() only, and sk_clone_lock() calls
sk_init_common() to initialize sk_callback_lock too, so the
lockdep-init of sk_callback_lock should be moved to sk_init_common().
Signed-off-by: Gou Hao <gouhao@uniontech.com>
Link: https://lore.kernel.org/r/20240526145718.9542-2-gouhao@uniontech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
sk_callback_lock has already been initialized in sk_init_common().
Signed-off-by: Gou Hao <gouhao@uniontech.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240526145718.9542-1-gouhao@uniontech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
sk_psock_get will return NULL if the refcount of psock has gone to 0, which
will happen when the last call of sk_psock_put is done. However,
sk_psock_drop may not have finished yet, so the close callback will still
point to sock_map_close despite psock being NULL.
This can be reproduced with a thread deleting an element from the sock map,
while the second one creates a socket, adds it to the map and closes it.
That will trigger the WARN_ON_ONCE:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 7220 at net/core/sock_map.c:1701 sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
Modules linked in:
CPU: 1 PID: 7220 Comm: syz-executor380 Not tainted 6.9.0-syzkaller-07726-g3c999d1ae3c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
Code: df e8 92 29 88 f8 48 8b 1b 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 79 29 88 f8 4c 8b 23 eb 89 e8 4f 15 23 f8 90 <0f> 0b 90 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d e9 13 26 3d 02
RSP: 0018:ffffc9000441fda8 EFLAGS: 00010293
RAX: ffffffff89731ae1 RBX: ffffffff94b87540 RCX: ffff888029470000
RDX: 0000000000000000 RSI: ffffffff8bcab5c0 RDI: ffffffff8c1faba0
RBP: 0000000000000000 R08: ffffffff92f9b61f R09: 1ffffffff25f36c3
R10: dffffc0000000000 R11: fffffbfff25f36c4 R12: ffffffff89731840
R13: ffff88804b587000 R14: ffff88804b587000 R15: ffffffff89731870
FS: 000055555e080380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000207d4000 CR4: 0000000000350ef0
Call Trace:
<TASK>
unix_release+0x87/0xc0 net/unix/af_unix.c:1048
__sock_release net/socket.c:659 [inline]
sock_close+0xbe/0x240 net/socket.c:1421
__fput+0x42b/0x8a0 fs/file_table.c:422
__do_sys_close fs/open.c:1556 [inline]
__se_sys_close fs/open.c:1541 [inline]
__x64_sys_close+0x7f/0x110 fs/open.c:1541
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb37d618070
Code: 00 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 10 2c 00 00 80 3d 31 f0 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
RSP: 002b:00007ffcd4a525d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb37d618070
RDX: 0000000000000010 RSI: 00000000200001c0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000100000000 R09: 0000000100000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Use sk_psock, which will only check that the pointer is not been set to
NULL yet, which should only happen after the callbacks are restored. If,
then, a reference can still be gotten, we may call sk_psock_stop and cancel
psock->work.
As suggested by Paolo Abeni, reorder the condition so the control flow is
less convoluted.
After that change, the reproducer does not trigger the WARN_ON_ONCE
anymore.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reported-by: syzbot+07a2e4a1a57118ef7355@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=07a2e4a1a57118ef7355
Fixes: aadb2bb83ff7 ("sock_map: Fix a potential use-after-free in sock_map_close()")
Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself")
Cc: stable@vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20240524144702.1178377-1-cascardo@igalia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzbot/KCSAN reported that races happen when multiple CPUs updating
dev->stats.tx_error concurrently. Adopt SMP safe DEV_STATS_INC() to
update the dev->stats fields.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: yunshui <jiangyunshui@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240523033520.4029314-1-jiangyunshui@kylinos.cn
|
|
Jason commit made checks against ACK sequence less strict
and can be exploited by attackers to establish spoofed flows
with less probes.
Innocent users might use tcp_rmem[1] == 1,000,000,000,
or something more reasonable.
An attacker can use a regular TCP connection to learn the server
initial tp->rcv_wnd, and use it to optimize the attack.
If we make sure that only the announced window (smaller than 65535)
is used for ACK validation, we force an attacker to use
65537 packets to complete the 3WHS (assuming server ISN is unknown)
Fixes: 378979e94e95 ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value")
Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240523130528.60376-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzkaller was able to trigger
kernel BUG at net/core/gro.c:424 !
RIP: 0010:gro_pull_from_frag0 net/core/gro.c:424 [inline]
RIP: 0010:gro_try_pull_from_frag0 net/core/gro.c:446 [inline]
RIP: 0010:dev_gro_receive+0x242f/0x24b0 net/core/gro.c:571
Due to using an incorrect NAPI_GRO_CB(skb)->network_offset.
The referenced commit sets this offset to 0 in skb_gro_reset_offset.
That matches the expected case in dev_gro_receive:
pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
ipv6_gro_receive, inet_gro_receive,
&gro_list->list, skb);
But syzkaller injected an skb with protocol ETH_P_TEB into an ip6gre
device (by writing the IP6GRE encapsulated version to a TAP device).
The result was a first call to eth_gro_receive, and thus an extra
ETH_HLEN in network_offset that should not be there. First issue hit
is when computing offset from network header in ipv6_gro_pull_exthdrs.
Initialize both offsets in the network layer gro_receive.
This pairs with all reads in gro_receive, which use
skb_gro_receive_network_offset().
Fixes: 186b1ea73ad8 ("net: gro: use cb instead of skb->network_header")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
CC: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240523141434.1752483-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cited commit started returning an error when user space requests to dump
the interface's IPv4 addresses and IPv4 is disabled on the interface.
Restore the previous behavior and do not return an error.
Before cited commit:
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether e2:40:68:98:d0:18 brd ff:ff:ff:ff:ff:ff
inet6 fe80::e040:68ff:fe98:d018/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 67
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 67 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether e2:40:68:98:d0:18 brd ff:ff:ff:ff:ff:ff
After cited commit:
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 32:2d:69:f2:9c:99 brd ff:ff:ff:ff:ff:ff
inet6 fe80::302d:69ff:fef2:9c99/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 67
# ip address show dev dummy1
RTNETLINK answers: No such device
Dump terminated
With this patch:
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether de:17:56:bb:57:c0 brd ff:ff:ff:ff:ff:ff
inet6 fe80::dc17:56ff:febb:57c0/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 67
# ip address show dev dummy1
10: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 67 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether de:17:56:bb:57:c0 brd ff:ff:ff:ff:ff:ff
I fixed the exact same issue for IPv6 in commit c04f7dfe6ec2 ("ipv6: Fix
address dump when IPv6 is disabled on an interface"), but noted [1] that
I am not doing the change for IPv4 because I am not aware of a way to
disable IPv4 on an interface other than unregistering it. I clearly
missed the above case.
[1] https://lore.kernel.org/netdev/20240321173042.2151756-1-idosch@nvidia.com/
Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
Reported-by: Carolina Jubran <cjubran@nvidia.com>
Reported-by: Yamen Safadi <ysafadi@nvidia.com>
Tested-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240523110257.334315-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-05-27
We've added 15 non-merge commits during the last 7 day(s) which contain
a total of 18 files changed, 583 insertions(+), 55 deletions(-).
The main changes are:
1) Fix broken BPF multi-uprobe PID filtering logic which filtered by thread
while the promise was to filter by process, from Andrii Nakryiko.
2) Fix the recent influx of syzkaller reports to sockmap which triggered
a locking rule violation by performing a map_delete, from Jakub Sitnicki.
3) Fixes to netkit driver in particular on skb->pkt_type override upon pass
verdict, from Daniel Borkmann.
4) Fix an integer overflow in resolve_btfids which can wrongly trigger build
failures, from Friedrich Vock.
5) Follow-up fixes for ARC JIT reported by static analyzers,
from Shahab Vahedi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Cover verifier checks for mutating sockmap/sockhash
Revert "bpf, sockmap: Prevent lock inversion deadlock in map delete elem"
bpf: Allow delete from sockmap/sockhash only if update is allowed
selftests/bpf: Add netkit test for pkt_type
selftests/bpf: Add netkit tests for mac address
netkit: Fix pkt_type override upon netkit pass verdict
netkit: Fix setting mac address in l2 mode
ARC, bpf: Fix issues reported by the static analyzers
selftests/bpf: extend multi-uprobe tests with USDTs
selftests/bpf: extend multi-uprobe tests with child thread case
libbpf: detect broken PID filtering logic for multi-uprobe
bpf: remove unnecessary rcu_read_{lock,unlock}() in multi-uprobe attach logic
bpf: fix multi-uprobe PID filtering logic
bpf: Fix potential integer overflow in resolve_btfids
MAINTAINERS: Add myself as reviewer of ARM64 BPF JIT
====================
Link: https://lore.kernel.org/r/20240527203551.29712-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit ff91059932401894e6c86341915615c5eb0eca48.
This check is no longer needed. BPF programs attached to tracepoints are
now rejected by the verifier when they attempt to delete from a
sockmap/sockhash maps.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240527-sockmap-verify-deletes-v1-2-944b372f2101@cloudflare.com
|
|
syzkaller reported data-race of sk->sk_hash in unix_autobind() [0],
and the same ones exist in unix_bind_bsd() and unix_bind_abstract().
The three bind() functions prefetch sk->sk_hash locklessly and
use it later after validating that unix_sk(sk)->addr is NULL under
unix_sk(sk)->bindlock.
The prefetched sk->sk_hash is the hash value of unbound socket set
in unix_create1() and does not change until bind() completes.
There could be a chance that sk->sk_hash changes after the lockless
read. However, in such a case, non-NULL unix_sk(sk)->addr is visible
under unix_sk(sk)->bindlock, and bind() returns -EINVAL without using
the prefetched value.
The KCSAN splat is false-positive, but let's silence it by reading
sk->sk_hash under unix_sk(sk)->bindlock.
[0]:
BUG: KCSAN: data-race in unix_autobind / unix_autobind
write to 0xffff888034a9fb88 of 4 bytes by task 4468 on cpu 0:
__unix_set_addr_hash net/unix/af_unix.c:331 [inline]
unix_autobind+0x47a/0x7d0 net/unix/af_unix.c:1185
unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373
__sys_connect_file+0xd7/0xe0 net/socket.c:2048
__sys_connect+0x114/0x140 net/socket.c:2065
__do_sys_connect net/socket.c:2075 [inline]
__se_sys_connect net/socket.c:2072 [inline]
__x64_sys_connect+0x40/0x50 net/socket.c:2072
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x46/0x4e
read to 0xffff888034a9fb88 of 4 bytes by task 4465 on cpu 1:
unix_autobind+0x28/0x7d0 net/unix/af_unix.c:1134
unix_dgram_connect+0x7e3/0x890 net/unix/af_unix.c:1373
__sys_connect_file+0xd7/0xe0 net/socket.c:2048
__sys_connect+0x114/0x140 net/socket.c:2065
__do_sys_connect net/socket.c:2075 [inline]
__se_sys_connect net/socket.c:2072 [inline]
__x64_sys_connect+0x40/0x50 net/socket.c:2072
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x46/0x4e
value changed: 0x000000e4 -> 0x000001e3
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 6.8.0-12822-gcd51db110a7e #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: afd20b9290e1 ("af_unix: Replace the big lock with small locks.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240522154218.78088-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once unix_sk(sk)->addr is assigned under net->unx.table.locks and
unix_sk(sk)->bindlock, *(unix_sk(sk)->addr) and unix_sk(sk)->path are
fully set up, and unix_sk(sk)->addr is never changed.
unix_getname() and unix_copy_addr() access the two fields locklessly,
and commit ae3b564179bf ("missing barriers in some of unix_sock ->addr
and ->path accesses") added smp_store_release() and smp_load_acquire()
pairs.
In other functions, we still read unix_sk(sk)->addr locklessly to check
if the socket is bound, and KCSAN complains about it. [0]
Given these functions have no dependency for *(unix_sk(sk)->addr) and
unix_sk(sk)->path, READ_ONCE() is enough to annotate the data-race.
Note that it is safe to access unix_sk(sk)->addr locklessly if the socket
is found in the hash table. For example, the lockless read of otheru->addr
in unix_stream_connect() is safe.
Note also that newu->addr there is of the child socket that is still not
accessible from userspace, and smp_store_release() publishes the address
in case the socket is accept()ed and unix_getname() / unix_copy_addr()
is called.
[0]:
BUG: KCSAN: data-race in unix_bind / unix_listen
write (marked) to 0xffff88805f8d1840 of 8 bytes by task 13723 on cpu 0:
__unix_set_addr_hash net/unix/af_unix.c:329 [inline]
unix_bind_bsd net/unix/af_unix.c:1241 [inline]
unix_bind+0x881/0x1000 net/unix/af_unix.c:1319
__sys_bind+0x194/0x1e0 net/socket.c:1847
__do_sys_bind net/socket.c:1858 [inline]
__se_sys_bind net/socket.c:1856 [inline]
__x64_sys_bind+0x40/0x50 net/socket.c:1856
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x46/0x4e
read to 0xffff88805f8d1840 of 8 bytes by task 13724 on cpu 1:
unix_listen+0x72/0x180 net/unix/af_unix.c:734
__sys_listen+0xdc/0x160 net/socket.c:1881
__do_sys_listen net/socket.c:1890 [inline]
__se_sys_listen net/socket.c:1888 [inline]
__x64_sys_listen+0x2e/0x40 net/socket.c:1888
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x46/0x4e
value changed: 0x0000000000000000 -> 0xffff88807b5b1b40
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 13724 Comm: syz-executor.4 Not tainted 6.8.0-12822-gcd51db110a7e #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240522154002.77857-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When running Cilium connectivity test suite with netkit in L2 mode, we
found that compared to tcx a few tests were failing which pushed traffic
into an L7 proxy sitting in host namespace. The problem in particular is
around the invocation of eth_type_trans() in netkit.
In case of tcx, this is run before the tcx ingress is triggered inside
host namespace and thus if the BPF program uses the bpf_skb_change_type()
helper the newly set type is retained. However, in case of netkit, the
late eth_type_trans() invocation overrides the earlier decision from the
BPF program which eventually leads to the test failure.
Instead of eth_type_trans(), split out the relevant parts, meaning, reset
of mac header and call to eth_skb_pkt_type() before the BPF program is run
in order to have the same behavior as with tcx, and refactor a small helper
called eth_skb_pull_mac() which is run in case it's passed up the stack
where the mac header must be pulled. With this all connectivity tests pass.
Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240524163619.26001-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
tstamp_type is now set based on actual clockid_t compressed
into 2 bits.
To make the design scalable for future needs this commit bring in
the change to extend the tstamp_type:1 to tstamp_type:2 to support
other clockid_t timestamp.
We now support CLOCK_TAI as part of tstamp_type as part of this
commit with existing support CLOCK_MONOTONIC and CLOCK_REALTIME.
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509211834.3235191-3-quic_abchauha@quicinc.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
mono_delivery_time was added to check if skb->tstamp has delivery
time in mono clock base (i.e. EDT) otherwise skb->tstamp has
timestamp in ingress and delivery_time at egress.
Renaming the bitfield from mono_delivery_time to tstamp_type is for
extensibilty for other timestamps such as userspace timestamp
(i.e. SO_TXTIME) set via sock opts.
As we are renaming the mono_delivery_time to tstamp_type, it makes
sense to start assigning tstamp_type based on enum defined
in this commit.
Earlier we used bool arg flag to check if the tstamp is mono in
function skb_set_delivery_time, Now the signature of the functions
accepts tstamp_type to distinguish between mono and real time.
Also skb_set_delivery_type_by_clockid is a new function which accepts
clockid to determine the tstamp_type.
In future tstamp_type:1 can be extended to support userspace timestamp
by increasing the bitfield.
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240509211834.3235191-2-quic_abchauha@quicinc.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Pull NFS client updates from Trond Myklebust:
"Stable fixes:
- nfs: fix undefined behavior in nfs_block_bits()
- NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
Bugfixes:
- Fix mixing of the lock/nolock and local_lock mount options
- NFSv4: Fixup smatch warning for ambiguous return
- NFSv3: Fix remount when using the legacy binary mount api
- SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
- SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
- rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
Features and cleanups:
- NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
- pNFS/filelayout: S layout segment range in LAYOUTGET
- pNFS: rework pnfs_generic_pg_check_layout to check IO range
- NFSv2: Turn off enabling of NFS v2 by default"
* tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix undefined behavior in nfs_block_bits()
pNFS: rework pnfs_generic_pg_check_layout to check IO range
pNFS/filelayout: check layout segment range
pNFS/filelayout: fixup pNfs allocation modes
rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
NFS: Don't enable NFS v2 by default
NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
sunrpc: fix NFSACL RPC retry on soft mount
SUNRPC: fix handling expired GSS context
nfs: keep server info for remounts
NFSv4: Fixup smatch warning for ambiguous return
NFS: make sure lock/nolock overriding local_lock mount option
NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
pNFS/filelayout: Specify the layout segment range in LAYOUTGET
pNFS/filelayout: Remove the whole file layout requirement
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Quite smaller than usual. Notably it includes the fix for the unix
regression from the past weeks. The TCP window fix will require some
follow-up, already queued.
Current release - regressions:
- af_unix: fix garbage collection of embryos
Previous releases - regressions:
- af_unix: fix race between GC and receive path
- ipv6: sr: fix missing sk_buff release in seg6_input_core
- tcp: remove 64 KByte limit for initial tp->rcv_wnd value
- eth: r8169: fix rx hangup
- eth: lan966x: remove ptp traps in case the ptp is not enabled
- eth: ixgbe: fix link breakage vs cisco switches
- eth: ice: prevent ethtool from corrupting the channels
Previous releases - always broken:
- openvswitch: set the skbuff pkt_type for proper pmtud support
- tcp: Fix shift-out-of-bounds in dctcp_update_alpha()
Misc:
- a bunch of selftests stabilization patches"
* tag 'net-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (25 commits)
r8169: Fix possible ring buffer corruption on fragmented Tx packets.
idpf: Interpret .set_channels() input differently
ice: Interpret .set_channels() input differently
nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()
net: relax socket state check at accept time.
tcp: remove 64 KByte limit for initial tp->rcv_wnd value
net: ti: icssg_prueth: Fix NULL pointer dereference in prueth_probe()
tls: fix missing memory barrier in tls_init
net: fec: avoid lock evasion when reading pps_enable
Revert "ixgbe: Manual AN-37 for troublesome link partners for X550 SFI"
testing: net-drv: use stats64 for testing
net: mana: Fix the extra HZ in mana_hwc_send_request
net: lan966x: Remove ptp traps in case the ptp is not enabled.
openvswitch: Set the skbuff pkt_type for proper pmtud support.
selftest: af_unix: Make SCM_RIGHTS into OOB data.
af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
tcp: Fix shift-out-of-bounds in dctcp_update_alpha().
selftests/net: use tc rule to filter the na packet
ipv6: sr: fix memleak in seg6_hmac_init_algo
af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing cleanup from Steven Rostedt:
"Remove second argument of __assign_str()
The __assign_str() macro logic of the TRACE_EVENT() macro was
optimized so that it no longer needs the second argument. The
__assign_str() is always matched with __string() field that takes a
field name and the source for that field:
__string(field, source)
The TRACE_EVENT() macro logic will save off the source value and then
use that value to copy into the ring buffer via the __assign_str().
Before commit c1fa617caeb0 ("tracing: Rework __assign_str() and
__string() to not duplicate getting the string"), the __assign_str()
needed the second argument which would perform the same logic as the
__string() source parameter did. Not only would this add overhead, but
it was error prone as if the __assign_str() source produced something
different, it may not have allocated enough for the string in the ring
buffer (as the __string() source was used to determine how much to
allocate)
Now that the __assign_str() just uses the same string that was used in
__string() it no longer needs the source parameter. It can now be
removed"
* tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/treewide: Remove second parameter of __assign_str()
|
|
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- virtio-net is finally supported in vduse
- virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster
And fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-pci: Check if is_avq is NULL
virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
MAINTAINERS: add Eugenio Pérez as reviewer
vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
vp_vdpa: don't allocate unused msix vectors
sound: virtio: drop owner assignment
fuse: virtio: drop owner assignment
scsi: virtio: drop owner assignment
rpmsg: virtio: drop owner assignment
nvdimm: virtio_pmem: drop owner assignment
wifi: mac80211_hwsim: drop owner assignment
vsock/virtio: drop owner assignment
net: 9p: virtio: drop owner assignment
net: virtio: drop owner assignment
net: caif: virtio: drop owner assignment
misc: nsm: drop owner assignment
iommu: virtio: drop owner assignment
drm/virtio: drop owner assignment
gpio: virtio: drop owner assignment
firmware: arm_scmi: virtio: drop owner assignment
...
|
|
When nci_rx_work() receives a zero-length payload packet, it should not
discard the packet and exit the loop. Instead, it should continue
processing subsequent packets.
Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240521153444.535399-1-ryasuoka@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Christoph reported the following splat:
WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
Modules linked in:
CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
FS: 000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
do_accept+0x435/0x620 net/socket.c:1929
__sys_accept4_file net/socket.c:1969 [inline]
__sys_accept4+0x9b/0x110 net/socket.c:1999
__do_sys_accept net/socket.c:2016 [inline]
__se_sys_accept net/socket.c:2013 [inline]
__x64_sys_accept+0x7d/0x90 net/socket.c:2013
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x4315f9
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
</TASK>
The reproducer invokes shutdown() before entering the listener status.
After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
TCP_SYN_RECV sockets"), the above causes the child to reach the accept
syscall in FIN_WAIT1 status.
Eric noted we can relax the existing assertion in __inet_accept()
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Recently, we had some servers upgraded to the latest kernel and noticed
the indicator from the user side showed worse results than before. It is
caused by the limitation of tp->rcv_wnd.
In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
CDN teams would not benefit from this change because they cannot have a
large window to receive a big packet, which will be slowed down especially
in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's
the side effect for the latency/time-sensitive users.
To avoid future confusion, current change doesn't affect the initial
receive window on the wire in a SYN or SYN+ACK packet which are set within
65535 bytes according to RFC 7323 also due to the limit in
__tcp_transmit_skb():
th->window = htons(min(tp->rcv_wnd, 65535U));
In one word, __tcp_transmit_skb() already ensures that constraint is
respected, no matter how large tp->rcv_wnd is. The change doesn't violate
RFC.
Let me provide one example if with or without the patch:
Before:
client --- SYN: rwindow=65535 ---> server
client <--- SYN+ACK: rwindow=65535 ---- server
client --- ACK: rwindow=65536 ---> server
Note: for the last ACK, the calculation is 512 << 7.
After:
client --- SYN: rwindow=65535 ---> server
client <--- SYN+ACK: rwindow=65535 ---- server
client --- ACK: rwindow=175232 ---> server
Note: I use the following command to make it work:
ip route change default via [ip] dev eth0 metric 100 initrwnd 120
For the last ACK, the calculation is 1369 << 7.
When we apply such a patch, having a large rcv_wnd if the user tweak this
knob can help transfer data more rapidly and save some rtts.
Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20240521134220.12510-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In tls_init(), a write memory barrier is missing, and store-store
reordering may cause NULL dereference in tls_{setsockopt,getsockopt}.
CPU0 CPU1
----- -----
// In tls_init()
// In tls_ctx_create()
ctx = kzalloc()
ctx->sk_proto = READ_ONCE(sk->sk_prot) -(1)
// In update_sk_prot()
WRITE_ONCE(sk->sk_prot, tls_prots) -(2)
// In sock_common_setsockopt()
READ_ONCE(sk->sk_prot)->setsockopt()
// In tls_{setsockopt,getsockopt}()
ctx->sk_proto->setsockopt() -(3)
In the above scenario, when (1) and (2) are reordered, (3) can observe
the NULL value of ctx->sk_proto, causing NULL dereference.
To fix it, we rely on rcu_assign_pointer() which implies the release
barrier semantic. By moving rcu_assign_pointer() after ctx->sk_proto is
initialized, we can ensure that ctx->sk_proto are visible when
changing sk->sk_prot.
Fixes: d5bee7374b68 ("net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE")
Signed-off-by: Yewon Choi <woni9911@gmail.com>
Signed-off-by: Dae R. Jeong <threeearcat@gmail.com>
Link: https://lore.kernel.org/netdev/ZU4OJG56g2V9z_H7@dragonet/T/
Link: https://lore.kernel.org/r/Zkx4vjSFp0mfpjQ2@libra05
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In MLO, the deflink BSSID is clearly invalid. Since we fill
the addresses as MLD addresses and translate later, use the
AP address here instead.
This fixes an issue that happens with HW restart, where the
DelBA frame is transmitted, but not processed correctly due
to the wrong BSSID (or even just discarded entirely). As a
result, the BA sessions are kept alive; however, as other
state is reset during HW restart, this then fails (reorder,
etc.) and data doesn't go through until new BA sessions are
established.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240510112601.f4e1effdea29.I98e81f22166b68d4b6211191bcaaf8531b324a77@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The negotiated TTLM data must be reset on disconnect, otherwise
it may end up getting reused on another connection. Fix that.
Fixes: 8f500fbc6c65 ("wifi: mac80211: process and save negotiated TID to Link mapping request")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211858.04142e8fe01c.Ia144457e086ebd8ddcfa31bdf5ff210b4b351c22@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There's no need to stop works that have already been
stopped during disconnect, so don't.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211034.f8434be19f56.I021afadc538508da3bc8f95c89f424ca62b94bef@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
It shouldn't be possible to run this after disconnecting, so
cancel the work earlier.
Fixes: a17a58ad2ff2 ("wifi: mac80211: add support for tearing down negotiated TTLM")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211034.096a10ccebec.I5584a21c27eb9b3e87b9e26380b627114b32ccba@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This work shouldn't run after we're disconnecting. Cancel it earlier
(and then don't cancel it in stop later.)
Fixes: 8eb8dd2ffbbb ("wifi: mac80211: Support link removal using Reconfiguration ML element")
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211034.ac754794279f.Ib9fbb1dab50c6b67f6de9be09a6c452ce89bbd50@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The worker calculates the wrong sdata pointer, so if it ever
runs, it'll crash. Fix that.
Fixes: a17a58ad2ff2 ("wifi: mac80211: add support for tearing down negotiated TTLM")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211853.e6471800c76d.I8b7c2d6984c89a11cd33d1a610e9645fa965f6e1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add flag for Full Bandwidth UL MU-MIMO for EHT. This is utilized
to pass EHT MU-MIMO configurations from user space to driver
in AP mode.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
Signed-off-by: Pradeep Kumar Chitrapu <quic_pradeepc@quicinc.com>
Link: https://msgid.link/20240515181327.12855-2-quic_pradeepc@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There are now three places setting up chanreq.ap which always
depends on the mode (EHT being used or not) and override flag.
Refactor that code into a common function with a comment, to
make that clearer.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506215543.5cd6a209e58a.I3be318959d9e2df5dccd2d0938c3d2fcc6688030@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Handle the transmit power envelope (TPE) element during
channel switch, applying it when the channel switch is
done.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506215543.486c33157d18.Idf971ad801b6961c177bdf42cc323fd1a4ca8165@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
During channel switch, track the AP configuration in the
chanreq, so that wider bandwidth OFDMA is taken into
account correctly, since multiple channel contexts may
be needed due to sharing not being possible due to
wider bandwidth OFDMA.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506215543.b2c5a72dac1b.I69f65cb2e75d4a49a174b1aede68bf8ff0a3cab3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Collect the CSA data in ieee80211_link_data_managed and
ieee80211_link_data into a csa sub-struct to clean up a
bit and make adding new things more obvious.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506215543.29f954b1f576.I9a683a9647c33d4dd3011aade6677982428c1082@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If we see a channel switch announcement on one link for
another, handle that case and start the CSA. The driver
can react to this in whatever way it needs. The stack
will have the ability to track it via the RNR/MLE in the
reporting link's beacon if it sees it for inactive links
and adjust everything accordingly.
Note that currently the timings for the CSA aren't set,
the values are only used by the Intel drivers, and they
don't need this for newer devices that support MLO, so
I've left it out for now.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240415112355.4d34b6a31be7.Ie8453979f5805873a8411c99346bcc3810cd6476@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Just like it's not currently possible to start radar
detection while already operating, it shouldn't be
possible to start operating while radar detection is
running. Fix that.
Also, improve the check whether operating (carrier
might not be up if e.g. attempting to join IBSS).
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211158.ae8dca3d0d6c.I7c70a66a5fbdbc63a78fee8a34f31d1995491bc3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
At some point we thought perhaps this could be per link, but
really that didn't happen, and it's confusing. Radar detection
still uses the deflink to allocate the channel, but the work
need not be there. Move it back.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211311.43bd82c6da04.Ib39bec3aa198d137385f081e7e1910dcbde3aa1b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If the TPE (transmit power envelope) is changed, detect and
report that to the driver.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506214536.103dda923f45.I990877e409ab8eade9ed7c172272e0cae57256cf@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Instead of passing the full TPE elements, in all their glory
and mixed up data formats for HE backward compatibility, parse
them fully into the right values, and pass that to the drivers.
Also introduce proper validation already in mac80211, so that
drivers don't need to do it, and parse the EHT portions.
The code now passes the values in the right order according to
the channel used by an interface, which could also be a subset
of the data advertised by the AP, if we couldn't connect with
the full bandwidth (for whatever reason.)
Also add kunit tests for the more complicated bits of it.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Acked-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/20240506214536.2aa839969b60.I265b28209e0b29772b2f125f7f83de44a4da877b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
6 GHz AP power type details are required to set proper tx power
used to send frames.
Update AP power type received in beacon while preparing
for connection instead of after association so the frames
before association can use the correct tx power.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Link: https://msgid.link/20240506214536.310434f55f76.I6aca291ee06265e3f63e0f9024ba19a850b53a33@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There's no need to have a lockdep assert and then call
another function, just move everything into one place.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211934.9759564a25f4.I88d43aa459d15c1d6230152e76b7757c2cdd6085@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
These comments are no longer correct, it's a wiphy work now
so it will go away immediately when canceled.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506211422.68bc10efbd8a.If80f43f4c8b9db1f5266f70d93a805f8c7463fe2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add trace events to trace when wiphy works are queued (or
delayed ones scheduled), and other APIs are called. Also
add an event when the worker starts, before acquiring the
mutex, to be able to see potential delays due to locking.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://msgid.link/20240506210002.bf1840a1d22d.I4abba048c1c4017345640219cf1384a0b2288dd3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
They were meant to be split into ops and APIs, but some
ops were added in the wrong place. Fix that.
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240506210002.0b3a86a5d8d7.I5591c03223bdb95597e181de63a2eded424de34c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Setting the AP channel width is meant for use with the normal
20/40/... MHz channel width progression, and switching around
in S1G or narrow channels isn't supported. Disallow that.
Reported-by: syzbot+bc0f5b92cc7091f45fb6@syzkaller.appspotmail.com
Link: https://msgid.link/20240515141600.d4a9590bfe32.I19a32d60097e81b527eafe6b0924f6c5fbb2dc45@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|