Age | Commit message (Collapse) | Author |
|
calls
This effectively reverts commit 05a7f4a8dff1 ("devlink: Break parameter
notification sequence to be before/after unload/load driver").
Cited commit resolved a problem in mlx5 params implementation,
when param notification code accessed memory previously freed
during reload.
Now, when the params can be registered and unregistered when devlink
instance is registered, mlx5 code unregisters the problematic param
during devlink reload. The fix is therefore no longer needed.
Current behavior is a it problematic, as it sends DEL notifications even
in potential case when reload_down() call fails which might confuse
userspace notifications listener.
So move the reload notifications back where they were originally in
between reload_down() and reload_up() calls.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- drop prandom.h includes, by Sven Eckelmann
- fix mailing list address, by Sven Eckelmann
- multicast feature preparation, by Linus Lüssing (2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If you call listen() and accept() on an already connect()ed
AF_NETROM socket, accept() can successfully connect.
This is because when the peer socket sends data to sendmsg,
the skb with its own sk stored in the connected socket's
sk->sk_receive_queue is connected, and nr_accept() dequeues
the skb waiting in the sk->sk_receive_queue.
As a result, nr_accept() allocates and returns a sock with
the sk of the parent AF_NETROM socket.
And here use-after-free can happen through complex race conditions:
```
cpu0 cpu1
1. socket_2 = socket(AF_NETROM)
.
.
listen(socket_2)
accepted_socket = accept(socket_2)
2. socket_1 = socket(AF_NETROM)
nr_create() // sk refcount : 1
connect(socket_1)
3. write(accepted_socket)
nr_sendmsg()
nr_output()
nr_kick()
nr_send_iframe()
nr_transmit_buffer()
nr_route_frame()
nr_loopback_queue()
nr_loopback_timer()
nr_rx_frame()
nr_process_rx_frame(sk, skb); // sk : socket_1's sk
nr_state3_machine()
nr_queue_rx_frame()
sock_queue_rcv_skb()
sock_queue_rcv_skb_reason()
__sock_queue_rcv_skb()
__skb_queue_tail(list, skb); // list : socket_1's sk->sk_receive_queue
4. listen(socket_1)
nr_listen()
uaf_socket = accept(socket_1)
nr_accept()
skb_dequeue(&sk->sk_receive_queue);
5. close(accepted_socket)
nr_release()
nr_write_internal(sk, NR_DISCREQ)
nr_transmit_buffer() // NR_DISCREQ
nr_route_frame()
nr_loopback_queue()
nr_loopback_timer()
nr_rx_frame() // sk : socket_1's sk
nr_process_rx_frame() // NR_STATE_3
nr_state3_machine() // NR_DISCREQ
nr_disconnect()
nr_sk(sk)->state = NR_STATE_0;
6. close(socket_1) // sk refcount : 3
nr_release() // NR_STATE_0
sock_put(sk); // sk refcount : 0
sk_free(sk);
close(uaf_socket)
nr_release()
sock_hold(sk); // UAF
```
KASAN report by syzbot:
```
BUG: KASAN: use-after-free in nr_release+0x66/0x460 net/netrom/af_netrom.c:520
Write of size 4 at addr ffff8880235d8080 by task syz-executor564/5128
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:306 [inline]
print_report+0x15e/0x461 mm/kasan/report.c:417
kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x141/0x190 mm/kasan/generic.c:189
instrument_atomic_read_write include/linux/instrumented.h:102 [inline]
atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:116 [inline]
__refcount_add include/linux/refcount.h:193 [inline]
__refcount_inc include/linux/refcount.h:250 [inline]
refcount_inc include/linux/refcount.h:267 [inline]
sock_hold include/net/sock.h:775 [inline]
nr_release+0x66/0x460 net/netrom/af_netrom.c:520
__sock_release+0xcd/0x280 net/socket.c:650
sock_close+0x1c/0x20 net/socket.c:1365
__fput+0x27c/0xa90 fs/file_table.c:320
task_work_run+0x16f/0x270 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xaa8/0x2950 kernel/exit.c:867
do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
get_signal+0x21c3/0x2450 kernel/signal.c:2859
arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f6c19e3c9b9
Code: Unable to access opcode bytes at 0x7f6c19e3c98f.
RSP: 002b:00007fffd4ba2ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: 0000000000000116 RBX: 0000000000000003 RCX: 00007f6c19e3c9b9
RDX: 0000000000000318 RSI: 00000000200bd000 RDI: 0000000000000006
RBP: 0000000000000003 R08: 000000000000000d R09: 000000000000000d
R10: 0000000000000000 R11: 0000000000000246 R12: 000055555566a2c0
R13: 0000000000000011 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Allocated by task 5128:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:371 [inline]
____kasan_kmalloc mm/kasan/common.c:330 [inline]
__kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slab_common.c:968 [inline]
__kmalloc+0x5a/0xd0 mm/slab_common.c:981
kmalloc include/linux/slab.h:584 [inline]
sk_prot_alloc+0x140/0x290 net/core/sock.c:2038
sk_alloc+0x3a/0x7a0 net/core/sock.c:2091
nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433
__sock_create+0x359/0x790 net/socket.c:1515
sock_create net/socket.c:1566 [inline]
__sys_socket_create net/socket.c:1603 [inline]
__sys_socket_create net/socket.c:1588 [inline]
__sys_socket+0x133/0x250 net/socket.c:1636
__do_sys_socket net/socket.c:1649 [inline]
__se_sys_socket net/socket.c:1647 [inline]
__x64_sys_socket+0x73/0xb0 net/socket.c:1647
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 5128:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:177 [inline]
__cache_free mm/slab.c:3394 [inline]
__do_kmem_cache_free mm/slab.c:3580 [inline]
__kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587
sk_prot_free net/core/sock.c:2074 [inline]
__sk_destruct+0x5df/0x750 net/core/sock.c:2166
sk_destruct net/core/sock.c:2181 [inline]
__sk_free+0x175/0x460 net/core/sock.c:2192
sk_free+0x7c/0xa0 net/core/sock.c:2203
sock_put include/net/sock.h:1991 [inline]
nr_release+0x39e/0x460 net/netrom/af_netrom.c:554
__sock_release+0xcd/0x280 net/socket.c:650
sock_close+0x1c/0x20 net/socket.c:1365
__fput+0x27c/0xa90 fs/file_table.c:320
task_work_run+0x16f/0x270 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xaa8/0x2950 kernel/exit.c:867
do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
get_signal+0x21c3/0x2450 kernel/signal.c:2859
arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
```
To fix this issue, nr_listen() returns -EINVAL for sockets that
successfully nr_connect().
Reported-by: syzbot+caa188bdfc1eeafeb418@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
s390x ABI requires the caller to zero- or sign-extend the arguments.
eBPF already deals with zero-extension (by definition of its ABI), but
not with sign-extension.
Add a test to cover that potentially problematic area.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-15-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_tcp_raw_gen_syncookie_ipv{4,6}()
These functions already check that th_len < sizeof(*th), and
propagating the lower bound (th_len > 0) may be challenging
in complex code, e.g. as is the case with xdp_synproxy test on
s390x [1]. Switch to ARG_CONST_SIZE_OR_ZERO in order to make the
verifier accept code where it cannot prove that th_len > 0.
[1] https://lore.kernel.org/bpf/CAEf4Bzb3uiSHtUbgVWmkWuJ5Sw1UZd4c_iuS4QXtUkXmTTtXuQ@mail.gmail.com/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Implement the core hooks in order to provide the softMAC layer support
for sending beacons. Coordinators may be requested to send beacons in a
beacon enabled PAN in order for the other devices around to self
discover the available PANs automatically.
Changing the channels is prohibited while a beacon operation is
ongoing.
The implementation uses a workqueue triggered at a certain interval
depending on the symbol duration for the current channel and the
interval order provided.
Sending beacons in response to a BEACON_REQ frame (ie. answering active
scans) is not yet supported.
This initial patchset has no security support (llsec).
Co-developed-by: David Girault <david.girault@qorvo.com>
Signed-off-by: David Girault <david.girault@qorvo.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20230125102923.135465-3-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
Parse user requests for sending beacons, start sending beacons at a
regular pace. If needed, the pace can be updated with a new request. The
process can also be interrupted at any moment.
The page and channel must be changed beforehands if needed. Interval
orders above 14 are reserved to tell a device it must answer BEACON_REQ
coming from another device as part of an active scan procedure and this
is not yet supported.
A netlink "beacon request" structure is created to list the
requirements.
Mac layers may now implement the ->send_beacons() and
->stop_beacons() hooks.
Co-developed-by: David Girault <david.girault@qorvo.com>
Signed-off-by: David Girault <david.girault@qorvo.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20230125102923.135465-2-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
We may have pending skbs in the receive queue when the sk is being
destroyed; add a destructor to purge the queue.
MCTP doesn't use the error queue, so only the receive_queue is purged.
Fixes: 833ef3b91de6 ("mctp: Populate socket implementation")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230126064551.464468-1-jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Function radix_tree_insert() returns errors if the node hasn't
been initialized and added to the tree.
"kfree(node)" and return value "NULL" of node_get() help
to avoid using unclear node in other calls.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: <stable@vger.kernel.org> # 5.7
Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: Natalia Petrova <n.petrova@fintech.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20230125134831.8090-1-n.petrova@fintech.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If you call listen() and accept() on an already connect()ed
rose socket, accept() can successfully connect.
This is because when the peer socket sends data to sendmsg,
the skb with its own sk stored in the connected socket's
sk->sk_receive_queue is connected, and rose_accept() dequeues
the skb waiting in the sk->sk_receive_queue.
This creates a child socket with the sk of the parent
rose socket, which can cause confusion.
Fix rose_listen() to return -EINVAL if the socket has
already been successfully connected, and add lock_sock
to prevent this issue.
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230125105944.GA133314@ubuntu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
bpf-next 2023-01-28
We've added 124 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 6386 insertions(+), 1827 deletions(-).
The main changes are:
1) Implement XDP hints via kfuncs with initial support for RX hash and
timestamp metadata kfuncs, from Stanislav Fomichev and
Toke Høiland-Jørgensen.
Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk
2) Extend libbpf's bpf_tracing.h support for tracing arguments of
kprobes/uprobes and syscall as a special case, from Andrii Nakryiko.
3) Significantly reduce the search time for module symbols by livepatch
and BPF, from Jiri Olsa and Zhen Lei.
4) Enable cpumasks to be used as kptrs, which is useful for tracing
programs tracking which tasks end up running on which CPUs
in different time intervals, from David Vernet.
5) Fix several issues in the dynptr processing such as stack slot liveness
propagation, missing checks for PTR_TO_STACK variable offset, etc,
from Kumar Kartikeya Dwivedi.
6) Various performance improvements, fixes, and introduction of more
than just one XDP program to XSK selftests, from Magnus Karlsson.
7) Big batch to BPF samples to reduce deprecated functionality,
from Daniel T. Lee.
8) Enable struct_ops programs to be sleepable in verifier,
from David Vernet.
9) Reduce pr_warn() noise on BTF mismatches when they are expected under
the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien.
10) Describe modulo and division by zero behavior of the BPF runtime
in BPF's instruction specification document, from Dave Thaler.
11) Several improvements to libbpf API documentation in libbpf.h,
from Grant Seltzer.
12) Improve resolve_btfids header dependencies related to subcmd and add
proper support for HOSTCC, from Ian Rogers.
13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room()
helper along with BPF selftests, from Ziyang Xuan.
14) Simplify the parsing logic of structure parameters for BPF trampoline
in the x86-64 JIT compiler, from Pu Lehui.
15) Get BTF working for kernels with CONFIG_RUST enabled by excluding
Rust compilation units with pahole, from Martin Rodriguez Reboredo.
16) Get bpf_setsockopt() working for kTLS on top of TCP sockets,
from Kui-Feng Lee.
17) Disable stack protection for BPF objects in bpftool given BPF backends
don't support it, from Holger Hoffstätte.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits)
selftest/bpf: Make crashes more debuggable in test_progs
libbpf: Add documentation to map pinning API functions
libbpf: Fix malformed documentation formatting
selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().
bpf/selftests: Verify struct_ops prog sleepable behavior
bpf: Pass const struct bpf_prog * to .check_member
libbpf: Support sleepable struct_ops.s section
bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
selftests/bpf: Fix vmtest static compilation error
tools/resolve_btfids: Alter how HOSTCC is forced
tools/resolve_btfids: Install subcmd headers
bpf/docs: Document the nocast aliasing behavior of ___init
bpf/docs: Document how nested trusted fields may be defined
bpf/docs: Document cpumask kfuncs in a new file
selftests/bpf: Add selftest suite for cpumask kfuncs
selftests/bpf: Add nested trust selftests suite
bpf: Enable cpumasks to be queried and used as kptrs
bpf: Disallow NULLable pointers for trusted kfuncs
...
====================
Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
bpf 2023-01-27
We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 10 files changed, 170 insertions(+), 59 deletions(-).
The main changes are:
1) Fix preservation of register's parent/live fields when copying
range-info, from Eduard Zingerman.
2) Fix an off-by-one bug in bpf_mem_cache_idx() to select the right
cache, from Hou Tao.
3) Fix stack overflow from infinite recursion in sock_map_close(),
from Jakub Sitnicki.
4) Fix missing btf_put() in register_btf_id_dtor_kfuncs()'s error path,
from Jiri Olsa.
5) Fix a splat from bpf_setsockopt() via lsm_cgroup/socket_sock_rcv_skb,
from Kui-Feng Lee.
6) Fix bpf_send_signal[_thread]() helpers to hold a reference on the task,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix the kernel crash caused by bpf_setsockopt().
selftests/bpf: Cover listener cloning with progs attached to sockmap
selftests/bpf: Pass BPF skeleton to sockmap_listen ops tests
bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener
bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself
bpf: Add missing btf_put to register_btf_id_dtor_kfuncs
selftests/bpf: Verify copy_register_state() preserves parent/live fields
bpf: Fix to preserve reg parent/live fields when copying range info
bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
bpf: Fix off-by-one error in bpf_mem_cache_idx()
====================
Link: https://lore.kernel.org/r/20230127215820.4993-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch removes the msleep(4s) during netpoll_setup() if the carrier
appears instantly.
Here are some scenarios where this workaround is counter-productive in
modern ages:
Servers which have BMC communicating over NC-SI via the same NIC as gets
used for netconsole. BMC will keep the PHY up, hence the carrier
appearing instantly.
The link is fibre, SERDES getting sync could happen within 0.1Hz, and
the carrier also appears instantly.
Other than that, if a driver is reporting instant carrier and then
losing it, this is probably a driver bug.
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230125185230.3574681-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
GSO should not merge page pool recycled frames with standard reference
counted frames. Traditionally this didn't occur, at least not often.
However as we start looking at adding support for wireless adapters there
becomes the potential to mix the two due to A-MSDU repartitioning frames in
the receive path. There are possibly other places where this may have
occurred however I suspect they must be few and far between as we have not
seen this issue until now.
Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
Reported-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Conflicts:
drivers/net/ethernet/intel/ice/ice_main.c
418e53401e47 ("ice: move devlink port creation/deletion")
643ef23bd9dd ("ice: Introduce local var for readability")
https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/
https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/
drivers/net/ethernet/engleder/tsnep_main.c
3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues")
25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/
net/netfilter/nf_conntrack_proto_sctp.c
13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths")
f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/
https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 1d18bb1a4ddd ("devlink: allow registering parameters after
the instance") as the subject implies introduced possibility to register
devlink params even for already registered devlink instance. This is a
bit problematic, as the consistency or params list was originally
secured by the fact it is static during devlink lifetime. So in order to
protect the params list, take devlink instance lock during the params
operations. Introduce unlocked function variants and use them in drivers
in locked context. Put lock assertions to appropriate places.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Put couple of WARN_ONs in devlink_param_driverinit_value_get() function
to clearly indicate, that it is a driver bug if used without reload
support or for non-driverinit param.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
devlink_param_driverinit_value_set() currently returns int with possible
error, but no user is checking it anyway. The only reason for a fail is
a driver bug. So convert the function to return void and put WARN_ONs
on error paths.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a WARN_ON checking the param_item for being NULL when the param
is not inserted in the list. That indicates a driver BUG. Instead of
continuing to work with NULL pointer with its consequences, return.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no user outside the devlink code, so remove the export and make
the functions static. Move them before callers to avoid forward
declarations.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert all SET commands where new common code is applicable.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Most ethtool SET callbacks follow the same general structure.
ethnl_parse_header_dev_get()
rtnl_lock()
ethnl_ops_begin()
... do stuff ...
ethtool_notify()
ethnl_ops_complete()
rtnl_unlock()
ethnl_parse_header_dev_put()
This leads to a lot of copy / pasted code an bugs when people
mis-handle the error path.
Add a generic implementation of this pattern with a .set callback
in struct ethnl_request_ops called to "do stuff".
Also add an optional .set_validate which is called before
ethnl_ops_begin() -- a lot of implementations do basic request
capability / sanity checking at that point.
Because we want to avoid generating the notification when
no change happened - adopt a slightly hairy return values:
- 0 means nothing to do (no notification)
- 1 means done / continue
- negative error codes on error
Reuse .hdr_attr from struct ethnl_request_ops, GET and SET
use the same attr spaces in all cases.
Convert pause as an example (and to avoid unused function warnings).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Number of files depend on linux/splice.h getting included
by linux/skbuff.h which soon will no longer be the case.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Number of files depend on linux/sched/clock.h getting included
by linux/skbuff.h which soon will no longer be the case.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This include was added for skb_find_text() but all we need there
is a forward declaration of struct ts_config.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KCSAN reported multiple cpus can update use_time
at the same time.
Adds READ_ONCE()/WRITE_ONCE() annotations.
Note that 32bit arches are not fully protected,
but they will probably no longer be supported/used in 2106.
BUG: KCSAN: data-race in __xfrm_policy_check / __xfrm_policy_check
write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 0:
__xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664
__xfrm_policy_check2 include/net/xfrm.h:1174 [inline]
xfrm_policy_check include/net/xfrm.h:1179 [inline]
xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189
udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703
udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792
udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935
__udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020
udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133
ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439
ip6_input_finish net/ipv6/ip6_input.c:484 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:454 [inline]
ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:302 [inline]
ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5482 [inline]
__netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596
process_backlog+0x23f/0x3b0 net/core/dev.c:5924
__napi_poll+0x65/0x390 net/core/dev.c:6485
napi_poll net/core/dev.c:6552 [inline]
net_rx_action+0x37e/0x730 net/core/dev.c:6663
__do_softirq+0xf2/0x2c7 kernel/softirq.c:571
do_softirq+0xb1/0xf0 kernel/softirq.c:472
__local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396
__raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline]
_raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284
wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184
wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline]
wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 1:
__xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664
__xfrm_policy_check2 include/net/xfrm.h:1174 [inline]
xfrm_policy_check include/net/xfrm.h:1179 [inline]
xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189
udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703
udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792
udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935
__udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020
udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133
ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439
ip6_input_finish net/ipv6/ip6_input.c:484 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:454 [inline]
ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:302 [inline]
ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5482 [inline]
__netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596
process_backlog+0x23f/0x3b0 net/core/dev.c:5924
__napi_poll+0x65/0x390 net/core/dev.c:6485
napi_poll net/core/dev.c:6552 [inline]
net_rx_action+0x37e/0x730 net/core/dev.c:6663
__do_softirq+0xf2/0x2c7 kernel/softirq.c:571
do_softirq+0xb1/0xf0 kernel/softirq.c:472
__local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396
__raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline]
_raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284
wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184
wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline]
wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276
process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
worker_thread+0x618/0xa70 kernel/workqueue.c:2436
kthread+0x1a9/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
value changed: 0x0000000063c62d6f -> 0x0000000063c62d70
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4185 Comm: kworker/1:2 Tainted: G W 6.2.0-rc4-syzkaller-00009-gd532dd102151-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: wg-crypt-wg0 wg_packet_tx_worker
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
For some reason, blamed commit did the right thing in xfrm_policy_timer()
but did not in xfrm_timer_handler()
Fixes: 386c5680e2e8 ("xfrm: use time64_t for in-kernel timestamps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Almost all validation logic is in the drivers, but they are
missing reliable way to convey failure reason to userspace
applications.
Let's use extack to return this information to users.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Almost all validation logic is in the drivers, but they are
missing reliable way to convey failure reason to userspace
applications.
Let's use extack to return this information to users.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Like in all other functions in this file, a single point of exit is used
when extra operations are needed: unlock, decrement refcount, etc.
There is no functional change for the moment but it is better to do the
same here to make sure all cleanups are done in case of intermediate
errors.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Usually, attributes are propagated to subflows as well.
Here, if subflows are created by other ways than the MPTCP path-manager,
it is important to make sure they are in v6 if it is asked by the
userspace.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently the in-kernel PM arbitrary enforces that created subflow's
family must match the main MPTCP socket while the RFC allows mixing
IPv4 and IPv6 subflows.
This patch changes the in-kernel PM logic to create subflows matching
the currently selected source (or destination) address. IPv4 sockets
can pick only IPv4 addresses (and v4 mapped in v6), while IPv6 sockets
not restricted to V6ONLY can pick either IPv4 and IPv6 addresses as
long as the source and destination matches.
A helper, previously introduced is used to ease family matching checks,
taking care of IPv4 vs IPv4-mapped-IPv6 vs IPv6 only addresses.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/269
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
There are multiple ICMP rate limiting mechanisms:
* Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec
* v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask
* v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask
However, when ICMP output is limited, there is no way to tell
which limit has been hit or even if the limits are responsible
for the lack of ICMP output.
Add counters for each of the cases above. As we are within
local_bh_disable(), use the __INC stats variant.
Example output:
# nstat -sz "*RateLimit*"
IcmpOutRateLimitGlobal 134 0.0
IcmpOutRateLimitHost 770 0.0
Icmp6OutRateLimitHost 84 0.0
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com>
Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Users who want to share a single public IP address for outgoing connections
between several hosts traditionally reach for SNAT. However, SNAT requires
state keeping on the node(s) performing the NAT.
A stateless alternative exists, where a single IP address used for egress
can be shared between several hosts by partitioning the available ephemeral
port range. In such a setup:
1. Each host gets assigned a disjoint range of ephemeral ports.
2. Applications open connections from the host-assigned port range.
3. Return traffic gets routed to the host based on both, the destination IP
and the destination port.
An application which wants to open an outgoing connection (connect) from a
given port range today can choose between two solutions:
1. Manually pick the source port by bind()'ing to it before connect()'ing
the socket.
This approach has a couple of downsides:
a) Search for a free port has to be implemented in the user-space. If
the chosen 4-tuple happens to be busy, the application needs to retry
from a different local port number.
Detecting if 4-tuple is busy can be either easy (TCP) or hard
(UDP). In TCP case, the application simply has to check if connect()
returned an error (EADDRNOTAVAIL). That is assuming that the local
port sharing was enabled (REUSEADDR) by all the sockets.
# Assume desired local port range is 60_000-60_511
s = socket(AF_INET, SOCK_STREAM)
s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
s.bind(("192.0.2.1", 60_000))
s.connect(("1.1.1.1", 53))
# Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy
# Application must retry with another local port
In case of UDP, the network stack allows binding more than one socket
to the same 4-tuple, when local port sharing is enabled
(REUSEADDR). Hence detecting the conflict is much harder and involves
querying sock_diag and toggling the REUSEADDR flag [1].
b) For TCP, bind()-ing to a port within the ephemeral port range means
that no connecting sockets, that is those which leave it to the
network stack to find a free local port at connect() time, can use
the this port.
IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port
will be skipped during the free port search at connect() time.
2. Isolate the app in a dedicated netns and use the use the per-netns
ip_local_port_range sysctl to adjust the ephemeral port range bounds.
The per-netns setting affects all sockets, so this approach can be used
only if:
- there is just one egress IP address, or
- the desired egress port range is the same for all egress IP addresses
used by the application.
For TCP, this approach avoids the downsides of (1). Free port search and
4-tuple conflict detection is done by the network stack:
system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'")
s = socket(AF_INET, SOCK_STREAM)
s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1)
s.bind(("192.0.2.1", 0))
s.connect(("1.1.1.1", 53))
# Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy
For UDP this approach has limited applicability. Setting the
IP_BIND_ADDRESS_NO_PORT socket option does not result in local source
port being shared with other connected UDP sockets.
Hence relying on the network stack to find a free source port, limits the
number of outgoing UDP flows from a single IP address down to the number
of available ephemeral ports.
To put it another way, partitioning the ephemeral port range between hosts
using the existing Linux networking API is cumbersome.
To address this use case, add a new socket option at the SOL_IP level,
named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the
ephemeral port range for each socket individually.
The option can be used only to narrow down the per-netns local port
range. If the per-socket range lies outside of the per-netns range, the
latter takes precedence.
UAPI-wise, the low and high range bounds are passed to the kernel as a pair
of u16 values in host byte order packed into a u32. This avoids pointer
passing.
PORT_LO = 40_000
PORT_HI = 40_511
s = socket(AF_INET, SOCK_STREAM)
v = struct.pack("I", PORT_HI << 16 | PORT_LO)
s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v)
s.bind(("127.0.0.1", 0))
s.getsockname()
# Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511),
# if there is a free port. EADDRINUSE otherwise.
[1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116
Reviewed-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix spelling in net/ Kconfig files.
(reported by codespell)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: coreteam@netfilter.org
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Link: https://lore.kernel.org/r/20230124181724.18166-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Resolve an issue when calling sol_tcp_sockopt() on a socket with ktls
enabled. Prior to this patch, sol_tcp_sockopt() would only allow calls
if the function pointer of setsockopt of the socket was set to
tcp_setsockopt(). However, any socket with ktls enabled would have its
function pointer set to tls_setsockopt(). To resolve this issue, the
patch adds a check of the protocol of the linux socket and allows
bpf_setsockopt() to be called if ktls is initialized on the linux
socket. This ensures that calls to sol_tcp_sockopt() will succeed on
sockets with ktls enabled.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230125201608.908230-2-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
In a set of prior changes, we added the ability for struct_ops programs
to be sleepable. This patch enhances the dummy_st_ops selftest suite to
validate this behavior by adding a new sleepable struct_ops entry to
dummy_st_ops.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The .check_member field of struct bpf_struct_ops is currently passed the
member's btf_type via const struct btf_type *t, and a const struct
btf_member *member. This allows the struct_ops implementation to check
whether e.g. an ops is supported, but it would be useful to also enforce
that the struct_ops prog being loaded for that member has other
qualities, like being sleepable (or not). This patch therefore updates
the .check_member() callback to also take a const struct bpf_prog *prog
argument.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Once a socket has been unhashed, we want to prevent it from being
re-used in a sk_key entry as part of a routing operation.
This change marks the sk as SOCK_DEAD on unhash, which prevents addition
into the net's key list.
We need to do this during the key add path, rather than key lookup, as
we release the net keys_lock between those operations.
Fixes: 4a992bbd3650 ("mctp: Implement message fragmentation & reassembly")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, we have a race where we look up a sock through a "general"
(ie, not directly associated with the (src,dest,tag) tuple) key, then
drop the key reference while still holding the key's sock.
This change expands the key reference until we've finished using the
sock, and hence the sock reference too.
Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>.
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, we delete the key expiry timer (in sk->close) before
unhashing the sk. This means that another thread may find the sk through
its presence on the key list, and re-queue the timer.
This change moves the timer deletion to the unhash, after we have made
the key no longer observable, so the timer cannot be re-queued.
Fixes: 7b14e15ae6f4 ("mctp: Implement a timeout for tags")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, we correlate the mctp_sk_key lifetime to the sock lifetime
through the sock hash/unhash operations, but this is pretty tenuous, and
there are cases where we may have a temporary reference to an unhashed
sk.
This change makes the reference more explicit, by adding a hold on the
sock when it's associated with a mctp_sk_key, released on final key
unref.
Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the following call path:
ethnl_default_dumpit
-> ethnl_default_dump_one
-> ctx->ops->prepare_data
-> pause_prepare_data
struct genl_info *info will be passed as NULL, and pause_prepare_data()
dereferences it while getting the extended ack pointer.
To avoid that, just set the extack to NULL if "info" is NULL, since the
netlink extack handling messages know how to deal with that.
The pattern "info ? info->extack : NULL" is present in quite a few other
"prepare_data" implementations, so it's clear that it's a more general
problem to be dealt with at a higher level, but the code should have at
least adhered to the current conventions to avoid the NULL dereference.
Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reported-by: syzbot+9d44aae2720fc40b8474@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the following call path:
ethnl_default_dumpit
-> ethnl_default_dump_one
-> ctx->ops->prepare_data
-> stats_prepare_data
struct genl_info *info will be passed as NULL, and stats_prepare_data()
dereferences it while getting the extended ack pointer.
To avoid that, just set the extack to NULL if "info" is NULL, since the
netlink extack handling messages know how to deal with that.
The pattern "info ? info->extack : NULL" is present in quite a few other
"prepare_data" implementations, so it's clear that it's a more general
problem to be dealt with at a higher level, but the code should have at
least adhered to the current conventions to avoid the NULL dereference.
Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When listen() and accept() are called on an x25 socket
that connect() succeeds, accept() succeeds immediately.
This is because x25_connect() queues the skb to
sk->sk_receive_queue, and x25_accept() dequeues it.
This creates a child socket with the sk of the parent
x25 socket, which can cause confusion.
Fix x25_listen() to return -EINVAL if the socket has
already been successfully connect()ed to avoid this issue.
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The struct device for ISM devices was part of struct smcd_dev. Move to
struct ism_dev, provide a new API call in struct smcd_ops, and convert
existing SMCD code accordingly.
Furthermore, remove struct smcd_dev from struct ism_dev.
This is the final part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ism module had SMC-D-specific code sprinkled across the entire module.
We are now consolidating the SMC-D-specific parts into the latter parts
of the module, so it becomes more clear what code is intended for use with
ISM, and which parts are glue code for usage in the context of SMC-D.
This is the fourth part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We separate the code implementing the struct smcd_ops API in the ISM
device driver from the functions that may be used by other exploiters of
ISM devices.
Note: We start out small, and don't offer the whole breadth of the ISM
device for public use, as many functions are specific to or likely only
ever used in the context of SMC-D.
This is the third part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register the smc module with the new ism device driver API.
This is the second part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new API that allows other drivers to concurrently access ISM devices.
To do so, we introduce a new API that allows other modules to register for
ISM device usage. Furthermore, we move the GID to struct ism, where it
belongs conceptually, and rename and relocate struct smcd_event to struct
ism_event.
This is the first part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|