summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-09-03net/smc: fix sock refcounting in case of terminationUrsula Braun
When an ISM device is removed, all its linkgroups are terminated, i.e. all the corresponding connections are killed. Connection killing invokes smc_close_active_abort(), which decreases the sock refcount for certain states to simulate passive closing. And it cancels the close worker and has to give up the sock lock for this timeframe. This opens the door for a passive close worker or a socket close to run in between. In this case smc_close_active_abort() and passive close worker resp. smc_release() might do a sock_put for passive closing. This causes: [ 1323.315943] refcount_t: underflow; use-after-free. [ 1323.316055] WARNING: CPU: 3 PID: 54469 at lib/refcount.c:28 refcount_warn_saturate+0xe8/0x130 [ 1323.316069] Kernel panic - not syncing: panic_on_warn set ... [ 1323.316084] CPU: 3 PID: 54469 Comm: uperf Not tainted 5.9.0-20200826.rc2.git0.46328853ed20.300.fc32.s390x+debug #1 [ 1323.316096] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) [ 1323.316108] Call Trace: [ 1323.316125] [<00000000c0d4aae8>] show_stack+0x90/0xf8 [ 1323.316143] [<00000000c15989b0>] dump_stack+0xa8/0xe8 [ 1323.316158] [<00000000c0d8344e>] panic+0x11e/0x288 [ 1323.316173] [<00000000c0d83144>] __warn+0xac/0x158 [ 1323.316187] [<00000000c1597a7a>] report_bug+0xb2/0x130 [ 1323.316201] [<00000000c0d36424>] monitor_event_exception+0x44/0xc0 [ 1323.316219] [<00000000c195c716>] pgm_check_handler+0x1da/0x238 [ 1323.316234] [<00000000c151844c>] refcount_warn_saturate+0xec/0x130 [ 1323.316280] ([<00000000c1518448>] refcount_warn_saturate+0xe8/0x130) [ 1323.316310] [<000003ff801f2e2a>] smc_release+0x192/0x1c8 [smc] [ 1323.316323] [<00000000c169f1fa>] __sock_release+0x5a/0xe0 [ 1323.316334] [<00000000c169f2ac>] sock_close+0x2c/0x40 [ 1323.316350] [<00000000c1086de0>] __fput+0xb8/0x278 [ 1323.316362] [<00000000c0db1e0e>] task_work_run+0x76/0xb8 [ 1323.316393] [<00000000c0d8ab84>] do_exit+0x26c/0x520 [ 1323.316408] [<00000000c0d8af08>] do_group_exit+0x48/0xc0 [ 1323.316421] [<00000000c0d8afa8>] __s390x_sys_exit_group+0x28/0x38 [ 1323.316433] [<00000000c195c32c>] system_call+0xe0/0x2b4 [ 1323.316446] 1 lock held by uperf/54469: [ 1323.316456] #0: 0000000044125e60 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x44/0xe0 The patch rechecks sock state in smc_close_active_abort() after smc_close_cancel_work() to avoid duplicate decrease of sock refcount for the same purpose. Fixes: 611b63a12732 ("net/smc: cancel tx worker in case of socket aborts") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03net/smc: reset sndbuf_desc if freedUrsula Braun
When an SMC connection is created, and there is a problem to create an RMB or DMB, the previously created send buffer is thrown away as well including buffer descriptor freeing. Make sure the connection no longer references the freed buffer descriptor, otherwise bugs like this are possible: [71556.835148] ============================================================================= [71556.835168] BUG kmalloc-128 (Tainted: G B OE ): Poison overwritten [71556.835172] ----------------------------------------------------------------------------- [71556.835179] INFO: 0x00000000d20894be-0x00000000aaef63e9 @offset=2724. First byte 0x0 instead of 0x6b [71556.835215] INFO: Allocated in __smc_buf_create+0x184/0x578 [smc] age=0 cpu=5 pid=46726 [71556.835234] ___slab_alloc+0x5a4/0x690 [71556.835239] __slab_alloc.constprop.0+0x70/0xb0 [71556.835243] kmem_cache_alloc_trace+0x38e/0x3f8 [71556.835250] __smc_buf_create+0x184/0x578 [smc] [71556.835257] smc_buf_create+0x2e/0xe8 [smc] [71556.835264] smc_listen_work+0x516/0x6a0 [smc] [71556.835275] process_one_work+0x280/0x478 [71556.835280] worker_thread+0x66/0x368 [71556.835287] kthread+0x17a/0x1a0 [71556.835294] ret_from_fork+0x28/0x2c [71556.835301] INFO: Freed in smc_buf_create+0xd8/0xe8 [smc] age=0 cpu=5 pid=46726 [71556.835307] __slab_free+0x246/0x560 [71556.835311] kfree+0x398/0x3f8 [71556.835318] smc_buf_create+0xd8/0xe8 [smc] [71556.835324] smc_listen_work+0x516/0x6a0 [smc] [71556.835328] process_one_work+0x280/0x478 [71556.835332] worker_thread+0x66/0x368 [71556.835337] kthread+0x17a/0x1a0 [71556.835344] ret_from_fork+0x28/0x2c [71556.835348] INFO: Slab 0x00000000a0744551 objects=51 used=51 fp=0x0000000000000000 flags=0x1ffff00000010200 [71556.835352] INFO: Object 0x00000000563480a1 @offset=2688 fp=0x00000000289567b2 [71556.835359] Redzone 000000006783cde2: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835363] Redzone 00000000e35b876e: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835367] Redzone 0000000023074562: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835372] Redzone 00000000b9564b8c: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835376] Redzone 00000000810c6362: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835380] Redzone 0000000065ef52c3: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835384] Redzone 00000000c5dd6984: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835388] Redzone 000000004c480f8f: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ [71556.835392] Object 00000000563480a1: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835397] Object 000000009c479d06: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835401] Object 000000006e1dce92: 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b kkkk....kkkkkkkk [71556.835405] Object 00000000227f7cf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835410] Object 000000009a701215: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835414] Object 000000003731ce76: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835418] Object 00000000f7085967: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [71556.835422] Object 0000000007f99927: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. [71556.835427] Redzone 00000000579c4913: bb bb bb bb bb bb bb bb ........ [71556.835431] Padding 00000000305aef82: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [71556.835435] Padding 00000000b1cdd722: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [71556.835438] Padding 00000000c7568199: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [71556.835442] Padding 00000000fad4c4d4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ [71556.835451] CPU: 0 PID: 47939 Comm: kworker/0:15 Tainted: G B OE 5.9.0-rc1uschi+ #54 [71556.835456] Hardware name: IBM 3906 M03 703 (LPAR) [71556.835464] Workqueue: events smc_listen_work [smc] [71556.835470] Call Trace: [71556.835478] [<00000000d5eaeb10>] show_stack+0x90/0xf8 [71556.835493] [<00000000d66fc0f8>] dump_stack+0xa8/0xe8 [71556.835499] [<00000000d61a511c>] check_bytes_and_report+0x104/0x130 [71556.835504] [<00000000d61a57b2>] check_object+0x26a/0x2e0 [71556.835509] [<00000000d61a59bc>] alloc_debug_processing+0x194/0x238 [71556.835514] [<00000000d61a8c14>] ___slab_alloc+0x5a4/0x690 [71556.835519] [<00000000d61a9170>] __slab_alloc.constprop.0+0x70/0xb0 [71556.835524] [<00000000d61aaf66>] kmem_cache_alloc_trace+0x38e/0x3f8 [71556.835530] [<000003ff80549bbc>] __smc_buf_create+0x184/0x578 [smc] [71556.835538] [<000003ff8054a396>] smc_buf_create+0x2e/0xe8 [smc] [71556.835545] [<000003ff80540c16>] smc_listen_work+0x516/0x6a0 [smc] [71556.835549] [<00000000d5f0f448>] process_one_work+0x280/0x478 [71556.835554] [<00000000d5f0f6a6>] worker_thread+0x66/0x368 [71556.835559] [<00000000d5f18692>] kthread+0x17a/0x1a0 [71556.835563] [<00000000d6abf3b8>] ret_from_fork+0x28/0x2c [71556.835569] INFO: lockdep is turned off. [71556.835573] FIX kmalloc-128: Restoring 0x00000000d20894be-0x00000000aaef63e9=0x6b [71556.835577] FIX kmalloc-128: Marking all objects used Fixes: fd7f3a746582 ("net/smc: remove freed buffer from list") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03net/smc: set rx_off for SMCR explicitlyUrsula Braun
SMC tries to make use of SMCD first. If a problem shows up, it tries to switch to SMCR. If the SMCD initializing problem shows up after the SMCD connection has already been initialized, field rx_off keeps the wrong SMCD value for SMCR, which results in corrupted data at the receiver. This patch adds an explicit (re-)setting of field rx_off to zero if the connection uses SMCR. Fixes: be244f28d22f ("net/smc: add SMC-D support in data transfer") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03net/smc: fix toleration of fake add_link messagesKarsten Graul
Older SMCR implementations had no link failover support and used one link only. Because the handshake protocol requires to try the establishment of a second link the old code sent a fake add_link message and declined any server response afterwards. The current code supports multiple links and inspects the received fake add_link message more closely. To tolerate the fake add_link messages smc_llc_is_local_add_link() needs an improved check of the message to be able to separate between locally enqueued and fake add_link messages. And smc_llc_cli_add_link() needs to check if the provided qp_mtu size is invalid and reject the add_link request in that case. Fixes: c48254fa48e5 ("net/smc: move add link processing for new device into llc layer") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03ip: expose inet sockopts through inet_diagWei Wang
Expose all exisiting inet sockopt bits through inet_diag for debug purpose. Corresponding changes in iproute2 ss will be submitted to output all these values. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03ethtool: fix error handling in ethtool_phys_idEdward Cree
If ops->set_phys_id() returned an error, previously we would only break out of the inner loop, which neither stopped the outer loop nor returned the error to the user (since 'rc' would be overwritten on the next pass through the loop). Thus, rewrite it to use a single loop, so that the break does the right thing. Use u64 for 'count' and 'i' to prevent overflow in case of (unreasonably) large values of id.data and n. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: avoid duplicated code in l2tp_tunnel_closeallTom Parkin
l2tp_tunnel_closeall is called as a part of tunnel shutdown in order to close all the sessions held by the tunnel. The code it uses to close a session duplicates what l2tp_session_delete does. Rather than duplicating the code, have l2tp_tunnel_closeall call l2tp_session_delete instead. This involves a very minor change to locking in l2tp_tunnel_closeall. Previously, l2tp_tunnel_closeall checked the session "dead" flag while holding tunnel->hlist_lock. This allowed for the code to step to the next session in the list without releasing the lock if the current session happened to be in the process of closing already. By calling l2tp_session_delete instead, l2tp_tunnel_closeall must now drop and regain the hlist lock for each session in the tunnel list. Given that the likelihood of a session being in the process of closing when the tunnel is closed, it seems worth this very minor potential loss of efficiency to avoid duplication of the session delete code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: make magic feather checks more usefulTom Parkin
The l2tp tunnel and session structures contain a "magic feather" field which was originally intended to help trace lifetime bugs in the code. Since the introduction of the shared kernel refcount code in refcount.h, and l2tp's porting to those APIs, we are covered by the refcount code's checks and warnings. Duplicating those checks in the l2tp code isn't useful. However, magic feather checks are still useful to help to detect bugs stemming from misuse/trampling of the sk_user_data pointer in struct sock. The l2tp code makes extensive use of sk_user_data to stash pointers to the tunnel and session structures, and if another subsystem overwrites sk_user_data it's important to detect this. As such, rework l2tp's magic feather checks to focus on validating the tunnel and session data structures when they're extracted from sk_user_data. * Add a new accessor function l2tp_sk_to_tunnel which contains a magic feather check, and is used by l2tp_core and l2tp_ip[6] * Comment l2tp_udp_encap_recv which doesn't use this new accessor function because of the specific nature of the codepath it is called in * Drop l2tp_session_queue_purge's check on the session magic feather: it is called from code which is walking the tunnel session list, and hence doesn't need validation * Drop l2tp_session_free's check on the tunnel magic feather: the intention of this check is covered by refcount.h's reference count sanity checking * Add session magic validation in pppol2tp_ioctl. On failure return -EBADF, which mirrors the approach in pppol2tp_[sg]etsockopt. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: capture more tx errors in data plane statsTom Parkin
l2tp_xmit_skb has a number of failure paths which are not reflected in the tunnel and session statistics because the stats are updated by l2tp_xmit_core. Hence any errors occurring before l2tp_xmit_core is called are missed from the statistics. Refactor the transmit path slightly to capture all error paths. l2tp_xmit_skb now leaves all the actual work of transmission to l2tp_xmit_core, and updates the statistics based on l2tp_xmit_core's return code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: drop net argument from l2tp_tunnel_createTom Parkin
The argument is unused, so remove it. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: drop data_len argument from l2tp_xmit_coreTom Parkin
The data_len argument passed to l2tp_xmit_core is no longer used, so remove it. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: remove header length param from l2tp_xmit_skbTom Parkin
All callers pass the session structure's hdr_len field as the header length parameter to l2tp_xmit_skb. Since we're passing a pointer to the session structure to l2tp_xmit_skb anyway, there's not much point breaking the header length out as a separate argument. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-02tipc: fix shutdown() of connectionless socketTetsuo Handa
syzbot is reporting hung task at nbd_ioctl() [1], for there are two problems regarding TIPC's connectionless socket's shutdown() operation. ---------- #include <fcntl.h> #include <sys/socket.h> #include <sys/ioctl.h> #include <linux/nbd.h> #include <unistd.h> int main(int argc, char *argv[]) { const int fd = open("/dev/nbd0", 3); alarm(5); ioctl(fd, NBD_SET_SOCK, socket(PF_TIPC, SOCK_DGRAM, 0)); ioctl(fd, NBD_DO_IT, 0); /* To be interrupted by SIGALRM. */ return 0; } ---------- One problem is that wait_for_completion() from flush_workqueue() from nbd_start_device_ioctl() from nbd_ioctl() cannot be completed when nbd_start_device_ioctl() received a signal at wait_event_interruptible(), for tipc_shutdown() from kernel_sock_shutdown(SHUT_RDWR) from nbd_mark_nsock_dead() from sock_shutdown() from nbd_start_device_ioctl() is failing to wake up a WQ thread sleeping at wait_woken() from tipc_wait_for_rcvmsg() from sock_recvmsg() from sock_xmit() from nbd_read_stat() from recv_work() scheduled by nbd_start_device() from nbd_start_device_ioctl(). Fix this problem by always invoking sk->sk_state_change() (like inet_shutdown() does) when tipc_shutdown() is called. The other problem is that tipc_wait_for_rcvmsg() cannot return when tipc_shutdown() is called, for tipc_shutdown() sets sk->sk_shutdown to SEND_SHUTDOWN (despite "how" is SHUT_RDWR) while tipc_wait_for_rcvmsg() needs sk->sk_shutdown set to RCV_SHUTDOWN or SHUTDOWN_MASK. Fix this problem by setting sk->sk_shutdown to SHUTDOWN_MASK (like inet_shutdown() does) when the socket is connectionless. [1] https://syzkaller.appspot.com/bug?id=3fe51d307c1f0a845485cf1798aa059d12bf18b2 Reported-by: syzbot <syzbot+e36f41d207137b5d12f7@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-02ipv6: Fix sysctl max for fib_multipath_hash_policyIdo Schimmel
Cited commit added the possible value of '2', but it cannot be set. Fix it by adjusting the maximum value to '2'. This is consistent with the corresponding IPv4 sysctl. Before: # sysctl -w net.ipv6.fib_multipath_hash_policy=2 sysctl: setting key "net.ipv6.fib_multipath_hash_policy": Invalid argument net.ipv6.fib_multipath_hash_policy = 2 # sysctl net.ipv6.fib_multipath_hash_policy net.ipv6.fib_multipath_hash_policy = 0 After: # sysctl -w net.ipv6.fib_multipath_hash_policy=2 net.ipv6.fib_multipath_hash_policy = 2 # sysctl net.ipv6.fib_multipath_hash_policy net.ipv6.fib_multipath_hash_policy = 2 Fixes: d8f74f0975d8 ("ipv6: Support multipath hashing on inner IP pkts") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-02xsk: Fix use-after-free in failed shared_umem bindMagnus Karlsson
Fix use-after-free when a shared umem bind fails. The code incorrectly tried to free the allocated buffer pool both in the bind code and then later also when the socket was released. Fix this by setting the buffer pool pointer to NULL after the bind code has freed the pool, so that the socket release code will not try to free the pool. This is the same solution as the regular, non-shared umem code path has. This was missing from the shared umem path. Fixes: b5aea28dca13 ("xsk: Add shared umem support between queue ids") Reported-by: syzbot+5334f62e4d22804e646a@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599032164-25684-1-git-send-email-magnus.karlsson@intel.com
2020-09-02xsk: Fix null check on error return pathGustavo A. R. Silva
Currently, dma_map is being checked, when the right object identifier to be null-checked is dma_map->dma_pages, instead. Fix this by null-checking dma_map->dma_pages. Fixes: 921b68692abb ("xsk: Enable sharing of dma mappings") Addresses-Coverity-ID: 1496811 ("Logically dead code") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20200902150750.GA7257@embeddedor
2020-09-02xsk: Fix possible segfault at xskmap entry insertionMagnus Karlsson
Fix possible segfault when entry is inserted into xskmap. This can happen if the socket is in a state where the umem has been set up, the Rx ring created but it has yet to be bound to a device. In this case the pool has not yet been created and we cannot reference it for the existence of the fill ring. Fix this by removing the whole xsk_is_setup_for_bpf_map function. Once upon a time, it was used to make sure that the Rx and fill rings where set up before the driver could call xsk_rcv, since there are no tests for the existence of these rings in the data path. But these days, we have a state variable that we test instead. When it is XSK_BOUND, everything has been set up correctly and the socket has been bound. So no reason to have the xsk_is_setup_for_bpf_map function anymore. Fixes: 7361f9c3d719 ("xsk: Move fill and completion rings to buffer pool") Reported-by: syzbot+febe51d44243fbc564ee@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599037569-26690-1-git-send-email-magnus.karlsson@intel.com
2020-09-02xsk: Fix possible segfault in xsk umem diagnosticsMagnus Karlsson
Fix possible segfault in the xsk diagnostics code when dumping information about the umem. This can happen when a umem has been created, but the socket has not been bound yet. In this case, the xsk buffer pool does not exist yet and we cannot dump the information that was moved from the umem to the buffer pool. Fix this by testing for the existence of the buffer pool and if not there, do not dump any of that information. Fixes: c2d3d6a47462 ("xsk: Move queue_id, dev and need_wakeup to buffer pool") Fixes: 7361f9c3d719 ("xsk: Move fill and completion rings to buffer pool") Reported-by: syzbot+3f04d36b7336f7868066@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1599036743-26454-1-git-send-email-magnus.karlsson@intel.com
2020-09-01net: openvswitch: fixes crash if nf_conncount_init() failsEelco Chaudron
If nf_conncount_init fails currently the dispatched work is not canceled, causing problems when the timer fires. This change fixes this by not scheduling the work until all initialization is successful. Fixes: a65878d6f00b ("net: openvswitch: fixes potential deadlock in dp cleanup code") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-09-01 The following pull-request contains BPF updates for your *net-next* tree. There are two small conflicts when pulling, resolve as follows: 1) Merge conflict in tools/lib/bpf/libbpf.c between 88a82120282b ("libbpf: Factor out common ELF operations and improve logging") in bpf-next and 1e891e513e16 ("libbpf: Fix map index used in error message") in net-next. Resolve by taking the hunk in bpf-next: [...] scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx); data = elf_sec_data(obj, scn); if (!scn || !data) { pr_warn("elf: failed to get %s map definitions for %s\n", MAPS_ELF_SEC, obj->path); return -EINVAL; } [...] 2) Merge conflict in drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c between 9647c57b11e5 ("xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for better performance") in bpf-next and e20f0dbf204f ("net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES") in net-next. Resolve the two locations by retaining net_prefetch() and taking xsk_buff_dma_sync_for_cpu() from bpf-next. Should look like: [...] xdp_set_data_meta_invalid(xdp); xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); net_prefetch(xdp->data); [...] We've added 133 non-merge commits during the last 14 day(s) which contain a total of 246 files changed, 13832 insertions(+), 3105 deletions(-). The main changes are: 1) Initial support for sleepable BPF programs along with bpf_copy_from_user() helper for tracing to reliably access user memory, from Alexei Starovoitov. 2) Add BPF infra for writing and parsing TCP header options, from Martin KaFai Lau. 3) bpf_d_path() helper for returning full path for given 'struct path', from Jiri Olsa. 4) AF_XDP support for shared umems between devices and queues, from Magnus Karlsson. 5) Initial prep work for full BPF-to-BPF call support in libbpf, from Andrii Nakryiko. 6) Generalize bpf_sk_storage map & add local storage for inodes, from KP Singh. 7) Implement sockmap/hash updates from BPF context, from Lorenz Bauer. 8) BPF xor verification for scalar types & add BPF link iterator, from Yonghong Song. 9) Use target's prog type for BPF_PROG_TYPE_EXT prog verification, from Udip Pant. 10) Rework BPF tracing samples to use libbpf loader, from Daniel T. Lee. 11) Fix xdpsock sample to really cycle through all buffers, from Weqaar Janjua. 12) Improve type safety for tun/veth XDP frame handling, from Maciej Żenczykowski. 13) Various smaller cleanups and improvements all over the place. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net/tls: Implement getsockopt SOL_TLS TLS_RXYutaro Hayakawa
Implement the getsockopt SOL_TLS TLS_RX which is currently missing. The primary usecase is to use it in conjunction with TCP_REPAIR to checkpoint/restore the TLS record layer state. TLS connection state usually exists on the user space library. So basically we can easily extract it from there, but when the TLS connections are delegated to the kTLS, it is not the case. We need to have a way to extract the TLS state from the kernel for both of TX and RX side. The new TLS_RX getsockopt copies the crypto_info to user in the same way as TLS_TX does. We have described use cases in our research work in Netdev 0x14 Transport Workshop [1]. Also, there is an TLS implementation called tlse [2] which supports TLS connection migration. They have support of kTLS and their code shows that they are expecting the future support of this option. [1] https://speakerdeck.com/yutarohayakawa/prism-proxies-without-the-pain [2] https://github.com/eduardsui/tlse Signed-off-by: Yutaro Hayakawa <yhayakawa3720@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01pktgen: fix error message with wrong function nameLeesoo Ahn
Error on calling kthread_create_on_node prints wrong function name, kernel_thread. Fixes: 94dcf29a11b3 ("kthread: use kthread_create_on_node()") Signed-off-by: Leesoo Ahn <dev@ooseel.net> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net: openvswitch: remove unused keep_flowsTonghao Zhang
keep_flows was introduced by [1], which used as flag to delete flows or not. When rehashing or expanding the table instance, we will not flush the flows. Now don't use it anymore, remove it. [1] - https://github.com/openvswitch/ovs/commit/acd051f1761569205827dc9b037e15568a8d59f8 Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net: openvswitch: refactor flow free functionTonghao Zhang
Decrease table->count and ufid_count unconditionally, because we only don't use count or ufid_count to count when flushing the flows. To simplify the codes, we remove the "count" argument of table_instance_flow_free. To avoid a bug when deleting flows in the future, add WARN_ON in flush flows function. Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net: openvswitch: improve the coding styleTonghao Zhang
Not change the logic, just improve the coding style. Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01Bluetooth: Clear suspend tasks on unregisterAbhishek Pandit-Subedi
While unregistering, make sure to clear the suspend tasks before cancelling the work. If the unregister is called during resume from suspend, this will unnecessarily add 2s to the resume time otherwise. Fixes: 4e8c36c3b0d73d (Bluetooth: Fix suspend notifier race) Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-08-31ipvs: remove dependency on ip6_tablesYaroslav Bolyukin
This dependency was added because ipv6_find_hdr was in iptables specific code but is no longer required Fixes: f8f626754ebe ("ipv6: Move ipv6_find_hdr() out of Netfilter code.") Fixes: 63dca2c0b0e7 ("ipvs: Fix faulty IPv6 extension header handling in IPVS") Signed-off-by: Yaroslav Bolyukin <iam@lach.pw> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-31net: ipv4: remove unused arg exact_dif in compute_scoreMiaohe Lin
The arg exact_dif is not used anymore, remove it. inet_exact_dif_match() is no longer needed after the above is removed, so remove it too. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: ipv6: remove unused arg exact_dif in compute_scoreMiaohe Lin
The arg exact_dif is not used anymore, remove it. inet6_exact_dif_match() is no longer needed after the above is removed, remove it too. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31tipc: Remove unused macro TIPC_NACK_INTVYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31tipc: Remove unused macro TIPC_FWD_MSGYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31mptcp: Remove unused macro MPTCP_SAME_STATEYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: clean up codestyleMiaohe Lin
This is a pure codestyle cleanup patch. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: Use helper macro IP_MAX_MTU in __ip_append_data()Miaohe Lin
What 0xFFFF means here is actually the max mtu of a ip packet. Use help macro IP_MAX_MTU here. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31openvswitch: using ip6_fragment in ipv6_stubwenxu
Using ipv6_stub->ipv6_fragment to avoid the netfilter dependency Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31ipv6: add ipv6_fragment hook in ipv6_stubwenxu
Add ipv6_fragment to ipv6_stub to avoid calling netfilter when access ip6_fragment. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31xsk: Add shared umem support between devicesMagnus Karlsson
Add support to share a umem between different devices. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same device. Note that when sharing a umem between devices, just as in the case of sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket (with two setsockopts, one for each ring) before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics of the rings can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-13-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Add shared umem support between queue idsMagnus Karlsson
Add support to share a umem between queue ids on the same device. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same queue id and device, and you shared one set of fill and completion rings. However, note that when sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-12-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Enable sharing of dma mappingsMagnus Karlsson
Enable the sharing of dma mappings by moving them out from the buffer pool. Instead we put each dma mapped umem region in a list in the umem structure. If dma has already been mapped for this umem and device, it is not mapped again and the existing dma mappings are reused. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-9-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move addrs from buffer pool to umemMagnus Karlsson
Replicate the addrs pointer in the buffer pool to the umem. This mapping will be the same for all buffer pools sharing the same umem. In the buffer pool we leave the addrs pointer for performance reasons. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-8-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move xsk_tx_list and its lock to buffer poolMagnus Karlsson
Move the xsk_tx_list and the xsk_tx_list_lock from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one xsk_tx_list per device and queue id, so it should be located in the buffer pool. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-7-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move queue_id, dev and need_wakeup to buffer poolMagnus Karlsson
Move queue_id, dev, and need_wakeup from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one buffer pool per dev and queue id, so these variables should belong to the buffer pool, not the umem. Need_wakeup is also something that is set on a per napi level, so there is usually one per device and queue id. So move this to the buffer pool too. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-6-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move fill and completion rings to buffer poolMagnus Karlsson
Move the fill and completion rings from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queue ids. In this case, we need one fill and completion ring per queue id. As the buffer pool is per queue id and napi id this is a natural place for it and one umem struture can be shared between these buffer pools. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-5-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Create and free buffer pool independently from umemMagnus Karlsson
Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-4-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Rename xsk zero-copy driver interfacesMagnus Karlsson
Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umemMagnus Karlsson
Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
2020-08-31netlink: policy: correct validation type checkJohannes Berg
In the policy export for binary attributes I erroneously used a != NLA_VALIDATE_NONE comparison instead of checking for the two possible values, which meant that if a validation function pointer ended up aliasing the min/max as negatives, we'd hit a warning in nla_get_range_unsigned(). Fix this to correctly check for only the two types that should be handled here, i.e. range with or without warn-too-long. Reported-by: syzbot+353df1490da781637624@syzkaller.appspotmail.com Fixes: 8aa26c575fb3 ("netlink: make NLA_BINARY validation more flexible") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Do not delete clash entries on reply, let them expire instead, from Florian Westphal. 2) Do not report EAGAIN to nfnetlink, otherwise this enters a busy loop. Update nfnetlink_unicast() to translate EAGAIN to ENOBUFS. 3) Remove repeated words in code comments, from Randy Dunlap. 4) Several patches for the flowtable selftests, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-30tipc: fix using smp_processor_id() in preemptibleTuong Lien
The 'this_cpu_ptr()' is used to obtain the AEAD key' TFM on the current CPU for encryption, however the execution can be preemptible since it's actually user-space context, so the 'using smp_processor_id() in preemptible' has been observed. We fix the issue by using the 'get/put_cpu_ptr()' API which consists of a 'preempt_disable()' instead. Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-29netfilter: nft_socket: add wildcard supportBalazs Scheidler
Add NFT_SOCKET_WILDCARD to match to wildcard socket listener. Signed-off-by: Balazs Scheidler <bazsi77@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>