Age | Commit message (Collapse) | Author |
|
Fixed function sk_msg_clone() to prevent overflow of 'dst' while adding
pages in scatterlist entries. The overflow of 'dst' causes crash in kernel
tls module while doing record encryption.
Crash fixed by this patch.
[ 78.796119] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 78.804900] Mem abort info:
[ 78.807683] ESR = 0x96000004
[ 78.810744] Exception class = DABT (current EL), IL = 32 bits
[ 78.816677] SET = 0, FnV = 0
[ 78.819727] EA = 0, S1PTW = 0
[ 78.822873] Data abort info:
[ 78.825759] ISV = 0, ISS = 0x00000004
[ 78.829600] CM = 0, WnR = 0
[ 78.832576] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000bf8ee311
[ 78.839195] [0000000000000008] pgd=0000000000000000
[ 78.844081] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 78.849642] Modules linked in: tls xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables xt_CHECKSUM cpve cpufreq_conservative lm90 ina2xx crct10dif_ce
[ 78.865377] CPU: 0 PID: 6007 Comm: openssl Not tainted 4.20.0-rc6-01647-g754d5da63145-dirty #107
[ 78.874149] Hardware name: LS1043A RDB Board (DT)
[ 78.878844] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 78.883632] pc : scatterwalk_copychunks+0x164/0x1c8
[ 78.888500] lr : scatterwalk_copychunks+0x160/0x1c8
[ 78.893366] sp : ffff00001d04b600
[ 78.896668] x29: ffff00001d04b600 x28: ffff80006814c680
[ 78.901970] x27: 0000000000000000 x26: ffff80006c8de786
[ 78.907272] x25: ffff00001d04b760 x24: 000000000000001a
[ 78.912573] x23: 0000000000000006 x22: ffff80006814e440
[ 78.917874] x21: 0000000000000100 x20: 0000000000000000
[ 78.923175] x19: 000081ffffffffff x18: 0000000000000400
[ 78.928476] x17: 0000000000000008 x16: 0000000000000000
[ 78.933778] x15: 0000000000000100 x14: 0000000000000001
[ 78.939079] x13: 0000000000001080 x12: 0000000000000020
[ 78.944381] x11: 0000000000001080 x10: 00000000ffff0002
[ 78.949683] x9 : ffff80006814c248 x8 : 00000000ffff0000
[ 78.954985] x7 : ffff80006814c318 x6 : ffff80006c8de786
[ 78.960286] x5 : 0000000000000f80 x4 : ffff80006c8de000
[ 78.965588] x3 : 0000000000000000 x2 : 0000000000001086
[ 78.970889] x1 : ffff7e0001b74e02 x0 : 0000000000000000
[ 78.976192] Process openssl (pid: 6007, stack limit = 0x00000000291367f9)
[ 78.982968] Call trace:
[ 78.985406] scatterwalk_copychunks+0x164/0x1c8
[ 78.989927] skcipher_walk_next+0x28c/0x448
[ 78.994099] skcipher_walk_done+0xfc/0x258
[ 78.998187] gcm_encrypt+0x434/0x4c0
[ 79.001758] tls_push_record+0x354/0xa58 [tls]
[ 79.006194] bpf_exec_tx_verdict+0x1e4/0x3e8 [tls]
[ 79.010978] tls_sw_sendmsg+0x650/0x780 [tls]
[ 79.015326] inet_sendmsg+0x2c/0xf8
[ 79.018806] sock_sendmsg+0x18/0x30
[ 79.022284] __sys_sendto+0x104/0x138
[ 79.025935] __arm64_sys_sendto+0x24/0x30
[ 79.029936] el0_svc_common+0x60/0xe8
[ 79.033588] el0_svc_handler+0x2c/0x80
[ 79.037327] el0_svc+0x8/0xc
[ 79.040200] Code: 6b01005f 54fff788 940169b1 f9000320 (b9400801)
[ 79.046283] ---[ end trace 74db007d069c1cf7 ]---
Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When dumping proxy entries the dump request has NTF_PROXY set in
ndm_flags. strict mode checking needs to be updated to allow this
flag.
Fixes: 51183d233b5a ("net/neighbor: Update neigh_dump_info for strict data checking")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add napi_disable routine in gro_cells_destroy since starting from
commit c42858eaf492 ("gro_cells: remove spinlock protecting receive
queues") gro_cell_poll and gro_cells_destroy can run concurrently on
napi_skbs list producing a kernel Oops if the tunnel interface is
removed while gro_cell_poll is running. The following Oops has been
triggered removing a vxlan device while the interface is receiving
traffic
[ 5628.948853] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 5628.949981] PGD 0 P4D 0
[ 5628.950308] Oops: 0002 [#1] SMP PTI
[ 5628.950748] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.20.0-rc6+ #41
[ 5628.952940] RIP: 0010:gro_cell_poll+0x49/0x80
[ 5628.955615] RSP: 0018:ffffc9000004fdd8 EFLAGS: 00010202
[ 5628.956250] RAX: 0000000000000000 RBX: ffffe8ffffc08150 RCX: 0000000000000000
[ 5628.957102] RDX: 0000000000000000 RSI: ffff88802356bf00 RDI: ffffe8ffffc08150
[ 5628.957940] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000
[ 5628.958803] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000040
[ 5628.959661] R13: ffffe8ffffc08100 R14: 0000000000000000 R15: 0000000000000040
[ 5628.960682] FS: 0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
[ 5628.961616] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5628.962359] CR2: 0000000000000008 CR3: 000000000221c000 CR4: 00000000000006b0
[ 5628.963188] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5628.964034] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5628.964871] Call Trace:
[ 5628.965179] net_rx_action+0xf0/0x380
[ 5628.965637] __do_softirq+0xc7/0x431
[ 5628.966510] run_ksoftirqd+0x24/0x30
[ 5628.966957] smpboot_thread_fn+0xc5/0x160
[ 5628.967436] kthread+0x113/0x130
[ 5628.968283] ret_from_fork+0x3a/0x50
[ 5628.968721] Modules linked in:
[ 5628.969099] CR2: 0000000000000008
[ 5628.969510] ---[ end trace 9d9dedc7181661fe ]---
[ 5628.970073] RIP: 0010:gro_cell_poll+0x49/0x80
[ 5628.972965] RSP: 0018:ffffc9000004fdd8 EFLAGS: 00010202
[ 5628.973611] RAX: 0000000000000000 RBX: ffffe8ffffc08150 RCX: 0000000000000000
[ 5628.974504] RDX: 0000000000000000 RSI: ffff88802356bf00 RDI: ffffe8ffffc08150
[ 5628.975462] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000
[ 5628.976413] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000040
[ 5628.977375] R13: ffffe8ffffc08100 R14: 0000000000000000 R15: 0000000000000040
[ 5628.978296] FS: 0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
[ 5628.979327] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5628.980044] CR2: 0000000000000008 CR3: 000000000221c000 CR4: 00000000000006b0
[ 5628.980929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5628.981736] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5628.982409] Kernel panic - not syncing: Fatal exception in interrupt
[ 5628.983307] Kernel Offset: disabled
Fixes: c42858eaf492 ("gro_cells: remove spinlock protecting receive queues")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael and Sandipan report:
Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
and is adjusted again at init time if MODULES_VADDR is defined.
For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
the compile-time default at boot-time, which is 0x9c400000 when
using 64K page size. This overflows the signed 32-bit bpf_jit_limit
value:
root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
-1673527296
and can cause various unexpected failures throughout the network
stack. In one case `strace dhclient eth0` reported:
setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
16) = -1 ENOTSUPP (Unknown error 524)
and similar failures can be seen with tools like tcpdump. This doesn't
always reproduce however, and I'm not sure why. The more consistent
failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
host would time out on systemd/netplan configuring a virtio-net NIC
with no noticeable errors in the logs.
Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().
Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.
Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.
Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We want to make sure that the following condition holds:
0 <= nhoff <= thoff <= skb->len
BPF program can set out-of-bounds nhoff and thoff, which is dangerous, see
recent commit d0c081b49137 ("flow_dissector: properly cap thoff field")'.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We are returning thoff from the flow dissector, not the nhoff. Pass
thoff along with nhoff to the bpf program (initially thoff == nhoff)
and expect flow dissector amend/return thoff, not nhoff.
This avoids confusion, when by the time bpf flow dissector exits,
nhoff == thoff, which doesn't make much sense.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2018-12-05
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix bpf uapi pointers for 32-bit architectures, from Daniel.
2) improve verifer ability to handle progs with a lot of branches, from Alexei.
3) strict btf checks, from Yonghong.
4) bpf_sk_lookup api cleanup, from Joe.
5) other misc fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
list_del() leaves the skb->next pointer poisoned, which can then lead to
a crash in e.g. OVS forwarding. For example, setting up an OVS VXLAN
forwarding bridge on sfc as per:
========
$ ovs-vsctl show
5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "enp6s0f0"
Interface "enp6s0f0"
Port "vxlan0"
Interface "vxlan0"
type: vxlan
options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"}
ovs_version: "2.5.0"
========
(where 10.0.0.5 is an address on enp6s0f1)
and sending traffic across it will lead to the following panic:
========
general protection fault: 0000 [#1] SMP PTI
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701
Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
RIP: 0010:dev_hard_start_xmit+0x38/0x200
Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95
RSP: 0018:ffff888627b437e0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000
RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8
R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000
R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000
FS: 0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0
Call Trace:
<IRQ>
__dev_queue_xmit+0x623/0x870
? masked_flow_lookup+0xf7/0x220 [openvswitch]
? ep_poll_callback+0x101/0x310
do_execute_actions+0xaba/0xaf0 [openvswitch]
? __wake_up_common+0x8a/0x150
? __wake_up_common_lock+0x87/0xc0
? queue_userspace_packet+0x31c/0x5b0 [openvswitch]
ovs_execute_actions+0x47/0x120 [openvswitch]
ovs_dp_process_packet+0x7d/0x110 [openvswitch]
ovs_vport_receive+0x6e/0xd0 [openvswitch]
? dst_alloc+0x64/0x90
? rt_dst_alloc+0x50/0xd0
? ip_route_input_slow+0x19a/0x9a0
? __udp_enqueue_schedule_skb+0x198/0x1b0
? __udp4_lib_rcv+0x856/0xa30
? __udp4_lib_rcv+0x856/0xa30
? cpumask_next_and+0x19/0x20
? find_busiest_group+0x12d/0xcd0
netdev_frame_hook+0xce/0x150 [openvswitch]
__netif_receive_skb_core+0x205/0xae0
__netif_receive_skb_list_core+0x11e/0x220
netif_receive_skb_list+0x203/0x460
? __efx_rx_packet+0x335/0x5e0 [sfc]
efx_poll+0x182/0x320 [sfc]
net_rx_action+0x294/0x3c0
__do_softirq+0xca/0x297
irq_exit+0xa6/0xb0
do_IRQ+0x54/0xd0
common_interrupt+0xf/0xf
</IRQ>
========
So, in all listified-receive handling, instead pull skbs off the lists with
skb_list_del_init().
Fixes: 9af86f933894 ("net: core: fix use-after-free in __netif_receive_skb_list_core")
Fixes: 7da517a3bc52 ("net: core: Another step of skb receive list processing")
Fixes: a4ca8b7df73c ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()")
Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
kmsan was able to trigger a kernel-infoleak using a gre device [1]
nlmsg_populate_fdb_fill() has a hard coded assumption
that dev->addr_len is ETH_ALEN, as normally guaranteed
for ARPHRD_ETHER devices.
A similar issue was fixed recently in commit da71577545a5
("rtnetlink: Disallow FDB configuration for non-Ethernet device")
[1]
BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:143 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
CPU: 0 PID: 6697 Comm: syz-executor310 Not tainted 4.20.0-rc3+ #95
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x32d/0x480 lib/dump_stack.c:113
kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683
kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743
kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634
copyout lib/iov_iter.c:143 [inline]
_copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
copy_to_iter include/linux/uio.h:143 [inline]
skb_copy_datagram_iter+0x4e2/0x1070 net/core/datagram.c:431
skb_copy_datagram_msg include/linux/skbuff.h:3316 [inline]
netlink_recvmsg+0x6f9/0x19d0 net/netlink/af_netlink.c:1975
sock_recvmsg_nosec net/socket.c:794 [inline]
sock_recvmsg+0x1d1/0x230 net/socket.c:801
___sys_recvmsg+0x444/0xae0 net/socket.c:2278
__sys_recvmsg net/socket.c:2327 [inline]
__do_sys_recvmsg net/socket.c:2337 [inline]
__se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
__x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441119
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffc7f008a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441119
RDX: 0000000000000040 RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 0000000000000100 R09: 0000000000000100
R10: 0000000000000100 R11: 0000000000000207 R12: 0000000000402080
R13: 0000000000402110 R14: 0000000000000000 R15: 0000000000000000
Uninit was stored to memory at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
kmsan_save_stack mm/kmsan/kmsan.c:261 [inline]
kmsan_internal_chain_origin+0x13d/0x240 mm/kmsan/kmsan.c:469
kmsan_memcpy_memmove_metadata+0x1a9/0xf70 mm/kmsan/kmsan.c:344
kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:362
__msan_memcpy+0x61/0x70 mm/kmsan/kmsan_instr.c:162
__nla_put lib/nlattr.c:744 [inline]
nla_put+0x20a/0x2d0 lib/nlattr.c:802
nlmsg_populate_fdb_fill+0x444/0x810 net/core/rtnetlink.c:3466
nlmsg_populate_fdb net/core/rtnetlink.c:3775 [inline]
ndo_dflt_fdb_dump+0x73a/0x960 net/core/rtnetlink.c:3807
rtnl_fdb_dump+0x1318/0x1cb0 net/core/rtnetlink.c:3979
netlink_dump+0xc79/0x1c90 net/netlink/af_netlink.c:2244
__netlink_dump_start+0x10c4/0x11d0 net/netlink/af_netlink.c:2352
netlink_dump_start include/linux/netlink.h:216 [inline]
rtnetlink_rcv_msg+0x141b/0x1540 net/core/rtnetlink.c:4910
netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
__sys_sendmsg net/socket.c:2154 [inline]
__do_sys_sendmsg net/socket.c:2163 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2161
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
kmsan_internal_poison_shadow+0x6d/0x130 mm/kmsan/kmsan.c:170
kmsan_kmalloc+0xa1/0x100 mm/kmsan/kmsan_hooks.c:186
__kmalloc+0x14c/0x4d0 mm/slub.c:3825
kmalloc include/linux/slab.h:551 [inline]
__hw_addr_create_ex net/core/dev_addr_lists.c:34 [inline]
__hw_addr_add_ex net/core/dev_addr_lists.c:80 [inline]
__dev_mc_add+0x357/0x8a0 net/core/dev_addr_lists.c:670
dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687
ip_mc_filter_add net/ipv4/igmp.c:1128 [inline]
igmp_group_added+0x4d4/0xb80 net/ipv4/igmp.c:1311
__ip_mc_inc_group+0xea9/0xf70 net/ipv4/igmp.c:1444
ip_mc_inc_group net/ipv4/igmp.c:1453 [inline]
ip_mc_up+0x1c3/0x400 net/ipv4/igmp.c:1775
inetdev_event+0x1d03/0x1d80 net/ipv4/devinet.c:1522
notifier_call_chain kernel/notifier.c:93 [inline]
__raw_notifier_call_chain kernel/notifier.c:394 [inline]
raw_notifier_call_chain+0x13d/0x240 kernel/notifier.c:401
__dev_notify_flags+0x3da/0x860 net/core/dev.c:1733
dev_change_flags+0x1ac/0x230 net/core/dev.c:7569
do_setlink+0x165f/0x5ea0 net/core/rtnetlink.c:2492
rtnl_newlink+0x2ad7/0x35a0 net/core/rtnetlink.c:3111
rtnetlink_rcv_msg+0x1148/0x1540 net/core/rtnetlink.c:4947
netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
__sys_sendmsg net/socket.c:2154 [inline]
__do_sys_sendmsg net/socket.c:2163 [inline]
__se_sys_sendmsg+0x305/0x460 net/socket.c:2161
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
Bytes 36-37 of 105 are uninitialized
Memory access of size 105 starts at ffff88819686c000
Data copied to user address 0000000020000380
Fixes: d83b06036048 ("net: add fdb generic dump routine")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netif_napi_add() could report an error like this below due to it allows
to pass a format string for wildcarding before calling
dev_get_valid_name(),
"netif_napi_add() called with weight 256 on device eth%d"
For example, hns_enet_drv module does this.
hns_nic_try_get_ae
hns_nic_init_ring_data
netif_napi_add
register_netdev
dev_get_valid_name
Hence, make it a bit more human-readable by using netdev_err_once()
instead.
Signed-off-by: Qian Cai <cai@gmx.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
David Ahern and Nicolas Dichtel report that the handling of the netns id
0 is incorrect for the BPF socket lookup helpers: rather than finding
the netns with id 0, it is resolving to the current netns. This renders
the netns_id 0 inaccessible.
To fix this, adjust the API for the netns to treat all negative s32
values as a lookup in the current netns (including u64 values which when
truncated to s32 become negative), while any values with a positive
value in the signed 32-bit integer space would result in a lookup for a
socket in the netns corresponding to that id. As before, if the netns
with that ID does not exist, no socket will be found. Any netns outside
of these ranges will fail to find a corresponding socket, as those
values are reserved for future usage.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Joey Pabalinas <joeypabalinas@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, pointer offsets in three BPF context structures are
broken in two scenarios: i) 32 bit compiled applications running
on 64 bit kernels, and ii) LLVM compiled BPF programs running
on 32 bit kernels. The latter is due to BPF target machine being
strictly 64 bit. So in each of the cases the offsets will mismatch
in verifier when checking / rewriting context access. Fix this by
providing a helper macro __bpf_md_ptr() that will enforce padding
up to 64 bit and proper alignment, and for context access a macro
bpf_ctx_range_ptr() which will cover full 64 bit member range on
32 bit archs. For flow_keys, we additionally need to force the
size check to sizeof(__u64) as with other pointer types.
Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit 04157469b7b8 ("net: Use static_key for XPS maps") introduced a
static key for XPS, but the increments/decrements don't match.
First, the static key's counter is incremented once for each queue, but
only decremented once for a whole batch of queues, leading to large
unbalances.
Second, the xps_rxqs_needed key is decremented whenever we reset a batch
of queues, whether they had any rxqs mapping or not, so that if we setup
cpu-XPS on em1 and RXQS-XPS on em2, resetting the queues on em1 would
decrement the xps_rxqs_needed key.
This reworks the accounting scheme so that the xps_needed key is
incremented only once for each type of XPS for all the queues on a
device, and the xps_rxqs_needed key is incremented only once for all
queues. This is sufficient to let us retrieve queues via
get_xps_queue().
This patch introduces a new reset_xps_maps(), which reinitializes and
frees the appropriate map (xps_rxqs_map or xps_cpus_map), and drops a
reference to the needed keys:
- both xps_needed and xps_rxqs_needed, in case of rxqs maps,
- only xps_needed, in case of CPU maps.
Now, we also need to call reset_xps_maps() at the end of
__netif_set_xps_queue() when there's no active map left, for example
when writing '00000000,00000000' to all queues' xps_rxqs setting.
Fixes: 04157469b7b8 ("net: Use static_key for XPS maps")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before commit 80d19669ecd3 ("net: Refactor XPS for CPUs and Rx queues"),
netif_reset_xps_queues() did netdev_queue_numa_node_write() for all the
queues being reset. Now, this is only done when the "active" variable in
clean_xps_maps() is false, ie when on all the CPUs, there's no active
XPS mapping left.
Fixes: 80d19669ecd3 ("net: Refactor XPS for CPUs and Rx queues")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-11-25
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix an off-by-one bug when adjusting subprog start offsets after
patching, from Edward.
2) Fix several bugs such as overflow in size allocation in queue /
stack map creation, from Alexei.
3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp
helper, from Andrey.
4) Fix several bugs in bpftool such as preventing an infinite loop
in get_fdinfo, error handling and man page references, from Quentin.
5) Fix a warning in bpf_trace_printk() that wasn't catching an
invalid format string, from Martynas.
6) Fix a bug in BPF cgroup local storage where non-atomic allocation
was used in atomic context, from Roman.
7) Fix a NULL pointer dereference bug in bpftool from reallocarray()
error handling, from Jakub and Wen.
8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools
include infrastructure so that bpftool compiles on older RHEL7-like
user space which does not ship these headers, from Yonghong.
9) Fix BPF kselftests for user space where to get ping test working
with ping6 and ping -6, from Li.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric noted that with UDP GRO and NAPI timeout, we could keep a single
UDP packet inside the GRO hash forever, if the related NAPI instance
calls napi_gro_complete() at an higher frequency than the NAPI timeout.
Willem noted that even TCP packets could be trapped there, till the
next retransmission.
This patch tries to address the issue, flushing the old packets -
those with a NAPI_GRO_CB age before the current jiffy - before scheduling
the NAPI timeout. The rationale is that such a timeout should be
well below a jiffy and we are not flushing packets eligible for sane GRO.
v1 -> v2:
- clarified the commit message and comment
RFC -> v1:
- added 'Fixes tags', cleaned-up the wording.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 3b47d30396ba ("net: gro: add a per device gro flush timer")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a packet is trapped and the corresponding SKB marked as
already-forwarded, it retains this marking even after it is forwarded
across veth links into another bridge. There, since it ingresses the
bridge over veth, which doesn't have offload_fwd_mark, it triggers a
warning in nbp_switchdev_frame_mark().
Then nbp_switchdev_allowed_egress() decides not to allow egress from
this bridge through another veth, because the SKB is already marked, and
the mark (of 0) of course matches. Thus the packet is incorrectly
blocked.
Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
function is called from tunnels and also from veth, and thus catches the
cases where traffic is forwarded between bridges and transformed in a
way that invalidates the marking.
Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
Fixes: abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
eth_type_trans() assumes initial value for skb->pkt_type
is PACKET_HOST.
This is indeed the value right after a fresh skb allocation.
However, it is possible that GRO merged a packet with a different
value (like PACKET_OTHERHOST in case macvlan is used), so
we need to make sure napi->skb will have pkt_type set back to
PACKET_HOST.
Otherwise, valid packets might be dropped by the stack because
their pkt_type is not PACKET_HOST.
napi_reuse_skb() was added in commit 96e93eab2033 ("gro: Add
internal interfaces for VLAN"), but this bug always has
been there.
Fixes: 96e93eab2033 ("gro: Add internal interfaces for VLAN")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Only first fragment has the sport/dport information,
not the following ones.
If we want consistent hash for all fragments, we need to
ignore ports even for first fragment.
This bug is visible for IPv6 traffic, if incoming fragments
do not have a flow label, since skb_get_hash() will give
different results for first fragment and following ones.
It is also visible if any routing rule wants dissection
and sport or dport.
See commit 5e5d6fed3741 ("ipv6: route: dissect flow
in input path if fib rules need it") for details.
[edumazet] rewrote the changelog completely.
Fixes: 06635a35d13d ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
Signed-off-by: 배석진 <soukjin.bae@samsung.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lookup functions in sk_lookup have different expectations about byte
order of provided arguments.
Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup
expect dport to be in network byte order and do ntohs(dport) internally.
At the same time __inet6_lookup expects dport to be in host byte order
and correspondingly name the argument hnum.
sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and
__inet6_lookup with regard to dport. But in __udp6_lib_lookup case it
uses host instead of expected network byte order. It makes result
returned by bpf_sk_lookup_udp for IPv6 incorrect.
The patch fixes byte order of dport passed to __udp6_lib_lookup.
Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84f02a
fixes TCPv6 but breaks UDPv6.
Fixes: 5ef0ae84f02a ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
IPPROTO_RAW isn't registred as an inet protocol, so
inet_protos[protocol] is always NULL for it.
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Xin Long <lucien.xin@gmail.com>
Fixes: bf2ae2e4bf93 ("sock_diag: request _diag module only when the family or proto has been registered")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no reason to discard using source link local address when
remote netconsole IPv6 address is set to be link local one.
The patch allows administrators to use IPv6 netconsole without
explicitly configuring source address:
netconsole=@/,@fe80::5054:ff:fe2f:6012/
Signed-off-by: Matwey V. Kornilov <matwey@sai.msu.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For non-zero return from dumpit() we should break the loop
in rtnl_dump_all() and return the result. Otherwise, e.g.,
we could get the memory leak in inet6_dump_fib() [1]. The
pointer to the allocated struct fib6_walker there (saved
in cb->args) can be lost, reset on the next iteration.
Fix it by partially restoring the previous behavior before
commit c63586dc9b3e ("net: rtnl_dump_all needs to propagate
error from dumpit function"). The returned error from
dumpit() is still passed further.
[1]:
unreferenced object 0xffff88001322a200 (size 96):
comm "sshd", pid 1484, jiffies 4296032768 (age 1432.542s)
hex dump (first 32 bytes):
00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................
18 09 41 36 00 88 ff ff 18 09 41 36 00 88 ff ff ..A6......A6....
backtrace:
[<0000000095846b39>] kmem_cache_alloc_trace+0x151/0x220
[<000000007d12709f>] inet6_dump_fib+0x68d/0x940
[<000000002775a316>] rtnl_dump_all+0x1d9/0x2d0
[<00000000d7cd302b>] netlink_dump+0x945/0x11a0
[<000000002f43485f>] __netlink_dump_start+0x55d/0x800
[<00000000f76bbeec>] rtnetlink_rcv_msg+0x4fa/0xa00
[<000000009b5761f3>] netlink_rcv_skb+0x29c/0x420
[<0000000087a1dae1>] rtnetlink_rcv+0x15/0x20
[<00000000691b703b>] netlink_unicast+0x4e3/0x6c0
[<00000000b5be0204>] netlink_sendmsg+0x7f2/0xba0
[<0000000096d2aa60>] sock_sendmsg+0xba/0xf0
[<000000008c1b786f>] __sys_sendto+0x1e4/0x330
[<0000000019587b3f>] __x64_sys_sendto+0xe1/0x1a0
[<00000000071f4d56>] do_syscall_64+0x9f/0x300
[<000000002737577f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0000000057587684>] 0xffffffffffffffff
Fixes: c63586dc9b3e ("net: rtnl_dump_all needs to propagate error from dumpit function")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before calling dev_hard_start_xmit(), upper layers tried
to cook optimal skb list based on BQL budget.
Problem is that GSO packets can end up comsuming more than
the BQL budget.
Breaking the loop is not useful, since requeued packets
are ahead of any packets still in the qdisc.
It is also more expensive, since next TX completion will
push these packets later, while skbs are not in cpu caches.
It is also a behavior difference with TSO packets, that can
break the BQL limit by a large amount.
Note that drivers should use __netdev_tx_sent_queue()
in order to have optimal xmit_more support, and avoid
useless atomic operations as shown in the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove kernel-doc warning:
net/core/skbuff.c:4953: warning: Function parameter or member 'skb' not described in 'skb_gso_size_check'
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When an FDB entry is configured, the address is validated to have the
length of an Ethernet address, but the device for which the address is
configured can be of any type.
The above can result in the use of uninitialized memory when the address
is later compared against existing addresses since 'dev->addr_len' is
used and it may be greater than ETH_ALEN, as with ip6tnl devices.
Fix this by making sure that FDB entries are only configured for
Ethernet devices.
BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x14b/0x190 lib/dump_stack.c:113
kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
__msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
memcmp+0x11d/0x180 lib/string.c:863
dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
__sys_sendmsg net/socket.c:2152 [inline]
__do_sys_sendmsg net/socket.c:2161 [inline]
__se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x440ee9
Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2718 [inline]
__kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:996 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
__sys_sendmsg net/socket.c:2152 [inline]
__do_sys_sendmsg net/socket.c:2161 [inline]
__se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
v2:
* Make error message more specific (David)
Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_<op>")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Just like with normal GRO processing, we have to initialize
skb->next to NULL when we unlink overflow packets from the
GRO hash lists.
Fixes: d4546c2509b1 ("net: Convert GRO SKB handling to list_head.")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-10-27
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix toctou race in BTF header validation, from Martin and Wenwen.
2) Fix devmap interface comparison in notifier call which was
neglecting netns, from Taehee.
3) Several fixes in various places, for example, correcting direct
packet access and helper function availability, from Daniel.
4) Fix BPF kselftest config fragment to include af_xdp and sockmap,
from Naresh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
"What better way to start off a weekend than with some networking bug
fixes:
1) net namespace leak in dump filtering code of ipv4 and ipv6, fixed
by David Ahern and Bjørn Mork.
2) Handle bad checksums from hardware when using CHECKSUM_COMPLETE
properly in UDP, from Sean Tranchetti.
3) Remove TCA_OPTIONS from policy validation, it turns out we don't
consistently use nested attributes for this across all packet
schedulers. From David Ahern.
4) Fix SKB corruption in cadence driver, from Tristram Ha.
5) Fix broken WoL handling in r8169 driver, from Heiner Kallweit.
6) Fix OOPS in pneigh_dump_table(), from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
net/neigh: fix NULL deref in pneigh_dump_table()
net: allow traceroute with a specified interface in a vrf
bridge: do not add port to router list when receives query with source 0.0.0.0
net/smc: fix smc_buf_unuse to use the lgr pointer
ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
net/{ipv4,ipv6}: Do not put target net if input nsid is invalid
lan743x: Remove SPI dependency from Microchip group.
drivers: net: remove <net/busy_poll.h> inclusion when not needed
net: phy: genphy_10g_driver: Avoid NULL pointer dereference
r8169: fix broken Wake-on-LAN from S5 (poweroff)
octeontx2-af: Use GFP_ATOMIC under spin lock
net: ethernet: cadence: fix socket buffer corruption problem
net/ipv6: Allow onlink routes to have a device mismatch if it is the default route
net: sched: Remove TCA_OPTIONS from policy
ice: Poll for link status change
ice: Allocate VF interrupts and set queue map
ice: Introduce ice_dev_onetime_setup
net: hns3: Fix for warning uninitialized symbol hw_err_lst3
octeontx2-af: Copy the right amount of memory
net: udp: fix handling of CHECKSUM_COMPLETE packets
...
|
|
pneigh can have NULL device pointer, so we need to make
neigh_master_filtered() and neigh_ifindex_filtered() more robust.
syzbot report :
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 15867 Comm: syz-executor2 Not tainted 4.19.0+ #276
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__read_once_size include/linux/compiler.h:179 [inline]
RIP: 0010:list_empty include/linux/list.h:203 [inline]
RIP: 0010:netdev_master_upper_dev_get+0xa1/0x250 net/core/dev.c:6467
RSP: 0018:ffff8801bfaf7220 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000001 RCX: ffffc90005e92000
RDX: 0000000000000016 RSI: ffffffff860b44d9 RDI: 0000000000000005
RBP: ffff8801bfaf72b0 R08: ffff8801c4c84080 R09: fffffbfff139a580
R10: fffffbfff139a580 R11: ffffffff89cd2c07 R12: 1ffff10037f5ee45
R13: 0000000000000000 R14: ffff8801bfaf7288 R15: 00000000000000b0
FS: 00007f65cc68d700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b33a21000 CR3: 00000001c6116000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
neigh_master_filtered net/core/neighbour.c:2367 [inline]
pneigh_dump_table net/core/neighbour.c:2456 [inline]
neigh_dump_info+0x7a9/0x1ce0 net/core/neighbour.c:2577
netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
__netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
netlink_dump_start include/linux/netlink.h:216 [inline]
rtnetlink_rcv_msg+0x809/0xc20 net/core/rtnetlink.c:4898
netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4953
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0x5a5/0x760 net/netlink/af_netlink.c:1336
netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg+0xd5/0x120 net/socket.c:631
sock_write_iter+0x35e/0x5c0 net/socket.c:900
call_write_iter include/linux/fs.h:1808 [inline]
new_sync_write fs/read_write.c:474 [inline]
__vfs_write+0x6b8/0x9f0 fs/read_write.c:487
vfs_write+0x1fc/0x560 fs/read_write.c:549
ksys_write+0x101/0x260 fs/read_write.c:598
__do_sys_write fs/read_write.c:610 [inline]
__se_sys_write fs/read_write.c:607 [inline]
__x64_sys_write+0x73/0xb0 fs/read_write.c:607
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Fixes: 6f52f80e8530 ("net/neigh: Extend dump filter to proxy neighbor dumps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@gmail.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit cd3394317653 ("bpf: introduce the bpf_get_local_storage()
helper function") enabled the bpf_get_local_storage() helper also
for BPF program types where it does not make sense to use them.
They have been added both in sk_skb_func_proto() and sk_msg_func_proto()
even though both program types are not invoked in combination with
cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the
bpf_cgroup_storage_set() is set shortly before BPF program invocation.
Later, the helper bpf_get_local_storage() retrieves this prior set
up per-cpu pointer and hands the buffer to the BPF program. The map
argument in there solely retrieves the enum bpf_cgroup_storage_type
from a local storage map associated with the program and based on the
type returns either the global or per-cpu storage. However, there
is no specific association between the program's map and the actual
content in bpf_cgroup_storage[].
Meaning, any BPF program that would have been properly run from the
cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set()
was performed, and that is later unloaded such that prog / maps are
teared down will cause a use after free if that pointer is retrieved
from programs that are not run through BPF_PROG_RUN_ARRAY() but have
the cgroup local storage helper enabled in their func proto.
Lets just remove it from the two sock_map program types to fix it.
Auditing through the types where this helper is enabled, it appears
that these are the only ones where it was mistakenly allowed.
Fixes: cd3394317653 ("bpf: introduce the bpf_get_local_storage() helper function")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Roman Gushchin <guro@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"All trivial changes - simplification, typo fix and adding
cond_resched() in a netclassid update loop"
* 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup, netclassid: add a preemption point to write_classid
rdmacg: fix a typo in rdmacg documentation
cgroup: Simplify cgroup_ancestor
|
|
Rick reported that the BPF JIT could potentially fill the entire module
space with BPF programs from unprivileged users which would prevent later
attempts to load normal kernel modules or privileged BPF programs, for
example. If JIT was enabled but unsuccessful to generate the image, then
before commit 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
we would always fall back to the BPF interpreter. Nowadays in the case
where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort
with a failure since the BPF interpreter was compiled out.
Add a global limit and enforce it for unprivileged users such that in case
of BPF interpreter compiled out we fail once the limit has been reached
or we fall back to BPF interpreter earlier w/o using module mem if latter
was compiled in. In a next step, fair share among unprivileged users can
be resolved in particular for the case where we would fail hard once limit
is reached.
Fixes: 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64")
Co-Developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Given this seems to be quite fragile and can easily slip through the
cracks, lets make direct packet write more robust by requiring that
future program types which allow for such write must provide a prologue
callback. In case of XDP and sk_msg it's noop, thus add a generic noop
handler there. The latter starts out with NULL data/data_end unconditionally
when sg pages are shared.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit b39b5f411dcf ("bpf: add cg_skb_is_valid_access for
BPF_PROG_TYPE_CGROUP_SKB") added support for returning pkt pointers
for direct packet access. Given this program type is allowed for both
unprivileged and privileged users, we shouldn't allow unprivileged
ones to use it, e.g. besides others one reason would be to avoid any
potential speculation on the packet test itself, thus guard this for
root only.
Fixes: b39b5f411dcf ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
incorrect for any packet that has an incorrect checksum value.
udp4/6_csum_init() will both make a call to
__skb_checksum_validate_complete() to initialize/validate the csum
field when receiving a CHECKSUM_COMPLETE packet. When this packet
fails validation, skb->csum will be overwritten with the pseudoheader
checksum so the packet can be fully validated by software, but the
skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
the stack can later warn the user about their hardware spewing bad
checksums. Unfortunately, leaving the SKB in this state can cause
problems later on in the checksum calculation.
Since the the packet is still marked as CHECKSUM_COMPLETE,
udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
from skb->csum instead of adding it, leaving us with a garbage value
in that field. Once we try to copy the packet to userspace in the
udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
to checksum the packet data and add it in the garbage skb->csum value
to perform our final validation check.
Since the value we're validating is not the proper checksum, it's possible
that the folded value could come out to 0, causing us not to drop the
packet. Instead, we believe that the packet was checksummed incorrectly
by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
to warn the user with netdev_rx_csum_fault(skb->dev);
Unfortunately, since this is the UDP path, skb->dev has been overwritten
by skb->dev_scratch and is no longer a valid pointer, so we end up
reading invalid memory.
This patch addresses this problem in two ways:
1) Do not use the dev pointer when calling netdev_rx_csum_fault()
from skb_copy_and_csum_datagram_msg(). Since this gets called
from the UDP path where skb->dev has been overwritten, we have
no way of knowing if the pointer is still valid. Also for the
sake of consistency with the other uses of
netdev_rx_csum_fault(), don't attempt to call it if the
packet was checksummed by software.
2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
If we receive a packet that's CHECKSUM_COMPLETE that fails
verification (i.e. skb->csum_valid == 0), check who performed
the calculation. It's possible that the checksum was done in
software by the network stack earlier (such as Netfilter's
CONNTRACK module), and if that says the checksum is bad,
we can drop the packet immediately instead of waiting until
we try and copy it to userspace. Otherwise, we need to
mark the SKB as CHECKSUM_NONE, since the skb->csum field
no longer contains the full packet checksum after the
call to __skb_checksum_validate_complete().
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Fixes: c84d949057ca ("udp: copy skb->truesize in the first cache line")
Cc: Sam Kumar <samanthakumar@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If an address, route or netconf dump request is sent for AF_UNSPEC, then
rtnl_dump_all is used to do the dump across all address families. If one
of the dumpit functions fails (e.g., invalid attributes in the dump
request) then rtnl_dump_all needs to propagate that error so the user
gets an appropriate response instead of just getting no data.
Fixes: effe67926624 ("net: Enable kernel side filtering of route dumps")
Fixes: 5fcd266a9f64 ("net/ipv4: Add support for dumping addresses for a specific device")
Fixes: 6371a71f3a3b ("net/ipv6: Add support for dumping addresses for a specific device")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have seen a customer complaining about soft lockups on !PREEMPT
kernel config with 4.4 based kernel
[1072141.435366] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [systemd:1]
[1072141.444090] Modules linked in: mpt3sas raid_class binfmt_misc af_packet 8021q garp mrp stp llc xfs libcrc32c bonding iscsi_ibft iscsi_boot_sysfs msr ext4 crc16 jbd2 mbcache cdc_ether usbnet mii joydev hid_generic usbhid intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif mgag200 i2c_algo_bit ttm ipmi_devintf drbg ixgbe drm_kms_helper vxlan ansi_cprng ip6_udp_tunnel drm aesni_intel udp_tunnel aes_x86_64 iTCO_wdt syscopyarea ptp xhci_pci lrw iTCO_vendor_support pps_core gf128mul ehci_pci glue_helper sysfillrect mdio pcspkr sb_edac ablk_helper cryptd ehci_hcd sysimgblt xhci_hcd fb_sys_fops edac_core mei_me lpc_ich ses usbcore enclosure dca mfd_core ipmi_si mei i2c_i801 scsi_transport_sas usb_common ipmi_msghandler shpchp fjes wmi processor button acpi_pad btrfs xor raid6_pq sd_mod crc32c_intel megaraid_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod md_mod autofs4
[1072141.444146] Supported: Yes
[1072141.444149] CPU: 21 PID: 1 Comm: systemd Not tainted 4.4.121-92.80-default #1
[1072141.444150] Hardware name: LENOVO Lenovo System x3650 M5 -[5462P4U]- -[5462P4U]-/01GR451, BIOS -[TCE136H-2.70]- 06/13/2018
[1072141.444151] task: ffff880191bd0040 ti: ffff880191bd4000 task.ti: ffff880191bd4000
[1072141.444153] RIP: 0010:[<ffffffff815229f9>] [<ffffffff815229f9>] update_classid_sock+0x29/0x40
[1072141.444157] RSP: 0018:ffff880191bd7d58 EFLAGS: 00000286
[1072141.444158] RAX: ffff883b177cb7c0 RBX: 0000000000000000 RCX: 0000000000000000
[1072141.444159] RDX: 00000000000009c7 RSI: ffff880191bd7d5c RDI: ffff8822e29bb200
[1072141.444160] RBP: ffff883a72230980 R08: 0000000000000101 R09: 0000000000000000
[1072141.444161] R10: 0000000000000008 R11: f000000000000000 R12: ffffffff815229d0
[1072141.444162] R13: 0000000000000000 R14: ffff881fd0a47ac0 R15: ffff880191bd7f28
[1072141.444163] FS: 00007f3e2f1eb8c0(0000) GS:ffff882000340000(0000) knlGS:0000000000000000
[1072141.444164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1072141.444165] CR2: 00007f3e2f200000 CR3: 0000001ffea4e000 CR4: 00000000001606f0
[1072141.444166] Stack:
[1072141.444166] ffffffa800000246 00000000000009c7 ffffffff8121d583 ffff8818312a05c0
[1072141.444168] ffff8818312a1100 ffff880197c3b280 ffff881861422858 ffffffffffffffea
[1072141.444170] ffffffff81522b1c ffffffff81d0ca20 ffff8817fa17b950 ffff883fdd8121e0
[1072141.444171] Call Trace:
[1072141.444179] [<ffffffff8121d583>] iterate_fd+0x53/0x80
[1072141.444182] [<ffffffff81522b1c>] write_classid+0x4c/0x80
[1072141.444187] [<ffffffff8111328b>] cgroup_file_write+0x9b/0x100
[1072141.444193] [<ffffffff81278bcb>] kernfs_fop_write+0x11b/0x150
[1072141.444198] [<ffffffff81201566>] __vfs_write+0x26/0x100
[1072141.444201] [<ffffffff81201bed>] vfs_write+0x9d/0x190
[1072141.444203] [<ffffffff812028c2>] SyS_write+0x42/0xa0
[1072141.444207] [<ffffffff815f58c3>] entry_SYSCALL_64_fastpath+0x1e/0xca
[1072141.445490] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xca
If a cgroup has many tasks with many open file descriptors then we would
end up in a large loop without any rescheduling point throught the
operation. Add cond_resched once per task.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This reverts commit dd979b4df817e9976f18fb6f9d134d6bc4a3c317.
This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an
internal TCP socket for the initial handshake with the remote peer.
Whenever the SMC connection can not be established this TCP socket is
used as a fallback. All socket operations on the SMC socket are then
forwarded to the TCP socket. In case of poll, the file->private_data
pointer references the SMC socket because the TCP socket has no file
assigned. This causes tcp_poll to wait on the wrong socket.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-10-21
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Implement two new kind of BPF maps, that is, queue and stack
map along with new peek, push and pop operations, from Mauricio.
2) Add support for MSG_PEEK flag when redirecting into an ingress
psock sk_msg queue, and add a new helper bpf_msg_push_data() for
insert data into the message, from John.
3) Allow for BPF programs of type BPF_PROG_TYPE_CGROUP_SKB to use
direct packet access for __skb_buff, from Song.
4) Use more lightweight barriers for walking perf ring buffer for
libbpf and perf tool as well. Also, various fixes and improvements
from verifier side, from Daniel.
5) Add per-symbol visibility for DSO in libbpf and hide by default
global symbols such as netlink related functions, from Andrey.
6) Two improvements to nfp's BPF offload to check vNIC capabilities
in case prog is shared with multiple vNICs and to protect against
mis-initializing atomic counters, from Jakub.
7) Fix for bpftool to use 4 context mode for the nfp disassembler,
also from Jakub.
8) Fix a return value comparison in test_libbpf.sh and add several
bpftool improvements in bash completion, documentation of bpf fs
restrictions and batch mode summary print, from Quentin.
9) Fix a file resource leak in BPF selftest's load_kallsyms()
helper, from Peng.
10) Fix an unused variable warning in map_lookup_and_delete_elem(),
from Alexei.
11) Fix bpf_skb_adjust_room() signature in BPF UAPI helper doc,
from Nicolas.
12) Add missing executables to .gitignore in BPF selftests, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
David Ahern's dump indexing bug fix in 'net' overlapped the
change of the function signature of inet6_fill_ifaddr() in
'net-next'. Trivially resolved.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 8e326289e3069dfc9fa9c209924668dd031ab8ef.
This patch results in unnecessary netlink notification when one
tries to delete a neigh entry already in NUD_FAILED state. Found
this with a buggy app that tries to delete a NUD_FAILED entry
repeatedly. While the notification issue can be fixed with more
checks, adding more complexity here seems unnecessary. Also,
recent tests with other changes in the neighbour code have
shown that the INCOMPLETE and PROBE checks are good enough for
the original issue.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This allows user to push data into a msg using sk_msg program types.
The format is as follows,
bpf_msg_push_data(msg, offset, len, flags)
this will insert 'len' bytes at offset 'offset'. For example to
prepend 10 bytes at the front of the message the user can,
bpf_msg_push_data(msg, 0, 10, 0);
This will invalidate data bounds so BPF user will have to then recheck
data bounds after calling this. After this the msg size will have been
updated and the user is free to write into the added bytes. We allow
any offset/len as long as it is within the (data, data_end) range.
However, a copy will be required if the ring is full and its possible
for the helper to fail with ENOMEM or EINVAL errors which need to be
handled by the BPF program.
This can be used similar to XDP metadata to pass data between sk_msg
layer and lower layers.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
We've been getting checksum errors involving small UDP packets, usually
59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
has been complaining that HW is providing bad checksums. Turns out the
problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98d1bb
("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").
The source of the problem is that when the bytes we are trimming start
at an odd address, as in the case of the 1 padding byte above,
skb_checksum() returns a byte-swapped value. We cannot just combine this
with skb->csum using csum_sub(). We need to use csum_block_sub() here
that takes into account the parity of the start address and handles the
swapping.
Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().
Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Signed-off-by: Dimitris Michailidis <dmichail@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes a problem introduced by:
commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")
When using netconsole on a bond, __netpoll_cleanup can asynchronously
recurse multiple times, each __netpoll_free_async call can result in
more __netpoll_free_async's. This means there is now a race between
cleanup_work queues on multiple netpoll_info's on multiple devices and
the configuration of a new netpoll. For example if a netconsole is set
to enable 0, reconfigured, and enable 1 immediately, this netconsole
will likely not work.
Given the reason for __netpoll_free_async is it can be called when rtnl
is not locked, if it is locked, we should be able to execute
synchronously. It appears to be locked everywhere it's called from.
Generalize the design pattern from the teaming driver for current
callers of __netpoll_free_async.
CC: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before using the psock returned by sk_psock_get() when adding it to a
sockmap we need to ensure it is actually a sockmap based psock.
Previously we were only checking this after incrementing the reference
counter which was an error. This resulted in a slab-out-of-bounds
error when the psock was not actually a sockmap type.
This moves the check up so the reference counter is only used
if it is a sockmap psock.
Eric reported the following KASAN BUG,
BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
BUG: KASAN: slab-out-of-bounds in refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120
Read of size 4 at addr ffff88019548be58 by task syz-executor4/22387
CPU: 1 PID: 22387 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #264
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120
sk_psock_get include/linux/skmsg.h:379 [inline]
sock_map_link.isra.6+0x41f/0xe30 net/core/sock_map.c:178
sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669
sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738
map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
skb. This patch enables direct access of skb for these programs.
Two helper functions bpf_compute_and_save_data_end() and
bpf_restore_data_end() are introduced. There are used in
__cgroup_bpf_run_filter_skb(), to compute proper data_end for the
BPF program, and restore original data afterwards.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs.
These maps support peek, pop and push operations that are exposed to eBPF
programs through the new bpf_map[peek/pop/push] helpers. Those operations
are exposed to userspace applications through the already existing
syscalls in the following way:
BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop
BPF_MAP_UPDATE_ELEM -> push
Queue/stack maps are implemented using a buffer, tail and head indexes,
hence BPF_F_NO_PREALLOC is not supported.
As opposite to other maps, queue and stack do not use RCU for protecting
maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE
argument that is a pointer to a memory zone where to save the value of a
map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not
be passed as an extra argument.
Our main motivation for implementing queue/stack maps was to keep track
of a pool of elements, like network ports in a SNAT, however we forsee
other use cases, like for exampling saving last N kernel events in a map
and then analysing from userspace.
Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
net/sched/cls_api.c has overlapping changes to a call to
nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
to the 5th argument, and another (from 'net-next') added cb->extack
instead of NULL to the 6th argument.
net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
code which moved (to mr_table_dump)) in 'net-next'. Thanks to David
Ahern for the heads up.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 6fe9487892b32cb1c8b8b0d552ed7222a527fe30.
It is causing more serious regressions than the RCU warning
it is fixing.
Signed-off-by: David S. Miller <davem@davemloft.net>
|