summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-01-15io_uring: ensure workqueue offload grabs ring mutex for poll listJens Axboe
A previous commit moved the locking for the async sqthread, but didn't take into account that the io-wq workers still need it. We can't use req->in_async for this anymore as both the sqthread and io-wq workers set it, gate the need for locking on io_wq_current_is_worker() instead. Fixes: 8a4955ff1cca ("io_uring: sqthread should grab ctx->uring_lock for submissions") Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-15block: fix an integer overflow in logical block sizeMikulas Patocka
Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-15io_uring: clear req->result always before issuing a read/write requestBijan Mottahedeh
req->result is cleared when io_issue_sqe() calls io_read/write_pre() routines. Those routines however are not called when the sqe argument is NULL, which is the case when io_issue_sqe() is called from io_wq_submit_work(). io_issue_sqe() may then examine a stale result if a polled request had previously failed with -EAGAIN: if (ctx->flags & IORING_SETUP_IOPOLL) { if (req->result == -EAGAIN) return -EAGAIN; io_iopoll_req_issued(req); } and in turn cause a subsequently completed request to be re-issued in io_wq_submit_work(). Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-15scsi: mptfusion: Fix double fetch bug in ioctlDan Carpenter
Tom Hatskevich reported that we look up "iocp" then, in the called functions we do a second copy_from_user() and look it up again. The problem that could cause is: drivers/message/fusion/mptctl.c 674 /* All of these commands require an interrupt or 675 * are unknown/illegal. 676 */ 677 if ((ret = mptctl_syscall_down(iocp, nonblock)) != 0) ^^^^ We take this lock. 678 return ret; 679 680 if (cmd == MPTFWDOWNLOAD) 681 ret = mptctl_fw_download(arg); ^^^ Then the user memory changes and we look up "iocp" again but a different one so now we are holding the incorrect lock and have a race condition. 682 else if (cmd == MPTCOMMAND) 683 ret = mptctl_mpt_command(arg); The security impact of this bug is not as bad as it could have been because these operations are all privileged and root already has enormous destructive power. But it's still worth fixing. This patch passes the "iocp" pointer to the functions to avoid the second lookup. That deletes 100 lines of code from the driver so it's a nice clean up as well. Link: https://lore.kernel.org/r/20200114123414.GA7957@kadam Reported-by: Tom Hatskevich <tom2001tom.23@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-15scsi: storvsc: Correctly set number of hardware queues for IDE diskLong Li
Commit 0ed881027690 ("scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue") introduced a regression for disks attached to IDE. For these disks the host VSP only offers one VMBUS channel. Setting multiple queues can overload the VMBUS channel and result in performance drop for high queue depth workload on system with large number of CPUs. Fix it by leaving the number of hardware queues to 1 (default value) for IDE disks. Fixes: 0ed881027690 ("scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue") Link: https://lore.kernel.org/r/1578960516-108228-1-git-send-email-longli@linuxonhyperv.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-15scsi: fnic: fix invalid stack accessArnd Bergmann
gcc -O3 warns that some local variables are not properly initialized: drivers/scsi/fnic/vnic_dev.c: In function 'fnic_dev_hang_notify': drivers/scsi/fnic/vnic_dev.c:511:16: error: 'a0' is used uninitialized in this function [-Werror=uninitialized] vdev->args[0] = *a0; ~~~~~~~~~~~~~~^~~~~ drivers/scsi/fnic/vnic_dev.c:691:6: note: 'a0' was declared here u64 a0, a1; ^~ drivers/scsi/fnic/vnic_dev.c:512:16: error: 'a1' is used uninitialized in this function [-Werror=uninitialized] vdev->args[1] = *a1; ~~~~~~~~~~~~~~^~~~~ drivers/scsi/fnic/vnic_dev.c:691:10: note: 'a1' was declared here u64 a0, a1; ^~ drivers/scsi/fnic/vnic_dev.c: In function 'fnic_dev_mac_addr': drivers/scsi/fnic/vnic_dev.c:512:16: error: 'a1' is used uninitialized in this function [-Werror=uninitialized] vdev->args[1] = *a1; ~~~~~~~~~~~~~~^~~~~ drivers/scsi/fnic/vnic_dev.c:698:10: note: 'a1' was declared here u64 a0, a1; ^~ Apparently the code relies on the local variables occupying adjacent memory locations in the same order, but this is of course not guaranteed. Use an array of two u64 variables where needed to make it work correctly. I suspect there is also an endianness bug here, but have not digged in deep enough to be sure. Fixes: 5df6d737dd4b ("[SCSI] fnic: Add new Cisco PCI-Express FCoE HBA") Fixes: mmtom ("init/Kconfig: enable -O3 for all arches") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200107201602.4096790-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-15selftests/bpf: Add whitelist/blacklist of test names to test_progsAndrii Nakryiko
Add ability to specify a list of test name substrings for selecting which tests to run. So now -t is accepting a comma-separated list of strings, similarly to how -n accepts a comma-separated list of test numbers. Additionally, add ability to blacklist tests by name. Blacklist takes precedence over whitelist. Blacklisting is important for cases where it's known that some tests can't pass (e.g., due to perf hardware events that are not available within VM). This is going to be used for libbpf testing in Travis CI in its Github repo. Example runs with just whitelist and whitelist + blacklist: $ sudo ./test_progs -tattach,core/existence #1 attach_probe:OK #6 cgroup_attach_autodetach:OK #7 cgroup_attach_multi:OK #8 cgroup_attach_override:OK #9 core_extern:OK #10/44 existence:OK #10/45 existence___minimal:OK #10/46 existence__err_int_sz:OK #10/47 existence__err_int_type:OK #10/48 existence__err_int_kind:OK #10/49 existence__err_arr_kind:OK #10/50 existence__err_arr_value_type:OK #10/51 existence__err_struct_type:OK #10 core_reloc:OK #19 flow_dissector_reattach:OK #60 tp_attach_query:OK Summary: 8/8 PASSED, 0 SKIPPED, 0 FAILED $ sudo ./test_progs -tattach,core/existence -bcgroup,flow/arr #1 attach_probe:OK #9 core_extern:OK #10/44 existence:OK #10/45 existence___minimal:OK #10/46 existence__err_int_sz:OK #10/47 existence__err_int_type:OK #10/48 existence__err_int_kind:OK #10/51 existence__err_struct_type:OK #10 core_reloc:OK #60 tp_attach_query:OK Summary: 4/6 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Julia Kartseva <hex@fb.com> Link: https://lore.kernel.org/bpf/20200116005549.3644118-1-andriin@fb.com
2020-01-15riscv: make sure the cores stay looping in .Lsecondary_parkGreentime Hu
The code in secondary_park is currently placed in the .init section. The kernel reclaims and clears this code when it finishes booting. That causes the cores parked in it to go to somewhere unpredictable, so we move this function out of init to make sure the cores stay looping there. The instruction bgeu a0, t0, .Lsecondary_park may have "a relocation truncated to fit" issue during linking time. It is because that sections are too far to jump. Let's use tail to jump to the .Lsecondary_park. Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Anup Patel <anup.patel@sifive.com> Cc: Andreas Schwab <schwab@suse.de> Cc: stable@vger.kernel.org Fixes: 76d2a0493a17d ("RISC-V: Init and Halt Code") Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2020-01-16Merge tag 'amd-drm-fixes-5.5-2020-01-15' of ↵Dave Airlie
git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.5-2020-01-15: - Update golden settings for renoir - eDP fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200115235141.4149-1-alexander.deucher@amd.com
2020-01-15Merge branch 'bpftool-improvements'Alexei Starovoitov
Martin Lau says: ==================== When a map is storing a kernel's struct, its map_info->btf_vmlinux_value_type_id is set. The first map type supporting it is BPF_MAP_TYPE_STRUCT_OPS. This series adds support to dump this kind of map with BTF. The first two patches are bug fixes which are only applicable to bpf-next. Please see individual patches for details. v3: - Remove unnecessary #include "libbpf_internal.h" from patch 5 v2: - Expose bpf_find_kernel_btf() as a LIBBPF_API in patch 3 (Andrii) - Cache btf_vmlinux in bpftool/map.c (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-15bpftool: Support dumping a map with btf_vmlinux_value_type_idMartin KaFai Lau
This patch makes bpftool support dumping a map's value properly when the map's value type is a type of the running kernel's btf. (i.e. map_info.btf_vmlinux_value_type_id is set instead of map_info.btf_value_type_id). The first usecase is for the BPF_MAP_TYPE_STRUCT_OPS. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230044.1103008-1-kafai@fb.com
2020-01-15bpftool: Add struct_ops map nameMartin KaFai Lau
This patch adds BPF_MAP_TYPE_STRUCT_OPS to "struct_ops" name mapping so that "bpftool map show" can print the "struct_ops" map type properly. [root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool map show id 8 8: struct_ops name dctcp flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 7 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230037.1102674-1-kafai@fb.com
2020-01-15libbpf: Expose bpf_find_kernel_btf as a LIBBPF_APIMartin KaFai Lau
This patch exposes bpf_find_kernel_btf() as a LIBBPF_API. It will be used in 'bpftool map dump' in a following patch to dump a map with btf_vmlinux_value_type_id set. bpf_find_kernel_btf() is renamed to libbpf_find_kernel_btf() and moved to btf.c. As <linux/kernel.h> is included, some of the max/min type casting needs to be fixed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230031.1102305-1-kafai@fb.com
2020-01-15bpftool: Fix missing BTF output for json during map dumpMartin KaFai Lau
The btf availability check is only done for plain text output. It causes the whole BTF output went missing when json_output is used. This patch simplifies the logic a little by avoiding passing "int btf" to map_dump(). For plain text output, the btf_wtr is only created when the map has BTF (i.e. info->btf_id != 0). The nullness of "json_writer_t *wtr" in map_dump() alone can decide if dumping BTF output is needed. As long as wtr is not NULL, map_dump() will print out the BTF-described data whenever a map has BTF available (i.e. info->btf_id != 0) regardless of json or plain-text output. In do_dump(), the "int btf" is also renamed to "int do_plain_btf". Fixes: 99f9863a0c45 ("bpftool: Match maps by name") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Paul Chaignon <paul.chaignon@orange.com> Link: https://lore.kernel.org/bpf/20200115230025.1101828-1-kafai@fb.com
2020-01-15bpftool: Fix a leak of btf objectMartin KaFai Lau
When testing a map has btf or not, maps_have_btf() tests it by actually getting a btf_fd from sys_bpf(BPF_BTF_GET_FD_BY_ID). However, it forgot to btf__free() it. In maps_have_btf() stage, there is no need to test it by really calling sys_bpf(BPF_BTF_GET_FD_BY_ID). Testing non zero info.btf_id is good enough. Also, the err_close case is unnecessary, and also causes double close() because the calling func do_dump() will close() all fds again. Fixes: 99f9863a0c45 ("bpftool: Match maps by name") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Paul Chaignon <paul.chaignon@orange.com> Link: https://lore.kernel.org/bpf/20200115230019.1101352-1-kafai@fb.com
2020-01-15drm/amd/display: Reorder detect_edp_sink_caps before link settings read.Mario Kleiner
read_current_link_settings_on_detect() on eDP 1.4+ may use the edp_supported_link_rates table which is set up by detect_edp_sink_caps(), so that function needs to be called first. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Martin Leung <martin.leung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2020-01-15drm/amdgpu: update goldensetting for renoirAaron Liu
Update mmSDMA0_UTCL1_WATERMK golden setting for renoir. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-15Merge branch 'bpf-sockmap-tls-fixes'Daniel Borkmann
John Fastabend says: ==================== To date our usage of sockmap/tls has been fairly simple, the BPF programs did only well-defined pop, push, pull and apply/cork operations. Now that we started to push more complex programs into sockmap we uncovered a series of issues addressed here. Further OpenSSL3.0 version should be released soon with kTLS support so its important to get any remaining issues on BPF and kTLS support resolved. Additionally, I have a patch under development to allow sockmap to be enabled/disabled at runtime for Cilium endpoints. This allows us to stress the map insert/delete with kTLS more than previously where Cilium only added the socket to the map when it entered ESTABLISHED state and never touched it from the control path side again relying on the sockets own close() hook to remove it. To test I have a set of test cases in test_sockmap.c that expose these issues. Once we get fixes here merged and in bpf-next I'll submit the tests to bpf-next tree to ensure we don't regress again. Also I've run these patches in the Cilium CI with OpenSSL (master branch) this will run tools such as netperf, ab, wrk2, curl, etc. to get a broad set of testing. I'm aware of two more issues that we are working to resolve in another couple (probably two) patches. First we see an auth tag corruption in kTLS when sending small 1byte chunks under stress. I've not pinned this down yet. But, guessing because its under 1B stress tests it must be some error path being triggered. And second we need to ensure BPF RX programs are not skipped when kTLS ULP is loaded. This breaks some of the sockmap selftests when running with kTLS. I'll send a follow up for this. v2: I dropped a patch that added !0 size check in tls_push_record this originated from a panic I caught awhile ago with a trace in the crypto stack. But I can not reproduce it anymore so will dig into that and send another patch later if needed. Anyways after a bit of thought it would be nicer if tls/crypto/bpf didn't require special case handling for the !0 size. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-01-15bpf: Sockmap/tls, fix pop data with SK_DROP return codeJohn Fastabend
When user returns SK_DROP we need to reset the number of copied bytes to indicate to the user the bytes were dropped and not sent. If we don't reset the copied arg sendmsg will return as if those bytes were copied giving the user a positive return value. This works as expected today except in the case where the user also pops bytes. In the pop case the sg.size is reduced but we don't correctly account for this when copied bytes is reset. The popped bytes are not accounted for and we return a small positive value potentially confusing the user. The reason this happens is due to a typo where we do the wrong comparison when accounting for pop bytes. In this fix notice the if/else is not needed and that we have a similar problem if we push data except its not visible to the user because if delta is larger the sg.size we return a negative value so it appears as an error regardless. Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chainingJohn Fastabend
Its possible through a set of push, pop, apply helper calls to construct a skmsg, which is just a ring of scatterlist elements, with the start value larger than the end value. For example, end start |_0_|_1_| ... |_n_|_n+1_| Where end points at 1 and start points and n so that valid elements is the set {n, n+1, 0, 1}. Currently, because we don't build the correct chain only {n, n+1} will be sent. This adds a check and sg_chain call to correctly submit the above to the crypto and tls send path. Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt bufJohn Fastabend
It is possible to build a plaintext buffer using push helper that is larger than the allocated encrypt buffer. When this record is pushed to crypto layers this can result in a NULL pointer dereference because the crypto API expects the encrypt buffer is large enough to fit the plaintext buffer. Kernel splat below. To resolve catch the cases this can happen and split the buffer into two records to send individually. Unfortunately, there is still one case to handle where the split creates a zero sized buffer. In this case we merge the buffers and unmark the split. This happens when apply is zero and user pushed data beyond encrypt buffer. This fixes the original case as well because the split allocated an encrypt buffer larger than the plaintext buffer and the merge simply moves the pointers around so we now have a reference to the new (larger) encrypt buffer. Perhaps its not ideal but it seems the best solution for a fixes branch and avoids handling these two cases, (a) apply that needs split and (b) non apply case. The are edge cases anyways so optimizing them seems not necessary unless someone wants later in next branches. [ 306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008 [...] [ 306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0 [...] [ 306.770350] Call Trace: [ 306.770956] scatterwalk_map_and_copy+0x6c/0x80 [ 306.772026] gcm_enc_copy_hash+0x4b/0x50 [ 306.772925] gcm_hash_crypt_remain_continue+0xef/0x110 [ 306.774138] gcm_hash_crypt_continue+0xa1/0xb0 [ 306.775103] ? gcm_hash_crypt_continue+0xa1/0xb0 [ 306.776103] gcm_hash_assoc_remain_continue+0x94/0xa0 [ 306.777170] gcm_hash_assoc_continue+0x9d/0xb0 [ 306.778239] gcm_hash_init_continue+0x8f/0xa0 [ 306.779121] gcm_hash+0x73/0x80 [ 306.779762] gcm_encrypt_continue+0x6d/0x80 [ 306.780582] crypto_gcm_encrypt+0xcb/0xe0 [ 306.781474] crypto_aead_encrypt+0x1f/0x30 [ 306.782353] tls_push_record+0x3b9/0xb20 [tls] [ 306.783314] ? sk_psock_msg_verdict+0x199/0x300 [ 306.784287] bpf_exec_tx_verdict+0x3f2/0x680 [tls] [ 306.785357] tls_sw_sendmsg+0x4a3/0x6a0 [tls] test_sockmap test signature to trigger bug, [TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,): Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, msg_push_data may leave end mark in placeJohn Fastabend
Leaving an incorrect end mark in place when passing to crypto layer will cause crypto layer to stop processing data before all data is encrypted. To fix clear the end mark on push data instead of expecting users of the helper to clear the mark value after the fact. This happens when we push data into the middle of a skmsg and have room for it so we don't do a set of copies that already clear the end flag. Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-6-john.fastabend@gmail.com
2020-01-15bpf: Sockmap, skmsg helper overestimates push, pull, and pop boundsJohn Fastabend
In the push, pull, and pop helpers operating on skmsg objects to make data writable or insert/remove data we use this bounds check to ensure specified data is valid, /* Bounds checks: start and pop must be inside message */ if (start >= offset + l || last >= msg->sg.size) return -EINVAL; The problem here is offset has already included the length of the current element the 'l' above. So start could be past the end of the scatterlist element in the case where start also points into an offset on the last skmsg element. To fix do the accounting slightly different by adding the length of the previous entry to offset at the start of the iteration. And ensure its initialized to zero so that the first iteration does nothing. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data") Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-5-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, push write_space updates through ulp updatesJohn Fastabend
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state and call tcp_update_ulp() to push updates to TLS ULP on top. However, we don't push the write_space callback up and instead simply overwrite the op with the psock stored previous op. This may or may not be correct so to ensure we don't overwrite the TLS write space hook pass this field to the ULP and have it fixup the ctx. This completes a previous fix that pushed the ops through to the ULP but at the time missed doing this for write_space, presumably because write_space TLS hook was added around the same time. Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
2020-01-15bpf: Sockmap, ensure sock lock held during tear downJohn Fastabend
The sock_map_free() and sock_hash_free() paths used to delete sockmap and sockhash maps walk the maps and destroy psock and bpf state associated with the socks in the map. When done the socks no longer have BPF programs attached and will function normally. This can happen while the socks in the map are still "live" meaning data may be sent/received during the walk. Currently, though we don't take the sock_lock when the psock and bpf state is removed through this path. Specifically, this means we can be writing into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc. while they are also being called from the networking side. This is not safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we believed it was safe. Further its not clear to me its even a good idea to try and do this on "live" sockets while networking side might also be using the socket. Instead of trying to reason about using the socks from both sides lets realize that every use case I'm aware of rarely deletes maps, in fact kubernetes/Cilium case builds map at init and never tears it down except on errors. So lets do the simple fix and grab sock lock. This patch wraps sock deletes from maps in sock lock and adds some annotations so we catch any other cases easier. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loopJohn Fastabend
When a sockmap is free'd and a socket in the map is enabled with tls we tear down the bpf context on the socket, the psock struct and state, and then call tcp_update_ulp(). The tcp_update_ulp() call is to inform the tls stack it needs to update its saved sock ops so that when the tls socket is later destroyed it doesn't try to call the now destroyed psock hooks. This is about keeping stacked ULPs in good shape so they always have the right set of stacked ops. However, recently unhash() hook was removed from TLS side. But, the sockmap/bpf side is not doing any extra work to update the unhash op when is torn down instead expecting TLS side to manage it. So both TLS and sockmap believe the other side is managing the op and instead no one updates the hook so it continues to point at tcp_bpf_unhash(). When unhash hook is called we call tcp_bpf_unhash() which detects the psock has already been destroyed and calls sk->sk_prot_unhash() which calls tcp_bpf_unhash() yet again and so on looping and hanging the core. To fix have sockmap tear down logic fixup the stale pointer. Fixes: 5d92e631b8be ("net/tls: partially revert fix transition through disconnect with close") Reported-by: syzbot+83979935eb6304f8cd46@syzkaller.appspotmail.com Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-2-john.fastabend@gmail.com
2020-01-15PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as brokenAlex Deucher
To account for parts of the chip that are "harvested" (disabled) due to silicon flaws, caches on some AMD GPUs must be initialized before ATS is enabled. ATS is normally enabled by the IOMMU driver before the GPU driver loads, so this cache initialization would have to be done in a quirk, but that's too complex to be practical. For Navi14 (device ID 0x7340), this initialization is done by the VBIOS, but apparently some boards went to production with an older VBIOS that doesn't do it. Disable ATS for those boards. Link: https://lore.kernel.org/r/20200114205523.1054271-3-alexander.deucher@amd.com Bug: https://gitlab.freedesktop.org/drm/amd/issues/1015 See-also: d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken") See-also: 9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken") [bhelgaas: squash into one patch, simplify slightly, commit log] Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org
2020-01-15devlink: fix typos in qed documentationJacob Keller
Review of the recently added documentation file for the qed driver noticed a couple of typos. Fix them now. Noticed-by: Michal Kalderon <mkalderon@marvell.com> Fixes: 0f261c3ca09e ("devlink: add a driver-specific file for the qed driver") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15pptp: support sockets bound to an interfaceUlrich Weber
use sk_bound_dev_if for route lookup as already done in most of the other ip_route_output_ports() calls. Since most PPPoA providers use 10.0.0.138 as default gateway IP this will allow connections to multiple PPTP providers with the same IP address over different interfaces. Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge branch 'stmmac-Fix-selftests-in-Synopsys-AXS101-board'David S. Miller
Jose Abreu says: ==================== net: stmmac: Fix selftests in Synopsys AXS101 board Set of fixes for sefltests so that they work in Synopsys AXS101 board. Final output: $ ethtool -t eth0 The test result is PASS The test extra info: 1. MAC Loopback 0 2. PHY Loopback -95 3. MMC Counters 0 4. EEE -95 5. Hash Filter MC 0 6. Perfect Filter UC 0 7. MC Filter 0 8. UC Filter 0 9. Flow Control -95 10. RSS -95 11. VLAN Filtering -95 12. VLAN Filtering (perf) -95 13. Double VLAN Filter -95 14. Double VLAN Filter (perf) -95 15. Flexible RX Parser -95 16. SA Insertion (desc) -95 17. SA Replacement (desc) -95 18. SA Insertion (reg) -95 19. SA Replacement (reg) -95 20. VLAN TX Insertion -95 21. SVLAN TX Insertion -95 22. L3 DA Filtering -95 23. L3 SA Filtering -95 24. L4 DA TCP Filtering -95 25. L4 SA TCP Filtering -95 26. L4 DA UDP Filtering -95 27. L4 SA UDP Filtering -95 28. ARP Offload -95 29. Jumbo Frame 0 30. Multichannel Jumbo -95 31. Split Header -95 Description: 1) Fixes the unaligned accesses that caused CPU halt in Synopsys AXS101 boards. 2) Fixes the VLAN tests when filtering failed to work. 3) Fixes the VLAN Perfect tests when filtering is not available in HW. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: stmmac: selftests: Guard VLAN Perfect test against non supported HWJose Abreu
When HW does not support perfect filtering the feature will not be enabled in the net_device. Add a check for this to prevent failures. Fixes: 1b2250a04c1f ("net: stmmac: selftests: Add tests for VLAN Perfect Filtering") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: stmmac: selftests: Mark as fail when received VLAN ID != expectedJose Abreu
When the VLAN ID does not match the expected one it means filter failed in HW. Fix it. Fixes: 94e18382003c ("net: stmmac: selftests: Add selftest for VLAN TX Offload") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15net: stmmac: selftests: Make it work in Synopsys AXS101 boardsJose Abreu
Synopsys AXS101 boards do not support unaligned memory loads or stores. Change the selftests mechanism to explicity: - Not add extra alignment in TX SKB - Use the unaligned version of ether_addr_equal() Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge branch 'bpf-batch-ops'Alexei Starovoitov
Brian Vazquez says: ==================== This patch series introduce batch ops that can be added to bpf maps to lookup/lookup_and_delete/update/delete more than 1 element at the time, this is specially useful when syscall overhead is a problem and in case of hmap it will provide a reliable way of traversing them. The implementation inclues a generic approach that could potentially be used by any bpf map and adds it to arraymap, it also includes the specific implementation of hashmaps which are traversed using buckets instead of keys. The bpf syscall subcommands introduced are: BPF_MAP_LOOKUP_BATCH BPF_MAP_LOOKUP_AND_DELETE_BATCH BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP_*_BATCH commands */ __aligned_u64 in_batch; /* start batch, * NULL to start from beginning */ __aligned_u64 out_batch; /* output: next start batch */ __aligned_u64 keys; __aligned_u64 values; __u32 count; /* input/output: * input: # of key/value * elements * output: # of filled elements */ __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch and out_batch are only used for lookup and lookup_and_delete since those are the only two operations that attempt to traverse the map. update/delete batch ops should provide the keys/values that user wants to modify. Here are the previous discussions on the batch processing: - https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ - https://lore.kernel.org/bpf/20190829064502.2750303-1-yhs@fb.com/ - https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Changelog sinve v4: - Remove unnecessary checks from libbpf API (Andrii Nakryiko) - Move DECLARE_LIBBPF_OPTS with all var declarations (Andrii Nakryiko) - Change bucket internal buffer size to 5 entries (Yonghong Song) - Fix some minor bugs in hashtab batch ops implementation (Yonghong Song) Changelog sinve v3: - Do not use copy_to_user inside atomic region (Yonghong Song) - Use _opts approach on libbpf APIs (Andrii Nakryiko) - Drop generic_map_lookup_and_delete_batch support - Free malloc-ed memory in tests (Yonghong Song) - Reverse christmas tree (Yonghong Song) - Add acked labels Changelog sinve v2: - Add generic batch support for lpm_trie and test it (Yonghong Song) - Use define MAP_LOOKUP_RETRIES for retries (John Fastabend) - Return errors directly and remove labels (Yonghong Song) - Insert new API functions into libbpf alphabetically (Yonghong Song) - Change hlist_nulls_for_each_entry_rcu to hlist_nulls_for_each_entry_safe in htab batch ops (Yonghong Song) Changelog since v1: - Fix SOB ordering and remove Co-authored-by tag (Alexei Starovoitov) Changelog since RFC: - Change batch to in_batch and out_batch to support more flexible opaque values to iterate the bpf maps. - Remove update/delete specific batch ops for htab and use the generic implementations instead. ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-15net/wan/fsl_ucc_hdlc: fix out of bounds write on array utdm_infoColin Ian King
Array utdm_info is declared as an array of MAX_HDLC_NUM (4) elements however up to UCC_MAX_NUM (8) elements are potentially being written to it. Currently we have an array out-of-bounds write error on the last 4 elements. Fix this by making utdm_info UCC_MAX_NUM elements in size. Addresses-Coverity: ("Out-of-bounds write") Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Merge tag 'batadv-next-for-davem-20200114' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - fix typo and kerneldocs, by Sven Eckelmann - use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer - silence some endian sparse warnings by adding annotations, by Sven Eckelmann - Update copyright years to 2020, by Sven Eckelmann - Disable deprecated sysfs configuration by default, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15drm/dp_mst: Have DP_Tx send one msg at a timeWayne Lin
[Why] Noticed this while testing MST with the 4 ports MST hub from StarTech.com. Sometimes can't light up monitors normally and get the error message as 'sideband msg build failed'. Look into aux transactions, found out that source sometimes will send out another down request before receiving the down reply of the previous down request. On the other hand, in drm_dp_get_one_sb_msg(), current code doesn't handle the interleaved replies case. Hence, source can't build up message completely and can't light up monitors. [How] For good compatibility, enforce source to send out one down request at a time. Add a flag, is_waiting_for_dwn_reply, to determine if the source can send out a down request immediately or not. - Check the flag before calling process_single_down_tx_qlock to send out a msg - Set the flag when successfully send out a down request - Clear the flag when successfully build up a down reply - Clear the flag when find erros during sending out a down request - Clear the flag when find errors during building up a down reply - Clear the flag when timeout occurs during waiting for a down reply - Use drm_dp_mst_kick_tx() to try to send another down request in queue at the end of drm_dp_mst_wait_tx_reply() (attempt to send out messages in queue when errors occur) Cc: Lyude Paul <lyude@redhat.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200113093649.11755-1-Wayne.Lin@amd.com
2020-01-15selftests/bpf: Add batch ops testing to array bpf mapBrian Vazquez
Tested bpf_map_lookup_batch() and bpf_map_update_batch() functionality. $ ./test_maps ... test_array_map_batch_ops:PASS ... Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-10-brianvv@google.com
2020-01-15selftests/bpf: Add batch ops testing for htab and htab_percpu mapYonghong Song
Tested bpf_map_lookup_batch(), bpf_map_lookup_and_delete_batch(), bpf_map_update_batch(), and bpf_map_delete_batch() functionality. $ ./test_maps ... test_htab_map_batch_ops:PASS test_htab_percpu_map_batch_ops:PASS ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-9-brianvv@google.com
2020-01-15libbpf: Add libbpf support to batch opsYonghong Song
Added four libbpf API functions to support map batch operations: . int bpf_map_delete_batch( ... ) . int bpf_map_lookup_batch( ... ) . int bpf_map_lookup_and_delete_batch( ... ) . int bpf_map_update_batch( ... ) Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-8-brianvv@google.com
2020-01-15tools/bpf: Sync uapi header bpf.hYonghong Song
sync uapi header include/uapi/linux/bpf.h to tools/include/uapi/linux/bpf.h Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-7-brianvv@google.com
2020-01-15bpf: Add batch ops to all htab bpf mapYonghong Song
htab can't use generic batch support due some problematic behaviours inherent to the data structre, i.e. while iterating the bpf map a concurrent program might delete the next entry that batch was about to use, in that case there's no easy solution to retrieve the next entry, the issue has been discussed multiple times (see [1] and [2]). The only way hmap can be traversed without the problem previously exposed is by making sure that the map is traversing entire buckets. This commit implements those strict requirements for hmap, the implementation follows the same interaction that generic support with some exceptions: - If keys/values buffer are not big enough to traverse a bucket, ENOSPC will be returned. - out_batch contains the value of the next bucket in the iteration, not the next key, but this is transparent for the user since the user should never use out_batch for other than bpf batch syscalls. This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete batch ops it is possible to use the generic implementations. [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com
2020-01-15bpf: Add lookup and update batch ops to arraymapBrian Vazquez
This adds the generic batch ops functionality to bpf arraymap, note that since deletion is not a valid operation for arraymap, only batch and lookup are added. Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200115184308.162644-5-brianvv@google.com
2020-01-15bpf: Add generic support for update and delete batch opsBrian Vazquez
This commit adds generic support for update and delete batch ops that can be used for almost all the bpf maps. These commands share the same UAPI attr that lookup and lookup_and_delete batch ops use and the syscall commands are: BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The main difference between update/delete and lookup batch ops is that for update/delete keys/values must be specified for userspace and because of that, neither in_batch nor out_batch are used. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com
2020-01-15bpf: Add generic support for lookup batch opBrian Vazquez
This commit introduces generic support for the bpf_map_lookup_batch. This implementation can be used by almost all the bpf maps since its core implementation is relying on the existing map_get_next_key and map_lookup_elem. The bpf syscall subcommand introduced is: BPF_MAP_LOOKUP_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP_*_BATCH commands */ __aligned_u64 in_batch; /* start batch, * NULL to start from beginning */ __aligned_u64 out_batch; /* output: next start batch */ __aligned_u64 keys; __aligned_u64 values; __u32 count; /* input/output: * input: # of key/value * elements * output: # of filled elements */ __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch/out_batch are opaque values use to communicate between user/kernel space, in_batch/out_batch must be of key_size length. To start iterating from the beginning in_batch must be null, count is the # of key/value elements to retrieve. Note that the 'keys' buffer must be a buffer of key_size * count size and the 'values' buffer must be value_size * count, where value_size must be aligned to 8 bytes by userspace if it's dealing with percpu maps. 'count' will contain the number of keys/values successfully retrieved. Note that 'count' is an input/output variable and it can contain a lower value after a call. If there's no more entries to retrieve, ENOENT will be returned. If error is ENOENT, count might be > 0 in case it copied some values but there were no more entries to retrieve. Note that if the return code is an error and not -EFAULT, count indicates the number of elements successfully processed. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com
2020-01-15bpf: Add bpf_map_{value_size, update_value, map_copy_value} functionsBrian Vazquez
This commit moves reusable code from map_lookup_elem and map_update_elem to avoid code duplication in kernel/bpf/syscall.c. Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200115184308.162644-2-brianvv@google.com
2020-01-15Merge tag 'batadv-net-for-davem-20200114' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Fix DAT candidate selection on little endian systems, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15bpf: Fix incorrect verifier simulation of ARSH under ALU32Daniel Borkmann
Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (85) call bpf_get_socket_cookie#46 1: R0_w=invP(id=0) R10=fp0 1: (57) r0 &= 808464432 2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 2: (14) w0 -= 810299440 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 3: (c4) w0 s>>= 1 4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 4: (76) if w0 s>= 0x30303030 goto pc+216 221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 221: (95) exit processed 6 insns (limit 1000000) [...] Taking a closer look, the program was xlated as follows: # ./bpftool p d x i 12 0: (85) call bpf_get_socket_cookie#7800896 1: (bf) r6 = r0 2: (57) r6 &= 808464432 3: (14) w6 -= 810299440 4: (c4) w6 s>>= 1 5: (76) if w6 s>= 0x30303030 goto pc+216 6: (05) goto pc-1 7: (05) goto pc-1 8: (05) goto pc-1 [...] 220: (05) goto pc-1 221: (05) goto pc-1 222: (95) exit Meaning, the visible effect is very similar to f54c7898ed1c ("bpf: Fix precision tracking for unbounded scalars"), that is, the fall-through branch in the instruction 5 is considered to be never taken given the conclusion from the min/max bounds tracking in w6, and therefore the dead-code sanitation rewrites it as goto pc-1. However, real-life input disagrees with verification analysis since a soft-lockup was observed. The bug sits in the analysis of the ARSH. The definition is that we shift the target register value right by K bits through shifting in copies of its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the register into 32 bit mode, same happens after simulating the operation. However, for the case of simulating the actual ARSH, we don't take the mode into account and act as if it's always 64 bit, but location of sign bit is different: dst_reg->smin_value >>= umin_val; dst_reg->smax_value >>= umin_val; dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val); Consider an unknown R0 where bpf_get_socket_cookie() (or others) would for example return 0xffff. With the above ARSH simulation, we'd see the following results: [...] 1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0 1: (85) call bpf_get_socket_cookie#46 2: R0_w=invP(id=0) R10=fp0 2: (57) r0 &= 808464432 -> R0_runtime = 0x3030 3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 3: (14) w0 -= 810299440 -> R0_runtime = 0xcfb40000 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 (0xffffffff) 4: (c4) w0 s>>= 1 -> R0_runtime = 0xe7da0000 5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 (0x67c00000) (0x7ffbfff8) [...] In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011 0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into 0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000' and means here that the simulation didn't retain the sign bit. With above logic, the updates happen on the 64 bit min/max bounds and given we coerced the register, the sign bits of the bounds are cleared as well, meaning, we need to force the simulation into s32 space for 32 bit alu mode. Verification after the fix below. We're first analyzing the fall-through branch on 32 bit signed >= test eventually leading to rejection of the program in this specific case: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r2 = 808464432 1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0 1: (85) call bpf_get_socket_cookie#46 2: R0_w=invP(id=0) R10=fp0 2: (bf) r6 = r0 3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0 3: (57) r6 &= 808464432 4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 4: (14) w6 -= 810299440 5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 5: (c4) w6 s>>= 1 6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0 (0x67c00000) (0xfffbfff8) 6: (76) if w6 s>= 0x30303030 goto pc+216 7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0 7: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 8 insns (limit 1000000) [...] Fixes: 9cbe1f5a32dc ("bpf/verifier: improve register value range tracking with ARSH") Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net
2020-01-15hv_netvsc: Fix memory leak when removing rndis deviceMohammed Gamal
kmemleak detects the following memory leak when hot removing a network device: unreferenced object 0xffff888083f63600 (size 256): comm "kworker/0:1", pid 12, jiffies 4294831717 (age 1113.676s) hex dump (first 32 bytes): 00 40 c7 33 80 88 ff ff 00 00 00 00 10 00 00 00 .@.3............ 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... backtrace: [<00000000d4a8f5be>] rndis_filter_device_add+0x117/0x11c0 [hv_netvsc] [<000000009c02d75b>] netvsc_probe+0x5e7/0xbf0 [hv_netvsc] [<00000000ddafce23>] vmbus_probe+0x74/0x170 [hv_vmbus] [<00000000046e64f1>] really_probe+0x22f/0xb50 [<000000005cc35eb7>] driver_probe_device+0x25e/0x370 [<0000000043c642b2>] bus_for_each_drv+0x11f/0x1b0 [<000000005e3d09f0>] __device_attach+0x1c6/0x2f0 [<00000000a72c362f>] bus_probe_device+0x1a6/0x260 [<0000000008478399>] device_add+0x10a3/0x18e0 [<00000000cf07b48c>] vmbus_device_register+0xe7/0x1e0 [hv_vmbus] [<00000000d46cf032>] vmbus_add_channel_work+0x8ab/0x1770 [hv_vmbus] [<000000002c94bb64>] process_one_work+0x919/0x17d0 [<0000000096de6781>] worker_thread+0x87/0xb40 [<00000000fbe7397e>] kthread+0x333/0x3f0 [<000000004f844269>] ret_from_fork+0x3a/0x50 rndis_filter_device_add() allocates an instance of struct rndis_device which never gets deallocated as rndis_filter_device_remove() sets net_device->extension which points to the rndis_device struct to NULL, leaving the rndis_device dangling. Since net_device->extension is eventually freed in free_netvsc_device(), we refrain from setting it to NULL inside rndis_filter_device_remove() Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15Bluetooth: hci_qca: Retry btsoc initialize when it failsRocky Liao
This patch adds the retry of btsoc initialization when it fails. There are reports that the btsoc initialization may fail on some platforms but the repro ratio is very low. The symptoms is the firmware downloading failed due to the UART write timed out. The failure may be caused by UART, platform HW or the btsoc itself but it's very difficlut to root cause, given the repro ratio is very low. Add a retry for the btsoc initialization can work around most of the failures and make Bluetooth finally works. Signed-off-by: Rocky Liao <rjliao@codeaurora.org> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>