linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-06-17	io_uring: add memory barrier to synchronize io_kiocb's result and ↵	Xiaoguang Wang
	iopoll_completed In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll completed are two independent store operations, to ensure that once iopoll_completed is ture and then req->result must been perceived by the cpu executing io_do_iopoll(), proper memory barrier should be used. And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is, we'll need to issue this io request using io-wq again. In order to just issue a single smp_rmb() on the completion side, move the re-submit work to io_iopoll_complete(). Cc: stable@vger.kernel.org Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> [axboe: don't set ->iopoll_completed for -EAGAIN retry] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-17	io_uring: don't fail links for EAGAIN error in IOPOLL mode	Xiaoguang Wang
	In IOPOLL mode, for EAGAIN error, we'll try to submit io request again using io-wq, so don't fail rest of links if this io request has links. Cc: stable@vger.kernel.org Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-17	Merge tag 'dma-mapping-5.8-3' of git://git.infradead.org/users/hch/dma-mapping	Linus Torvalds
	Pull dma-mapping fixes from Christoph Hellwig: "Fixes for the SEV atomic pool (Geert Uytterhoeven and David Rientjes)" * tag 'dma-mapping-5.8-3' of git://git.infradead.org/users/hch/dma-mapping: dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL dma-pool: fix too large DMA pools on medium memory size systems
2020-06-17	maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault	Christoph Hellwig
	Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17	maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault	Christoph Hellwig
	Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17	bpf: Document optval > PAGE_SIZE behavior for sockopt hooks	Stanislav Fomichev
	Extend existing doc with more details about requiring ctx->optlen = 0 for handling optval > PAGE_SIZE. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200617010416.93086-3-sdf@google.com
2020-06-17	selftests/bpf: Make sure optvals > PAGE_SIZE are bypassed	Stanislav Fomichev
	We are relying on the fact, that we can pass > sizeof(int) optvals to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes). In the BPF program we check that we can only touch PAGE_SIZE bytes, but the real optlen is PAGE_SIZE * 2. In both cases, we override it to some predefined value and trim the optlen. Also, let's modify exiting IP_TOS usecase to test optlen=0 case where BPF program just bypasses the data as is. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200617010416.93086-2-sdf@google.com
2020-06-17	bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE	Stanislav Fomichev
	Attaching to these hooks can break iptables because its optval is usually quite big, or at least bigger than the current PAGE_SIZE limit. David also mentioned some SCTP options can be big (around 256k). For such optvals we expose only the first PAGE_SIZE bytes to the BPF program. BPF program has two options: 1. Set ctx->optlen to 0 to indicate that the BPF's optval should be ignored and the kernel should use original userspace value. 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE. v5: * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov) * update the docs accordingly v4: * use temporary buffer to avoid optval == optval_end == NULL; this removes the corner case in the verifier that might assume non-zero PTR_TO_PACKET/PTR_TO_PACKET_END. v3: * don't increase the limit, bypass the argument v2: * proper comments formatting (Jakub Kicinski) Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: David Laight <David.Laight@ACULAB.COM> Link: https://lore.kernel.org/bpf/20200617010416.93086-1-sdf@google.com
2020-06-17	devmap: Use bpf_map_area_alloc() for allocating hash buckets	Toke Høiland-Jørgensen
	Syzkaller discovered that creating a hash of type devmap_hash with a large number of entries can hit the memory allocator limit for allocating contiguous memory regions. There's really no reason to use kmalloc_array() directly in the devmap code, so just switch it to the existing bpf_map_area_alloc() function that is used elsewhere. Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616142829.114173-1-toke@redhat.com
2020-06-17	xdp: Handle frame_sz in xdp_convert_zc_to_xdp_frame()	Hangbin Liu
	In commit 34cc0b338a61 we only handled the frame_sz in convert_to_xdp_frame(). This patch will also handle frame_sz in xdp_convert_zc_to_xdp_frame(). Fixes: 34cc0b338a61 ("xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616103518.2963410-1-liuhangbin@gmail.com
2020-06-17	tools headers UAPI: Sync linux/fs.h with the kernel sources	Arnaldo Carvalho de Melo
	To pick the changes from: b383a73f2b83 ("fs/ext4: Introduce DAX inode flag") And silence this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h' diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h It causes various beautifiers for things like fspick, fsmount, etc (see below) to get rebuilt, but this specific change doesn't make 'perf trace' be capable of decoding anything new, as we still don't decode what comes from ioctls, just its cmds. Details about the update: $ cp include/uapi/linux/fs.h tools/include/uapi/linux/fs.h $ git diff diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h index 379a612f8f1d..f44eb0a04afd 100644 --- a/tools/include/uapi/linux/fs.h +++ b/tools/include/uapi/linux/fs.h @@ -262,6 +262,7 @@ struct fsxattr { #define FS_EA_INODE_FL 0x00200000 /* Inode used for large EA / #define FS_EOFBLOCKS_FL 0x00400000 / Reserved for ext4 / #define FS_NOCOW_FL 0x00800000 / Do not cow file / +#define FS_DAX_FL 0x02000000 / Inode is DAX / #define FS_INLINE_DATA_FL 0x10000000 / Reserved for ext4 / #define FS_PROJINHERIT_FL 0x20000000 / Create with parents projid / #define FS_CASEFOLD_FL 0x40000000 / Folder is case insensitive */ $ m make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j8' parallel build INSTALL GTK UI CC /tmp/build/perf/builtin-trace.o DESCEND plugins CC /tmp/build/perf/trace/beauty/fsmount.o CC /tmp/build/perf/trace/beauty/fspick.o CC /tmp/build/perf/trace/beauty/mount_flags.o CC /tmp/build/perf/trace/beauty/move_mount.o CC /tmp/build/perf/trace/beauty/renameat.o CC /tmp/build/perf/trace/beauty/sync_file_range.o INSTALL trace_plugins LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf <SNIP> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17	tools include UAPI: Sync linux/vhost.h with the kernel sources	Arnaldo Carvalho de Melo
	To get the changes in: 776f395004d8 ("vhost_vdpa: Support config interrupt in vdpa") Silencing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h' diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h This automatically picks the new ioctl introduced in the above patch, making tools such as 'perf trace' aware of them and possibly allowing to use the strings in filters, etc: # perf trace -e ioctl --pid 7951 <SNIP> 0.178 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 0.194 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 0.209 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 0.224 (249.413 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.660 ( 0.011 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.675 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.686 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.697 ( 0.008 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.709 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.720 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.730 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.740 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.752 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.762 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.772 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.782 (120.138 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 370.201 ( 0.039 ms): CPU 0/KVM/8023 ioctl(fd: 12, cmd: KVM_IRQ_LINE_STATUS, arg: 0x7f744f9e1420) = 0 370.254 ( 0.052 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 370.575 ( 0.365 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 370.973 ( 0.028 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 371.015 ( 0.037 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 371.071 ( 0.009 ms): CPU 0/KVM/8023 ioctl(fd: 12, cmd: KVM_IRQ_LINE_STATUS, arg: 0x7f744f9e14b0) = 0 <SNIP> # Details about the update: $ diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h --- tools/include/uapi/linux/vhost.h 2020-04-16 13:19:12.056763843 -0300 +++ include/uapi/linux/vhost.h 2020-06-17 10:04:20.532056428 -0300 @@ -15,6 +15,8 @@ #include <linux/types.h> #include <linux/ioctl.h> +#define VHOST_FILE_UNBIND -1 + /* ioctls / #define VHOST_VIRTIO 0xAF @@ -140,4 +142,6 @@ / Get the max ring size. / #define VHOST_VDPA_GET_VRING_NUM _IOR(VHOST_VIRTIO, 0x76, __u16) +/ Set event fd for config interrupt/ +#define VHOST_VDPA_SET_CONFIG_CALL _IOW(VHOST_VIRTIO, 0x77, int) #endif $ $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2020-06-17 10:15:35.123275966 -0300 +++ after 2020-06-17 10:15:51.812482117 -0300 @@ -27,6 +27,7 @@ [0x72] = "VDPA_SET_STATUS", [0x74] = "VDPA_SET_CONFIG", [0x75] = "VDPA_SET_VRING_ENABLE", + [0x77] = "VDPA_SET_CONFIG_CALL", }; static const char vhost_virtio_ioctl_read_cmds[] = { [0x00] = "GET_FEATURES", $ This causes these parts to get rebuilt: CC /tmp/build/perf/trace/beauty/ioctl.o INSTALL trace_plugins LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zhu Lingshan <lingshan.zhu@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17	tools arch x86: Sync the msr-index.h copy with the kernel sources	Arnaldo Carvalho de Melo
	To pick up the changes in: 7e5b3c267d25 ("x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation") Addressing these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h With this one will be able to use these new AMD MSRs in filters, by name, e.g.: # perf trace -e msr:* --filter "msr==IA32_MCU_OPT_CTRL" ^C# Using -v we can see how it sets up the tracepoint filters, converting from the string in the filter to the numeric value: # perf trace -v -e msr:* --filter "msr==IA32_MCU_OPT_CTRL" Using CPUID GenuineIntel-6-8E-A 0x123 New filter for msr:read_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344) 0x123 New filter for msr:write_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344) 0x123 New filter for msr:rdpmc: (msr==0x123) && (common_pid != 335 && common_pid != 30344) mmap size 528384B ^C# The updating process shows how this affects tooling in more detail: $ diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h --- tools/arch/x86/include/asm/msr-index.h 2020-06-03 10:36:09.959910238 -0300 +++ arch/x86/include/asm/msr-index.h 2020-06-17 10:04:20.235052901 -0300 @@ -128,6 +128,10 @@ #define TSX_CTRL_RTM_DISABLE BIT(0) /* Disable RTM feature / #define TSX_CTRL_CPUID_CLEAR BIT(1) / Disable TSX enumeration / +/ SRBDS support */ +#define MSR_IA32_MCU_OPT_CTRL 0x00000123 +#define RNGDS_MITG_DIS BIT(0) + #define MSR_IA32_SYSENTER_CS 0x00000174 #define MSR_IA32_SYSENTER_ESP 0x00000175 #define MSR_IA32_SYSENTER_EIP 0x00000176 $ set -o vi $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2020-06-17 10:05:49.653114752 -0300 +++ after 2020-06-17 10:06:01.777258731 -0300 @@ -51,6 +51,7 @@ [0x0000011e] = "IA32_BBL_CR_CTL3", [0x00000120] = "IDT_MCR_CTRL", [0x00000122] = "IA32_TSX_CTRL", + [0x00000123] = "IA32_MCU_OPT_CTRL", [0x00000140] = "MISC_FEATURES_ENABLES", [0x00000174] = "IA32_SYSENTER_CS", [0x00000175] = "IA32_SYSENTER_ESP", $ The related change to cpu-features.h affects this: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o This shouldn't be affecting that 'perf bench' entry: $ find tools/perf/ -type f \| xargs grep SRBDS $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Gross <mgross@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17	Merge remote-tracking branch 'torvalds/master' into perf/urgent	Arnaldo Carvalho de Melo
	To get some newer headers that got out of sync with the copies in tools/ so that we can try to have the tools/perf/ build clean for v5.8 with fewer pull requests. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17	perf script: Initialize zstd_data	Milian Wolff
	Fixes segmentation fault when trying to interpret zstd-compressed data with perf script: ``` $ perf record -z ls ... [ perf record: Captured and wrote 0,010 MB perf.data, compressed (original 0,001 MB, ratio is 2,190) ] $ memcheck perf script ... ==67911== Invalid read of size 4 ==67911== at 0x5568188: ZSTD_decompressStream (in /usr/lib/libzstd.so.1.4.5) ==67911== by 0x6E726B: zstd_decompress_stream (zstd.c:100) ==67911== by 0x65729C: perf_session__process_compressed_event (session.c:72) ==67911== by 0x6598E8: perf_session__process_user_event (session.c:1583) ==67911== by 0x65BA59: reader__process_events (session.c:2177) ==67911== by 0x65BA59: __perf_session__process_events (session.c:2234) ==67911== by 0x65BA59: perf_session__process_events (session.c:2267) ==67911== by 0x5A7397: __cmd_script (builtin-script.c:2447) ==67911== by 0x5A7397: cmd_script (builtin-script.c:3840) ==67911== by 0x5FE9D2: run_builtin (perf.c:312) ==67911== by 0x711627: handle_internal_command (perf.c:364) ==67911== by 0x711627: run_argv (perf.c:408) ==67911== by 0x711627: main (perf.c:538) ==67911== Address 0x71d8 is not stack'd, malloc'd or (recently) free'd ``` Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> LPU-Reference: 20200612230333.72140-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17	regmap: Fix memory leak from regmap_register_patch	Charles Keepax
	When a register patch is registered the reg_sequence is copied but the memory allocated is never freed. Add a kfree in regmap_exit to clean it up. Fixes: 22f0d90a3482 ("regmap: Support register patch sets") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://lore.kernel.org/r/20200617152129.19655-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-17	tools, bpftool: Add ringbuf map type to map command docs	Tobias Klauser
	Commit c34a06c56df7 ("tools/bpftool: Add ringbuf map to a list of known map types") added the symbolic "ringbuf" name. Document it in the bpftool map command docs and usage as well. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616113303.8123-1-tklauser@distanz.ch
2020-06-17	bpf: bpf_probe_read_kernel_str() has to return amount of data read on success	Andrii Nakryiko
	During recent refactorings, bpf_probe_read_kernel_str() started returning 0 on success, instead of amount of data successfully read. This majorly breaks applications relying on bpf_probe_read_kernel_str() and bpf_probe_read_str() and their results. Fix this by returning actual number of bytes read. Fixes: 8d92db5c04d1 ("bpf: rework the compat kernel probe handling") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616050432.1902042-1-andriin@fb.com
2020-06-17	ALSA: hda/realtek: Add mute LED and micmute LED support for HP systems	Kai-Heng Feng
	There are two more HP systems control mute LED from HDA codec and need to expose micmute led class so SoF can control micmute LED. Add quirks to support them. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200617102906.16156-2-kai.heng.feng@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-06-17	blktrace: Avoid sparse warnings when assigning q->blk_trace	Jan Kara
	Mostly for historical reasons, q->blk_trace is assigned through xchg() and cmpxchg() atomic operations. Although this is correct, sparse complains about this because it violates rcu annotations since commit c780e86dd48e ("blktrace: Protect q->blk_trace with RCU") which started to use rcu for accessing q->blk_trace. Furthermore there's no real need for atomic operations anymore since all changes to q->blk_trace happen under q->blk_trace_mutex and since it also makes more sense to check if q->blk_trace is set with the mutex held earlier. So let's just replace xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and rcu_assign_pointer(). This makes the code more efficient and sparse happy. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-17	blktrace: break out of blktrace setup on concurrent calls	Luis Chamberlain
	We use one blktrace per request_queue, that means one per the entire disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1, or just two calls on /dev/vda. We check for concurrent setup only at the very end of the blktrace setup though. If we try to run two concurrent blktraces on the same block device the second one will fail, and the first one seems to go on. However when one tries to kill the first one one will see things like this: The kernel will show these: ``` debugfs: File 'dropped' in directory 'nvme1n1' already present! debugfs: File 'msg' in directory 'nvme1n1' already present! debugfs: File 'trace0' in directory 'nvme1n1' already present! `` And userspace just sees this error message for the second call: ``` blktrace /dev/nvme1n1 BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error ``` The first userspace process #1 will also claim that the files were taken underneath their nose as well. The files are taken away form the first process given that when the second blktrace fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN. This means that even if go-happy process #1 is waiting for blktrace data, we have been asked to take teardown the blktrace. This can easily be reproduced with break-blktrace [0] run_0005.sh test. Just break out early if we know we're already going to fail, this will prevent trying to create the files all over again, which we know still exist. [0] https://github.com/mcgrof/break-blktrace Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-17	powerpc/syscalls: Use the number when building SPU syscall table	Michael Ellerman
	Currently the macro that inserts entries into the SPU syscall table doesn't actually use the "nr" (syscall number) parameter. This does work, but it relies on the exact right number of syscall entries being emitted in order for the syscal numbers to line up with the array entries. If for example we had two entries with the same syscall number we wouldn't get an error, it would just cause all subsequent syscalls to be off by one in the spu_syscall_table. So instead change the macro to assign to the specific entry of the array, meaning any numbering overlap will be caught by the compiler. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200616135617.2937252-1-mpe@ellerman.id.au
2020-06-17	powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()	Mike Rapoport
	The pte_update() implementation for PPC_8xx unfolds page table from the PGD level to access a PMD entry. Since 8xx has only 2-level page table this can be simplified with pmd_off() shortcut. Replace explicit unfolding with pmd_off() and drop defines of pgd_index() and pgd_offset() that are no longer needed. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200615092229.23142-1-rppt@kernel.org
2020-06-17	spi: stm32-qspi: Fix error path in case of -EPROBE_DEFER	Patrice Chotard
	In case of -EPROBE_DEFER, stm32_qspi_release() was called in any case which unregistered driver from pm_runtime framework even if it has not been registered yet to it. This leads to: stm32-qspi 58003000.spi: can't setup spi0.0, status -13 spi_master spi0: spi_device register error /soc/spi@58003000/mx66l51235l@0 spi_master spi0: Failed to create SPI device for /soc/spi@58003000/mx66l51235l@0 stm32-qspi 58003000.spi: can't setup spi0.1, status -13 spi_master spi0: spi_device register error /soc/spi@58003000/mx66l51235l@1 spi_master spi0: Failed to create SPI device for /soc/spi@58003000/mx66l51235l@1 On v5.7 kernel,this issue was not "visible", qspi driver was probed successfully. Fixes: 9d282c17b023 ("spi: stm32-qspi: Add pm_runtime support") Signed-off-by: Patrice Chotard <patrice.chotard@st.com> Link: https://lore.kernel.org/r/20200616113035.4514-1-patrice.chotard@st.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-17	regulator: mt6358: Remove BROKEN dependency	Axel Lin
	The MFD part is merged into v5.8-rc1, thus remove BROKEN dependency. Signed-off-by: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/20200616135030.1163660-1-axel.lin@ingics.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-17	Merge tag 'v5.8-rc1' into regulator-5.8	Mark Brown
	Linux 5.8-rc1
2020-06-17	arm64: bti: Require clang >= 10.0.1 for in-kernel BTI support	Will Deacon
	Unfortunately, most versions of clang that support BTI are capable of miscompiling the kernel when converting a switch statement into a jump table. As an example, attempting to spawn a KVM guest results in a panic: [ 56.253312] Kernel panic - not syncing: bad mode [ 56.253834] CPU: 0 PID: 279 Comm: lkvm Not tainted 5.8.0-rc1 #2 [ 56.254225] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 56.254712] Call trace: [ 56.254952] dump_backtrace+0x0/0x1d4 [ 56.255305] show_stack+0x1c/0x28 [ 56.255647] dump_stack+0xc4/0x128 [ 56.255905] panic+0x16c/0x35c [ 56.256146] bad_el0_sync+0x0/0x58 [ 56.256403] el1_sync_handler+0xb4/0xe0 [ 56.256674] el1_sync+0x7c/0x100 [ 56.256928] kvm_vm_ioctl_check_extension_generic+0x74/0x98 [ 56.257286] __arm64_sys_ioctl+0x94/0xcc [ 56.257569] el0_svc_common+0x9c/0x150 [ 56.257836] do_el0_svc+0x84/0x90 [ 56.258083] el0_sync_handler+0xf8/0x298 [ 56.258361] el0_sync+0x158/0x180 This is because the switch in kvm_vm_ioctl_check_extension_generic() is executed as an indirect branch to tail-call through a jump table: ffff800010032dc8: 3869694c ldrb w12, [x10, x9] ffff800010032dcc: 8b0c096b add x11, x11, x12, lsl #2 ffff800010032dd0: d61f0160 br x11 However, where the target case uses the stack, the landing pad is elided due to the presence of a paciasp instruction: ffff800010032e14: d503233f paciasp ffff800010032e18: a9bf7bfd stp x29, x30, [sp, #-16]! ffff800010032e1c: 910003fd mov x29, sp ffff800010032e20: aa0803e0 mov x0, x8 ffff800010032e24: 940017c0 bl ffff800010038d24 <kvm_vm_ioctl_check_extension> ffff800010032e28: 93407c00 sxtw x0, w0 ffff800010032e2c: a8c17bfd ldp x29, x30, [sp], #16 ffff800010032e30: d50323bf autiasp ffff800010032e34: d65f03c0 ret Unfortunately, this results in a fatal exception because paciasp is compatible only with branch-and-link (call) instructions and not simple indirect branches. A fix is being merged into Clang 10.0.1 so that a 'bti j' instruction is emitted as an explicit landing pad in this situation. Make in-kernel BTI depend on that compiler version when building with clang. Cc: Tom Stellard <tstellar@redhat.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20200615105524.GA2694@willie-the-truck Link: https://lore.kernel.org/r/20200616183630.2445-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-06-17	ALSA: usb-audio: Fix potential use-after-free of streams	Takashi Iwai
	With the recent full-duplex support of implicit feedback streams, an endpoint can be still running after closing the capture stream as long as the playback stream with the sync-endpoint is running. In such a state, the URBs are still be handled and they may call retire_data_urb callback, which tries to transfer the data from the PCM buffer. Since the PCM stream gets closed, this may lead to use-after-free. This patch adds the proper clearance of the callback at stopping the capture stream for addressing the possible UAF above. Fixes: 10ce77e4817f ("ALSA: usb-audio: Add duplex sound support for USB devices using implicit feedback") Link: https://lore.kernel.org/r/20200616120921.12249-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-06-16	overflow.h: Add flex_array_size() helper	Gustavo A. R. Silva
	Add flex_array_size() helper for the calculation of the size, in bytes, of a flexible array member contained within an enclosing structure. Example of usage: struct something { size_t count; struct foo items[]; }; struct something *instance; instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL); instance->count = count; memcpy(instance->items, src, flex_array_size(instance, items, instance->count)); The helper returns SIZE_MAX on overflow instead of wrapping around. Additionally replaces parameter "n" with "count" in struct_size() helper for greater clarity and unification. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20200609012233.GA3371@embeddedor Signed-off-by: Kees Cook <keescook@chromium.org>
2020-06-17	scripts: Fix typo in headers_install.sh	Masanari Iida
	This patch fixes a spelling typo in scripts/headers_install.sh Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-17	kconfig: unify cc-option and as-option	Masahiro Yamada
	cc-option and as-option are almost the same; both pass the flag to $(CC). The main difference is the cc-option stops before the assemble stage (-S option) whereas as-option stops after (-c option). I chose -S because it is slightly faster, but $(cc-option,-gz=zlib) returns a wrong result (https://lkml.org/lkml/2020/6/9/1529). It has been fixed by commit 7b16994437c7 ("Makefile: Improve compressed debug info support detection"), but the assembler should always be invoked for more reliable compiler option tests. However, you cannot simply replace -S with -c because the following code in lib/Kconfig.debug would break: depends on $(cc-option,-gsplit-dwarf) The combination of -c and -gsplit-dwarf does not accept /dev/null as output. $ cat /dev/null \| gcc -gsplit-dwarf -S -x c - -o /dev/null $ echo $? 0 $ cat /dev/null \| gcc -gsplit-dwarf -c -x c - -o /dev/null objcopy: Warning: '/dev/null' is not an ordinary file $ echo $? 1 $ cat /dev/null \| gcc -gsplit-dwarf -c -x c - -o tmp.o $ echo $? 0 There is another flag that creates an separate file based on the object file path: $ cat /dev/null \| gcc -ftest-coverage -c -x c - -o /dev/null <stdin>:1: error: cannot open /dev/null.gcno So, we cannot use /dev/null to sink the output. Align the cc-option implementation with scripts/Kbuild.include. With -c option used in cc-option, as-option is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Will Deacon <will@kernel.org>
2020-06-16	tools/bootconfig: Add testcase for show-command and quotes test	Masami Hiramatsu
	Add testcases for the return value of the command to show bootconfig in initrd, and double/single quotes selecting. Link: http://lkml.kernel.org/r/159230247428.65555.2109472942519215104.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig	Masami Hiramatsu
	Fix bootconfig to return 0 if succeeded to show the bootconfig in initrd. Without this fix, "bootconfig INITRD" command returns !0 even if the command succeeded to show the bootconfig. Link: http://lkml.kernel.org/r/159230246566.65555.11891772258543514487.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	tools/bootconfig: Fix to use correct quotes for value	Masami Hiramatsu
	Fix bootconfig tool to select double or single quotes correctly according to the value. If a bootconfig value includes a double quote character, we must use single-quotes to quote that value. Link: http://lkml.kernel.org/r/159230245697.65555.12444299015852932304.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	proc/bootconfig: Fix to use correct quotes for value	Masami Hiramatsu
	Fix /proc/bootconfig to select double or single quotes corrctly according to the value. If a bootconfig value includes a double quote character, we must use single-quotes to quote that value. This modifies if() condition and blocks for avoiding double-quote in value check in 2 places. Anyway, since xbc_array_for_each_value() can handle the array which has a single node correctly. Thus, if (vnode && xbc_node_is_array(vnode)) { xbc_array_for_each_value(vnode) /* vnode->next != NULL / ... } else { snprintf(val); / val is an empty string if !vnode / } is equivalent to if (vnode) { xbc_array_for_each_value(vnode) / vnode->next can be NULL / ... } else { snprintf(""); / value is always empty */ } Link: http://lkml.kernel.org/r/159230244786.65555.3763894451251622488.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	tracing: Remove unused event variable in tracing_iter_reset	YangHui
	We do not use the event variable, just remove it. Signed-off-by: YangHui <yanghui.def@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	tracing/probe: Fix memleak in fetch_op_data operations	Vamshi K Sthambamkadi
	kmemleak report: [<57dcc2ca>] __kmalloc_track_caller+0x139/0x2b0 [<f1c45d0f>] kstrndup+0x37/0x80 [<f9761eb0>] parse_probe_arg.isra.7+0x3cc/0x630 [<055bf2ba>] traceprobe_parse_probe_arg+0x2f5/0x810 [<655a7766>] trace_kprobe_create+0x2ca/0x950 [<4fc6a02a>] create_or_delete_trace_kprobe+0xf/0x30 [<6d1c8a52>] trace_run_command+0x67/0x80 [<be812cc0>] trace_parse_run_command+0xa7/0x140 [<aecfe401>] probes_write+0x10/0x20 [<2027641c>] __vfs_write+0x30/0x1e0 [<6a4aeee1>] vfs_write+0x96/0x1b0 [<3517fb7d>] ksys_write+0x53/0xc0 [<dad91db7>] __ia32_sys_write+0x15/0x20 [<da347f64>] do_syscall_32_irqs_on+0x3d/0x260 [<fd0b7e7d>] do_fast_syscall_32+0x39/0xb0 [<ea5ae810>] entry_SYSENTER_32+0xaf/0x102 Post parse_probe_arg(), the FETCH_OP_DATA operation type is overwritten to FETCH_OP_ST_STRING, as a result memory is never freed since traceprobe_free_probe_arg() iterates only over SYMBOL and DATA op types Setup fetch string operation correctly after fetch_op_data operation. Link: https://lkml.kernel.org/r/20200615143034.GA1734@cosmos Cc: stable@vger.kernel.org Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	trace: Fix typo in allocate_ftrace_ops()'s comment	Wei Yang
	No functional change, just correct the word. Link: https://lkml.kernel.org/r/20200610033251.31713-1-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	tracing: Make ftrace packed events have align of 1	Steven Rostedt (VMware)
	When using trace-cmd on 5.6-rt for the function graph tracer, the output was corrupted. It gave output like this: funcgraph_entry: func=0xffffffff depth=38982 funcgraph_entry: func=0x1ffffffff depth=16044 funcgraph_exit: func=0xffffffff overrun=0x92539aaf00000000 calltime=0x92539c9900000072 rettime=0x100000072 depth=11084 funcgraph_exit: func=0xffffffff overrun=0x9253946e00000000 calltime=0x92539e2100000072 rettime=0x72 depth=26033702 funcgraph_entry: func=0xffffffff depth=85798 funcgraph_entry: func=0x1ffffffff depth=12044 The reason was because the tracefs/events/ftrace/funcgraph_entry/exit format file was incorrect. The -rt kernel adds more common fields to the trace events. Namely, common_migrate_disable and common_preempt_lazy_count. Each is one byte in size. This changes the alignment of the normal payload. Most events are aligned normally, but the function and function graph events are defined with a "PACKED" macro, that packs their payload. As the offsets displayed in the format files are now calculated by an aligned field, the aligned field for function and function graph events should be 1, not their normal alignment. With aligning of the funcgraph_entry event, the format file has: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned char common_migrate_disable; offset:8; size:1; signed:0; field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0; field:unsigned long func; offset:16; size:8; signed:0; field:int depth; offset:24; size:4; signed:1; But the actual alignment is: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned char common_migrate_disable; offset:8; size:1; signed:0; field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0; field:unsigned long func; offset:12; size:8; signed:0; field:int depth; offset:20; size:4; signed:1; Link: https://lkml.kernel.org/r/20200609220041.2a3b527f@oasis.local.home Cc: stable@vger.kernel.org Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	sample-trace-array: Remove trace_array 'sample-instance'	Kefeng Wang
	Remove trace_array 'sample-instance' if kthread_run fails in sample_trace_array_init(). Link: https://lkml.kernel.org/r/20200609135200.2206726-1-wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org Fixes: 89ed42495ef4a ("tracing: Sample module to demonstrate kernel access to Ftrace instances.") Reviewed-by: Divya Indi <divya.indi@oracle.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	sample-trace-array: Fix sleeping function called from invalid context	Kefeng Wang
	BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5 1 lock held by swapper/5/0: #0: ffff80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, at: call_timer_fn+0x8/0x3e0 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x1a0 show_stack+0x20/0x30 dump_stack+0xe4/0x150 ___might_sleep+0x160/0x200 __might_sleep+0x58/0x90 __mutex_lock+0x64/0x948 mutex_lock_nested+0x3c/0x58 __ftrace_set_clr_event+0x44/0x88 trace_array_set_clr_event+0x24/0x38 mytimer_handler+0x34/0x40 [sample_trace_array] mutex_lock() will be called in interrupt context, using workqueue to fix it. Link: https://lkml.kernel.org/r/20200610011244.2209486-1-wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org Fixes: 89ed42495ef4 ("tracing: Sample module to demonstrate kernel access to Ftrace instances.") Reviewed-by: Divya Indi <divya.indi@oracle.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	kretprobe: Prevent triggering kretprobe from within kprobe_flush_task	Jiri Olsa
	Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave. My test was also able to trigger lockdep output: ============================================ WARNING: possible recursive locking detected 5.6.0-rc6+ #6 Not tainted -------------------------------------------- sched-messaging/2767 is trying to acquire lock: ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0 but task is already holding lock: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(kretprobe_table_locks[i].lock)); lock(&(kretprobe_table_locks[i].lock)); * DEADLOCK * May be due to missing lock nesting notation 1 lock held by sched-messaging/2767: #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 stack backtrace: CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6 Call Trace: dump_stack+0x96/0xe0 __lock_acquire.cold.57+0x173/0x2b7 ? native_queued_spin_lock_slowpath+0x42b/0x9e0 ? lockdep_hardirqs_on+0x590/0x590 ? __lock_acquire+0xf63/0x4030 lock_acquire+0x15a/0x3d0 ? kretprobe_hash_lock+0x52/0xa0 _raw_spin_lock_irqsave+0x36/0x70 ? kretprobe_hash_lock+0x52/0xa0 kretprobe_hash_lock+0x52/0xa0 trampoline_handler+0xf8/0x940 ? kprobe_fault_handler+0x380/0x380 ? find_held_lock+0x3a/0x1c0 kretprobe_trampoline+0x25/0x50 ? lock_acquired+0x392/0xbc0 ? _raw_spin_lock_irqsave+0x50/0x70 ? __get_valid_kprobe+0x1f0/0x1f0 ? _raw_spin_unlock_irqrestore+0x3b/0x40 ? finish_task_switch+0x4b9/0x6d0 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 The code within the kretprobe handler checks for probe reentrancy, so we won't trigger any _raw_spin_lock_irqsave probe in there. The problem is in outside kprobe_flush_task, where we call: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave where _raw_spin_lock_irqsave triggers the kretprobe and installs kretprobe_trampoline handler on _raw_spin_lock_irqsave return. The kretprobe_trampoline handler is then executed with already locked kretprobe_table_locks, and first thing it does is to lock kretprobe_table_locks ;-) the whole lockup path like: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed ---> kretprobe_table_locks locked kretprobe_trampoline trampoline_handler kretprobe_hash_lock(current, &head, &flags); <--- deadlock Adding kprobe_busy_begin/end helpers that mark code with fake probe installed to prevent triggering of another kprobe within this code. Using these helpers in kprobe_flush_task, so the probe recursion protection check is hit and the probe is never set to prevent above lockup. Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2 Fixes: ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking") Cc: Ingo Molnar <mingo@kernel.org> Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	kprobes: Remove redundant arch_disarm_kprobe() call	Masami Hiramatsu
	Fix to remove redundant arch_disarm_kprobe() call in force_unoptimize_kprobe(). This arch_disarm_kprobe() will be invoked if the kprobe is optimized but disabled, but that means the kprobe (optprobe) is unused (and unoptimized) state. In that case, unoptimize_kprobe() puts it in freeing_list and kprobe_optimizer (do_unoptimize_kprobes()) automatically disarm it. Thus this arch_disarm_kprobe() is redundant. Link: http://lkml.kernel.org/r/158927058719.27680.17183632908465341189.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex	Masami Hiramatsu
	In kprobe_optimizer() kick_kprobe_optimizer() is called without kprobe_mutex, but this can race with other caller which is protected by kprobe_mutex. To fix that, expand kprobe_mutex protected area to protect kick_kprobe_optimizer() call. Link: http://lkml.kernel.org/r/158927057586.27680.5036330063955940456.stgit@devnote2 Fixes: cd7ebe2298ff ("kprobes: Use text_poke_smp_batch for optimizing") Cc: Ingo Molnar <mingo@kernel.org> Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ziqian SUN <zsun@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	kprobes: Use non RCU traversal APIs on kprobe_tables if possible	Masami Hiramatsu
	Current kprobes uses RCU traversal APIs on kprobe_tables even if it is safe because kprobe_mutex is locked. Make those traversals to non-RCU APIs where the kprobe_mutex is locked. Link: http://lkml.kernel.org/r/158927056452.27680.9710575332163005121.stgit@devnote2 Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	kprobes: Suppress the suspicious RCU warning on kprobes	Masami Hiramatsu
	Anders reported that the lockdep warns that suspicious RCU list usage in register_kprobe() (detected by CONFIG_PROVE_RCU_LIST.) This is because get_kprobe() access kprobe_table[] by hlist_for_each_entry_rcu() without rcu_read_lock. If we call get_kprobe() from the breakpoint handler context, it is run with preempt disabled, so this is not a problem. But in other cases, instead of rcu_read_lock(), we locks kprobe_mutex so that the kprobe_table[] is not updated. So, current code is safe, but still not good from the view point of RCU. Joel suggested that we can silent that warning by passing lockdep_is_held() to the last argument of hlist_for_each_entry_rcu(). Add lockdep_is_held(&kprobe_mutex) at the end of the hlist_for_each_entry_rcu() to suppress the warning. Link: http://lkml.kernel.org/r/158927055350.27680.10261450713467997503.stgit@devnote2 Reported-by: Anders Roxell <anders.roxell@linaro.org> Suggested-by: Joel Fernandes <joel@joelfernandes.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16	recordmcount: support >64k sections	Sami Tolvanen
	When compiling a kernel with Clang and LTO, we need to run recordmcount on vmlinux.o with a large number of sections, which currently fails as the program doesn't understand extended section indexes. This change adds support for processing binaries with >64k sections. Link: https://lkml.kernel.org/r/20200424193046.160744-1-samitolvanen@google.com Link: https://lore.kernel.org/lkml/CAK7LNARbZhoaA=Nnuw0=gBrkuKbr_4Ng_Ei57uafujZf7Xazgw@mail.gmail.com/ Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Matt Helsley <mhelsley@vmware.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-17	kbuild: improve cc-option to clean up all temporary files	Masahiro Yamada
	When cc-option and friends evaluate compiler flags, the temporary file $$TMP is created as an output object, and automatically cleaned up. The actual file path of $$TMP is .<pid>.tmp, here <pid> is the process ID of $(shell ...) invoked from cc-option. (Please note $$$$ is the escape sequence of $$). Such garbage files are cleaned up in most cases, but some compiler flags create additional output files. For example, -gsplit-dwarf creates a .dwo file. When CONFIG_DEBUG_INFO_SPLIT=y, you will see a bunch of .<pid>.dwo files left in the top of build directories. You may not notice them unless you do 'ls -a', but the garbage files will increase every time you run 'make'. This commit changes the temporary object path to .tmp_<pid>/tmp, and removes .tmp_<pid> directory when exiting. Separate build artifacts such as *.dwo will be cleaned up all together because their file paths are usually determined based on the base name of the object. Another example is -ftest-coverage, which outputs the coverage data into <base-name-of-object>.gcno Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-16	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) Don't get per-cpu pointer with preemption enabled in nft_set_pipapo, fix from Stefano Brivio. 2) Fix memory leak in ctnetlink, from Pablo Neira Ayuso. 3) Multiple definitions of MPTCP_PM_MAX_ADDR, from Geliang Tang. 4) Accidently disabling NAPI in non-error paths of macb_open(), from Charles Keepax. 5) Fix races between alx_stop and alx_remove, from Zekun Shen. 6) We forget to re-enable SRIOV during resume in bnxt_en driver, from Michael Chan. 7) Fix memory leak in ipv6_mc_destroy_dev(), from Wang Hai. 8) rxtx stats use wrong index in mvpp2 driver, from Sven Auhagen. 9) Fix memory leak in mptcp_subflow_create_socket error path, from Wei Yongjun. 10) We should not adjust the TCP window advertised when sending dup acks in non-SACK mode, because it won't be counted as a dup by the sender if the window size changes. From Eric Dumazet. 11) Destroy the right number of queues during remove in mvpp2 driver, from Sven Auhagen. 12) Various WOL and PM fixes to e1000 driver, from Chen Yu, Vaibhav Gupta, and Arnd Bergmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) e1000e: fix unused-function warning e1000: use generic power management e1000e: Do not wake up the system via WOL if device wakeup is disabled lan743x: add MODULE_DEVICE_TABLE for module loading alias mlxsw: spectrum: Adjust headroom buffers for 8x ports bareudp: Fixed configuration to avoid having garbage values mvpp2: remove module bugfix tcp: grow window for OOO packets only for SACK flows mptcp: fix memory leak in mptcp_subflow_create_socket() netfilter: flowtable: Make nf_flow_table_offload_add/del_cb inline net/sched: act_ct: Make tcf_ct_flow_table_restore_skb inline net: dsa: sja1105: fix PTP timestamping with large tc-taprio cycles mvpp2: ethtool rxtx stats fix MAINTAINERS: switch to my private email for Renesas Ethernet drivers rocker: fix incorrect error handling in dma_rings_init test_objagg: Fix potential memory leak in error handling net: ethernet: mtk-star-emac: simplify interrupt handling mld: fix memory leak in ipv6_mc_destroy_dev() bnxt_en: Return from timer if interface is not in open state. bnxt_en: Fix AER reset logic on 57500 chips. ...
2020-06-16	Merge tag 'afs-fixes-20200616' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "I've managed to get xfstests kind of working with afs. Here are a set of patches that fix most of the bugs found. There are a number of primary issues: - Incorrect handling of mtime and non-handling of ctime. It might be argued, that the latter isn't a bug since the AFS protocol doesn't support ctime, but I should probably still update it locally. - Shared-write mmap, truncate and writeback bugs. This includes not changing i_size under the callback lock, overwriting local i_size with the reply from the server after a partial writeback, not limiting the writeback from an mmapped page to EOF. - Checks for an abort code indicating that the primary vnode in an operation was deleted by a third-party are done in the wrong place. - Silly rename bugs. This includes an incomplete conversion to the new operation handling, duplicate nlink handling, nlink changing not being done inside the callback lock and insufficient handling of third-party conflicting directory changes. And some secondary ones: - The UAEOVERFLOW abort code should map to EOVERFLOW not EREMOTEIO. - Remove a couple of unused or incompletely used bits. - Remove a couple of redundant success checks. These seem to fix all the data-corruption bugs found by ./check -afs -g quick along with the obvious silly rename bugs and time bugs. There are still some test failures, but they seem to fall into two classes: firstly, the authentication/security model is different to the standard UNIX model and permission is arbitrated by the server and cached locally; and secondly, there are a number of features that AFS does not support (such as mknod). But in these cases, the tests themselves need to be adapted or skipped. Using the in-kernel afs client with xfstests also found a bug in the AuriStor AFS server that has been fixed for a future release" * tag 'afs-fixes-20200616' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix silly rename afs: afs_vnode_commit_status() doesn't need to check the RPC error afs: Fix use of afs_check_for_remote_deletion() afs: Remove afs_operation::abort_code afs: Fix yfs_fs_fetch_status() to honour vnode selector afs: Remove yfs_fs_fetch_file_status() as it's not used afs: Fix the mapping of the UAEOVERFLOW abort code afs: Fix truncation issues and mmap writeback size afs: Concoct ctimes afs: Fix EOF corruption afs: afs_write_end() should change i_size under the right lock afs: Fix non-setting of mtime when writing into mmap