summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-06-17maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofaultChristoph Hellwig
Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofaultChristoph Hellwig
Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17bpf: Document optval > PAGE_SIZE behavior for sockopt hooksStanislav Fomichev
Extend existing doc with more details about requiring ctx->optlen = 0 for handling optval > PAGE_SIZE. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200617010416.93086-3-sdf@google.com
2020-06-17selftests/bpf: Make sure optvals > PAGE_SIZE are bypassedStanislav Fomichev
We are relying on the fact, that we can pass > sizeof(int) optvals to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes). In the BPF program we check that we can only touch PAGE_SIZE bytes, but the real optlen is PAGE_SIZE * 2. In both cases, we override it to some predefined value and trim the optlen. Also, let's modify exiting IP_TOS usecase to test optlen=0 case where BPF program just bypasses the data as is. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200617010416.93086-2-sdf@google.com
2020-06-17bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZEStanislav Fomichev
Attaching to these hooks can break iptables because its optval is usually quite big, or at least bigger than the current PAGE_SIZE limit. David also mentioned some SCTP options can be big (around 256k). For such optvals we expose only the first PAGE_SIZE bytes to the BPF program. BPF program has two options: 1. Set ctx->optlen to 0 to indicate that the BPF's optval should be ignored and the kernel should use original userspace value. 2. Set ctx->optlen to something that's smaller than the PAGE_SIZE. v5: * use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov) * update the docs accordingly v4: * use temporary buffer to avoid optval == optval_end == NULL; this removes the corner case in the verifier that might assume non-zero PTR_TO_PACKET/PTR_TO_PACKET_END. v3: * don't increase the limit, bypass the argument v2: * proper comments formatting (Jakub Kicinski) Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: David Laight <David.Laight@ACULAB.COM> Link: https://lore.kernel.org/bpf/20200617010416.93086-1-sdf@google.com
2020-06-17devmap: Use bpf_map_area_alloc() for allocating hash bucketsToke Høiland-Jørgensen
Syzkaller discovered that creating a hash of type devmap_hash with a large number of entries can hit the memory allocator limit for allocating contiguous memory regions. There's really no reason to use kmalloc_array() directly in the devmap code, so just switch it to the existing bpf_map_area_alloc() function that is used elsewhere. Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616142829.114173-1-toke@redhat.com
2020-06-17xdp: Handle frame_sz in xdp_convert_zc_to_xdp_frame()Hangbin Liu
In commit 34cc0b338a61 we only handled the frame_sz in convert_to_xdp_frame(). This patch will also handle frame_sz in xdp_convert_zc_to_xdp_frame(). Fixes: 34cc0b338a61 ("xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616103518.2963410-1-liuhangbin@gmail.com
2020-06-17dm ioctl: use struct_size() helper in retrieve_deps()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct dm_target_deps { ... __u64 dev[0]; /* out */ }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-17dm writecache: skip writecache_wait when using pmem modeHuaisheng Ye
The array bio_in_progress is only used with ssd mode. So skip writecache_wait_for_ios in writecache_discard when pmem mode. Signed-off-by: Huaisheng Ye <yehs1@lenovo.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-17dm writecache: correct uncommitted_block when discarding uncommitted entryHuaisheng Ye
When uncommitted entry has been discarded, correct wc->uncommitted_block for getting the exact number. Fixes: 48debafe4f2fe ("dm: add writecache target") Cc: stable@vger.kernel.org Signed-off-by: Huaisheng Ye <yehs1@lenovo.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-17dm zoned: assign max_io_len correctlyHou Tao
The unit of max_io_len is sector instead of byte (spotted through code review), so fix it. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-17tools headers UAPI: Sync linux/fs.h with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: b383a73f2b83 ("fs/ext4: Introduce DAX inode flag") And silence this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h' diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h It causes various beautifiers for things like fspick, fsmount, etc (see below) to get rebuilt, but this specific change doesn't make 'perf trace' be capable of decoding anything new, as we still don't decode what comes from ioctls, just its cmds. Details about the update: $ cp include/uapi/linux/fs.h tools/include/uapi/linux/fs.h $ git diff diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h index 379a612f8f1d..f44eb0a04afd 100644 --- a/tools/include/uapi/linux/fs.h +++ b/tools/include/uapi/linux/fs.h @@ -262,6 +262,7 @@ struct fsxattr { #define FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */ #define FS_EOFBLOCKS_FL 0x00400000 /* Reserved for ext4 */ #define FS_NOCOW_FL 0x00800000 /* Do not cow file */ +#define FS_DAX_FL 0x02000000 /* Inode is DAX */ #define FS_INLINE_DATA_FL 0x10000000 /* Reserved for ext4 */ #define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ #define FS_CASEFOLD_FL 0x40000000 /* Folder is case insensitive */ $ m make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j8' parallel build INSTALL GTK UI CC /tmp/build/perf/builtin-trace.o DESCEND plugins CC /tmp/build/perf/trace/beauty/fsmount.o CC /tmp/build/perf/trace/beauty/fspick.o CC /tmp/build/perf/trace/beauty/mount_flags.o CC /tmp/build/perf/trace/beauty/move_mount.o CC /tmp/build/perf/trace/beauty/renameat.o CC /tmp/build/perf/trace/beauty/sync_file_range.o INSTALL trace_plugins LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf <SNIP> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17tools include UAPI: Sync linux/vhost.h with the kernel sourcesArnaldo Carvalho de Melo
To get the changes in: 776f395004d8 ("vhost_vdpa: Support config interrupt in vdpa") Silencing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h' diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h This automatically picks the new ioctl introduced in the above patch, making tools such as 'perf trace' aware of them and possibly allowing to use the strings in filters, etc: # perf trace -e ioctl --pid 7951 <SNIP> 0.178 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 0.194 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 0.209 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 0.224 (249.413 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.660 ( 0.011 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.675 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.686 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.697 ( 0.008 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.709 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.720 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.730 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.740 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.752 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.762 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.772 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 249.782 (120.138 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 370.201 ( 0.039 ms): CPU 0/KVM/8023 ioctl(fd: 12, cmd: KVM_IRQ_LINE_STATUS, arg: 0x7f744f9e1420) = 0 370.254 ( 0.052 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 370.575 ( 0.365 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 370.973 ( 0.028 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 371.015 ( 0.037 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0 371.071 ( 0.009 ms): CPU 0/KVM/8023 ioctl(fd: 12, cmd: KVM_IRQ_LINE_STATUS, arg: 0x7f744f9e14b0) = 0 <SNIP> # Details about the update: $ diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h --- tools/include/uapi/linux/vhost.h 2020-04-16 13:19:12.056763843 -0300 +++ include/uapi/linux/vhost.h 2020-06-17 10:04:20.532056428 -0300 @@ -15,6 +15,8 @@ #include <linux/types.h> #include <linux/ioctl.h> +#define VHOST_FILE_UNBIND -1 + /* ioctls */ #define VHOST_VIRTIO 0xAF @@ -140,4 +142,6 @@ /* Get the max ring size. */ #define VHOST_VDPA_GET_VRING_NUM _IOR(VHOST_VIRTIO, 0x76, __u16) +/* Set event fd for config interrupt*/ +#define VHOST_VDPA_SET_CONFIG_CALL _IOW(VHOST_VIRTIO, 0x77, int) #endif $ $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2020-06-17 10:15:35.123275966 -0300 +++ after 2020-06-17 10:15:51.812482117 -0300 @@ -27,6 +27,7 @@ [0x72] = "VDPA_SET_STATUS", [0x74] = "VDPA_SET_CONFIG", [0x75] = "VDPA_SET_VRING_ENABLE", + [0x77] = "VDPA_SET_CONFIG_CALL", }; static const char *vhost_virtio_ioctl_read_cmds[] = { [0x00] = "GET_FEATURES", $ This causes these parts to get rebuilt: CC /tmp/build/perf/trace/beauty/ioctl.o INSTALL trace_plugins LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zhu Lingshan <lingshan.zhu@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in: 7e5b3c267d25 ("x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation") Addressing these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h With this one will be able to use these new AMD MSRs in filters, by name, e.g.: # perf trace -e msr:* --filter "msr==IA32_MCU_OPT_CTRL" ^C# Using -v we can see how it sets up the tracepoint filters, converting from the string in the filter to the numeric value: # perf trace -v -e msr:* --filter "msr==IA32_MCU_OPT_CTRL" Using CPUID GenuineIntel-6-8E-A 0x123 New filter for msr:read_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344) 0x123 New filter for msr:write_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344) 0x123 New filter for msr:rdpmc: (msr==0x123) && (common_pid != 335 && common_pid != 30344) mmap size 528384B ^C# The updating process shows how this affects tooling in more detail: $ diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h --- tools/arch/x86/include/asm/msr-index.h 2020-06-03 10:36:09.959910238 -0300 +++ arch/x86/include/asm/msr-index.h 2020-06-17 10:04:20.235052901 -0300 @@ -128,6 +128,10 @@ #define TSX_CTRL_RTM_DISABLE BIT(0) /* Disable RTM feature */ #define TSX_CTRL_CPUID_CLEAR BIT(1) /* Disable TSX enumeration */ +/* SRBDS support */ +#define MSR_IA32_MCU_OPT_CTRL 0x00000123 +#define RNGDS_MITG_DIS BIT(0) + #define MSR_IA32_SYSENTER_CS 0x00000174 #define MSR_IA32_SYSENTER_ESP 0x00000175 #define MSR_IA32_SYSENTER_EIP 0x00000176 $ set -o vi $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2020-06-17 10:05:49.653114752 -0300 +++ after 2020-06-17 10:06:01.777258731 -0300 @@ -51,6 +51,7 @@ [0x0000011e] = "IA32_BBL_CR_CTL3", [0x00000120] = "IDT_MCR_CTRL", [0x00000122] = "IA32_TSX_CTRL", + [0x00000123] = "IA32_MCU_OPT_CTRL", [0x00000140] = "MISC_FEATURES_ENABLES", [0x00000174] = "IA32_SYSENTER_CS", [0x00000175] = "IA32_SYSENTER_ESP", $ The related change to cpu-features.h affects this: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o This shouldn't be affecting that 'perf bench' entry: $ find tools/perf/ -type f | xargs grep SRBDS $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Gross <mgross@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17Merge remote-tracking branch 'torvalds/master' into perf/urgentArnaldo Carvalho de Melo
To get some newer headers that got out of sync with the copies in tools/ so that we can try to have the tools/perf/ build clean for v5.8 with fewer pull requests. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17perf script: Initialize zstd_dataMilian Wolff
Fixes segmentation fault when trying to interpret zstd-compressed data with perf script: ``` $ perf record -z ls ... [ perf record: Captured and wrote 0,010 MB perf.data, compressed (original 0,001 MB, ratio is 2,190) ] $ memcheck perf script ... ==67911== Invalid read of size 4 ==67911== at 0x5568188: ZSTD_decompressStream (in /usr/lib/libzstd.so.1.4.5) ==67911== by 0x6E726B: zstd_decompress_stream (zstd.c:100) ==67911== by 0x65729C: perf_session__process_compressed_event (session.c:72) ==67911== by 0x6598E8: perf_session__process_user_event (session.c:1583) ==67911== by 0x65BA59: reader__process_events (session.c:2177) ==67911== by 0x65BA59: __perf_session__process_events (session.c:2234) ==67911== by 0x65BA59: perf_session__process_events (session.c:2267) ==67911== by 0x5A7397: __cmd_script (builtin-script.c:2447) ==67911== by 0x5A7397: cmd_script (builtin-script.c:3840) ==67911== by 0x5FE9D2: run_builtin (perf.c:312) ==67911== by 0x711627: handle_internal_command (perf.c:364) ==67911== by 0x711627: run_argv (perf.c:408) ==67911== by 0x711627: main (perf.c:538) ==67911== Address 0x71d8 is not stack'd, malloc'd or (recently) free'd ``` Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> LPU-Reference: 20200612230333.72140-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-17dm zoned: fix uninitialized pointer dereferenceDamien Le Moal
Make sure that the local variable rzone in dmz_do_reclaim() is always initialized before being used for printing debug messages. Fixes: f97809aec589 ("dm zoned: per-device reclaim") Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-17regmap: Fix memory leak from regmap_register_patchCharles Keepax
When a register patch is registered the reg_sequence is copied but the memory allocated is never freed. Add a kfree in regmap_exit to clean it up. Fixes: 22f0d90a3482 ("regmap: Support register patch sets") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://lore.kernel.org/r/20200617152129.19655-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-17tools, bpftool: Add ringbuf map type to map command docsTobias Klauser
Commit c34a06c56df7 ("tools/bpftool: Add ringbuf map to a list of known map types") added the symbolic "ringbuf" name. Document it in the bpftool map command docs and usage as well. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616113303.8123-1-tklauser@distanz.ch
2020-06-17bpf: bpf_probe_read_kernel_str() has to return amount of data read on successAndrii Nakryiko
During recent refactorings, bpf_probe_read_kernel_str() started returning 0 on success, instead of amount of data successfully read. This majorly breaks applications relying on bpf_probe_read_kernel_str() and bpf_probe_read_str() and their results. Fix this by returning actual number of bytes read. Fixes: 8d92db5c04d1 ("bpf: rework the compat kernel probe handling") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200616050432.1902042-1-andriin@fb.com
2020-06-17ALSA: hda/realtek: Add mute LED and micmute LED support for HP systemsKai-Heng Feng
There are two more HP systems control mute LED from HDA codec and need to expose micmute led class so SoF can control micmute LED. Add quirks to support them. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200617102906.16156-2-kai.heng.feng@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-06-17blktrace: Avoid sparse warnings when assigning q->blk_traceJan Kara
Mostly for historical reasons, q->blk_trace is assigned through xchg() and cmpxchg() atomic operations. Although this is correct, sparse complains about this because it violates rcu annotations since commit c780e86dd48e ("blktrace: Protect q->blk_trace with RCU") which started to use rcu for accessing q->blk_trace. Furthermore there's no real need for atomic operations anymore since all changes to q->blk_trace happen under q->blk_trace_mutex and since it also makes more sense to check if q->blk_trace is set with the mutex held earlier. So let's just replace xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and rcu_assign_pointer(). This makes the code more efficient and sparse happy. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-17blktrace: break out of blktrace setup on concurrent callsLuis Chamberlain
We use one blktrace per request_queue, that means one per the entire disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1, or just two calls on /dev/vda. We check for concurrent setup only at the very end of the blktrace setup though. If we try to run two concurrent blktraces on the same block device the second one will fail, and the first one seems to go on. However when one tries to kill the first one one will see things like this: The kernel will show these: ``` debugfs: File 'dropped' in directory 'nvme1n1' already present! debugfs: File 'msg' in directory 'nvme1n1' already present! debugfs: File 'trace0' in directory 'nvme1n1' already present! `` And userspace just sees this error message for the second call: ``` blktrace /dev/nvme1n1 BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error ``` The first userspace process #1 will also claim that the files were taken underneath their nose as well. The files are taken away form the first process given that when the second blktrace fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN. This means that even if go-happy process #1 is waiting for blktrace data, we *have* been asked to take teardown the blktrace. This can easily be reproduced with break-blktrace [0] run_0005.sh test. Just break out early if we know we're already going to fail, this will prevent trying to create the files all over again, which we know still exist. [0] https://github.com/mcgrof/break-blktrace Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-17powerpc/syscalls: Use the number when building SPU syscall tableMichael Ellerman
Currently the macro that inserts entries into the SPU syscall table doesn't actually use the "nr" (syscall number) parameter. This does work, but it relies on the exact right number of syscall entries being emitted in order for the syscal numbers to line up with the array entries. If for example we had two entries with the same syscall number we wouldn't get an error, it would just cause all subsequent syscalls to be off by one in the spu_syscall_table. So instead change the macro to assign to the specific entry of the array, meaning any numbering overlap will be caught by the compiler. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200616135617.2937252-1-mpe@ellerman.id.au
2020-06-17powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()Mike Rapoport
The pte_update() implementation for PPC_8xx unfolds page table from the PGD level to access a PMD entry. Since 8xx has only 2-level page table this can be simplified with pmd_off() shortcut. Replace explicit unfolding with pmd_off() and drop defines of pgd_index() and pgd_offset() that are no longer needed. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200615092229.23142-1-rppt@kernel.org
2020-06-17spi: stm32-qspi: Fix error path in case of -EPROBE_DEFERPatrice Chotard
In case of -EPROBE_DEFER, stm32_qspi_release() was called in any case which unregistered driver from pm_runtime framework even if it has not been registered yet to it. This leads to: stm32-qspi 58003000.spi: can't setup spi0.0, status -13 spi_master spi0: spi_device register error /soc/spi@58003000/mx66l51235l@0 spi_master spi0: Failed to create SPI device for /soc/spi@58003000/mx66l51235l@0 stm32-qspi 58003000.spi: can't setup spi0.1, status -13 spi_master spi0: spi_device register error /soc/spi@58003000/mx66l51235l@1 spi_master spi0: Failed to create SPI device for /soc/spi@58003000/mx66l51235l@1 On v5.7 kernel,this issue was not "visible", qspi driver was probed successfully. Fixes: 9d282c17b023 ("spi: stm32-qspi: Add pm_runtime support") Signed-off-by: Patrice Chotard <patrice.chotard@st.com> Link: https://lore.kernel.org/r/20200616113035.4514-1-patrice.chotard@st.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-17regulator: mt6358: Remove BROKEN dependencyAxel Lin
The MFD part is merged into v5.8-rc1, thus remove BROKEN dependency. Signed-off-by: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/20200616135030.1163660-1-axel.lin@ingics.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-06-17Merge tag 'v5.8-rc1' into regulator-5.8Mark Brown
Linux 5.8-rc1
2020-06-17arm64: bti: Require clang >= 10.0.1 for in-kernel BTI supportWill Deacon
Unfortunately, most versions of clang that support BTI are capable of miscompiling the kernel when converting a switch statement into a jump table. As an example, attempting to spawn a KVM guest results in a panic: [ 56.253312] Kernel panic - not syncing: bad mode [ 56.253834] CPU: 0 PID: 279 Comm: lkvm Not tainted 5.8.0-rc1 #2 [ 56.254225] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 56.254712] Call trace: [ 56.254952] dump_backtrace+0x0/0x1d4 [ 56.255305] show_stack+0x1c/0x28 [ 56.255647] dump_stack+0xc4/0x128 [ 56.255905] panic+0x16c/0x35c [ 56.256146] bad_el0_sync+0x0/0x58 [ 56.256403] el1_sync_handler+0xb4/0xe0 [ 56.256674] el1_sync+0x7c/0x100 [ 56.256928] kvm_vm_ioctl_check_extension_generic+0x74/0x98 [ 56.257286] __arm64_sys_ioctl+0x94/0xcc [ 56.257569] el0_svc_common+0x9c/0x150 [ 56.257836] do_el0_svc+0x84/0x90 [ 56.258083] el0_sync_handler+0xf8/0x298 [ 56.258361] el0_sync+0x158/0x180 This is because the switch in kvm_vm_ioctl_check_extension_generic() is executed as an indirect branch to tail-call through a jump table: ffff800010032dc8: 3869694c ldrb w12, [x10, x9] ffff800010032dcc: 8b0c096b add x11, x11, x12, lsl #2 ffff800010032dd0: d61f0160 br x11 However, where the target case uses the stack, the landing pad is elided due to the presence of a paciasp instruction: ffff800010032e14: d503233f paciasp ffff800010032e18: a9bf7bfd stp x29, x30, [sp, #-16]! ffff800010032e1c: 910003fd mov x29, sp ffff800010032e20: aa0803e0 mov x0, x8 ffff800010032e24: 940017c0 bl ffff800010038d24 <kvm_vm_ioctl_check_extension> ffff800010032e28: 93407c00 sxtw x0, w0 ffff800010032e2c: a8c17bfd ldp x29, x30, [sp], #16 ffff800010032e30: d50323bf autiasp ffff800010032e34: d65f03c0 ret Unfortunately, this results in a fatal exception because paciasp is compatible only with branch-and-link (call) instructions and not simple indirect branches. A fix is being merged into Clang 10.0.1 so that a 'bti j' instruction is emitted as an explicit landing pad in this situation. Make in-kernel BTI depend on that compiler version when building with clang. Cc: Tom Stellard <tstellar@redhat.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20200615105524.GA2694@willie-the-truck Link: https://lore.kernel.org/r/20200616183630.2445-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-06-17ALSA: usb-audio: Fix potential use-after-free of streamsTakashi Iwai
With the recent full-duplex support of implicit feedback streams, an endpoint can be still running after closing the capture stream as long as the playback stream with the sync-endpoint is running. In such a state, the URBs are still be handled and they may call retire_data_urb callback, which tries to transfer the data from the PCM buffer. Since the PCM stream gets closed, this may lead to use-after-free. This patch adds the proper clearance of the callback at stopping the capture stream for addressing the possible UAF above. Fixes: 10ce77e4817f ("ALSA: usb-audio: Add duplex sound support for USB devices using implicit feedback") Link: https://lore.kernel.org/r/20200616120921.12249-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-06-17dma-direct: check return value when encrypting or decrypting memoryDavid Rientjes
__change_page_attr() can fail which will cause set_memory_encrypted() and set_memory_decrypted() to return non-zero. If the device requires unencrypted DMA memory and decryption fails, simply free the memory and fail. If attempting to re-encrypt in the failure path and that encryption fails, there is no alternative other than to leak the memory. Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent buffers in common code") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-17dma-direct: re-encrypt memory if dma_direct_alloc_pages() failsDavid Rientjes
If arch_dma_set_uncached() fails after memory has been decrypted, it needs to be re-encrypted before freeing. Fixes: fa7e2247c572 ("dma-direct: make uncached_kernel_address more general") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-17dma-direct: always align allocation size in dma_direct_alloc_pages()David Rientjes
dma_alloc_contiguous() does size >> PAGE_SHIFT and set_memory_decrypted() works at page granularity. It's necessary to page align the allocation size in dma_direct_alloc_pages() for consistent behavior. This also fixes an issue when arch_dma_prep_coherent() is called on an unaligned allocation size for dma_alloc_need_uncached() when CONFIG_DMA_DIRECT_REMAP is disabled but CONFIG_ARCH_HAS_DMA_SET_UNCACHED is enabled. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-17dma-direct: mark __dma_direct_alloc_pages staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-06-17dma-direct: re-enable mmap for !CONFIG_MMUChristoph Hellwig
nommu configfs can trivially map the coherent allocations to user space, as no actual page table setup is required and the kernel and the user space programs share the same address space. Fixes: 62fcee9a3bd7 ("dma-mapping: remove CONFIG_ARCH_NO_COHERENT_DMA_MMAP") Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: dillon min <dillon.minfei@gmail.com> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: dillon min <dillon.minfei@gmail.com>
2020-06-16overflow.h: Add flex_array_size() helperGustavo A. R. Silva
Add flex_array_size() helper for the calculation of the size, in bytes, of a flexible array member contained within an enclosing structure. Example of usage: struct something { size_t count; struct foo items[]; }; struct something *instance; instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL); instance->count = count; memcpy(instance->items, src, flex_array_size(instance, items, instance->count)); The helper returns SIZE_MAX on overflow instead of wrapping around. Additionally replaces parameter "n" with "count" in struct_size() helper for greater clarity and unification. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20200609012233.GA3371@embeddedor Signed-off-by: Kees Cook <keescook@chromium.org>
2020-06-17scripts: Fix typo in headers_install.shMasanari Iida
This patch fixes a spelling typo in scripts/headers_install.sh Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-17kconfig: unify cc-option and as-optionMasahiro Yamada
cc-option and as-option are almost the same; both pass the flag to $(CC). The main difference is the cc-option stops before the assemble stage (-S option) whereas as-option stops after (-c option). I chose -S because it is slightly faster, but $(cc-option,-gz=zlib) returns a wrong result (https://lkml.org/lkml/2020/6/9/1529). It has been fixed by commit 7b16994437c7 ("Makefile: Improve compressed debug info support detection"), but the assembler should always be invoked for more reliable compiler option tests. However, you cannot simply replace -S with -c because the following code in lib/Kconfig.debug would break: depends on $(cc-option,-gsplit-dwarf) The combination of -c and -gsplit-dwarf does not accept /dev/null as output. $ cat /dev/null | gcc -gsplit-dwarf -S -x c - -o /dev/null $ echo $? 0 $ cat /dev/null | gcc -gsplit-dwarf -c -x c - -o /dev/null objcopy: Warning: '/dev/null' is not an ordinary file $ echo $? 1 $ cat /dev/null | gcc -gsplit-dwarf -c -x c - -o tmp.o $ echo $? 0 There is another flag that creates an separate file based on the object file path: $ cat /dev/null | gcc -ftest-coverage -c -x c - -o /dev/null <stdin>:1: error: cannot open /dev/null.gcno So, we cannot use /dev/null to sink the output. Align the cc-option implementation with scripts/Kbuild.include. With -c option used in cc-option, as-option is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Will Deacon <will@kernel.org>
2020-06-16tools/bootconfig: Add testcase for show-command and quotes testMasami Hiramatsu
Add testcases for the return value of the command to show bootconfig in initrd, and double/single quotes selecting. Link: http://lkml.kernel.org/r/159230247428.65555.2109472942519215104.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16tools/bootconfig: Fix to return 0 if succeeded to show the bootconfigMasami Hiramatsu
Fix bootconfig to return 0 if succeeded to show the bootconfig in initrd. Without this fix, "bootconfig INITRD" command returns !0 even if the command succeeded to show the bootconfig. Link: http://lkml.kernel.org/r/159230246566.65555.11891772258543514487.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16tools/bootconfig: Fix to use correct quotes for valueMasami Hiramatsu
Fix bootconfig tool to select double or single quotes correctly according to the value. If a bootconfig value includes a double quote character, we must use single-quotes to quote that value. Link: http://lkml.kernel.org/r/159230245697.65555.12444299015852932304.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16proc/bootconfig: Fix to use correct quotes for valueMasami Hiramatsu
Fix /proc/bootconfig to select double or single quotes corrctly according to the value. If a bootconfig value includes a double quote character, we must use single-quotes to quote that value. This modifies if() condition and blocks for avoiding double-quote in value check in 2 places. Anyway, since xbc_array_for_each_value() can handle the array which has a single node correctly. Thus, if (vnode && xbc_node_is_array(vnode)) { xbc_array_for_each_value(vnode) /* vnode->next != NULL */ ... } else { snprintf(val); /* val is an empty string if !vnode */ } is equivalent to if (vnode) { xbc_array_for_each_value(vnode) /* vnode->next can be NULL */ ... } else { snprintf(""); /* value is always empty */ } Link: http://lkml.kernel.org/r/159230244786.65555.3763894451251622488.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16tracing: Remove unused event variable in tracing_iter_resetYangHui
We do not use the event variable, just remove it. Signed-off-by: YangHui <yanghui.def@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16tracing/probe: Fix memleak in fetch_op_data operationsVamshi K Sthambamkadi
kmemleak report: [<57dcc2ca>] __kmalloc_track_caller+0x139/0x2b0 [<f1c45d0f>] kstrndup+0x37/0x80 [<f9761eb0>] parse_probe_arg.isra.7+0x3cc/0x630 [<055bf2ba>] traceprobe_parse_probe_arg+0x2f5/0x810 [<655a7766>] trace_kprobe_create+0x2ca/0x950 [<4fc6a02a>] create_or_delete_trace_kprobe+0xf/0x30 [<6d1c8a52>] trace_run_command+0x67/0x80 [<be812cc0>] trace_parse_run_command+0xa7/0x140 [<aecfe401>] probes_write+0x10/0x20 [<2027641c>] __vfs_write+0x30/0x1e0 [<6a4aeee1>] vfs_write+0x96/0x1b0 [<3517fb7d>] ksys_write+0x53/0xc0 [<dad91db7>] __ia32_sys_write+0x15/0x20 [<da347f64>] do_syscall_32_irqs_on+0x3d/0x260 [<fd0b7e7d>] do_fast_syscall_32+0x39/0xb0 [<ea5ae810>] entry_SYSENTER_32+0xaf/0x102 Post parse_probe_arg(), the FETCH_OP_DATA operation type is overwritten to FETCH_OP_ST_STRING, as a result memory is never freed since traceprobe_free_probe_arg() iterates only over SYMBOL and DATA op types Setup fetch string operation correctly after fetch_op_data operation. Link: https://lkml.kernel.org/r/20200615143034.GA1734@cosmos Cc: stable@vger.kernel.org Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16trace: Fix typo in allocate_ftrace_ops()'s commentWei Yang
No functional change, just correct the word. Link: https://lkml.kernel.org/r/20200610033251.31713-1-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16tracing: Make ftrace packed events have align of 1Steven Rostedt (VMware)
When using trace-cmd on 5.6-rt for the function graph tracer, the output was corrupted. It gave output like this: funcgraph_entry: func=0xffffffff depth=38982 funcgraph_entry: func=0x1ffffffff depth=16044 funcgraph_exit: func=0xffffffff overrun=0x92539aaf00000000 calltime=0x92539c9900000072 rettime=0x100000072 depth=11084 funcgraph_exit: func=0xffffffff overrun=0x9253946e00000000 calltime=0x92539e2100000072 rettime=0x72 depth=26033702 funcgraph_entry: func=0xffffffff depth=85798 funcgraph_entry: func=0x1ffffffff depth=12044 The reason was because the tracefs/events/ftrace/funcgraph_entry/exit format file was incorrect. The -rt kernel adds more common fields to the trace events. Namely, common_migrate_disable and common_preempt_lazy_count. Each is one byte in size. This changes the alignment of the normal payload. Most events are aligned normally, but the function and function graph events are defined with a "PACKED" macro, that packs their payload. As the offsets displayed in the format files are now calculated by an aligned field, the aligned field for function and function graph events should be 1, not their normal alignment. With aligning of the funcgraph_entry event, the format file has: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned char common_migrate_disable; offset:8; size:1; signed:0; field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0; field:unsigned long func; offset:16; size:8; signed:0; field:int depth; offset:24; size:4; signed:1; But the actual alignment is: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned char common_migrate_disable; offset:8; size:1; signed:0; field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0; field:unsigned long func; offset:12; size:8; signed:0; field:int depth; offset:20; size:4; signed:1; Link: https://lkml.kernel.org/r/20200609220041.2a3b527f@oasis.local.home Cc: stable@vger.kernel.org Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16sample-trace-array: Remove trace_array 'sample-instance'Kefeng Wang
Remove trace_array 'sample-instance' if kthread_run fails in sample_trace_array_init(). Link: https://lkml.kernel.org/r/20200609135200.2206726-1-wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org Fixes: 89ed42495ef4a ("tracing: Sample module to demonstrate kernel access to Ftrace instances.") Reviewed-by: Divya Indi <divya.indi@oracle.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16sample-trace-array: Fix sleeping function called from invalid contextKefeng Wang
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5 1 lock held by swapper/5/0: #0: ffff80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, at: call_timer_fn+0x8/0x3e0 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x1a0 show_stack+0x20/0x30 dump_stack+0xe4/0x150 ___might_sleep+0x160/0x200 __might_sleep+0x58/0x90 __mutex_lock+0x64/0x948 mutex_lock_nested+0x3c/0x58 __ftrace_set_clr_event+0x44/0x88 trace_array_set_clr_event+0x24/0x38 mytimer_handler+0x34/0x40 [sample_trace_array] mutex_lock() will be called in interrupt context, using workqueue to fix it. Link: https://lkml.kernel.org/r/20200610011244.2209486-1-wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org Fixes: 89ed42495ef4 ("tracing: Sample module to demonstrate kernel access to Ftrace instances.") Reviewed-by: Divya Indi <divya.indi@oracle.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16kretprobe: Prevent triggering kretprobe from within kprobe_flush_taskJiri Olsa
Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave. My test was also able to trigger lockdep output: ============================================ WARNING: possible recursive locking detected 5.6.0-rc6+ #6 Not tainted -------------------------------------------- sched-messaging/2767 is trying to acquire lock: ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0 but task is already holding lock: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(kretprobe_table_locks[i].lock)); lock(&(kretprobe_table_locks[i].lock)); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by sched-messaging/2767: #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 stack backtrace: CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6 Call Trace: dump_stack+0x96/0xe0 __lock_acquire.cold.57+0x173/0x2b7 ? native_queued_spin_lock_slowpath+0x42b/0x9e0 ? lockdep_hardirqs_on+0x590/0x590 ? __lock_acquire+0xf63/0x4030 lock_acquire+0x15a/0x3d0 ? kretprobe_hash_lock+0x52/0xa0 _raw_spin_lock_irqsave+0x36/0x70 ? kretprobe_hash_lock+0x52/0xa0 kretprobe_hash_lock+0x52/0xa0 trampoline_handler+0xf8/0x940 ? kprobe_fault_handler+0x380/0x380 ? find_held_lock+0x3a/0x1c0 kretprobe_trampoline+0x25/0x50 ? lock_acquired+0x392/0xbc0 ? _raw_spin_lock_irqsave+0x50/0x70 ? __get_valid_kprobe+0x1f0/0x1f0 ? _raw_spin_unlock_irqrestore+0x3b/0x40 ? finish_task_switch+0x4b9/0x6d0 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 The code within the kretprobe handler checks for probe reentrancy, so we won't trigger any _raw_spin_lock_irqsave probe in there. The problem is in outside kprobe_flush_task, where we call: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave where _raw_spin_lock_irqsave triggers the kretprobe and installs kretprobe_trampoline handler on _raw_spin_lock_irqsave return. The kretprobe_trampoline handler is then executed with already locked kretprobe_table_locks, and first thing it does is to lock kretprobe_table_locks ;-) the whole lockup path like: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed ---> kretprobe_table_locks locked kretprobe_trampoline trampoline_handler kretprobe_hash_lock(current, &head, &flags); <--- deadlock Adding kprobe_busy_begin/end helpers that mark code with fake probe installed to prevent triggering of another kprobe within this code. Using these helpers in kprobe_flush_task, so the probe recursion protection check is hit and the probe is never set to prevent above lockup. Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2 Fixes: ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking") Cc: Ingo Molnar <mingo@kernel.org> Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16kprobes: Remove redundant arch_disarm_kprobe() callMasami Hiramatsu
Fix to remove redundant arch_disarm_kprobe() call in force_unoptimize_kprobe(). This arch_disarm_kprobe() will be invoked if the kprobe is optimized but disabled, but that means the kprobe (optprobe) is unused (and unoptimized) state. In that case, unoptimize_kprobe() puts it in freeing_list and kprobe_optimizer (do_unoptimize_kprobes()) automatically disarm it. Thus this arch_disarm_kprobe() is redundant. Link: http://lkml.kernel.org/r/158927058719.27680.17183632908465341189.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>