Age | Commit message (Collapse) | Author |
|
Better describe what these functions do.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Better describe what these functions do.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Extend existing doc with more details about requiring ctx->optlen = 0
for handling optval > PAGE_SIZE.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200617010416.93086-3-sdf@google.com
|
|
We are relying on the fact, that we can pass > sizeof(int) optvals
to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes).
In the BPF program we check that we can only touch PAGE_SIZE bytes,
but the real optlen is PAGE_SIZE * 2. In both cases, we override it to
some predefined value and trim the optlen.
Also, let's modify exiting IP_TOS usecase to test optlen=0 case
where BPF program just bypasses the data as is.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200617010416.93086-2-sdf@google.com
|
|
Attaching to these hooks can break iptables because its optval is
usually quite big, or at least bigger than the current PAGE_SIZE limit.
David also mentioned some SCTP options can be big (around 256k).
For such optvals we expose only the first PAGE_SIZE bytes to
the BPF program. BPF program has two options:
1. Set ctx->optlen to 0 to indicate that the BPF's optval
should be ignored and the kernel should use original userspace
value.
2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.
v5:
* use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
* update the docs accordingly
v4:
* use temporary buffer to avoid optval == optval_end == NULL;
this removes the corner case in the verifier that might assume
non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.
v3:
* don't increase the limit, bypass the argument
v2:
* proper comments formatting (Jakub Kicinski)
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: David Laight <David.Laight@ACULAB.COM>
Link: https://lore.kernel.org/bpf/20200617010416.93086-1-sdf@google.com
|
|
Syzkaller discovered that creating a hash of type devmap_hash with a large
number of entries can hit the memory allocator limit for allocating
contiguous memory regions. There's really no reason to use kmalloc_array()
directly in the devmap code, so just switch it to the existing
bpf_map_area_alloc() function that is used elsewhere.
Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200616142829.114173-1-toke@redhat.com
|
|
In commit 34cc0b338a61 we only handled the frame_sz in convert_to_xdp_frame().
This patch will also handle frame_sz in xdp_convert_zc_to_xdp_frame().
Fixes: 34cc0b338a61 ("xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200616103518.2963410-1-liuhangbin@gmail.com
|
|
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct dm_target_deps {
...
__u64 dev[0]; /* out */
};
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The array bio_in_progress is only used with ssd mode. So skip
writecache_wait_for_ios in writecache_discard when pmem mode.
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
When uncommitted entry has been discarded, correct wc->uncommitted_block
for getting the exact number.
Fixes: 48debafe4f2fe ("dm: add writecache target")
Cc: stable@vger.kernel.org
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The unit of max_io_len is sector instead of byte (spotted through
code review), so fix it.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
To pick the changes from:
b383a73f2b83 ("fs/ext4: Introduce DAX inode flag")
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h
It causes various beautifiers for things like fspick, fsmount, etc (see
below) to get rebuilt, but this specific change doesn't make 'perf
trace' be capable of decoding anything new, as we still don't decode
what comes from ioctls, just its cmds.
Details about the update:
$ cp include/uapi/linux/fs.h tools/include/uapi/linux/fs.h
$ git diff
diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h
index 379a612f8f1d..f44eb0a04afd 100644
--- a/tools/include/uapi/linux/fs.h
+++ b/tools/include/uapi/linux/fs.h
@@ -262,6 +262,7 @@ struct fsxattr {
#define FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define FS_EOFBLOCKS_FL 0x00400000 /* Reserved for ext4 */
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
+#define FS_DAX_FL 0x02000000 /* Inode is DAX */
#define FS_INLINE_DATA_FL 0x10000000 /* Reserved for ext4 */
#define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
#define FS_CASEFOLD_FL 0x40000000 /* Folder is case insensitive */
$ m
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j8' parallel build
INSTALL GTK UI
CC /tmp/build/perf/builtin-trace.o
DESCEND plugins
CC /tmp/build/perf/trace/beauty/fsmount.o
CC /tmp/build/perf/trace/beauty/fspick.o
CC /tmp/build/perf/trace/beauty/mount_flags.o
CC /tmp/build/perf/trace/beauty/move_mount.o
CC /tmp/build/perf/trace/beauty/renameat.o
CC /tmp/build/perf/trace/beauty/sync_file_range.o
INSTALL trace_plugins
LD /tmp/build/perf/trace/beauty/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
<SNIP>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To get the changes in:
776f395004d8 ("vhost_vdpa: Support config interrupt in vdpa")
Silencing this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
This automatically picks the new ioctl introduced in the above patch,
making tools such as 'perf trace' aware of them and possibly allowing to
use the strings in filters, etc:
# perf trace -e ioctl --pid 7951
<SNIP>
0.178 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
0.194 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
0.209 ( 0.010 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
0.224 (249.413 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.660 ( 0.011 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.675 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.686 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.697 ( 0.008 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.709 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.720 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.730 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.740 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.752 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.762 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.772 ( 0.007 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
249.782 (120.138 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
370.201 ( 0.039 ms): CPU 0/KVM/8023 ioctl(fd: 12, cmd: KVM_IRQ_LINE_STATUS, arg: 0x7f744f9e1420) = 0
370.254 ( 0.052 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
370.575 ( 0.365 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
370.973 ( 0.028 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
371.015 ( 0.037 ms): CPU 0/KVM/8023 ioctl(fd: 14, cmd: KVM_RUN) = 0
371.071 ( 0.009 ms): CPU 0/KVM/8023 ioctl(fd: 12, cmd: KVM_IRQ_LINE_STATUS, arg: 0x7f744f9e14b0) = 0
<SNIP>
#
Details about the update:
$ diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
--- tools/include/uapi/linux/vhost.h 2020-04-16 13:19:12.056763843 -0300
+++ include/uapi/linux/vhost.h 2020-06-17 10:04:20.532056428 -0300
@@ -15,6 +15,8 @@
#include <linux/types.h>
#include <linux/ioctl.h>
+#define VHOST_FILE_UNBIND -1
+
/* ioctls */
#define VHOST_VIRTIO 0xAF
@@ -140,4 +142,6 @@
/* Get the max ring size. */
#define VHOST_VDPA_GET_VRING_NUM _IOR(VHOST_VIRTIO, 0x76, __u16)
+/* Set event fd for config interrupt*/
+#define VHOST_VDPA_SET_CONFIG_CALL _IOW(VHOST_VIRTIO, 0x77, int)
#endif
$
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
$ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
$ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
$ diff -u before after
--- before 2020-06-17 10:15:35.123275966 -0300
+++ after 2020-06-17 10:15:51.812482117 -0300
@@ -27,6 +27,7 @@
[0x72] = "VDPA_SET_STATUS",
[0x74] = "VDPA_SET_CONFIG",
[0x75] = "VDPA_SET_VRING_ENABLE",
+ [0x77] = "VDPA_SET_CONFIG_CALL",
};
static const char *vhost_virtio_ioctl_read_cmds[] = {
[0x00] = "GET_FEATURES",
$
This causes these parts to get rebuilt:
CC /tmp/build/perf/trace/beauty/ioctl.o
INSTALL trace_plugins
LD /tmp/build/perf/trace/beauty/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up the changes in:
7e5b3c267d25 ("x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation")
Addressing these tools/perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
With this one will be able to use these new AMD MSRs in filters, by
name, e.g.:
# perf trace -e msr:* --filter "msr==IA32_MCU_OPT_CTRL"
^C#
Using -v we can see how it sets up the tracepoint filters, converting
from the string in the filter to the numeric value:
# perf trace -v -e msr:* --filter "msr==IA32_MCU_OPT_CTRL"
Using CPUID GenuineIntel-6-8E-A
0x123
New filter for msr:read_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
0x123
New filter for msr:write_msr: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
0x123
New filter for msr:rdpmc: (msr==0x123) && (common_pid != 335 && common_pid != 30344)
mmap size 528384B
^C#
The updating process shows how this affects tooling in more detail:
$ diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
--- tools/arch/x86/include/asm/msr-index.h 2020-06-03 10:36:09.959910238 -0300
+++ arch/x86/include/asm/msr-index.h 2020-06-17 10:04:20.235052901 -0300
@@ -128,6 +128,10 @@
#define TSX_CTRL_RTM_DISABLE BIT(0) /* Disable RTM feature */
#define TSX_CTRL_CPUID_CLEAR BIT(1) /* Disable TSX enumeration */
+/* SRBDS support */
+#define MSR_IA32_MCU_OPT_CTRL 0x00000123
+#define RNGDS_MITG_DIS BIT(0)
+
#define MSR_IA32_SYSENTER_CS 0x00000174
#define MSR_IA32_SYSENTER_ESP 0x00000175
#define MSR_IA32_SYSENTER_EIP 0x00000176
$ set -o vi
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2020-06-17 10:05:49.653114752 -0300
+++ after 2020-06-17 10:06:01.777258731 -0300
@@ -51,6 +51,7 @@
[0x0000011e] = "IA32_BBL_CR_CTL3",
[0x00000120] = "IDT_MCR_CTRL",
[0x00000122] = "IA32_TSX_CTRL",
+ [0x00000123] = "IA32_MCU_OPT_CTRL",
[0x00000140] = "MISC_FEATURES_ENABLES",
[0x00000174] = "IA32_SYSENTER_CS",
[0x00000175] = "IA32_SYSENTER_ESP",
$
The related change to cpu-features.h affects this:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
This shouldn't be affecting that 'perf bench' entry:
$ find tools/perf/ -type f | xargs grep SRBDS
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Gross <mgross@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To get some newer headers that got out of sync with the copies in tools/
so that we can try to have the tools/perf/ build clean for v5.8 with
fewer pull requests.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Fixes segmentation fault when trying to interpret zstd-compressed data
with perf script:
```
$ perf record -z ls
...
[ perf record: Captured and wrote 0,010 MB perf.data, compressed (original 0,001 MB, ratio is 2,190) ]
$ memcheck perf script
...
==67911== Invalid read of size 4
==67911== at 0x5568188: ZSTD_decompressStream (in /usr/lib/libzstd.so.1.4.5)
==67911== by 0x6E726B: zstd_decompress_stream (zstd.c:100)
==67911== by 0x65729C: perf_session__process_compressed_event (session.c:72)
==67911== by 0x6598E8: perf_session__process_user_event (session.c:1583)
==67911== by 0x65BA59: reader__process_events (session.c:2177)
==67911== by 0x65BA59: __perf_session__process_events (session.c:2234)
==67911== by 0x65BA59: perf_session__process_events (session.c:2267)
==67911== by 0x5A7397: __cmd_script (builtin-script.c:2447)
==67911== by 0x5A7397: cmd_script (builtin-script.c:3840)
==67911== by 0x5FE9D2: run_builtin (perf.c:312)
==67911== by 0x711627: handle_internal_command (perf.c:364)
==67911== by 0x711627: run_argv (perf.c:408)
==67911== by 0x711627: main (perf.c:538)
==67911== Address 0x71d8 is not stack'd, malloc'd or (recently) free'd
```
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LPU-Reference: 20200612230333.72140-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Make sure that the local variable rzone in dmz_do_reclaim() is always
initialized before being used for printing debug messages.
Fixes: f97809aec589 ("dm zoned: per-device reclaim")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
When a register patch is registered the reg_sequence is copied but the
memory allocated is never freed. Add a kfree in regmap_exit to clean it
up.
Fixes: 22f0d90a3482 ("regmap: Support register patch sets")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20200617152129.19655-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Commit c34a06c56df7 ("tools/bpftool: Add ringbuf map to a list of known
map types") added the symbolic "ringbuf" name. Document it in the bpftool
map command docs and usage as well.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200616113303.8123-1-tklauser@distanz.ch
|
|
During recent refactorings, bpf_probe_read_kernel_str() started returning 0 on
success, instead of amount of data successfully read. This majorly breaks
applications relying on bpf_probe_read_kernel_str() and bpf_probe_read_str()
and their results. Fix this by returning actual number of bytes read.
Fixes: 8d92db5c04d1 ("bpf: rework the compat kernel probe handling")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200616050432.1902042-1-andriin@fb.com
|
|
There are two more HP systems control mute LED from HDA codec and need
to expose micmute led class so SoF can control micmute LED.
Add quirks to support them.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200617102906.16156-2-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Mostly for historical reasons, q->blk_trace is assigned through xchg()
and cmpxchg() atomic operations. Although this is correct, sparse
complains about this because it violates rcu annotations since commit
c780e86dd48e ("blktrace: Protect q->blk_trace with RCU") which started
to use rcu for accessing q->blk_trace. Furthermore there's no real need
for atomic operations anymore since all changes to q->blk_trace happen
under q->blk_trace_mutex and since it also makes more sense to check if
q->blk_trace is set with the mutex held earlier.
So let's just replace xchg() with rcu_replace_pointer() and cmpxchg()
with explicit check and rcu_assign_pointer(). This makes the code more
efficient and sparse happy.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We use one blktrace per request_queue, that means one per the entire
disk. So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
or just two calls on /dev/vda.
We check for concurrent setup only at the very end of the blktrace setup though.
If we try to run two concurrent blktraces on the same block device the
second one will fail, and the first one seems to go on. However when
one tries to kill the first one one will see things like this:
The kernel will show these:
```
debugfs: File 'dropped' in directory 'nvme1n1' already present!
debugfs: File 'msg' in directory 'nvme1n1' already present!
debugfs: File 'trace0' in directory 'nvme1n1' already present!
``
And userspace just sees this error message for the second call:
```
blktrace /dev/nvme1n1
BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
```
The first userspace process #1 will also claim that the files
were taken underneath their nose as well. The files are taken
away form the first process given that when the second blktrace
fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
This means that even if go-happy process #1 is waiting for blktrace
data, we *have* been asked to take teardown the blktrace.
This can easily be reproduced with break-blktrace [0] run_0005.sh test.
Just break out early if we know we're already going to fail, this will
prevent trying to create the files all over again, which we know still
exist.
[0] https://github.com/mcgrof/break-blktrace
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently the macro that inserts entries into the SPU syscall table
doesn't actually use the "nr" (syscall number) parameter.
This does work, but it relies on the exact right number of syscall
entries being emitted in order for the syscal numbers to line up with
the array entries. If for example we had two entries with the same
syscall number we wouldn't get an error, it would just cause all
subsequent syscalls to be off by one in the spu_syscall_table.
So instead change the macro to assign to the specific entry of the
array, meaning any numbering overlap will be caught by the compiler.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20200616135617.2937252-1-mpe@ellerman.id.au
|
|
The pte_update() implementation for PPC_8xx unfolds page table from the PGD
level to access a PMD entry. Since 8xx has only 2-level page table this can
be simplified with pmd_off() shortcut.
Replace explicit unfolding with pmd_off() and drop defines of pgd_index()
and pgd_offset() that are no longer needed.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200615092229.23142-1-rppt@kernel.org
|
|
In case of -EPROBE_DEFER, stm32_qspi_release() was called
in any case which unregistered driver from pm_runtime framework
even if it has not been registered yet to it. This leads to:
stm32-qspi 58003000.spi: can't setup spi0.0, status -13
spi_master spi0: spi_device register error /soc/spi@58003000/mx66l51235l@0
spi_master spi0: Failed to create SPI device for /soc/spi@58003000/mx66l51235l@0
stm32-qspi 58003000.spi: can't setup spi0.1, status -13
spi_master spi0: spi_device register error /soc/spi@58003000/mx66l51235l@1
spi_master spi0: Failed to create SPI device for /soc/spi@58003000/mx66l51235l@1
On v5.7 kernel,this issue was not "visible", qspi driver was probed
successfully.
Fixes: 9d282c17b023 ("spi: stm32-qspi: Add pm_runtime support")
Signed-off-by: Patrice Chotard <patrice.chotard@st.com>
Link: https://lore.kernel.org/r/20200616113035.4514-1-patrice.chotard@st.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The MFD part is merged into v5.8-rc1, thus remove BROKEN dependency.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Link: https://lore.kernel.org/r/20200616135030.1163660-1-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Linux 5.8-rc1
|
|
Unfortunately, most versions of clang that support BTI are capable of
miscompiling the kernel when converting a switch statement into a jump
table. As an example, attempting to spawn a KVM guest results in a panic:
[ 56.253312] Kernel panic - not syncing: bad mode
[ 56.253834] CPU: 0 PID: 279 Comm: lkvm Not tainted 5.8.0-rc1 #2
[ 56.254225] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 56.254712] Call trace:
[ 56.254952] dump_backtrace+0x0/0x1d4
[ 56.255305] show_stack+0x1c/0x28
[ 56.255647] dump_stack+0xc4/0x128
[ 56.255905] panic+0x16c/0x35c
[ 56.256146] bad_el0_sync+0x0/0x58
[ 56.256403] el1_sync_handler+0xb4/0xe0
[ 56.256674] el1_sync+0x7c/0x100
[ 56.256928] kvm_vm_ioctl_check_extension_generic+0x74/0x98
[ 56.257286] __arm64_sys_ioctl+0x94/0xcc
[ 56.257569] el0_svc_common+0x9c/0x150
[ 56.257836] do_el0_svc+0x84/0x90
[ 56.258083] el0_sync_handler+0xf8/0x298
[ 56.258361] el0_sync+0x158/0x180
This is because the switch in kvm_vm_ioctl_check_extension_generic()
is executed as an indirect branch to tail-call through a jump table:
ffff800010032dc8: 3869694c ldrb w12, [x10, x9]
ffff800010032dcc: 8b0c096b add x11, x11, x12, lsl #2
ffff800010032dd0: d61f0160 br x11
However, where the target case uses the stack, the landing pad is elided
due to the presence of a paciasp instruction:
ffff800010032e14: d503233f paciasp
ffff800010032e18: a9bf7bfd stp x29, x30, [sp, #-16]!
ffff800010032e1c: 910003fd mov x29, sp
ffff800010032e20: aa0803e0 mov x0, x8
ffff800010032e24: 940017c0 bl ffff800010038d24 <kvm_vm_ioctl_check_extension>
ffff800010032e28: 93407c00 sxtw x0, w0
ffff800010032e2c: a8c17bfd ldp x29, x30, [sp], #16
ffff800010032e30: d50323bf autiasp
ffff800010032e34: d65f03c0 ret
Unfortunately, this results in a fatal exception because paciasp is
compatible only with branch-and-link (call) instructions and not simple
indirect branches.
A fix is being merged into Clang 10.0.1 so that a 'bti j' instruction is
emitted as an explicit landing pad in this situation. Make in-kernel
BTI depend on that compiler version when building with clang.
Cc: Tom Stellard <tstellar@redhat.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20200615105524.GA2694@willie-the-truck
Link: https://lore.kernel.org/r/20200616183630.2445-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
With the recent full-duplex support of implicit feedback streams, an
endpoint can be still running after closing the capture stream as long
as the playback stream with the sync-endpoint is running. In such a
state, the URBs are still be handled and they may call retire_data_urb
callback, which tries to transfer the data from the PCM buffer. Since
the PCM stream gets closed, this may lead to use-after-free.
This patch adds the proper clearance of the callback at stopping the
capture stream for addressing the possible UAF above.
Fixes: 10ce77e4817f ("ALSA: usb-audio: Add duplex sound support for USB devices using implicit feedback")
Link: https://lore.kernel.org/r/20200616120921.12249-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
__change_page_attr() can fail which will cause set_memory_encrypted() and
set_memory_decrypted() to return non-zero.
If the device requires unencrypted DMA memory and decryption fails, simply
free the memory and fail.
If attempting to re-encrypt in the failure path and that encryption fails,
there is no alternative other than to leak the memory.
Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent buffers in common code")
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
If arch_dma_set_uncached() fails after memory has been decrypted, it needs
to be re-encrypted before freeing.
Fixes: fa7e2247c572 ("dma-direct: make uncached_kernel_address more general")
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
dma_alloc_contiguous() does size >> PAGE_SHIFT and set_memory_decrypted()
works at page granularity. It's necessary to page align the allocation
size in dma_direct_alloc_pages() for consistent behavior.
This also fixes an issue when arch_dma_prep_coherent() is called on an
unaligned allocation size for dma_alloc_need_uncached() when
CONFIG_DMA_DIRECT_REMAP is disabled but CONFIG_ARCH_HAS_DMA_SET_UNCACHED
is enabled.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
nommu configfs can trivially map the coherent allocations to user space,
as no actual page table setup is required and the kernel and the user
space programs share the same address space.
Fixes: 62fcee9a3bd7 ("dma-mapping: remove CONFIG_ARCH_NO_COHERENT_DMA_MMAP")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: dillon min <dillon.minfei@gmail.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: dillon min <dillon.minfei@gmail.com>
|
|
Add flex_array_size() helper for the calculation of the size, in bytes,
of a flexible array member contained within an enclosing structure.
Example of usage:
struct something {
size_t count;
struct foo items[];
};
struct something *instance;
instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
instance->count = count;
memcpy(instance->items, src, flex_array_size(instance, items, instance->count));
The helper returns SIZE_MAX on overflow instead of wrapping around.
Additionally replaces parameter "n" with "count" in struct_size() helper
for greater clarity and unification.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20200609012233.GA3371@embeddedor
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
This patch fixes a spelling typo in scripts/headers_install.sh
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
cc-option and as-option are almost the same; both pass the flag to
$(CC). The main difference is the cc-option stops before the assemble
stage (-S option) whereas as-option stops after (-c option).
I chose -S because it is slightly faster, but $(cc-option,-gz=zlib)
returns a wrong result (https://lkml.org/lkml/2020/6/9/1529).
It has been fixed by commit 7b16994437c7 ("Makefile: Improve compressed
debug info support detection"), but the assembler should always be
invoked for more reliable compiler option tests.
However, you cannot simply replace -S with -c because the following
code in lib/Kconfig.debug would break:
depends on $(cc-option,-gsplit-dwarf)
The combination of -c and -gsplit-dwarf does not accept /dev/null as
output.
$ cat /dev/null | gcc -gsplit-dwarf -S -x c - -o /dev/null
$ echo $?
0
$ cat /dev/null | gcc -gsplit-dwarf -c -x c - -o /dev/null
objcopy: Warning: '/dev/null' is not an ordinary file
$ echo $?
1
$ cat /dev/null | gcc -gsplit-dwarf -c -x c - -o tmp.o
$ echo $?
0
There is another flag that creates an separate file based on the
object file path:
$ cat /dev/null | gcc -ftest-coverage -c -x c - -o /dev/null
<stdin>:1: error: cannot open /dev/null.gcno
So, we cannot use /dev/null to sink the output.
Align the cc-option implementation with scripts/Kbuild.include.
With -c option used in cc-option, as-option is unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
|
|
Add testcases for the return value of the command to show
bootconfig in initrd, and double/single quotes selecting.
Link: http://lkml.kernel.org/r/159230247428.65555.2109472942519215104.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix bootconfig to return 0 if succeeded to show the bootconfig
in initrd. Without this fix, "bootconfig INITRD" command
returns !0 even if the command succeeded to show the bootconfig.
Link: http://lkml.kernel.org/r/159230246566.65555.11891772258543514487.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix bootconfig tool to select double or single quotes
correctly according to the value.
If a bootconfig value includes a double quote character,
we must use single-quotes to quote that value.
Link: http://lkml.kernel.org/r/159230245697.65555.12444299015852932304.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix /proc/bootconfig to select double or single quotes
corrctly according to the value.
If a bootconfig value includes a double quote character,
we must use single-quotes to quote that value.
This modifies if() condition and blocks for avoiding
double-quote in value check in 2 places. Anyway, since
xbc_array_for_each_value() can handle the array which
has a single node correctly.
Thus,
if (vnode && xbc_node_is_array(vnode)) {
xbc_array_for_each_value(vnode) /* vnode->next != NULL */
...
} else {
snprintf(val); /* val is an empty string if !vnode */
}
is equivalent to
if (vnode) {
xbc_array_for_each_value(vnode) /* vnode->next can be NULL */
...
} else {
snprintf(""); /* value is always empty */
}
Link: http://lkml.kernel.org/r/159230244786.65555.3763894451251622488.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: c1a3c36017d4 ("proc: bootconfig: Add /proc/bootconfig to show boot config list")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
We do not use the event variable, just remove it.
Signed-off-by: YangHui <yanghui.def@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
kmemleak report:
[<57dcc2ca>] __kmalloc_track_caller+0x139/0x2b0
[<f1c45d0f>] kstrndup+0x37/0x80
[<f9761eb0>] parse_probe_arg.isra.7+0x3cc/0x630
[<055bf2ba>] traceprobe_parse_probe_arg+0x2f5/0x810
[<655a7766>] trace_kprobe_create+0x2ca/0x950
[<4fc6a02a>] create_or_delete_trace_kprobe+0xf/0x30
[<6d1c8a52>] trace_run_command+0x67/0x80
[<be812cc0>] trace_parse_run_command+0xa7/0x140
[<aecfe401>] probes_write+0x10/0x20
[<2027641c>] __vfs_write+0x30/0x1e0
[<6a4aeee1>] vfs_write+0x96/0x1b0
[<3517fb7d>] ksys_write+0x53/0xc0
[<dad91db7>] __ia32_sys_write+0x15/0x20
[<da347f64>] do_syscall_32_irqs_on+0x3d/0x260
[<fd0b7e7d>] do_fast_syscall_32+0x39/0xb0
[<ea5ae810>] entry_SYSENTER_32+0xaf/0x102
Post parse_probe_arg(), the FETCH_OP_DATA operation type is overwritten
to FETCH_OP_ST_STRING, as a result memory is never freed since
traceprobe_free_probe_arg() iterates only over SYMBOL and DATA op types
Setup fetch string operation correctly after fetch_op_data operation.
Link: https://lkml.kernel.org/r/20200615143034.GA1734@cosmos
Cc: stable@vger.kernel.org
Fixes: a42e3c4de964 ("tracing/probe: Add immediate string parameter support")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
No functional change, just correct the word.
Link: https://lkml.kernel.org/r/20200610033251.31713-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When using trace-cmd on 5.6-rt for the function graph tracer, the output was
corrupted. It gave output like this:
funcgraph_entry: func=0xffffffff depth=38982
funcgraph_entry: func=0x1ffffffff depth=16044
funcgraph_exit: func=0xffffffff overrun=0x92539aaf00000000 calltime=0x92539c9900000072 rettime=0x100000072 depth=11084
funcgraph_exit: func=0xffffffff overrun=0x9253946e00000000 calltime=0x92539e2100000072 rettime=0x72 depth=26033702
funcgraph_entry: func=0xffffffff depth=85798
funcgraph_entry: func=0x1ffffffff depth=12044
The reason was because the tracefs/events/ftrace/funcgraph_entry/exit format
file was incorrect. The -rt kernel adds more common fields to the trace
events. Namely, common_migrate_disable and common_preempt_lazy_count. Each
is one byte in size. This changes the alignment of the normal payload. Most
events are aligned normally, but the function and function graph events are
defined with a "PACKED" macro, that packs their payload. As the offsets
displayed in the format files are now calculated by an aligned field, the
aligned field for function and function graph events should be 1, not their
normal alignment.
With aligning of the funcgraph_entry event, the format file has:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned char common_migrate_disable; offset:8; size:1; signed:0;
field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0;
field:unsigned long func; offset:16; size:8; signed:0;
field:int depth; offset:24; size:4; signed:1;
But the actual alignment is:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned char common_migrate_disable; offset:8; size:1; signed:0;
field:unsigned char common_preempt_lazy_count; offset:9; size:1; signed:0;
field:unsigned long func; offset:12; size:8; signed:0;
field:int depth; offset:20; size:4; signed:1;
Link: https://lkml.kernel.org/r/20200609220041.2a3b527f@oasis.local.home
Cc: stable@vger.kernel.org
Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Remove trace_array 'sample-instance' if kthread_run fails
in sample_trace_array_init().
Link: https://lkml.kernel.org/r/20200609135200.2206726-1-wangkefeng.wang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 89ed42495ef4a ("tracing: Sample module to demonstrate kernel access to Ftrace instances.")
Reviewed-by: Divya Indi <divya.indi@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5
1 lock held by swapper/5/0:
#0: ffff80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, at: call_timer_fn+0x8/0x3e0
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace+0x0/0x1a0
show_stack+0x20/0x30
dump_stack+0xe4/0x150
___might_sleep+0x160/0x200
__might_sleep+0x58/0x90
__mutex_lock+0x64/0x948
mutex_lock_nested+0x3c/0x58
__ftrace_set_clr_event+0x44/0x88
trace_array_set_clr_event+0x24/0x38
mytimer_handler+0x34/0x40 [sample_trace_array]
mutex_lock() will be called in interrupt context, using workqueue to fix it.
Link: https://lkml.kernel.org/r/20200610011244.2209486-1-wangkefeng.wang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 89ed42495ef4 ("tracing: Sample module to demonstrate kernel access to Ftrace instances.")
Reviewed-by: Divya Indi <divya.indi@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:
============================================
WARNING: possible recursive locking detected
5.6.0-rc6+ #6 Not tainted
--------------------------------------------
sched-messaging/2767 is trying to acquire lock:
ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
but task is already holding lock:
ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(kretprobe_table_locks[i].lock));
lock(&(kretprobe_table_locks[i].lock));
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by sched-messaging/2767:
#0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
stack backtrace:
CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
Call Trace:
dump_stack+0x96/0xe0
__lock_acquire.cold.57+0x173/0x2b7
? native_queued_spin_lock_slowpath+0x42b/0x9e0
? lockdep_hardirqs_on+0x590/0x590
? __lock_acquire+0xf63/0x4030
lock_acquire+0x15a/0x3d0
? kretprobe_hash_lock+0x52/0xa0
_raw_spin_lock_irqsave+0x36/0x70
? kretprobe_hash_lock+0x52/0xa0
kretprobe_hash_lock+0x52/0xa0
trampoline_handler+0xf8/0x940
? kprobe_fault_handler+0x380/0x380
? find_held_lock+0x3a/0x1c0
kretprobe_trampoline+0x25/0x50
? lock_acquired+0x392/0xbc0
? _raw_spin_lock_irqsave+0x50/0x70
? __get_valid_kprobe+0x1f0/0x1f0
? _raw_spin_unlock_irqrestore+0x3b/0x40
? finish_task_switch+0x4b9/0x6d0
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.
The problem is in outside kprobe_flush_task, where we call:
kprobe_flush_task
kretprobe_table_lock
raw_spin_lock_irqsave
_raw_spin_lock_irqsave
where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:
kprobe_flush_task
kretprobe_table_lock
raw_spin_lock_irqsave
_raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
---> kretprobe_table_locks locked
kretprobe_trampoline
trampoline_handler
kretprobe_hash_lock(current, &head, &flags); <--- deadlock
Adding kprobe_busy_begin/end helpers that mark code with fake
probe installed to prevent triggering of another kprobe within
this code.
Using these helpers in kprobe_flush_task, so the probe recursion
protection check is hit and the probe is never set to prevent
above lockup.
Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2
Fixes: ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix to remove redundant arch_disarm_kprobe() call in
force_unoptimize_kprobe(). This arch_disarm_kprobe()
will be invoked if the kprobe is optimized but disabled,
but that means the kprobe (optprobe) is unused (and
unoptimized) state.
In that case, unoptimize_kprobe() puts it in freeing_list
and kprobe_optimizer (do_unoptimize_kprobes()) automatically
disarm it. Thus this arch_disarm_kprobe() is redundant.
Link: http://lkml.kernel.org/r/158927058719.27680.17183632908465341189.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|