summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-01perf uprobe: Skip prologue if program compiled without optimizationRavi Bangoria
The function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after the prologue. When we probe entrypc of the function, and try to record a function parameter, it contains a garbage value. For example: $ vim test.c #include <stdio.h> void foo(int i) { printf("i: %d\n", i); } int main() { foo(42); return 0; } $ gcc -g test.c -o test $ objdump -dl test | less foo(): /home/ravi/test.c:4 400536: 55 push %rbp 400537: 48 89 e5 mov %rsp,%rbp 40053a: 48 83 ec 10 sub -bashx10,%rsp 40053e: 89 7d fc mov %edi,-0x4(%rbp) /home/ravi/test.c:5 400541: 8b 45 fc mov -0x4(%rbp),%eax ... ... main(): /home/ravi/test.c:9 400558: 55 push %rbp 400559: 48 89 e5 mov %rsp,%rbp /home/ravi/test.c:10 40055c: bf 2a 00 00 00 mov -bashx2a,%edi 400561: e8 d0 ff ff ff callq 400536 <foo> $ perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0000000000000536 i=-12(%sp):s32 $ perf record -e probe_test:foo ./test $ perf script test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 Here variable 'i' is passed via stack which is pushed on stack at 0x40053e. But we are probing at 0x400536. To resolve this issues, we need to probe on next instruction after prologue. gdb and systemtap also does same thing. I've implemented this patch based on approach systemtap has used. After applying patch: $ perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0000000000000541 i=-4(%bp):s32 $ perf record -e probe_test:foo ./test $ perf script test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 No need to skip prologue for optimized case since debug info is correct for each instructions for -O2 -g. For more details please visit: https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 Changes in v2: - Skipping prologue only when any ARG is either C variable, $params or $vars. - Probe on line(:1) may not be always possible. Recommend only address to force probe on function entry. Committer notes: Testing it with 'perf trace': # perf probe -x ./test foo i Added new event: probe_test:foo (on foo in /home/acme/c/test with i) You can now use it in all perf tools, such as: perf record -e probe_test:foo -aR sleep 1 # cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/acme/c/test:0x0000000000000526 i=-12(%sp):s32 # trace --no-sys --event probe_*:* ./test i: 42 0.000 probe_test:foo:(400526) i=0) # After the patch: # perf probe -d *:* Removed event: probe_test:foo # perf probe -x ./test foo i Target program is compiled without optimization. Skipping prologue. Probe on address 0x400526 to force probing at the function entry. Added new event: probe_test:foo (on foo in /home/acme/c/test with i) You can now use it in all perf tools, such as: perf record -e probe_test:foo -aR sleep 1 # cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/acme/c/test:0x0000000000000531 i=-4(%bp):s32 # trace --no-sys --event probe_*:* ./test i: 42 0.000 probe_test:foo:(400531) i=42) # Reported-by: Michael Petlan <mpetlan@redhat.com> Report-Link: https://www.mail-archive.com/linux-perf-users@vger.kernel.org/msg02348.html Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1299021 Link: http://lkml.kernel.org/r/1470214725-5023-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com [ Rename 'die' to 'cu_die' to avoid shadowing a die() definition on at least centos 5, Debian 7 and ubuntu:12.04.5] [ Use PRIx64 instead of lx to format a Dwarf_Addr, aka long long unsigned int, fixing the build on 32-bit systems ] [ dwarf_getsrclines() expects a size_t * argument ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf probe: Add helper function to check if probe with variableRavi Bangoria
Introduce helper function instead of inline code and replace hardcoded strings "$vars" and "$params" with their corresponding macros. perf_probe_with_var() is not declared as static since it will be called from different file in subsequent patch. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1470214725-5023-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf symbols: Fixup symbol sizes before picking best onesArnaldo Carvalho de Melo
When we call symbol__fixup_duplicate() we use algorithms to pick the "best" symbols for cases where there are various functions/aliases to an address, and those check zero size symbols, which, before calling symbol__fixup_end() are _all_ symbols in a just parsed kallsyms file. So first fixup the end, then fixup the duplicates. Found while trying to figure out why 'perf test vmlinux' failed, see the output of 'perf test -v vmlinux' to see cases where the symbols picked as best for vmlinux don't match the ones picked for kallsyms. Cc: Anton Blanchard <anton@samba.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 694bf407b061 ("perf symbols: Add some heuristics for choosing the best duplicate symbol") Link: http://lkml.kernel.org/n/tip-rxqvdgr0mqjdxee0kf8i2ufn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf symbols: Check symbol_conf.allow_aliases for kallsyms loading tooArnaldo Carvalho de Melo
We can allow aliases to be kept, but we were checking this just when loading vmlinux files, be consistent, do it for any symbol table loading code that calls symbol__fixup_duplicate() by making this function check .allow_aliases instead. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 680d926a8cb0 ("perf symbols: Allow symbol alias when loading map for symbol name") Link: http://lkml.kernel.org/n/tip-z0avp0s6cfjckc4xj3pdfjdz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf test vmlinux: Tolerate symbol aliasesArnaldo Carvalho de Melo
The algorithms used to prune aliases in symbols__fixup_duplicate() uses information available on ELF symtabs that are not present on /proc/kallsyms, so it picks different aliases as "best" for vmlinux and kallsyms. We could probably improve a bit this by having a list of aliases for the "best" symbols picked, instead of throwing this info, but that is left for when we find a real need. With this, 'perf test vmlinux' passes: # perf test -F 1 1: vmlinux symtab matches kallsyms: Ok # When we ask for verbose mode, we can see those warning: # perf test -F -v 1 1: vmlinux symtab matches kallsyms: --- start --- Looking at the vmlinux_path (8 entries long) Using /lib/modules/4.8.0-rc4+/build/vmlinux for symbols WARN: 0xffffffffb7001000: diff name v: xen_hypercall_set_trap_table k: hypercall_page WARN: 0xffffffffb7077970: diff end addr for aesni_gcm_dec v: 0xffffffffb707a2f2 k: 0xffffffffb7077a02 WARN: 0xffffffffb707a300: diff end addr for aesni_gcm_enc v: 0xffffffffb707cc03 k: 0xffffffffb707a392 WARN: 0xffffffffb707f950: diff end addr for aesni_gcm_enc_avx_gen2 v: 0xffffffffb7084ef6 k: 0xffffffffb707f9c3 WARN: 0xffffffffb7084f00: diff end addr for aesni_gcm_dec_avx_gen2 v: 0xffffffffb708a691 k: 0xffffffffb7084f73 WARN: 0xffffffffb708aa10: diff end addr for aesni_gcm_enc_avx_gen4 v: 0xffffffffb708f844 k: 0xffffffffb708aa83 WARN: 0xffffffffb708f850: diff end addr for aesni_gcm_dec_avx_gen4 v: 0xffffffffb709486f k: 0xffffffffb708f8c3 WARN: 0xffffffffb71a6e50: diff name v: perf_pmu_commit_txn.part.98 k: perf_pmu_cancel_txn.part.97 WARN: 0xffffffffb752e480: diff name v: wakeup_expire_count_show.part.5 k: wakeup_active_count_show.part.7 WARN: 0xffffffffb76e8d00: diff name v: phys_switch_id_show.part.11 k: phys_port_name_show.part.12 WARN: Maps only in vmlinux: ffffffffb7d7d000-ffffffffb7eeaac8 117d000 [kernel].init.text ffffffffb7eeaac8-ffffffffc03ad000 12eaac8 [kernel].exit.text ---- end ---- vmlinux symtab matches kallsyms: Ok # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-6v5w1k8rpx4ggczlkw730vt0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf test vmlinux: Avoid printing headers for empty listsArnaldo Carvalho de Melo
Before: # perf test -F -v 1 1: vmlinux symtab matches kallsyms: --- start --- <SNIP> WARN: Maps only in vmlinux: ffffffffb7d7d000-ffffffffb7eeaac8 117d000 [kernel].init.text ffffffffb7eeaac8-ffffffffc03ad000 12eaac8 [kernel].exit.text WARN: Maps in vmlinux with a different name in kallsyms: WARN: Maps only in kallsyms: ---- end ---- vmlinux symtab matches kallsyms: Ok # The two last WARN lines are now suppressed, since there are no such cases detected. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-9ww8uvzl682ykaw8ht1tozlr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf test vmlinux: Clarify which -v lines are errors or warningArnaldo Carvalho de Melo
When the 'perf test -v vmlinux' test fails, it is not clear which of the lines are errors or warnings, clarify that adding ERR/WARN prefixes: # perf test -F -v 1 1: vmlinux symtab matches kallsyms : --- start --- Looking at the vmlinux_path (8 entries long) Using /lib/modules/4.8.0-rc4+/build/vmlinux for symbols ERR : 0xffffffffb7001000: diff name v: xen_hypercall_set_trap_table k: hypercall_page WARN: 0xffffffffb7077970: diff end addr for aesni_gcm_dec v: 0xffffffffb707a2f2 k: 0xffffffffb7077a02 WARN: 0xffffffffb707a300: diff end addr for aesni_gcm_enc v: 0xffffffffb707cc03 k: 0xffffffffb707a392 WARN: 0xffffffffb707f950: diff end addr for aesni_gcm_enc_avx_gen2 v: 0xffffffffb7084ef6 k: 0xffffffffb707f9c3 WARN: 0xffffffffb7084f00: diff end addr for aesni_gcm_dec_avx_gen2 v: 0xffffffffb708a691 k: 0xffffffffb7084f73 WARN: 0xffffffffb708aa10: diff end addr for aesni_gcm_enc_avx_gen4 v: 0xffffffffb708f844 k: 0xffffffffb708aa83 WARN: 0xffffffffb708f850: diff end addr for aesni_gcm_dec_avx_gen4 v: 0xffffffffb709486f k: 0xffffffffb708f8c3 ERR : 0xffffffffb71a6e50: diff name v: perf_pmu_commit_txn.part.98 k: perf_pmu_cancel_txn.part.97 ERR : 0xffffffffb752e480: diff name v: wakeup_expire_count_show.part.5 k: wakeup_active_count_show.part.7 ERR : 0xffffffffb76e8d00: diff name v: phys_switch_id_show.part.11 k: phys_port_name_show.part.12 WARN: Maps only in vmlinux: ffffffffb7d7d000-ffffffffb7eeaac8 117d000 [kernel].init.text ffffffffb7eeaac8-ffffffffc03ad000 12eaac8 [kernel].exit.text WARN: Maps in vmlinux with a different name in kallsyms: WARN: Maps only in kallsyms: ---- end ---- vmlinux symtab matches kallsyms: FAILED! # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-n5ml8m7y9x8kzvxt09ipku88@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf probe: Ignore vmlinux Build-id when offline vmlinux givenMasami Hiramatsu
Ignore vmlinux build-id when user gives offline vmlinux if the command does not affect running kernel. perf-probe has several actions some of them does not change the running kernel, like --lines, --vars, and --funcs. e.g. ----- $ ./perf probe -k ./vmlinux-arm -V do_sys_open:14 Available variables at do_sys_open:14 @<do_sys_open+202> char* filename int dfd int fd int flags struct filename* tmp struct open_flags op umode_t mode ----- Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/147222347320.5088.2582658035296667520.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf probe: Support probing on offline cross-arch binaryMasami Hiramatsu
Support probing on offline cross-architecture binary by adding getting the target machine arch from ELF and choose correct register string for the machine. Here is an example: ----- $ perf probe --vmlinux=./vmlinux-arm --definition 'do_sys_open $params' p:probe/do_sys_open do_sys_open+0 dfd=%r5:s32 filename=%r1:u32 flags=%r6:s32 mode=%r3:u16 ----- Here, we can get probe/do_sys_open from above and append it to to the target machine's tracing/kprobe_events file in the tracefs mountput, usually /sys/kernel/debug/tracing/kprobe_events (or /sys/kernel/tracing/kprobe_events). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/147214229717.23638.6440579792548044658.stgit@devbox [ Add definition for EM_AARCH64 to fix the build on at least centos 6, debian 7 & ubuntu 12.04.5 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01btrfs: fix one bug that process may endlessly wait for ticket in ↵Wang Xiaoguang
wait_reserve_ticket() If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true, btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just return directly and also forget to wake up process which are waiting for their tickets, so these processes will wait endlessly. Fstests case generic/172 with mount option "-o compress=lzo" have revealed this bug in my test machine. Here if we have tickets to handle, we must handle them first. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-01Btrfs: fix endless loop in balancing block groupsLiu Bo
Qgroup function may overwrite the saved error 'err' with 0 in case quota is not enabled, and this ends up with a endless loop in balance because we keep going back to balance the same block group. It really should use 'ret' instead. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-01Btrfs: kill invalid ASSERT() in process_all_refs()Josef Bacik
Suppose you have the following tree in snap1 on a file system mounted with -o inode_cache so that inode numbers are recycled └── [ 258] a └── [ 257] b and then you remove b, rename a to c, and then re-create b in c so you have the following tree └── [ 258] c └── [ 257] b and then you try to do an incremental send you will hit ASSERT(pending_move == 0); in process_all_refs(). This is because we assume that any recycling of inodes will not have a pending change in our path, which isn't the case. This is the case for the DELETE side, since we want to remove the old file using the old path, but on the create side we could have a pending move and need to do the normal pending rename dance. So remove this ASSERT() and put a comment about why we ignore pending_move. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-01perf probe: Ignore vmlinux buildid if offline kernel is givenMasami Hiramatsu
Ignore the buildid of running kernel when both of --definition and --vmlinux is given because that kernel should be off-line. This also skips post-processing of kprobe event for relocating symbol and checking blacklist, because it can not be done on off-line kernel. E.g. without this fix perf shows an error as below ---- $ perf probe --vmlinux=./vmlinux-arm --definition do_sys_open ./vmlinux-arm with build id 7a1f76dd56e9c4da707cd3d6333f50748141434b not found, continuing without symbols Failed to find symbol do_sys_open in kernel Error: Failed to add events. ---- with this fix, we can get the definition ---- $ perf probe --vmlinux=./vmlinux-arm --definition do_sys_open p:probe/do_sys_open do_sys_open+0 ---- Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/147214228193.23638.12581984840822162131.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf probe: Show trace event definitionMasami Hiramatsu
Add --definition/-D option for showing the trace-event definition in stdout. This can be useful in debugging or combined with a shell script. e.g. ---- # perf probe --definition 'do_sys_open $params' p:probe/do_sys_open _text+2261728 dfd=%di:s32 filename=%si:u64 flags=%dx:s32 mode=%cx:u16 ---- Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/147214226712.23638.2240534040014013658.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf config: Show default report configuration in example and docsMilian Wolff
Signed-off-by: Milian Wolff <milian.wolff@kdab.com> LPU-Reference: 20160830134106.21240-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf symbols: Demangle symbols for synthesized @plt entries.Milian Wolff
The symbols in the synthesized @plt entries where not demangled before, i.e. we could end up with entries such as: $ perf report Samples: 7K of event 'cycles:ppp', Event count (approx.): 6223833141 Children Self Command Shared Object Symbol - 93.63% 28.89% lab_mandelbrot lab_mandelbrot [.] main - 73.81% main - 33.57% hypot 27.76% __hypot_finite 15.97% __muldc3 2.90% __muldc3@plt 2.40% _ZNK6QImage6heightEv@plt + 2.14% QColor::rgb 1.94% _ZNK6QImage5widthEv@plt 1.92% cabs@plt This patch remedies this issue by also applying demangling to the synthesized symbols. The output for the above is now: $ perf report Samples: 7K of event 'cycles:ppp', Event count (approx.): 6223833141 Children Self Command Shared Object Symbol - 93.63% 28.89% lab_mandelbrot lab_mandelbrot [.] main - 73.81% main - 33.57% hypot 27.76% __hypot_finite 15.97% __muldc3 2.90% __muldc3@plt 2.40% QImage::height() const@plt + 2.14% QColor::rgb 1.94% QImage::width() const@plt 1.92% cabs@plt Signed-off-by: Milian Wolff <milian.wolff@kdab.com> LPU-Reference: 20160830114102.30863-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01perf probe: Do not use map_load filters for functionArnaldo Carvalho de Melo
It is simpler to just do the loop, no need for globals and the last user of such facility disappears. Testing: # perf probe -F [a-z]*recvmsg aead_recvmsg compat_SyS_recvmsg compat_sys_recvmsg hash_recvmsg inet_recvmsg kernel_recvmsg netlink_recvmsg packet_recvmsg ping_recvmsg raw_recvmsg rawv6_recvmsg rng_recvmsg security_socket_recvmsg selinux_socket_recvmsg skcipher_recvmsg sock_common_recvmsg sock_no_recvmsg sock_recvmsg sys_recvmsg tcp_recvmsg udp_recvmsg udpv6_recvmsg unix_dgram_recvmsg unix_seqpacket_recvmsg unix_stream_recvmsg # Without filters: # perf probe -F | tail -5 zswap_pool_create zswap_pool_current zswap_update_total_size zswap_writeback_entry zswap_zpool_param_set # # perf probe -F | wc -l 33311 # Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20160831130427.GA13095@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-01ovl: update docMiklos Szeredi
Some of the documented quirks no longer apply. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: listxattr: use strnlen()Miklos Szeredi
Be defensive about what underlying fs provides us in the returned xattr list buffer. If it's not properly null terminated, bail out with a warning insead of BUG. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-09-01ovl: Switch to generic_getxattrAndreas Gruenbacher
Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use those same handlers for iop->getxattr as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: copyattr after setting POSIX ACLMiklos Szeredi
Setting POSIX acl may also modify the file mode, so need to copy that up to the overlay inode. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: Switch to generic_removexattrAndreas Gruenbacher
Commit d837a49bd57f ("ovl: fix POSIX ACL setting") switches from iop->setxattr from ovl_setxattr to generic_setxattr, so switch from ovl_removexattr to generic_removexattr as well. As far as permission checking goes, the same rules should apply in either case. While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that this is not an iop->setxattr implementation and remove the unused inode argument. Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the order of handlers in ovl_xattr_handlers. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: Get rid of ovl_xattr_noacl_handlers arrayAndreas Gruenbacher
Use an ordinary #ifdef to conditionally include the POSIX ACL handlers in ovl_xattr_handlers, like the other filesystems do. Flag the code that is now only used conditionally with __maybe_unused. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: Fix OVL_XATTR_PREFIXAndreas Gruenbacher
Make sure ovl_own_xattr_handler only matches attribute names starting with "overlay.", not "overlayXXX". Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: fix spelling mistake: "directries" -> "directories"Colin Ian King
Trivial fix to spelling mistake in pr_err message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: don't cache acl on overlay layerMiklos Szeredi
Some operations (setxattr/chmod) can make the cached acl stale. We either need to clear overlay's acl cache for the affected inode or prevent acl caching on the overlay altogether. Preventing caching has the following advantages: - no double caching, less memory used - overlay cache doesn't go stale when fs clears it's own cache Possible disadvantage is performance loss. If that becomes a problem get_acl() can be optimized for overlayfs. This patch disables caching by pre setting i_*acl to a value that - has bit 0 set, so is_uncached_acl() will return true - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it The constant -3 was chosen for this purpose. Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: use cached acl on underlying layerMiklos Szeredi
Instead of calling ->get_acl() directly, use get_acl() to get the cached value. We will have the acl cached on the underlying inode anyway, because we do permission checking on the both the overlay and the underlying fs. So, since we already have double caching, this improves performance without any cost. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01ovl: proper cleanup of workdirMiklos Szeredi
When mounting overlayfs it needs a clean "work" directory under the supplied workdir. Previously the mount code removed this directory if it already existed and created a new one. If the removal failed (e.g. directory was not empty) then it fell back to a read-only mount not using the workdir. While this has never been reported, it is possible to get a non-empty "work" dir from a previous mount of overlayfs in case of crash in the middle of an operation using the work directory. In this case the left over state should be discarded and the overlay filesystem will be consistent, guaranteed by the atomicity of operations on moving to/from the workdir to the upper layer. This patch implements cleaning out any files left in workdir. It is implemented using real recursion for simplicity, but the depth is limited to 2, because the worst case is that of a directory containing whiteouts under "work". Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-09-01ovl: remove posix_acl_default from workdirMiklos Szeredi
Clear out posix acl xattrs on workdir and also reset the mode after creation so that an inherited sgid bit is cleared. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-09-01ovl: handle umask and posix_acl_default correctly on creationMiklos Szeredi
Setting MS_POSIXACL in sb->s_flags has the side effect of passing mode to create functions without masking against umask. Another problem when creating over a whiteout is that the default posix acl is not inherited from the parent dir (because the real parent dir at the time of creation is the work directory). Fix these problems by: a) If upper fs does not have MS_POSIXACL, then mask mode with umask. b) If creating over a whiteout, call posix_acl_create() to get the inherited acls. After creation (but before moving to the final destination) set these acls on the created file. posix_acl_create() also updates the file creation mode as appropriate. Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-09-01Merge branch 'msm-fixes-4.8' of git://people.freedesktop.org/~robclark/linux ↵Dave Airlie
into drm-fixes copy from user fixes. * 'msm-fixes-4.8' of git://people.freedesktop.org/~robclark/linux: drm/msm: protect against faults from copy_from_user() in submit ioctl drm/msm: fix use of copy_from_user() while holding spinlock
2016-08-31audit: fix exe_file access in audit_exe_compareMateusz Guzik
Prior to the change the function would blindly deference mm, exe_file and exe_file->f_inode, each of which could have been NULL or freed. Use get_task_exe_file to safely obtain stable exe_file. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Richard Guy Briggs <rgb@redhat.com> Cc: <stable@vger.kernel.org> # 4.3.x Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-08-31mm: introduce get_task_exe_fileMateusz Guzik
For more convenient access if one has a pointer to the task. As a minor nit take advantage of the fact that only task lock + rcu are needed to safely grab ->exe_file. This saves mm refcount dance. Use the helper in proc_exe_link. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Richard Guy Briggs <rgb@redhat.com> Cc: <stable@vger.kernel.org> # 4.3.x Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-09-01Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Fixes for 4.8: - 2 CI S4 fixes - error handling fix * 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: record error code when ring test failed drm/amd/amdgpu: compute ring test fail during S4 on CI drm/amd/amdgpu: sdma resume fail during S4 on CI
2016-08-31drm/amdgpu: record error code when ring test failedChunming Zhou
Otherwise we may miss errors. Signed-off-by: Chunming Zhou <David1.Zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-08-31drm/amd/amdgpu: compute ring test fail during S4 on CIjimqu
unhalt Instrction Fetch Unit after all rings are inited. Signed-off-by: JimQu <Jim.Qu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-08-31drm/amd/amdgpu: sdma resume fail during S4 on CIjimqu
SDMA could be fail in the thaw() and restore() processes, do software reset if each SDMA engine is busy. Signed-off-by: JimQu <Jim.Qu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2016-08-31Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - Kconfig problem that prevented mxc-rnga from being enabled - bogus key sizes in qat aes-xts - buggy aes-xts code in vmx" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: vmx - fix null dereference in p8_aes_xts_crypt crypto: qat - fix aes-xts key sizes hwrng: mxc-rnga - Fix Kconfig dependency
2016-08-31binfmt_elf: switch to new creds when switching to new mmLinus Torvalds
We used to delay switching to the new credentials until after we had mapped the executable (and possible elf interpreter). That was kind of odd to begin with, since the new executable will actually then _run_ with the new creds, but whatever. The bigger problem was that we also want to make sure that we turn off prof events and tracing before we start mapping the new executable state. So while this is a cleanup, it's also a fix for a possible information leak. Reported-by: Robert Święcki <robert@swiecki.net> Tested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-31serial: 8250: added acces i/o products quad and octal serial cardsJimi Damon
Added devices ids for acces i/o products quad and octal serial cards that make use of existing Pericom PI7C9X7954 and PI7C9X7958 configurations . Signed-off-by: Jimi Damon <jdamon@accesio.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31serial: 8250_mid: fix divide error bug if baud rate is 0Andy Shevchenko
Since the commit c1a67b48f6a5 ("serial: 8250_pci: replace switch-case by formula for Intel MID"), the 8250 driver crashes in the byt_set_termios() function with a divide error. This is caused by the fact that a baud rate of 0 (B0) is not handled properly. Fix it by falling back to B9600 in this case. Reported-by: "Mendez Salinas, Fernando" <fernando.mendez.salinas@intel.com> Fixes: c1a67b48f6a5 ("serial: 8250_pci: replace switch-case by formula for Intel MID") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31Revert "tty/serial/8250: use mctrl_gpio helpers"Andy Shevchenko
Serial console is broken in v4.8-rcX. Mika and I independently bisected down to commit 4ef03d328769 ("tty/serial/8250: use mctrl_gpio helpers"). Since neither author nor anyone else didn't propose a solution we better revert it for now. This reverts commit 4ef03d328769eddbfeca1f1c958fdb181a69c341. Link: https://lkml.kernel.org/r/20160809130229.GN1729@lahna.fi.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31sysfs: correctly handle read offset on PREALLOC attrsKonstantin Khlebnikov
Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns zero bytes for non-zero offset. This breaks script checkarray in mdadm tool in debian where /bin/sh is 'dash' because its builtin 'read' reads only one byte at a time. Script gets 'i' instead of 'idle' when reads current action from /sys/block/$dev/md/sync_action and as a result does nothing. This patch adds trivial implementation of partial read: generate whole string and move required part into buffer head. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 4ef67a8c95f3 ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.") Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950 Cc: Stable <stable@vger.kernel.org> # v3.19+ Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31documentation: drivers/core/of: fix name of of_node symlinkMartin Fuzzey
commit 5590f3196b29 ("drivers/core/of: Add symlink to device-tree from devices with an OF node") added a symlink called "of_node" to sysfs however the documentation describes it as "of_path". Fix the documentation to match what the code actually does. Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31kernfs: don't depend on d_find_any_alias() when generating notificationsTejun Heo
kernfs_notify_workfn() sends out file modified events for the scheduled kernfs_nodes. Because the modifications aren't from userland, it doesn't have the matching file struct at hand and can't use fsnotify_modify(). Instead, it looked up the inode and then used d_find_any_alias() to find the dentry and used fsnotify_parent() and fsnotify() directly to generate notifications. The assumption was that the relevant dentries would have been pinned if there are listeners, which isn't true as inotify doesn't pin dentries at all and watching the parent doesn't pin the child dentries even for dnotify. This led to, for example, inotify watchers not getting notifications if the system is under memory pressure and the matching dentries got reclaimed. It can also be triggered through /proc/sys/vm/drop_caches or a remount attempt which involves shrinking dcache. fsnotify_parent() only uses the dentry to access the parent inode, which kernfs can do easily. Update kernfs_notify_workfn() so that it uses fsnotify() directly for both the parent and target inodes without going through d_find_any_alias(). While at it, supply the target file name to fsnotify() from kernfs_node->name. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Evgeny Vereshchagin <evvers@ya.ru> Fixes: d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events too") Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31thunderbolt: Don't declare Falcon Ridge unsupportedLukas Wunner
Falcon Ridge 4C has been supported by the driver from the beginning, Falcon Ridge 2C support was just added. Don't irritate users with a warning declaring the opposite. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Andreas Noever <andreas.noever@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31thunderbolt: Add support for INTEL_FALCON_RIDGE_2C controller.Xavier Gnata
From: Xavier Gnata <xavier.gnata@gmail.com> Add support to INTEL_FALCON_RIDGE_2C controller and corresponding quirk to support suspend/resume. Tested against 4.7 master on a MacBook Air 11" 2015. Signed-off-by: Andreas Noever <andreas.noever@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31thunderbolt: Fix resume quirk for Falcon Ridge 4C.Andreas Noever
The quirk 'quirk_apple_wait_for_thunderbolt' did not fire on Falcon Ridge 4C controllers with subdevice/subvendor set to zero. This lead to lost pci devices on system resume. Older thunderbolt controllers (pre Falcon Ridge) used the same device id for bridges and for the controller. On Apple hardware the subvendor- & subdevice-ids were set for the controller, but not for bridges. So that is what was used to differentiate between the two. Starting with Falcon Ridge bridges and controllers received different device ids. Additionally on some MacBookPro models (but not all) the subvendor/subdevice was zeroed. Starting with a42fb351c (thunderbolt: Allow loading of module on recent Apple MacBooks with thunderbolt 2 controller) the thunderbolt driver binds to all Falcon Ridge 4C controllers (irregardless of subvendor/subdevice). The corresponding quirk was not updated. This commit changes the quirk to check the device class instead of its subvendor-/subdeviceids. This works for all generations of Thunderbolt controllers. Signed-off-by: Andreas Noever <andreas.noever@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31lkdtm: Mark lkdtm_rodata_do_nothing() notraceMichael Ellerman
lkdtm_rodata_do_nothing() is an empty function which is generated in order to test the non-executability of rodata. Currently if function tracing is enabled then an mcount callsite will be generated for lkdtm_rodata_do_nothing(), and it will appear in the list of available functions for function tracing (available_filter_functions). Given it's purpose purely as a test function, it seems preferable for lkdtm_rodata_do_nothing() to be marked notrace, so it doesn't appear as traceable. This also avoids triggering a linker bug on powerpc: https://sourceware.org/bugzilla/show_bug.cgi?id=20428 When the linker sees code that needs to generate a call stub, eg. a branch to mcount(), it assumes the section is executable and dereferences a NULL pointer leading to a linker segfault. Marking lkdtm_rodata_do_nothing() notrace avoids triggering the bug because the function contains no other function calls. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-31drm/nouveau/acpi: use DSM if bridge does not support D3coldPeter Wu
Even if PR3 support is available on the bridge, it will not be used if the PCI layer considers it unavailable (i.e. on all laptops from 2013 and 2014). Ensure that this condition is checked to allow a fallback to the Optimus DSM for device poweroff. Initially I wanted to call pci_d3cold_enable before checking bridge_d3 (in case the user changed d3cold_allowed), but that is such an unlikely case and likely fragile anyway. The current patch is suggested by Mika in http://www.spinics.net/lists/linux-pci/msg52599.html Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>