Age | Commit message (Collapse) | Author |
|
If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called
where there is a tracepoint to identify the busy pages. However, it is
possible for busy pages to become available between the calls to these
two routines. In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller. Therefore, the caller believes the pages were
not allocated and they are leaked.
Update the comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY. Also, clear return code in
this case so that it is not accidentally used or returned to caller.
Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
space. In this case, tlb->fullmm is true. Some archs like arm64
doesn't flush TLB when tlb->fullmm is true:
commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").
Which causes leaking of tlb entries.
Will clarifies his patch:
"Basically, we tag each address space with an ASID (PCID on x86) which
is resident in the TLB. This means we can elide TLB invalidation when
pulling down a full mm because we won't ever assign that ASID to
another mm without doing TLB invalidation elsewhere (which actually
just nukes the whole TLB).
I think that means that we could potentially not fault on a kernel
uaccess, because we could hit in the TLB"
There could be a window between complete_signal() sending IPI to other
cores and all threads sharing this mm are really kicked off from cores.
In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to
flush TLB then frees pages. However, due to the above problem, the TLB
entries are not really flushed on arm64. Other threads are possible to
access these pages through TLB entries. Moreover, a copy_to_user() can
also write to these pages without generating page fault, causes
use-after-free bugs.
This patch gathers each vma instead of gathering full vm space. In this
case tlb->fullmm is not true. The behavior of oom reaper become similar
to munmapping before do_exit, which should be safe for all archs.
Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Bob Liu <liubo95@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
drain_all_pages backs off when called from a kworker context since
commit 0ccce3b92421 ("mm, page_alloc: drain per-cpu pages from workqueue
context") because the original IPI based pcp draining has been replaced
by a WQ based one and the check wanted to prevent from recursion and
inter workers dependencies. This has made some sense at the time
because the system WQ has been used and one worker holding the lock
could be blocked while waiting for new workers to emerge which can be a
problem under OOM conditions.
Since then commit ce612879ddc7 ("mm: move pcp and lru-pcp draining into
single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a
rescuer so we shouldn't depend on any other WQ activity to make a
forward progress so calling drain_all_pages from a worker context is
safe as long as this doesn't happen from mm_percpu_wq itself which is
not the case because all workers are required to _not_ depend on any MM
locks.
Why is this a problem in the first place? ACPI driven memory hot-remove
(acpi_device_hotplug) is executed from the worker context. We end up
calling __offline_pages to free all the pages and that requires both
lru_add_drain_all_cpuslocked and drain_all_pages to do their job
otherwise we can have dangling pages on pcp lists and fail the offline
operation (__test_page_isolated_in_pageblock would see a page with 0 ref
count but without PageBuddy set).
Fix the issue by removing the worker check in drain_all_pages.
lru_add_drain_all_cpuslocked doesn't have this restriction so it works
as expected.
Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org
Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We meet this compile warning, which caused by missing bpf.h in xdp.h.
In file included from ./include/trace/events/xdp.h:10:0,
from ./include/linux/bpf_trace.h:6,
from drivers/net/ethernet/intel/i40e/i40e_txrx.c:29:
./include/trace/events/xdp.h:93:17: warning: ‘struct bpf_map’ declared inside parameter list will not be visible outside of this definition or declaration
const struct bpf_map *map, u32 map_index),
^
./include/linux/tracepoint.h:187:34: note: in definition of macro ‘__DECLARE_TRACE’
static inline void trace_##name(proto) \
^~~~~
./include/linux/tracepoint.h:352:24: note: in expansion of macro ‘PARAMS’
__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \
^~~~~~
./include/linux/tracepoint.h:477:2: note: in expansion of macro ‘DECLARE_TRACE’
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
^~~~~~~~~~~~~
./include/linux/tracepoint.h:477:22: note: in expansion of macro ‘PARAMS’
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
^~~~~~
./include/trace/events/xdp.h:89:1: note: in expansion of macro ‘DEFINE_EVENT’
DEFINE_EVENT(xdp_redirect_template, xdp_redirect,
^~~~~~~~~~~~
./include/trace/events/xdp.h:90:2: note: in expansion of macro ‘TP_PROTO’
TP_PROTO(const struct net_device *dev,
^~~~~~~~
./include/trace/events/xdp.h:93:17: warning: ‘struct bpf_map’ declared inside parameter list will not be visible outside of this definition or declaration
const struct bpf_map *map, u32 map_index),
^
./include/linux/tracepoint.h:203:38: note: in definition of macro ‘__DECLARE_TRACE’
register_trace_##name(void (*probe)(data_proto), void *data) \
^~~~~~~~~~
./include/linux/tracepoint.h:354:4: note: in expansion of macro ‘PARAMS’
PARAMS(void *__data, proto), \
^~~~~~
Reported-by: Huang Daode <huangdaode@hisilicon.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Fixes: 8d3b778ff544 ("xdp: tracepoint xdp_redirect also need a map argument")
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Quentin Monnet says:
====================
First commit in this series fixes a crash that occurs when incorrect
arguments are passed to bpftool after the `--json` option. It comes from
the usage() function trying to use the JSON writer, although the latter
has not been created yet at that point.
Other patches add destruction of the writer in case the program exits in
usage(), fix error messages handling when an unrecognized option is
encountered, remove a spurious new-line character in an error message.
Last patches are related to the Makefiles. They fix the installation
directory prefix and .PHONY targets.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
In the Makefile, targets install, doc and doc-install should be added to
.PHONY. Let's fix this.
Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Programs and documentation not managed by package manager are generally
installed under /usr/local/, instead of the user's home directory. In
particular, `man` is generally able to find manual pages under
`/usr/local/share/man`.
bpftool generally follows perf's example, and perf installs to home
directory. However bpftool requires root credentials, so it seems
sensible to follow the more common convention of installing files under
/usr/local instead. So, make /usr/local the default prefix for
installing the binary with `make install`, and the documentation with
`make doc-install`. Also, create /usr/local/sbin if it does not exist.
Note that the bash-completion file, however, is still installed under
/usr/share/bash-completion/completions, as the default setup for bash
does not attempt to load completion files under /usr/local/.
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The end-of-line character inside the string would break JSON compliance.
Remove it, `p_err()` already adds a '\n' character for plain output
anyway.
Fixes: 9a5ab8bf1d6d ("tools: bpftool: turn err() and info() macros into functions")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
If `getopt_long()` meets an unknown option, it prints its own error
message to standard error output. While this does not strictly break
JSON output, it is the only case bpftool prints something to standard
error output if JSON output is required. All other errors are printed on
standard output as JSON objects, so that an external program does not
have to parse stderr.
This is changed by setting the global variable `opterr` to 0.
Furthermore, p_err() is used to reproduce the error message in a more
JSON-friendly way, so that users still get to know what the erroneous
option is.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The writer is cleaned at the end of the main function, but not if the
program exits sooner in usage(). Let's keep it clean and destroy the
writer before exiting.
Destruction and actual call to exit() are moved to another function so
that clean exit can also be performed without printing usage() hints.
Fixes: d35efba99d92 ("tools: bpftool: introduce --json and --pretty options")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
If bad or unrecognised parameters are specified after JSON output is
requested, `usage()` will try to output null JSON object before the
writer is created.
To prevent this, create the writer as soon as the `--json` option is
parsed.
Fixes: 004b45c0e51a ("tools: bpftool: provide JSON output for all possible commands")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Pull nfsd fixes from Bruce Fields:
"I screwed up my merge window pull request; I only sent half of what I
meant to.
There were no new features, just bugfixes of various importance and
some very minor cleanup, so I think it's all still appropriate for
-rc2.
Highlights:
- Fixes from Trond for some races in the NFSv4 state code.
- Fix from Naofumi Honda for a typo in the blocked lock notificiation
code
- Fixes from Vasily Averin for some problems starting and stopping
lockd especially in network namespaces"
* tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits)
lockd: fix "list_add double add" caused by legacy signal interface
nlm_shutdown_hosts_net() cleanup
race of nfsd inetaddr notifiers vs nn->nfsd_serv change
race of lockd inetaddr notifiers vs nlmsvc_rqst change
SUNRPC: make cache_detail structures const
NFSD: make cache_detail structures const
sunrpc: make the function arg as const
nfsd: check for use of the closed special stateid
nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
lockd: lost rollback of set_grace_period() in lockd_down_net()
lockd: added cleanup checks in exit_net hook
grace: replace BUG_ON by WARN_ONCE in exit_net hook
nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
lockd: remove net pointer from messages
nfsd: remove net pointer from debug messages
nfsd: Fix races with check_stateid_generation()
nfsd: Ensure we check stateid validity in the seqid operation checks
nfsd: Fix race in lock stateid creation
nfsd4: move find_lock_stateid
nfsd: Ensure we don't recognise lock stateids after freeing them
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We've collected some fixes in since the pre-merge window freeze.
There's technically only one regression fix for 4.15, but the rest
seems important and candidates for stable.
- fix missing flush bio puts in error cases (is serious, but rarely
happens)
- fix reporting stat::st_blocks for buffered append writes
- fix space cache invalidation
- fix out of bound memory access when setting zlib level
- fix potential memory corruption when fsync fails in the middle
- fix crash in integrity checker
- incremetnal send fix, path mixup for certain unlink/rename
combination
- pass flags to writeback so compressed writes can be throttled
properly
- error handling fixes"
* tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: incremental send, fix wrong unlink path after renaming file
btrfs: tree-checker: Fix false panic for sanity test
Btrfs: fix list_add corruption and soft lockups in fsync
btrfs: Fix wild memory access in compression level parser
btrfs: fix deadlock when writing out space cache
btrfs: clear space cache inode generation always
Btrfs: fix reported number of inode blocks after buffered append writes
Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
Btrfs: bail out gracefully rather than BUG_ON
btrfs: dev_alloc_list is not protected by RCU, use normal list_del
btrfs: add missing device::flush_bio puts
btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
Btrfs: add write_flags for compression bio
|
|
Pull Microblaze fix from Michal Simek:
"Add missing header to mmu_context_mm.h"
* tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: add missing include to mmu_context_mm.h
|
|
Pull sparc fix from David Miller:
"Sparc T4 and later cpu bootup regression fix"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix boot on T4 and later.
|
|
Add my name to the list.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Print file names of files that differ. For example, instead of:
Warning: Intel PT: x86 instruction decoder differs from kernel
print:
Warning: Intel PT: x86 instruction decoder header at 'tools/perf/util/intel-pt-decoder/inat.h' differs from latest version at 'arch/x86/include/asm/inat.h'
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/1511253326-22308-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The PERF_RECORD_USER_ events are synthesized by the tool to assist in
processing the PERF_RECORD_ ones generated by the kernel, the printing
of that information doesn't come with a perf_sample structure, so, when
dumping the event fields using 'perf report -D' there were columns that
end up not being printed.
To tidy up a bit this, fake a perf_sample structure with zeroes to have
the missing columns printed and avoid the occasional surprise with that.
Before:
0 0x45b8 [0x68]: PERF_RECORD_MMAP -1/0: [0xffffffffc12ec000(0x4000) @ 0]: x /lib/modules/4.14.0+/kernel/fs/nls/nls_utf8.ko
0x4620 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27820
0x4648 [0x18]: PERF_RECORD_CPU_MAP: 0-3
0 0x4660 [0x28]: PERF_RECORD_COMM: perf:27820/27820
0x4a58 [0x8]: PERF_RECORD_FINISHED_ROUND
447723433020976 0x4688 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 27820/27820: 0xffffffff8f1b6d7a period: 1 addr: 0
After:
$ perf report -D | grep PERF_RECORD_ | head
0 0xe8 [0x20]: PERF_RECORD_TIME_CONV: unhandled!
0 0x108 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 32555
0 0x130 [0x18]: PERF_RECORD_CPU_MAP: 0-3
0 0x148 [0x28]: PERF_RECORD_COMM: perf:32555/32555
0 0x4e8 [0x8]: PERF_RECORD_FINISHED_ROUND
448743409421205 0x170 [0x28]: PERF_RECORD_COMM exec: sleep:32555/32555
448743409431883 0x198 [0x68]: PERF_RECORD_MMAP2 32555/32555: [0x55e11d75a000(0x208000) @ 0 fd:00 3147174 2566255743]: r-xp /usr/bin/sleep
448743409443873 0x200 [0x70]: PERF_RECORD_MMAP2 32555/32555: [0x7f0ced316000(0x229000) @ 0 fd:00 3151761 2566238119]: r-xp /usr/lib64/ld-2.25.so
448743409454790 0x270 [0x60]: PERF_RECORD_MMAP2 32555/32555: [0x7ffe84f6d000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
448743409479500 0x2d0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4002): 32555/32555: 0xffffffff8f84c7e7 period: 1 addr: 0
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9aefcab0de47 ("perf session: Consolidate the dump code")
Link: https://lkml.kernel.org/n/tip-todcu15x0cwgppkh1gi6uhru@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a tip for Node.js USDT(User-Level Statically Defined Tracing) probes
in tips.txt
Signed-off-by: Hansuk Hong <flavono123@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20171123160546.9722-1-flavono123@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add support for computing 'perf stat' style metrics in 'perf script'.
When using leader sampling we can get metrics for each sampling period
by computing formulas over the values of the different group members.
This allows things like fine grained IPC tracking through sampling, much
more fine grained than with 'perf stat'.
The metric is still averaged over the sampling period, it is not just
for the sampling point.
This patch adds a new metric output field for 'perf script' that uses
the existing 'perf stat' metrics infrastructure to compute any metrics
supported by 'perf stat'.
For example to sample IPC:
$ perf record -e '{ref-cycles,cycles,instructions}:S' -a sleep 1
$ perf script -F metric,ip,sym,time,cpu,comm
...
alsa-sink-ALC32 [000] 42815.856074: 7fd65937d6cc [unknown]
alsa-sink-ALC32 [000] 42815.856074: 7fd65937d6cc [unknown]
alsa-sink-ALC32 [000] 42815.856074: 7fd65937d6cc [unknown]
alsa-sink-ALC32 [000] 42815.856074: metric: 0.13 insn per cycle
swapper [000] 42815.857961: ffffffff81655df0 __schedule
swapper [000] 42815.857961: ffffffff81655df0 __schedule
swapper [000] 42815.857961: ffffffff81655df0 __schedule
swapper [000] 42815.857961: metric: 0.23 insn per cycle
qemu-system-x86 [000] 42815.858130: ffffffff8165ad0e _raw_spin_unlock_irqrestore
qemu-system-x86 [000] 42815.858130: ffffffff8165ad0e _raw_spin_unlock_irqrestore
qemu-system-x86 [000] 42815.858130: ffffffff8165ad0e _raw_spin_unlock_irqrestore
qemu-system-x86 [000] 42815.858130: metric: 0.46 insn per cycle
:4972 [000] 42815.858312: ffffffffa080e5f2 vmx_vcpu_run
:4972 [000] 42815.858312: ffffffffa080e5f2 vmx_vcpu_run
:4972 [000] 42815.858312: ffffffffa080e5f2 vmx_vcpu_run
:4972 [000] 42815.858312: metric: 0.45 insn per cycle
TopDown:
This requires disabling SMT if you have it enabled, because SMT would
require sampling per core, which is not supported.
$ perf record -e '{ref-cycles,topdown-fetch-bubbles,\
topdown-recovery-bubbles,\
topdown-slots-retired,topdown-total-slots,\
topdown-slots-issued}:S' -a sleep 1
$ perf script --header -I -F cpu,ip,sym,event,metric,period
...
[000] 121108 ref-cycles: ffffffff8165222e copy_user_enhanced_fast_string
[000] 190350 topdown-fetch-bubbles: ffffffff8165222e copy_user_enhanced_fast_string
[000] 2055 topdown-recovery-bubbles: ffffffff8165222e copy_user_enhanced_fast_string
[000] 148729 topdown-slots-retired: ffffffff8165222e copy_user_enhanced_fast_string
[000] 144324 topdown-total-slots: ffffffff8165222e copy_user_enhanced_fast_string
[000] 160852 topdown-slots-issued: ffffffff8165222e copy_user_enhanced_fast_string
[000] metric: 33.0% frontend bound
[000] metric: 3.5% bad speculation
[000] metric: 25.8% retiring
[000] metric: 37.7% backend bound
[000] 112112 ref-cycles: ffffffff8165aec8 _raw_spin_lock_irqsave
[000] 357222 topdown-fetch-bubbles: ffffffff8165aec8 _raw_spin_lock_irqsave
[000] 3325 topdown-recovery-bubbles: ffffffff8165aec8 _raw_spin_lock_irqsave
[000] 323553 topdown-slots-retired: ffffffff8165aec8 _raw_spin_lock_irqsave
[000] 270507 topdown-total-slots: ffffffff8165aec8 _raw_spin_lock_irqsave
[000] 341226 topdown-slots-issued: ffffffff8165aec8 _raw_spin_lock_irqsave
[000] metric: 33.0% frontend bound
[000] metric: 2.9% bad speculation
[000] metric: 29.9% retiring
[000] metric: 34.2% backend bound
...
v2:
Use evsel->priv for new fields
Port to new base line, support fp output.
Handle stats in ->stats, not ->priv
Minor cleanups
Extra explanation about the use of the term 'averaging', from Andi in the
thread in the Link: tag below:
<quote Andi>
The current samples contains the sum of event counts for a sampling period.
EventA-1 EventA-2 EventA-3 EventA-4
EventB-1 EventB-2 EventC-3
gap with no events overflow
|-----------------------------------------------------------------|
period-start period-end
^ ^
| |
previous sample current sample
So EventA = 4 and EventB = 3 at the sample point
I generate a metric, let's say EventA / EventB. It applies to the whole period.
But the metric is over a longer time which does not have the same behavior. For
example the gap above doesn't have any events, while they are clustered at the
beginning and end of the sample period.
But we're summing everything together. The metric doesn't know that the gap is
different than the busy period.
That's what I'm trying to express with averaging.
</quote>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171117214300.32746-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Synthesize the per attr thread maps and cpu maps in 'perf record'.
This allows code from 'perf stat' called from 'perf script' to access
this information.
Committer testing:
Please see the PERF_RECORD_THREAD_MAP and PERF_RECORD_CPU_MAP records,
added by this patch:
$ perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
$ perf report -D | grep PERF_RECORD_ | head
0xe8 [0x20]: PERF_RECORD_TIME_CONV: unhandled!
0x108 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 23568
0x130 [0x18]: PERF_RECORD_CPU_MAP: 0-3
0 0x148 [0x28]: PERF_RECORD_COMM: perf:23568/23568
0x570 [0x8]: PERF_RECORD_FINISHED_ROUND
445342677837144 0x170 [0x28]: PERF_RECORD_COMM exec: sleep:23568/23568
445342677847339 0x198 [0x68]: PERF_RECORD_MMAP2 23568/23568: [0x564c943a4000(0x208000) @ 0 fd:00 3147174 2566255743]: r-xp /usr/bin/sleep
445342677862450 0x200 [0x70]: PERF_RECORD_MMAP2 23568/23568: [0x7f25968a8000(0x229000) @ 0 fd:00 3151761 2566238119]: r-xp /usr/lib64/ld-2.25.so
445342677873174 0x270 [0x60]: PERF_RECORD_MMAP2 23568/23568: [0x7ffc98176000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
445342677891928 0x2d0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4002): 23568/23568: 0xffffffff8f84c7e7 period: 1 addr: 0
$
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20171117214300.32746-3-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Move the code to synthesize event updates for scale/unit/cpus to a
common utility file, and use it both from stat and record.
This allows to access scale and other extra qualifiers from perf script.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171117214300.32746-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The s390x CPU sampling and measurement facilities do not support perf
events of type PERF_TYPE_BREAKPOINT. The test cases are executed and
fail with -ENOENT due to missing hardware support.
Disable the execution of both test cases based on a
platform check. This is the same approach as done for
PowerPC.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
LPU-Reference: 20171123074623.20817-1-tmricht@linux.vnet.ibm.com
Link: https://lkml.kernel.org/n/tip-uqvoy6a1tsu8jddo5jjg4h85@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
kernel
Remove this from check-headers.sh:
opts="--ignore-blank-lines --ignore-space-change"
as the easiest policy is to just follow the upstream UAPI header version 100%.
Pure space-only changes are comparatively rare.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/20171121084111.y6p5zwqso2cbms5s@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Pull networking fixes from David Miller:
1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones
missed one spot. From Zhu Yanjun.
2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg.
3) Fix checksum offloading in thunderx driver, from Sunil Goutham.
4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger.
5) Fix use after free of packet headers in TIPC, from Jon Maloy.
6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva.
7) Tunneling fixes in mlxsw driver, from Petr Machata.
8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike
Maloney.
9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric
Dumazet.
10) Fix regression in sch_sfq.c due to one of the timer_setup()
conversions. From Paolo Abeni.
11) SCTP does list_for_each_entry() using wrong struct member, fix from
Xin Long.
12) Don't use big endian netlink attribute read for
IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin
Long.
13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing
adding filters there. From Jiri Pirko.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
ethernet: dwmac-stm32: Fix copyright
net: via: via-rhine: use %p to format void * address instead of %x
net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
myri10ge: Update MAINTAINERS
net: sched: cbq: create block for q->link.block
atm: suni: remove extraneous space to fix indentation
atm: lanai: use %p to format kernel addresses instead of %x
VSOCK: Don't set sk_state to TCP_CLOSE before testing it
atm: fore200e: use %pK to format kernel addresses instead of %x
ambassador: fix incorrect indentation of assignment statement
vxlan: use __be32 type for the param vni in __vxlan_fdb_delete
bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM
sctp: use right member as the param of list_for_each_entry
sch_sfq: fix null pointer dereference at timer expiration
cls_bpf: don't decrement net's refcount when offload fails
net/packet: fix a race in packet_bind() and packet_notifier()
packet: fix crash in fanout_demux_rollover()
sctp: remove extern from stream sched
sctp: force the params with right types for sctp csum apis
sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
...
|
|
If we don't put the NG4fls.o object into the same part of
the link as the generic sparc64 objects for fls() and __fls()
then the relocation in the branch we use for patching will
not fit.
Move NG4fls.o into lib-y to fix this problem.
Fixes: 46ad8d2d22c1 ("sparc64: Use sparc optimized fls and __fls for T4 and above")
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
|
|
VMIDs 8-16 in Kaveri were reserved for use by the amdkfd driver.
Because we removed amdkfd support from radeon, those VMIDs are now
used by radeon and are initialized by radeon.
This patch removes the function that initialized those VMIDs for amdkfd
use.
This initialization overridden the radeon initialization and caused GPU
faults and GUI crashed.
Fixes: f4fa88ab28ab ("drm/radeon: deprecate and remove KFD interface")
Rported-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reverts "drm/ttm: Fix configuration error around populate_and_map()
functions".
This fix has gone into the wrong direction. Those helpers should be
available even when neither CONFIG_INTEL_IOMMU nor CONFIG_SWIOTLB are
set.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Instead, just fall back on the new '%p' behavior which hashes the
pointer.
Otherwise, '%pK' - that was intended to mark a pointer as restricted -
just ends up leaking pointers that a normal '%p' wouldn't leak. Which
just make the whole thing pointless.
I suspect we should actually get rid of '%pK' entirely, and make it just
work as '%p' regardless, but this is the minimal obvious fix. People
who actually use 'kptr_restrict' should weigh in on which behavior they
want.
Cc: Tobin Harding <me@tobin.cc>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
gcc 4.4.4 is too old to have full C11 anonymous union support, so
the current initialiser fails to compile.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
(compile-)Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The conditional kallsym hex printing used a special fixed-width '%lx'
output (KALLSYM_FMT) in preparation for the hashing of %p, but that
series ended up adding a %px specifier to help with the conversions.
Use it, and avoid the "print pointer as an unsigned long" code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull printk pointer hashing update from Tobin Harding:
"Here is the patch set that implements hashing of printk specifier %p.
First we have two clean up patches then we do the hashing. Hashing is
done via the SipHash algorithm. The next patch adds printk specifier
%px for printing pointers when we _really_ want to see the address i.e
%px is functionally equivalent to %lx. Final patch in the set fixes
KASAN since we break it by hashing %p.
For the record here is the justification for the series:
Currently there exist approximately 14 000 places in the Kernel
where addresses are being printed using an unadorned %p. This
potentially leaks sensitive information about the Kernel layout in
memory. Many of these calls are stale, instead of fixing every call
we hash the address by default before printing. We then add %px to
provide a way to print the actual address. Although this is
achievable using %lx, using %px will assist us if we ever want to
change pointer printing behaviour. %px is more uniquely grep'able
(there are already >50 000 uses of %lx).
The added advantage of hashing %p is that security is now opt-out,
if you _really_ want the address you have to work a little harder
and use %px.
This will of course break some users, forcing code printing needed
addresses to be updated"
[ I do expect this to be an annoyance, and a number of %px users to be
added for debuggability. But nobody is willing to audit existing %p
users for information leaks, and a number of places really only use
the pointer as an object identifier rather than really 'I need the
address'.
IOW - sorry for the inconvenience, but it's the least inconvenient of
the options. - Linus ]
* tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux:
kasan: use %px to print addresses instead of %p
vsprintf: add printk specifier %px
printk: hash addresses printed with %p
vsprintf: refactor %pK code out of pointer()
docs: correct documentation for %pK
|
|
Since it is perfectly legal to run the kernel at EL1, it is not
actually an error if HYP mode is not available when attempting to
initialize KVM, given that KVM support cannot be built as a module.
So demote the kvm_err() to kvm_info(), which prevents the error from
appearing on an otherwise 'quiet' console.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The timer optimization patches inadvertendly changed the logic to always
load the timer state as if we have a vgic, even if we don't have a vgic.
Fix this by doing the usual irqchip_in_kernel() check and call the
appropriate load function.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
There is a fast-path of MMIO emulation inside hyp mode. The handling
of single-step is broadly the same as kvm_arm_handle_step_debug()
except we just setup ESR/HSR so handle_exit() does the correct thing
as we exit.
For the case of an emulated illegal access causing an SError we will
exit via the ARM_EXCEPTION_EL1_SERROR path in handle_exit(). We behave
as we would during a real SError and clear the DBG_SPSR_SS bit for the
emulated instruction.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
When an SError arrives during single-step both the SError and debug
exceptions may be pending when the step is completed, and the
architecture doesn't define the ordering of the two. This means that we
can observe en SError even though we've just completed a step, without
receiving a debug exception. In that case the DBG_SPSR_SS bit will have
flipped as the instruction executed. After handling the abort in
handle_exit() we test to see if the bit is clear and we were
single-stepping before deciding if we need to exit to user space.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
This reverts commit 152e93af3cfe2d29d8136cc0a02a8612507136ee.
It was a nice cleanup in theory, but as Nicolai Stange points out, we do
need to make the page dirty for the copy-on-write case even when we
didn't end up making it writable, since the dirty bit is what we use to
check that we've gone through a COW cycle.
Reported-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull NVMe fixes from Christoph:
"A few more nvme updates for 4.15. A single small PCIe fix, and a number
of patches for RDMA that are a little larger than what I'd like to see
for -rc2, but they fix important issues seen in the wild."
|
|
register_shrinker() might return -ENOMEM error since Linux 3.12.
Call panic() as with other failure checks in this function if
register_shrinker() failed.
Fixes: 1d3d4437eae1 ("vmscan: per-node deferred work")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Jan Kara <jack@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The system state of KVM when using userspace emulation is not complete
until we return into KVM_RUN. To handle mmio related updates we wait
until they have been committed and then schedule our KVM_EXIT_DEBUG.
The kvm_arm_handle_step_debug() helper tells us if we need to return
and sets up the exit_reason for us.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
If we are using guest debug to single-step the guest, we need to ensure
that we exit after emulating the instruction. This only affects
instructions completely emulated by the kernel. For instructions
emulated in userspace, we need to exit and return to complete the
emulation.
The kvm_arm_handle_step_debug() helper sets up the necessary exit
state if needed.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
After emulating instructions we may want return to user-space to handle
single-step debugging. Introduce a helper function, which, if
single-step is enabled, sets the run structure for return and returns
true.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
VTTBR_BADDR_MASK is used to sanity check the size and alignment of the
VTTBR address. It seems to currently be off by one, thereby only
allowing up to 39-bit addresses (instead of 40-bit) and also
insufficiently checking the alignment. This patch fixes it.
This patch is the 32bit pendent of Kristina's arm64 fix, and
she deserves the actual kudos for pinpointing that one.
Fixes: f7ed45be3ba52 ("KVM: ARM: World-switch implementation")
Cc: <stable@vger.kernel.org> # 3.9
Reported-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
VTTBR_BADDR_MASK is used to sanity check the size and alignment of the
VTTBR address. It seems to currently be off by one, thereby only
allowing up to 47-bit addresses (instead of 48-bit) and also
insufficiently checking the alignment. This patch fixes it.
As an example, with 4k pages, before this patch we have:
PHYS_MASK_SHIFT = 48
VTTBR_X = 37 - 24 = 13
VTTBR_BADDR_SHIFT = 13 - 1 = 12
VTTBR_BADDR_MASK = ((1 << 35) - 1) << 12 = 0x00007ffffffff000
Which is wrong, because the mask doesn't allow bit 47 of the VTTBR
address to be set, and only requires the address to be 12-bit (4k)
aligned, while it actually needs to be 13-bit (8k) aligned because we
concatenate two 4k tables.
With this patch, the mask becomes 0x0000ffffffffe000, which is what we
want.
Fixes: 0369f6a34b9f ("arm64: KVM: EL2 register definitions")
Cc: <stable@vger.kernel.org> # 3.11.x
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Before performing an unmap, let's check that what we have was
really mapped the first place.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
We miss a test against NULL after allocation.
Fixes: 6d03a68f8054 ("KVM: arm64: vgic-its: Turn device_id validation into generic ID validation")
Cc: stable@vger.kernel.org # 4.8
Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The current pending table parsing code assumes that we keep the
previous read of the pending bits, but keep that variable in
the current block, making sure it is discarded on each loop.
We end-up using whatever is on the stack. Who knows, it might
just be the right thing...
Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table")
Cc: stable@vger.kernel.org # 4.8
Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
The current pending table parsing code assumes that we keep the
previous read of the pending bits, but keep that variable in
the current block, making sure it is discarded on each loop.
We end-up using whatever is on the stack. Who knows, it might
just be the right thing...
Fixes: 280771252c1ba ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES")
Cc: stable@vger.kernel.org # 4.12
Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Using the size of the structure we're allocating is a good idea
and avoids any surprise... In this case, we're happilly confusing
kvm_kernel_irq_routing_entry and kvm_irq_routing_entry...
Fixes: 95b110ab9a09 ("KVM: arm/arm64: Enable irqchip routing")
Cc: stable@vger.kernel.org # 4.8
Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|