summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-12-18nvme-rdma: implement polling queue mapSagi Grimberg
When passed with nr_poll_queues setup additional queues with cq polling context IB_POLL_DIRECT (no interrupts) and make sure to set QUEUE_FLAG_POLL on the connect_q. In addition add the third queue mapping for polling queues. nvmf connect on this queue is polled for like all other requests so make nvmf_connect_io_queue poll for polling queues. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvme-fabrics: allow user to pass in nr_poll_queuesSagi Grimberg
This argument will specify how many polling I/O queues to connect when creating the controller. These I/O queues will host I/O that is set with REQ_HIPRI. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvme-fabrics: allow nvmf_connect_io_queue to pollSagi Grimberg
Preparation for polling support for fabrics. Polling support means that our completion queues are not generating any interrupts which means we need to poll for the nvmf io queue connect as well. Reviewed by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvme-core: optionally poll sync commandsSagi Grimberg
Pass poll bool to indicate that we need it to poll. This prepares us for polling support in nvmf since connect is an I/O that will be queued and has to be polled in order to complete. If poll is passed, we call nvme_execute_rq_polled which sends the requests and polls for its completion. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18block: make request_to_qc_t publicSagi Grimberg
block consumers will need it for polling requests that are sent with blk_execute_rq_nowait. Also, get rid of blk_tag_to_qc_t and open-code it instead. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"Colin Ian King
There is a spelling mistake in a dev_info message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvme-tcp: fix endianess annotationsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-12-18nvmet-tcp: fix endianess annotationsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me>
2018-12-18nvme-pci: refactor nvme_poll_irqdisable to make sparse happyChristoph Hellwig
By duplicating the nvme_process_cq in both branches we keep the sparse lock context checking happy, so do it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-12-18nvme-pci: only set nr_maps to 2 if poll queues are supportedChristoph Hellwig
The block layer now enables polling support on a queue if nr_maps includes the poll map, so we should only set that if we actually support poll queues. Fixes: 6544d229bf ("block: enable polling by default if a poll map is initalized") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2018-12-18nvmet: use a macro for default error locationChaitanya Kulkarni
This patch defines a new macro NVMET_NO_ERROR_LOC to represent the default error location value in the nvme-error-log-page. This is a pure cleanup patch and it does not change any functionality. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvmet: fix comparison of a u16 with -1Colin Ian King
Currently the u16 req->error_loc is being compared to -1 which will always be false. Fix this by casting -1 to u16 to fix this. Detected by clang: warning: result of comparison of constant -1 with expression of type 'u16' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare] Fixes: 76574f37bf4c ("nvmet: add interface to update error-log page") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2018-12-18 1) Add xfrm policy selftest scripts. From Florian Westphal. 2) Split inexact policies into four different search list classes and use the rbtree infrastructure to store/lookup the policies. This is to improve the policy lookup performance after the flowcache removal. Patches from Florian Westphal. 3) Various coding style fixes, from Colin Ian King. 4) Fix policy lookup logic after adding the inexact policy search tree infrastructure. From Florian Westphal. 5) Remove a useless remove BUG_ON from xfrm6_dst_ifdown. From Li RongQing. 6) Use the correct policy direction for lookups on hash rebuilding. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18gfs2: take jdata unstuff into account in do_growBob Peterson
Before this patch, function do_grow would not reserve enough journal blocks in the transaction to unstuff jdata files while growing them. This patch adds the logic to add one more block if the file to grow is jdata. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-18SUNRPC: Remove xprt_connect_status()Trond Myklebust
Over the years, xprt_connect_status() has been superseded by call_connect_status(), which now handles all the errors that xprt_connect_status() does and more. Since the latter converts all errors that it doesn't recognise to EIO, then it is time for it to be retired. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2018-12-18SUNRPC: Fix a race with XPRT_CONNECTINGTrond Myklebust
Ensure that we clear XPRT_CONNECTING before releasing the XPRT_LOCK so that we don't have races between the (asynchronous) socket setup code and tasks in xprt_connect(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2018-12-18SUNRPC: Fix disconnection racesTrond Myklebust
When the socket is closed, we need to call xprt_disconnect_done() in order to clean up the XPRT_WRITE_SPACE flag, and wake up the sleeping tasks. However, we also want to ensure that we don't wake them up before the socket is closed, since that would cause thundering herd issues with everyone piling up to retransmit before the TCP shutdown dance has completed. Only the task that holds XPRT_LOCKED needs to wake up early in order to allow the close to complete. Reported-by: Dave Wysochanski <dwysocha@redhat.com> Reported-by: Scott Mayhew <smayhew@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2018-12-18irqchip/stm32: protect configuration registers with hwspinlockBenjamin Gaignard
If a hwspinlock is defined in device tree use it to protect configuration registers. Do not request for hwspinlock during the exti driver init since the hwspinlock driver is not probed yet at that stage and the exti driver does not support deferred probe. Instead of this, postpone the hwspinlock request at the first time the hwspinlock is actually needed. Use the hwspin_trylock_raw() API which is the most appropriated here Indeed: - hwspin_lock_() calls are under spin_lock protection (chip_data->rlock or gc->lock). - the _timeout() API relies on jiffies count which won't work if IRQs are disabled which is the case here (a large part of the IRQ setup is done atomically (see irq/manage.c)) As a consequence implement the retry/timeout lock from here. And since all of this is done atomically, reduce the timeout delay to 1 ms. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com> Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18dt-bindings: interrupt-controller: stm32: Document hwlock propertiesBenjamin Gaignard
Add hwlocks as optional property Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18irqchip: Add driver for imx-irqsteer controllerLucas Stach
The irqsteer block is a interrupt multiplexer/remapper found on the i.MX8 line of SoCs. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18dt-bindings/irq: Add binding for Freescale IRQSTEER multiplexerLucas Stach
This adds the DT binding for the Freescale IRQSTEER interrupt multiplexer found in the i.MX8 familiy SoCs. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18Merge branch 'turbostat' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat utility updates for v4.21 from Len Brown: "A couple of random fixes that were sitting in the queue." * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: consolidate duplicate model numbers tools/power turbostat: fix goldmont C-state limit decoding tools/power turbostat: reduce debug output tools/power turbosat: fix AMD APIC-id output
2018-12-18aio: abstract out io_event filler helperJens Axboe
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18aio: split out iocb copy from io_submit_one()Jens Axboe
In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18aio: use iocb_put() instead of open coding itJens Axboe
Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18aio: only use blk plugs for > 2 depth submissionsJens Axboe
Plugging is meant to optimize submission of a string of IOs, if we don't have more than 2 being submitted, don't bother setting up a plug. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18aio: don't zero entire aio_kiocb aio_get_req()Jens Axboe
It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18aio: separate out ring reservation from req allocationChristoph Hellwig
This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18aio: use assigned completion handlerJens Axboe
We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-18Merge branch 'for-4.21/block' into for-4.21/aioJens Axboe
* for-4.21/block: (351 commits) blk-mq: enable IO poll if .nr_queues of type poll > 0 blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() blk-mq: skip zero-queue maps in blk_mq_map_swqueue block: fix blk-iolatency accounting underflow blk-mq: fix dispatch from sw queue block: mq-deadline: Fix write completion handling nvme-pci: don't share queue maps blk-mq: only dispatch to non-defauly queue maps if they have queues blk-mq: export hctx->type in debugfs instead of sysfs blk-mq: fix allocation for queue mapping table blk-wbt: export internal state via debugfs blk-mq-debugfs: support rq_qos block: update sysfs documentation block: loop: check error using IS_ERR instead of IS_ERR_OR_NULL in loop_add() aoe: add __exit annotation block: clear REQ_HIPRI if polling is not supported blk-mq: replace and kill blk_mq_request_issue_directly blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests blk-mq: refactor the code of issue request directly block: remove the bio_integrity_advance export ...
2018-12-18perf trace: Allow suppressing the syscall argument namesArnaldo Carvalho de Melo
To show just the values: Default: # trace -e open*,close,*sleep sleep 1 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC ) = 3 close(fd: 3 ) = 0 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC ) = 3 close(fd: 3 ) = 0 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3 close(fd: 3 ) = 0 nanosleep(rqtp: 0x7ffc0c4ea0d0, rmtp: 0 ) = 0 close(fd: 1 ) = 0 close(fd: 2 ) = 0 # Remove it: # perf config trace.show_arg_names=no # trace -e open*,close,*sleep sleep 1 openat(CWD, /etc/ld.so.cache, CLOEXEC ) = 3 close(3 ) = 0 openat(CWD, /lib64/libc.so.6, CLOEXEC ) = 3 close(3 ) = 0 openat(CWD, /usr/lib/locale/locale-archive, CLOEXEC ) = 3 close(3 ) = 0 nanosleep(0x7ffced3a8c40, 0 ) = 0 close(1 ) = 0 close(2 ) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ta9tbdwgodpw719sr2bjm8eb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf trace: Allow configuring if the syscall start timestamp should be printedArnaldo Carvalho de Melo
# trace -e open*,close,*sleep sleep 1 0.000 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC ) = 3 0.016 close(fd: 3 ) = 0 0.024 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC ) = 3 0.074 close(fd: 3 ) = 0 0.235 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3 0.251 close(fd: 3 ) = 0 0.285 nanosleep(rqtp: 0x7ffd68e6d620, rmtp: 0 ) = 0 1000.386 close(fd: 1 ) = 0 1000.395 close(fd: 2 ) = 0 # # perf config trace.show_timestamp=no # trace -e open*,close,*sleep sleep 1 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC ) = 3 close(fd: 3 ) = 0 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC ) = 3 close(fd: 3 ) = 0 openat(dfd: CWD, filename: , flags: CLOEXEC ) = 3 close(fd: 3 ) = 0 nanosleep(rqtp: 0x7fffa79c38e0, rmtp: 0 ) = 0 close(fd: 1 ) = 0 close(fd: 2 ) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-mjjnicy48367jah6ls4k0nk8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf trace: Allow configuring default for perf_event_attr.inheritArnaldo Carvalho de Melo
I.f. if children should inherit the parent perf_event configuration, i.e. if we should trace children as well or just the parent. The default is to follow children, to disable this and have a behaviour similar to strace, set this config option or use the --no_inherit 'perf trace' option. E.g.: Default: # perf config trace.no_inherit # trace -e clone,*sleep time sleep 1 0.000 time/21107 clone(clone_flags: CHILD_CLEARTID|CHILD_SETTID|0x11, newsp: 0, child_tidptr: 0x7f7b8f9ae810) = 21108 (time) ? time/21108 ... [continued]: clone() 0.691 sleep/21108 nanosleep(rqtp: 0x7ffed01d0540, rmtp: 0 ) = 0 0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 1988maxresident)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps # Disable it: # trace -e clone,*sleep time sleep 1 0.000 clone(clone_flags: CHILD_CLEARTID|CHILD_SETTID|0x11, newsp: 0, child_tidptr: 0x7ff41e100810) = 21414 (time) 0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 1964maxresident)k 0inputs+0outputs (0major+76minor)pagefaults 0swaps # Notice that since there is just one thread, the "comm/TID" column is suppressed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-thd8s16pagyza71ufi5vjlan@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf config: Show the configuration when no arguments are providedArnaldo Carvalho de Melo
More convenient thah having to recall what letter is about showing/listing/dumping the configuration, i.e. no arguments means -l/--list: # perf config llvm.dump-obj=true trace.default_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o trace.show_zeros=yes trace.show_duration=no # perf config -l llvm.dump-obj=true trace.default_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o trace.show_zeros=yes trace.show_duration=no # perf config -h Usage: perf config [<file-option>] [options] [section.name[=value] ...] -l, --list show current config variables --system use system config file --user use user config file # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-z2n63avz6tliqb5gmu4l1dti@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf trace: Allow configuring if the syscall duration should be printedArnaldo Carvalho de Melo
# perf config trace.show_duration=no # perf config -l | grep trace trace.default_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o trace.show_zeros=true trace.show_duration=no # trace -e *sleep sleep 1 0.000 sleep/8729 nanosleep(rqtp: 0x7ffcb0b4c940, rmtp: 0) = 0 # perf config trace.show_duration=yes # trace -e *sleep sleep 1 0.000 (1000.212 ms): sleep/8735 nanosleep(rqtp: 0x7ffca15fa770, rmtp: 0) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-2c7h1m8fhzb9puxtj9nlevi8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf trace: Allow configuring if zeroed syscall args should be printedArnaldo Carvalho de Melo
The default so far, since we show argument names followed by its values, was to make the output more compact by suppressing most zeroed args. Make this configurable so that users can choose what best suit their needs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-q0gxws02ygodh94o0hzim5xd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf trace: Allow specifying a set of events to add in perfconfigArnaldo Carvalho de Melo
To add augmented_raw_syscalls to the events speficied by the user, or be the only one if no events were specified by the user, one can add this to perfconfig: # cat ~/.perfconfig [trace] add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o # I.e. pre-compile the augmented_raw_syscalls.c BPF program and make it always load, this way: # perf trace -e open* cat /etc/passwd > /dev/null 0.000 ( 0.013 ms): cat/31557 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3 0.035 ( 0.007 ms): cat/31557 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3 0.353 ( 0.009 ms): cat/31557 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3 0.424 ( 0.006 ms): cat/31557 openat(dfd: CWD, filename: /etc/passwd) = 3 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-0lgj7vh64hg3ce44gsmvj7ud@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf augmented_raw_syscalls: Do not include stdio.hArnaldo Carvalho de Melo
We're not using that puts() thing, and thus we don't need to define the __bpf_stdout__ map, reducing the setup time. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-3452xgatncpil7v22minkwbo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf cs-etm: Generate branch sample for exception packetLeo Yan
The exception packet appears as one element with 'elem_type' == OCSD_GEN_TRC_ELEM_EXCEPTION or OCSD_GEN_TRC_ELEM_EXCEPTION_RET, which is present for exception entry and exit respectively. The decoder sets the packet fields 'packet->exc' and 'packet->exc_ret' to indicate the exception packets; but exception packets don't have a dedicated sample type and shares the same sample type CS_ETM_RANGE with normal instruction packets. As a result, the exception packets are taken as normal instruction packets and this introduces confusion in mixing different packet types. Furthermore, these instruction range packets will be processed for branch samples only when 'packet->last_instr_taken_branch' is true, otherwise they will be omitted, this can introduce a mess for exception and exception returning due to not having the complete address range info for context switching. To process exception packets properly, this patch introduces two new sample types: CS_ETM_EXCEPTION and CS_ETM_EXCEPTION_RET; these two types of packets will be handled by cs_etm__exception(). The function cs_etm__exception() forces setting the previous CS_ETM_RANGE packet flag 'prev_packet->last_instr_taken_branch' to true, this matches well with the program flow when the exception is trapped from user space to kernel space, no matter if the most recent flow has branch taken or not; this is also safe for returning to user space after exception handling. After exception packets have their own sample type, the packet fields 'packet->exc' and 'packet->exc_ret' aren't needed anymore, so remove them. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1544513908-16805-9-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf cs-etm: Treat EO_TRACE element as trace discontinuityLeo Yan
If the decoder outputs an EO_TRACE element, it means the end of the trace buffer; this is a discontinuity and in this case the end of trace data needs to be saved. This patch generates a CS_ETM_DISCONTINUITY packet for the EO_TRACE element hereby flushing the end of trace data in cs-etm.c. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1544513908-16805-8-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf cs-etm: Treat NO_SYNC element as trace discontinuityLeo Yan
The CoreSight tracer driver might insert barrier packets between different buffers, thus the decoder can spot the boundaries based on the barrier packet; it is possible for the decoder to hit a barrier packet and emit a NO_SYNC element, then the decoder will find a periodic synchronisation point inside that next trace block that starts the trace again but does not have the TRACE_ON element as indicator - usually because this trace block has wrapped the buffer so we have lost the original point when the trace was enabled. In the first case it causes the insertion of a OCSD_GEN_TRC_ELEM_NO_SYNC in the middle of the tracing stream, but as we were not handling the NO_SYNC element properly this ends up making users miss the discontinuity indications. Though OCSD_GEN_TRC_ELEM_NO_SYNC is different from CS_ETM_TRACE_ON when output from the decoder, both indicate that the trace data is discontinuous; this patch treats OCSD_GEN_TRC_ELEM_NO_SYNC as a trace discontinuity and generates a CS_ETM_DISCONTINUITY packet for it, so cs-etm can handle the discontinuity for this case, finally it saves the last trace data for the previous trace block and restart samples for the new block. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1544513908-16805-7-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf cs-etm: Rename CS_ETM_TRACE_ON to CS_ETM_DISCONTINUITYLeo Yan
TRACE_ON element is used at the beginning of trace, it also can be appeared in the middle of trace data to indicate discontinuity; for example, it's possible to see multiple TRACE_ON elements in the trace stream if the trace is being limited by address range filtering. Furthermore, except TRACE_ON element is for discontinuity, NO_SYNC and EO_TRACE also can be used to indicate discontinuity, though they are used for different scenarios for which the trace is interrupted. This patch renames sample type CS_ETM_TRACE_ON to CS_ETM_DISCONTINUITY, firstly the new name describes more closely the purpose of the packet; secondly this is a preparation for other output elements which also cause the trace discontinuity thus they can share the same one packet type. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1544513908-16805-6-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf cs-etm: Refactor enumeration cs_etm_sample_typeLeo Yan
The values in enumeration cs_etm_sample_type are defined with setting bit N for each packet type, this is not suggested in the usual case. This patch refactor cs_etm_sample_type by converting from bit shifting values to continuous numbers. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1544513908-16805-5-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf cs-etm: Remove unused 'trace_on' in cs_etm_decoderLeo Yan
cs_etm_decoder::trace_on is being assigned when TRACE_ON or NO_SYNC element is coming, but it is never used hence it is redundant and can be removed. So let's remove 'trace_on' field from cs_etm_decoder struct. Suggested-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1544513908-16805-4-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf cs-etm: Avoid stale branch samples when flush packetLeo Yan
At the end of trace buffer handling, function cs_etm__flush() is invoked to flush any remaining branch stack entries. As a side effect, it also generates branch sample, because the 'etmq->packet' doesn't contains any new coming packet but point to one stale packet after packets swapping, so it wrongly makes synthesize branch samples with stale packet info. We could review below detailed flow which causes issue: Packet1: start_addr=0xffff000008b1fbf0 end_addr=0xffff000008b1fbfc Packet2: start_addr=0xffff000008b1fb5c end_addr=0xffff000008b1fb6c step 1: cs_etm__sample(): sample: ip=(0xffff000008b1fbfc-4) addr=0xffff000008b1fb5c step 2: flush packet in cs_etm__run_decoder(): cs_etm__run_decoder() `-> err = cs_etm__flush(etmq, false); sample: ip=(0xffff000008b1fb6c-4) addr=0xffff000008b1fbf0 Packet1 and packet2 are two continuous packets, when packet2 is the new coming packet, cs_etm__sample() generates branch sample for these two packets and use [packet1::end_addr - 4 => packet2::start_addr] as branch jump flow, thus we can see the first generated branch sample in step 1. At the end of cs_etm__sample() it swaps packets so 'etm->prev_packet'= packet2 and 'etm->packet'=packet1, so far it's okay for branch sample. If packet2 is the last one packet in trace buffer, even there have no any new coming packet, cs_etm__run_decoder() invokes cs_etm__flush() to flush branch stack entries as expected, but it also generates branch samples by taking 'etm->packet' as a new coming packet, thus the branch jump flow is as [packet2::end_addr - 4 => packet1::start_addr]; this is the second sample which is generated in step 2. So actually the second sample is a stale sample and we should not generate it. This patch introduces a new function cs_etm__end_block(), at the end of trace block this function is invoked to only flush branch stack entries and thus can avoid to generate branch sample for stale packet. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1544513908-16805-3-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf cs-etm: Correct packets swapping in cs_etm__flush()Leo Yan
The structure cs_etm_queue uses 'prev_packet' to point to previous packet, this can be used to combine with new coming packet to generate samples. In function cs_etm__flush() it swaps packets only when the flag 'etm->synth_opts.last_branch' is true, this means that it will not swap packets if without option '--itrace=il' to generate last branch entries; thus for this case the 'prev_packet' doesn't point to the correct previous packet and the stale packet still will be used to generate sequential sample. Thus if dump trace with 'perf script' command we can see the incorrect flow with the stale packet's address info. This patch corrects packets swapping in cs_etm__flush(); except using the flag 'etm->synth_opts.last_branch' it also checks the another flag 'etm->sample_branches', if any flag is true then it swaps packets so can save correct content to 'prev_packet'. Finally this can fix the wrong program flow dumping issue. The patch has a minor refactoring to use 'etm->synth_opts.last_branch' instead of 'etmq->etm->synth_opts.last_branch' for condition checking, this is consistent with that is done in cs_etm__sample(). Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Walker <robert.walker@arm.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1544513908-16805-2-git-send-email-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf trace: Switch to using a struct for the aumented_raw_syscalls syscalls ↵Arnaldo Carvalho de Melo
map values We'll start adding more perf-syscall stuff, so lets do this prep step so that the next ones are just about adding more fields. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-vac4sn1ns1vj4y07lzj7y4b8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf augmented_syscalls: Switch to using a struct for the syscalls map valuesArnaldo Carvalho de Melo
We'll start adding more perf-syscall stuff, so lets do this prep step so that the next ones are just about adding more fields. Run it with the .c file once to cache the .o file: # trace --filter-pids 2834,2199 -e openat,augmented_raw_syscalls.c LLVM: dumping augmented_raw_syscalls.o 0.000 ( 0.021 ms): tmux: server/4952 openat(dfd: CWD, filename: /proc/5691/cmdline ) = 11 349.807 ( 0.040 ms): DNS Res~er #39/11082 openat(dfd: CWD, filename: /etc/hosts, flags: CLOEXEC ) = 44 4988.759 ( 0.052 ms): gsd-color/2431 openat(dfd: CWD, filename: /etc/localtime ) = 18 4988.976 ( 0.029 ms): gsd-color/2431 openat(dfd: CWD, filename: /etc/localtime ) = 18 ^C[root@quaco bpf]# From now on, we can use just the newly built .o file, skipping the compilation step for a faster startup: # trace --filter-pids 2834,2199 -e openat,augmented_raw_syscalls.o 0.000 ( 0.046 ms): DNS Res~er #39/11088 openat(dfd: CWD, filename: /etc/hosts, flags: CLOEXEC ) = 44 1946.408 ( 0.190 ms): systemd/1 openat(dfd: CWD, filename: /proc/1071/cgroup, flags: CLOEXEC ) = 20 1946.792 ( 0.215 ms): systemd/1 openat(dfd: CWD, filename: /proc/954/cgroup, flags: CLOEXEC ) = 20 ^C# Now on to do the same in the builtin-trace.c side of things. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-k8mwu04l8es29rje5loq9vg7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf bpf: Move perf_event_output() from stdio.h to bpf.hArnaldo Carvalho de Melo
So that we don't always carry that __bpf_output__ map, leaving that to the scripts wanting to use that facility. 'perf trace' will be changed to look if that map is present and only setup the bpf-output events if so. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-azwys8irxqx9053vpajr0k5h@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18perf trace: Implement syscall filtering in augmented_syscallsArnaldo Carvalho de Melo
Just another map, this time an BPF_MAP_TYPE_ARRAY, stating with one bool per syscall, stating if it should be filtered or not. So, with a pre-built augmented_raw_syscalls.o file, we use: # perf trace -e open*,augmented_raw_syscalls.o 0.000 ( 0.016 ms): DNS Res~er #37/29652 openat(dfd: CWD, filename: /etc/hosts, flags: CLOEXEC ) = 138 187.039 ( 0.048 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /etc/fstab, flags: CLOEXEC ) = 11 187.348 ( 0.041 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /proc/self/mountinfo, flags: CLOEXEC ) = 11 188.793 ( 0.036 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /proc/self/mountinfo, flags: CLOEXEC ) = 11 189.803 ( 0.029 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /proc/self/mountinfo, flags: CLOEXEC ) = 11 190.774 ( 0.027 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /proc/self/mountinfo, flags: CLOEXEC ) = 11 284.620 ( 0.149 ms): DataStorage/3076 openat(dfd: CWD, filename: /home/acme/.mozilla/firefox/ina67tev.default/SiteSecurityServiceState.txt, flags: CREAT|TRUNC|WRONLY, mode: IRUGO|IWUSR|IWGRP) = 167 ^C# What is it that this gsd-housekeeping thingy needs to open /proc/self/mountinfo four times periodically? :-) This map will be extended to tell per-syscall parameters, i.e. how many bytes to copy per arg, using the function signature to get the types and then the size of those types, via BTF. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-cy222g9ucvnym3raqvxp0hpg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>