summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-01net: sched: introduce and use qstats read helpersPaolo Abeni
Classful qdiscs can't access directly the child qdiscs backlog length: if such qdisc is NOLOCK, per CPU values should be accounted instead. Most qdiscs no not respect the above. As a result, qstats fetching for most classful qdisc is currently incorrect: if the child qdisc is NOLOCK, it always reports 0 len backlog. This change introduces a pair of helpers to safely fetch both backlog and qlen and use them in stats class dumping functions, fixing the above issue and cleaning a bit the code. DRR needs also to access the child qdisc queue length, so it needs custom handling. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01cpufreq/intel_pstate: Load only on Intel hardwareBorislav Petkov
This driver is Intel-only so loading on anything which is not Intel is pointless. Prevent it from doing so. While at it, correct the "not supported" print statement to say CPU "model" which is what that test does. Fixes: 076b862c7e44 (cpufreq: intel_pstate: Add reasons for failure and debug messages) Suggested-by: Erwan Velu <e.velu@criteo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-01net/sched: fix ->get helper of the matchall clsNicolas Dichtel
It returned always NULL, thus it was never possible to get the filter. Example: $ ip link add foo type dummy $ ip link add bar type dummy $ tc qdisc add dev foo clsact $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \ matchall action mirred ingress mirror dev bar Before the patch: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall Error: Specified filter handle not found. We have an error talking to the kernel After: $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2 not_in_hw action order 1: mirred (Ingress Mirror to device bar) pipe index 1 ref 1 bind 1 CC: Yotam Gigi <yotamg@mellanox.com> CC: Jiri Pirko <jiri@mellanox.com> Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01signal: don't silently convert SI_USER signals to non-current pidfdJann Horn
The current sys_pidfd_send_signal() silently turns signals with explicit SI_USER context that are sent to non-current tasks into signals with kernel-generated siginfo. This is unlike do_rt_sigqueueinfo(), which returns -EPERM in this case. If a user actually wants to send a signal with kernel-provided siginfo, they can do that with pidfd_send_signal(pidfd, sig, NULL, 0); so allowing this case is unnecessary. Instead of silently replacing the siginfo, just bail out with an error; this is consistent with other interfaces and avoids special-casing behavior based on security checks. Fixes: 3eb39f47934f ("signal: add pidfd_send_signal() syscall") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-04-01dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errorsIlya Dryomov
Some devices don't use blk_integrity but still want stable pages because they do their own checksumming. Examples include rbd and iSCSI when data digests are negotiated. Stacking DM (and thus LVM) on top of these devices results in sporadic checksum errors. Set BDI_CAP_STABLE_WRITES if any underlying device has it set. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm: revert 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * ↵Mikulas Patocka
PAGE_SIZE") The limit was already incorporated to dm-crypt with commit 4e870e948fba ("dm crypt: fix error with too large bios"), so we don't need to apply it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is wrong anyway because the variable ti->max_io_len it is supposed to be in the units of 512-byte sectors not in bytes. Reduction of the limit to 1048576 sectors could even cause data corruption in rare cases - suppose that we have a dm-striped device with stripe size 768MiB. The target will call dm_set_target_max_io_len with the value 1572864. The buggy code would reduce it to 1048576. Now, the dm-core will errorneously split the bios on 1048576-sector boundary insetad of 1572864-sector boundary and pass these stripe-crossing bios to the striped target. Cc: stable@vger.kernel.org # v4.16+ Fixes: 8f50e358153d ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm init: fix const confusion for dm_allowed_targets arrayAndi Kleen
A non const pointer to const cannot be marked initconst. Mark the array actually const. Fixes: 6bbc923dfcf5 dm: add support to directly boot to a mapped device Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm integrity: make dm_integrity_init and dm_integrity_exit staticYueHaibing
Fix sparse warnings: drivers/md/dm-integrity.c:3619:12: warning: symbol 'dm_integrity_init' was not declared. Should it be static? drivers/md/dm-integrity.c:3638:6: warning: symbol 'dm_integrity_exit' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01dm integrity: change memcmp to strncmp in dm_integrity_ctrMikulas Patocka
If the string opt_string is small, the function memcmp can access bytes that are beyond the terminating nul character. In theory, it could cause segfault, if opt_string were located just below some unmapped memory. Change from memcmp to strncmp so that we don't read bytes beyond the end of the string. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-01cifs: a smb2_validate_and_copy_iov failure does not mean the handle is invalid.Ronnie Sahlberg
It only means that we do not have a valid cached value for the file_all_info structure. CC: Stable <stable@vger.kernel.org> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-04-01SMB3: Allow persistent handle timeout to be configurable on mountSteve French
Reconnecting after server or network failure can be improved (to maintain availability and protect data integrity) by allowing the client to choose the default persistent (or resilient) handle timeout in some use cases. Today we default to 0 which lets the server pick the default timeout (usually 120 seconds) but this can be problematic for some workloads. Add the new mount parameter to cifs.ko for SMB3 mounts "handletimeout" which enables the user to override the default handle timeout for persistent (mount option "persistenthandles") or resilient handles (mount option "resilienthandles"). Maximum allowed is 16 minutes (960000 ms). Units for the timeout are expressed in milliseconds. See section 2.2.14.2.12 and 2.2.31.3 of the MS-SMB2 protocol specification for more information. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org>
2019-04-01smb3: Fix enumerating snapshots to AzureSteve French
Some servers (see MS-SMB2 protocol specification section 3.3.5.15.1) expect that the FSCTL enumerate snapshots is done twice, with the first query having EXACTLY the minimum size response buffer requested (16 bytes) which refreshes the snapshot list (otherwise that and subsequent queries get an empty list returned). So had to add code to set the maximum response size differently for the first snapshot query (which gets the size needed for the second query which contains the actual list of snapshots). Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> # 4.19+
2019-04-01cifs: fix kref underflow in close_shroot()Ronnie Sahlberg
Fix a bug where we used to not initialize the cached fid structure at all in open_shroot() if the open was successful but we did not get a lease. This would leave the structure uninitialized and later when we close the handle we would in close_shroot() try to kref_put() an uninitialized refcount. Fix this by always initializing this structure if the open was successful but only do the extra get() if we got a lease. This extra get() is only used to hold the structure until we get a lease break from the server at which point we will kref_put() it during lease processing. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org>
2019-04-01i40e: add tracking of AF_XDP ZC state for each queue pairBjörn Töpel
In commit f3fef2b6e1cc ("i40e: Remove umem from VSI") a regression was introduced; When the VSI was reset, the setup code would try to enable AF_XDP ZC unconditionally (as long as there was a umem placed in the netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a certain queue pair has been "zero-copy enabled" via the ndo_bpf. The bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if XDP is enabled, the corresponding qid in the bitmap is set and the umem is non-NULL. Fixes: f3fef2b6e1cc ("i40e: Remove umem from VSI") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-04-01perf vendor events intel: Update Silvermont to v14Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update GoldmontPlus to v1.01Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update Goldmont to v13Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update Bonnell to V4Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update KnightsLanding events to v9Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update Haswell events to v28Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update IvyBridge events to v21Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update SandyBridge events to v16Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update JakeTown events to v20Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update IvyTown events to v20Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update HaswellX events to v20Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update BroadwellX events to v14Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update SkylakeX events to v1.12Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update Skylake events to v42Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update Broadwell-DE events to v7Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update Broadwell events to v23Andi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf vendor events intel: Update metrics from TMAM 3.5Andi Kleen
Update all the Intel JSON metrics from Ahmad Yasin's TMAM 3.5 for Intel big core from Sandy Bridge to Cascade Lake. This has many improvements and new metircs - New TopDownL1_SMT group that provides a per SMT thread version of --topdown that does not require -a anymore. The drawback is increased multiplexing though since L1 TopDown does not fit into 4 generic counters anymore. - Added SMT aware versions of other metrics - Split SMT aware metrics into separate metrics to avoid unnecessary event collections - New metrics for better branch analysis: Estimated Branch_Mispredict_Costs, Instructions per taken Branch, Branch Instructions per Taken Branch, etc. - Instruction mix metrics: Instructions per load, Instructions per store, Instructions per Branch, Instructions per Call - New Cache metrics: Bandwidth to L1/L2/L3 caches. L1/L2/L3 misses per kilo instructions. memory level parallelism - New memory controller metrics: Normalized memory bandwidth in interval mode, Average memory latency, Average number of parallel read requests, - 3DXP persistent memory metrics for Cascade Lake: 3dxp read latency, 3dxp read/write bandwidth - Some other useful metrics like Instruction Level Parallelism, - Various other improvements. Not all metrics are available on all CPUs. Skylake has best coverage. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf record: Implement --mmap-flush=<number> optionAlexey Budankov
Implement a --mmap-flush option that specifies minimal number of bytes that is extracted from mmaped kernel buffer to store into a trace. The default option value is 1 byte what means every time trace writing thread finds some new data in the mmaped buffer the data is extracted, possibly compressed and written to a trace. $ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc $ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc The option is independent from -z setting, doesn't vary with compression level and can serve two purposes. The first purpose is to increase the compression ratio of a trace data. Larger data chunks are compressed more effectively so the implemented option allows specifying data chunk size to compress. Also at some cases executing more write syscalls with smaller data size can take longer than executing less write syscalls with bigger data size due to syscall overhead so extracting bigger data chunks specified by the option value could additionally decrease runtime overhead. The second purpose is to avoid self monitoring live-lock issue in system wide (-a) profiling mode. Profiling in system wide mode with compression (-a -z) can additionally induce data into the kernel buffers along with the data from monitored processes. If performance data rate and volume from the monitored processes is high then trace streaming and compression activity in the tool is also high. High tool process activity can lead to subtle live-lock effect when compression of single new byte from some of mmaped kernel buffer leads to generation of the next single byte at some mmaped buffer. So perf tool process ends up in endless self monitoring. Implemented synch parameter is the mean to force data move independently from the specified flush threshold value. Despite the provided flush value the tool needs capability to unconditionally drain memory buffers, at least in the end of the collection. Committer testing: Running with the default value, i.e. as soon as there is something to read go on consuming, we first write the synthesized events, small chunks of about 128 bytes: # perf trace -m 2048 --call-graph dwarf -e write -- perf record <SNIP> 101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120 __libc_write (/usr/lib64/libpthread-2.28.so) ion (/home/acme/bin/perf) record__write (inlined) process_synthesized_event (/home/acme/bin/perf) perf_tool__process_synth_event (inlined) perf_event__synthesize_mmap_events (/home/acme/bin/perf) Then we move to reading the mmap buffers consuming the events put there by the kernel perf infrastructure: 107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336 __libc_write (/usr/lib64/libpthread-2.28.so) ion (/home/acme/bin/perf) record__write (inlined) record__pushfn (/home/acme/bin/perf) perf_mmap__push (/home/acme/bin/perf) record__mmap_read_evlist (inlined) record__mmap_read_all (inlined) __cmd_record (inlined) cmd_record (/home/acme/bin/perf) 12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984 <SNIP same backtrace as in the 107.561 timestamp> 12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816 <SNIP same backtrace as in the 107.561 timestamp> 12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832 <SNIP same backtrace as in the 107.561 timestamp> If we limit it to write only when more than 16MB are available for reading, it throttles that to a quarter of the --mmap-pages set for 'perf record', which by default get to 528384 bytes, found out using 'record -v': mmap flush: 132096 mmap size 528384B With that in place all the writes coming from record__mmap_read_evlist(), i.e. from the mmap buffers setup by the kernel perf infrastructure were at least 132096 bytes long. Trying with a bigger mmap size: perf trace -e write perf record -v -m 2048 --mmap-flush 16M 74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888 74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256 74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232 74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032 74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520 74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688 74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760 Was again limited to a quarter of the mmap size: mmap flush: 2098176 mmap size 8392704B A warning about that would be good to have but can be added later, something like: "max flush is a quarter of the mmap size, if wanting to bump the mmap flush further, bump the mmap size as well using -m/--mmap-pages" Also rename the 'sync' parameters to 'synch' to keep tools/perf building with older glibcs: cc1: warnings being treated as errors builtin-record.c: In function 'record__mmap_read_evlist': builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration /usr/include/unistd.h:933: warning: shadowed declaration is here builtin-record.c: In function 'record__mmap_read_all': builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration /usr/include/unistd.h:933: warning: shadowed declaration is here Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools build: Implement libzstd feature check, LIBZSTD_DIR and NO_LIBZSTD definesAlexey Budankov
Implement libzstd feature check, NO_LIBZSTD and LIBZSTD_DIR defines to override Zstd library sources or disable the feature from the command line: $ make -C tools/perf LIBZSTD_DIR=/path/to/zstd/sources/ clean all $ make -C tools/perf NO_LIBZSTD=1 clean all Auto detection feature status is reported just before compilation starts. If your system has some version of the zstd library preinstalled then the build system finds and uses it during the build. If you still prefer to compile with some other version of zstd library you have capability to refer the compilation to that version using LIBZSTD_DIR define. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/9b4cd8b0-10a3-1f1e-8d6b-5922a7ca216b@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Rename input arguments and local variables of ↵Tzvetomir Stoyanov
libtraceevent from pevent to tep "pevent" to "tep" renaming of: - all "pevent" input arguments of libtraceevent internal functions. - all local "pevent" variables of libtraceevent. This makes the implementation consistent with the chosen naming convention, tep (trace event parser), and will avoid any confusion with the old pevent name Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/linux-trace-devel/20190401132111.13727-5-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164344.944953447@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf tools, tools lib traceevent: Rename "pevent" member of struct ↵Tzvetomir Stoyanov
tep_event_filter to "tep" The member "pevent" of the struct tep_event_filter is renamed to "tep". This makes the struct consistent with the chosen naming convention: tep (trace event parser), instead of the old pevent. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/linux-trace-devel/20190401132111.13727-4-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164344.785896189@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf tools, tools lib traceevent: Rename "pevent" member of struct tep_event ↵Tzvetomir Stoyanov
to "tep" The member "pevent" of the struct tep_event is renamed to "tep". This makes the struct consistent with the chosen naming convention: tep (trace event parser), instead of the old pevent. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/linux-trace-devel/20190401132111.13727-3-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164344.627724996@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Rename input arguments of libtraceevent APIs from ↵Tzvetomir Stoyanov
pevent to tep Input arguments of libtraceevent APIs are renamed from "struct tep_handle *pevent" to "struct tep_handle *tep". This makes the API consistent with the chosen naming convention: tep (trace event parser), instead of the old pevent. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/linux-trace-devel/20190401132111.13727-2-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164344.465573837@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools tools, tools lib traceevent: Make traceevent APIs more consistentTzvetomir Stoyanov
Rename some traceevent APIs for consistency: tep_pid_is_registered() to tep_is_pid_registered() tep_file_bigendian() to tep_is_file_bigendian() to make the names and return values consistent with other tep_is_... APIs tep_data_lat_fmt() to tep_data_latency_format() to make the name more descriptive tep_host_bigendian() to tep_is_bigendian() tep_set_host_bigendian() to tep_set_local_bigendian() tep_is_host_bigendian() to tep_is_local_bigendian() "host" can be confused with VMs, and "local" is about the local machine. All tep_is_..._bigendian(struct tep_handle *tep) APIs return the saved data in the tep handle, while tep_is_bigendian() returns the running machine's endianness. All tep_is_... functions are modified to return bool value, instead of int. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20190327141946.4353-2-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164344.288624897@goodmis.org [ Removed some extra parenthesis around return statements ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Remove call to exit() from tep_filter_add_filter_str()Tzvetomir Stoyanov
This patch removes call to exit() from tep_filter_add_filter_str(). A library function should not force the application to exit. In the current implementation tep_filter_add_filter_str() calls exit() when a special "test_filters" mode is set, used only for debugging purposes. When this mode is set and a filter is added - its string is printed to the console and exit() is called. This patch changes the logic - when in "test_filters" mode, the filter string is still printed, but the exit() is not called. It is up to the application to track when "test_filters" mode is set and to call exit, if needed. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20190326154328.28718-9-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164344.121717482@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Remove tep filter trivial APIsTzvetomir Stoyanov
This patch removes trivial filter tep APIs: enum tep_filter_trivial_type tep_filter_event_has_trivial() tep_update_trivial() tep_filter_clear_trivial() Trivial filters is an optimization, used only in the first version of KernelShark. The API is deprecated, the next KernelShark release does not use it. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20190326154328.28718-4-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164343.968458918@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Removed unneeded !! and return parenthesisSteven Rostedt (VMware)
As return is not a function we do not need parenthesis around the return value. Also, a function returning bool does not need to add !!. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com> Link: http://lkml.kernel.org/r/20190401164343.817886725@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Implement new traceevent APIs for accessing struct ↵Tzvetomir Stoyanov
tep_handler fields As struct tep_handler definition is not exposed as part of libtraceevent API, its fields cannot be accessed directly by the library users. This patch implements new APIs, which can be used to access the struct tep_handler fields: tep_get_event() - retrieves an event pointer at a specific index tep_get_first_event() - is modified to use tep_get_event() tep_clear_flag() - clears a tep handle flag tep_test_flag() - test if a given flag is set tep_get_header_timestamp_size() - returns the size of the timestamp stored in the header. tep_get_cpus() - returns the number of CPUs tep_is_old_format() - returns true if data was created by an older kernel with the old data format tep_set_print_raw() - have the output print in the raw format tep_set_test_filters() - debugging utility for testing tep filters Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/linux-trace-devel/20190325145017.30246-4-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164343.679629539@goodmis.org [ Renamed some newly added "pevent" to "tep" ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Coding style fixesTzvetomir Stoyanov
Fixed few coding style problems in event-parse-api.c Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/linux-trace-devel/20190325145017.30246-3-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164343.537086316@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Change description of few APIsTzvetomir Stoyanov
APIs descriptions should describe the purpose of the function, its parameters and return value. While working on man pages implementation, I noticed mismatches in the descriptions of few APIs. This patch changes the description of these APIs, making them consistent with the man pages: - tep_print_num_field() - tep_print_func_field() - tep_get_header_page_size() - tep_get_long_size() - tep_set_long_size() - tep_get_page_size() - tep_set_page_size() Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/linux-trace-devel/20190325145017.30246-2-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164343.396759247@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Add more debugging to see various internal ring buffer ↵Steven Rostedt (Red Hat)
entries When trace-cmd report --debug is set, show the internal ring buffer entries like time-extends and padding. This requires adding new kbuffer API to retrieve these items. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com> Link: http://lkml.kernel.org/r/20190401164343.257591565@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Implement a new API, tep_list_events_copy()Tzvetomir Stoyanov
Existing API tep_list_events() is not thread safe, it uses the internal array sort_events to keep cache of the sorted events and reuses it. This patch implements a new API, tep_list_events_copy(), which allocates new sorted array each time it is called. It could be used when a sorted events functionality is needed in thread safe use cases. It is up to the caller to free the array. Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/linux-trace-devel/20181218133013.31094-1-tstoyanov@vmware.com Link: http://lkml.kernel.org/r/20190401164343.117437443@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Add mono clocks to be parsed in secondsSteven Rostedt (VMware)
The mono clocks can display in seconds instead of whole numbers: trace-cmd-521 [001] 99176715281005: sched_waking: comm=kworker/u16:2 pid=32118 prio=120 target_cpu=002 trace-cmd-521 [001] 99176715286349: sched_wake_idle_without_ipi: cpu=2 trace-cmd-521 [001] 99176715288047: sched_wakeup: kworker/u16:2:32118 [120] success=1 CPU:002 trace-cmd-521 [001] 99176715290022: sched_waking: comm=trace-cmd pid=523 prio=120 target_cpu=000 trace-cmd-521 [001] 99176715292332: sched_wake_idle_without_ipi: cpu=0 trace-cmd-521 [001] 99176715292855: sched_wakeup: trace-cmd:523 [120] success=1 CPU:000 trace-cmd-521 [001] 99176715300697: sched_stat_runtime: comm=trace-cmd pid=521 runtime=80233 [ns] vruntime=66705540554 [ns Break it up in seconds: trace-cmd-521 [001] 99176.715281: sched_waking: comm=kworker/u16:2 pid=32118 prio=120 target_cpu=002 trace-cmd-521 [001] 99176.715286: sched_wake_idle_without_ipi: cpu=2 trace-cmd-521 [001] 99176.715288: sched_wakeup: kworker/u16:2:32118 [120] success=1 CPU:002 trace-cmd-521 [001] 99176.715290: sched_waking: comm=trace-cmd pid=523 prio=120 target_cpu=000 trace-cmd-521 [001] 99176.715292: sched_wake_idle_without_ipi: cpu=0 trace-cmd-521 [001] 99176.715293: sched_wakeup: trace-cmd:523 [120] success=1 CPU:000 trace-cmd-521 [001] 99176.715301: sched_stat_runtime: comm=trace-cmd pid=521 runtime=80233 [ns] vruntime=66705540554 [ns] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com> Link: http://lkml.kernel.org/r/20190401164342.976554023@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01tools lib traceevent: Handle trace_printk() "%px"Steven Rostedt (VMware)
With security updates, %p in the kernel is hashed to protect true kernel locations. But trace_printk() is not allowed in production systems, and when a pointer is used, there are many times that the actual address is needed. "%px" produces the real address. But libtraceevent does not know how to handle that extension. Add it. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com> Link: http://lkml.kernel.org/r/20190401164342.837312153@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf list: Output tool eventsAndi Kleen
Add support in 'perf list' to output tool internal events, currently only 'duration_time'. Committer testing: $ perf list dur* List of pre-defined events (to be used in -e): duration_time [Tool event] Metric Groups: $ perf list sw List of pre-defined events (to be used in -e): alignment-faults [Software event] bpf-output [Software event] context-switches OR cs [Software event] cpu-clock [Software event] cpu-migrations OR migrations [Software event] dummy [Software event] emulation-faults [Software event] major-faults [Software event] minor-faults [Software event] page-faults OR faults [Software event] task-clock [Software event] duration_time [Tool event] $ perf list | grep duration duration_time [Tool event] [L1D miss outstandings duration in cycles] page walk duration are excluded in Skylake] load. EPT page walk duration are excluded in Skylake] page walk duration are excluded in Skylake] store. EPT page walk duration are excluded in Skylake] (instruction fetch) request. EPT page walk duration are excluded in instruction fetch request. EPT page walk duration are excluded in $ Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190326221823.11518-5-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf evsel: Support printing evsel name for 'duration_time'Andi Kleen
Implement printing the correct name for duration_time Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190326221823.11518-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>