Age | Commit message (Collapse) | Author |
|
bond_3ad_bind_slave() calls ad_initialize_port() and then immediately
assigns correct values making some of that initialization unnecessary.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch breaks the rich assignments into it's own statements
and removes some duplicate code where admin-key, & oper-key are
updated.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: 79b16aadea32cce ("udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb().")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The socket parameter might legally be NULL, thus sock_net is sometimes
causing a NULL pointer dereference. Using net_device pointer in dst_entry
is more reliable.
Fixes: b6a7719aedd7e5c ("ipv4: hash net ptr into fragmentation bucket selection")
Reported-by: Rick Jones <rick.jones2@hp.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit d63e2e1f3df904bf6bd150bdafb42ddbb3257ea8.
David Ahern reported that d63e2e1f3df9 breaks booting on an 8-socket T5
sparc system. He also verified that the system boots with d63e2e1f3df9
reverted. Yinghai has some fixes, but they need a little more polishing
than we can do before v4.0.
Link: http://lkml.kernel.org/r/5514391F.2030300@oracle.com # report
Link: http://lkml.kernel.org/r/1427857069-6789-1-git-send-email-yinghai@kernel.org # patches
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.19+
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)
- Improve 'perf sched replay' on high CPU core count machines (Yunlong Song)
- Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
events (Arnaldo Carvalho de Melo)
- Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
- Respect -i option 'in perf kmem' (Jiri Olsa)
Infrastructure changes:
- Honor operator priority in libtraceevent (Namhyung Kim)
- Merge all perf_event_attr print functions (Peter Zijlstra)
- Check kmaps access to make code more robust (Wang Nan)
- Fix inverted logic in perf_mmap__empty() (He Kuang)
- Fix ARM 32 'perf probe' building error (Wang Nan)
- Fix perf_event_attr tests (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add compatible string definitions and supported pin functions.
Signed-off-by: Ivan T. Ivanov <ivan.ivanov@linaro.org>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Booting a v3.18 or newer Xen domU kernel with PCI devices passed through
results in an oops (this is a 32-bit 3.13.11 dom0 with a 64-bit 4.4.0
hypervisor and 32-bit domU):
BUG: unable to handle kernel paging request at 0030303e
IP: [<c06ed0e6>] acpi_ns_validate_handle+0x12/0x1a
Call Trace:
[<c06eda4d>] ? acpi_evaluate_object+0x31/0x1fc
[<c06b78e1>] ? pci_get_hp_params+0x111/0x4e0
[<c0407bc7>] ? xen_force_evtchn_callback+0x17/0x30
[<c04085fb>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<c0699d34>] ? pci_device_add+0x24/0x450
Don't look for ACPI configuration information if ACPI has been disabled.
I don't think this is the best fix, because we can boot plain Linux (no
Xen) with "acpi=off", and we don't need this check in pci_get_hp_params().
There should be a better fix that would make Xen domU work the same way.
The domU kernel has ACPI support but it has no AML. There should be a way
to initialize the ACPI data structures so things fail gracefully rather
than oopsing. This is an interim fix to address the regression.
Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96301
Reported-by: Michael D Labriola <mlabriol@gdeb.com>
Tested-by: Michael D Labriola <mlabriol@gdeb.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.18+
|
|
Add a enum_map file in the tracing directory to see what enums have been
saved to convert in the print fmt files.
As this requires the enum mapping to be persistent in memory, it is only
created if the new config option CONFIG_TRACE_ENUM_MAP_FILE is enabled.
This is for debugging and will increase the persistent memory footprint
of the kernel.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The enums used in tracepoints for __print_symbolic() do not have their
values shown in the tracepoint format files and this makes it difficult
for user space tools to convert the binary values to the strings they
are to represent.
Add TRACE_DEFINE_ENUM(x) macros to export the enum names to their values
to make the tracing output from user space tools more robust.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Add an userdata set extension and allow the user to attach arbitrary
data to set elements. This is intended to hold TLV encoded data like
comments or DNS annotations that have no meaning to the kernel.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add a new "dynset" expression for dynamic set updates.
A new set op ->update() is added which, for non existant elements,
invokes an initialization callback and inserts the new element.
For both new or existing elements the extenstion pointer is returned
to the caller to optionally perform timer updates or other actions.
Element removal is not supported so far, however that seems to be a
rather exotic need and can be added later on.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently a set binding is assumed to be related to a lookup and, in
case of maps, a data load.
In order to use bindings for set updates, the loop detection checks
must be restricted to map operations only. Add a flags member to the
binding struct to hold the set "action" flags such as NFT_SET_MAP,
and perform loop detection based on these.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use atomic operations for the element count to avoid races with async
updates.
To properly handle the transactional semantics during netlink updates,
deleted but not yet committed elements are accounted for seperately and
are treated as being already removed. This means for the duration of
a netlink transaction, the limit might be exceeded by the amount of
elements deleted. Set implementations must be prepared to handle this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The NFT_SET_TIMEOUT flag is ignore in nft_select_set_ops, which may
lead to selection of a set implementation that doesn't actually
support timeouts.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Enums used by tracepoints for __print_symbolic() are shown in the
tracepoint format files with just their names and not their values.
This makes it difficult for user space tools to know how to convert the
binary data into their string representations.
By adding the use of TRACE_DEFINE_ENUM(), the enum names will be mapped
to their values and shown in the tracing file system to let tools
convert the data as necessary.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Replace occurences of the pci api by appropriate call to the dma api.
A simplified version of the semantic patch that finds this problem is as
follows: (http://coccinelle.lip6.fr)
@deprecated@
idexpression id;
position p;
@@
(
pci_dma_supported@p ( id, ...)
|
pci_alloc_consistent@p ( id, ...)
)
@bad1@
idexpression id;
position deprecated.p;
@@
...when != &id->dev
when != pci_get_drvdata ( id )
when != pci_enable_device ( id )
(
pci_dma_supported@p ( id, ...)
|
pci_alloc_consistent@p ( id, ...)
)
@depends on !bad1@
idexpression id;
expression direction;
position deprecated.p;
@@
(
- pci_dma_supported@p ( id,
+ dma_supported ( &id->dev,
...
+ , GFP_ATOMIC
)
|
- pci_alloc_consistent@p ( id,
+ dma_alloc_coherent ( &id->dev,
...
+ , GFP_ATOMIC
)
)
Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
nf_bridge_info->mask is used for several things, for example to
remember if skb->pkt_type was set to OTHER_HOST.
For a bridge, OTHER_HOST is expected case. For ip forward its a non-starter
though -- routing expects PACKET_HOST.
Bridge netfilter thus changes OTHER_HOST to PACKET_HOST before hook
invocation and then un-does it after hook traversal.
This information is irrelevant outside of br_netfilter.
After this change, ->mask now only contains flags that need to be
known outside of br_netfilter in fast-path.
Future patch changes mask into a 2bit state field in sk_buff, so that
we can remove skb->nf_bridge pointer for good and consider all remaining
places that access nf_bridge info content a not-so fastpath.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
->mask is a bit info field that mixes various use cases.
In particular, we have flags that are mutually exlusive, and flags that
are only used within br_netfilter while others need to be exposed to
other parts of the kernel.
Remove BRNF_8021Q/PPPoE flags. They're mutually exclusive and only
needed within br_netfilter context.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Don't access skb->nf_bridge directly, this pointer will be removed soon.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Avoid skb->nf_bridge accesses where possible.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
right now we store this in the nf_bridge_info struct, accessible
via skb->nf_bridge. This patch prepares removal of this pointer from skb:
Instead of using skb->nf_bridge->x, we use helpers to obtain the in/out
device (or ifindexes).
Followup patches to netfilter will then allow nf_bridge_info to be
obtained by a call into the br_netfilter core, rather than keeping a
pointer to it in sk_buff.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
br_netfilter maintains an extra state, nf_bridge_info, which is attached
to skb via skb->nf_bridge pointer.
Amongst other things we use skb->nf_bridge->data to store the original
mac header for every processed skb.
This is required for ip refragmentation when using conntrack
on top of bridge, because ip_fragment doesn't copy it from original skb.
However there is no need anymore to do this unconditionally.
Move this to the one place where its needed -- when br_netfilter calls
ip_fragment().
Also switch to percpu storage for this so we can handle fragmenting
without accessing nf_bridge meta data.
Only user left is neigh resolution when DNAT is detected, to hold
the original source mac address (neigh resolution builds new mac header
using bridge mac), so rename ->data and reduce its size to whats needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
match
Currently in xt_socket, we take advantage of early demuxed sockets
since commit 00028aa37098 ("netfilter: xt_socket: use IP early demux")
in order to avoid a second socket lookup in the fast path, but we
only make partial use of this:
We still unnecessarily parse headers, extract proto, {s,d}addr and
{s,d}ports from the skb data, accessing possible conntrack information,
etc even though we were not even calling into the socket lookup via
xt_socket_get_sock_{v4,v6}() due to skb->sk hit, meaning those cycles
can be spared.
After this patch, we only proceed the slower, manual lookup path
when we have a skb->sk miss, thus time to match verdict for early
demuxed sockets will improve further, which might be i.e. interesting
for use cases such as mentioned in 681f130f39e1 ("netfilter: xt_socket:
add XT_SOCKET_NOWILDCARD flag").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently, the driver uses handle_simple_irq for all IRQ types and hard
codes the acknowledge for different IRQ types into the handler. It is
better to use the IRQ core as intended and let it handle the differences
between the various types of IRQ. For example the current system does
not work for threaded level triggered IRQs as these need to be masked
until the threaded handler has run.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
quotad periodically syncs in-memory quotas to the ondisk quota file
and sets the QDF_REFRESH flag so that a subsequent read of a synced
quota is re-read from disk.
gfs2_quota_lock() checks for this flag and sets a 'force' bit to
force re-read from disk if requested. However, there is a race
condition here. It is possible for gfs2_quota_lock() to find the
QDF_REFRESH flag unset (i.e force=0) and quotad comes in immediately
after and syncs the relevant quota and sets the QDF_REFRESH flag.
gfs2_quota_lock() resumes with force=0 and uses the stale in-memory
quota usage values that result in miscalculations.
This patch fixes this race by moving the check for the QDF_REFRESH
flag check further out into the gfs2_quota_lock() process, i.e, in
do_glock(), under the protection of the quota glock.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Reported by 0-day test infrastructure.
Fixes: ecaa1f6a0154 ("vfio-pci: Add VGA arbiter client")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Adding 'I' event modifier to have complete set of modifiers for
perf_event_attr:exclude_* bits.
Any event specified with 'I' modifier will have the
perf_event_attr:exclude_idle bit set.
$ perf record -e cycles:I -vv ls 2>&1 | grep exclude_idle
exclude_hv 0 exclude_idle 1
Adding automated tests.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: William Cohen <wcohen@redhat.com>
Link: http://lkml.kernel.org/r/1428441919-23099-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
report__warn_kptr_restrict() calls map__kmap(kernel_map) before checking
kernel_map againest NULL.
Which is dangerous, since map__kmap() will return a invalid and not NULL
address.
It will trigger a warning message in map__kmap() after the patch "perf:
kmaps: enforce usage of kmaps to protect futher bugs." was applied.
This patch fixes it by adding the missing checking.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1428490772-135393-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Following commit:
1a5941312414 perf: Add wakeup watermark control to the AUX area
enlarged perf_event_attr, but did not updated attr tests.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: http://lkml.kernel.org/n/20150407171715.GA22603@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit 9b118acae310f57baee770b5db402500d8695e50 ("perf probe: Fix to
handle aliased symbols in glibc") uses an absolute format '%lx' to
print u64 argument, which causes compiling error on ARM 32.
This patch replaces it with PRIx64.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1428459274-138470-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The enums used in the tracepoints for __print_symbolic() have their
names shown in the tracepoint format files. User space tools do not know
how to convert those names into their values to be able to convert the
binary data.
Use TRACE_DEFINE_ENUM() to export the enum names to their values for
userspace to do the parsing correctly.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The enums used in tracepoints with __print_symbolic() have their
names shown in the tracepoint format files and not their values.
This makes it difficult for user space tools to convert the binary
data to the strings as user space does not know what those enums
are about.
By having them use TRACE_DEFINE_ENUM(), the names of the enums will
be mapped to the values and shown to user space.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The enums used by the softirq mapping is what is shown in the output
of the __print_symbolic() and not their values, that are needed
to map them to their strings. Export them to userspace with the
TRACE_DEFINE_ENUM() macro so that user space tools can map the enums
with their values.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The tracepoints that use __print_symbolic() use enums as the value
to convert to strings. Unfortunately, the format files for these
tracepoints show the enum name and not their value. This causes some
userspace tools not to know how to convert __print_symbolic() to
their strings.
Add TRACE_DEFINE_ENUM() macros to export the enums used to userspace
to let those tools know what those enum values are.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The tracepoints in the 9p code use a lot of enums for the __print_symbolic()
function. These enums are shown in the tracepoint format files, and user
space tools such as trace-cmd does not have the information to parse it.
Add helper macros to export the enums with TRACE_DEFINE_ENUM().
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Have the enums used in __print_symbolic() by the trace_tlb_flush()
tracepoint exported to userpace such that they can be parsed by
userspace tools.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Dave Hansen <dave@sr71.net>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Document the use of TRACE_DEFINE_ENUM() by adding enums to the
trace-event-sample.h and using this macro to convert them in the format
files.
Also update the comments and sho the use of __print_symbolic() and
__print_flags() as well as adding comments abount __print_array().
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM()
will have those enums converted into their values in the tracepoint
print fmt strings.
Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.au
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Several tracepoints use the helper functions __print_symbolic() or
__print_flags() and pass in enums that do the mapping between the
binary data stored and the value to print. This works well for reading
the ASCII trace files, but when the data is read via userspace tools
such as perf and trace-cmd, the conversion of the binary value to a
human string format is lost if an enum is used, as userspace does not
have access to what the ENUM is.
For example, the tracepoint trace_tlb_flush() has:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
Which maps the enum values to the strings they represent. But perf and
trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
not be able to map it.
With TRACE_DEFINE_ENUM(), developers can place these in the event header
files and ftrace will convert the enums to their values:
By adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
$ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
[...]
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })
The above is what userspace expects to see, and tools do not need to
be modified to parse them.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Guilherme Cox <cox@computer.org>
Cc: Tony Luck <tony.luck@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Add documentation about TRACE_SYSTEM needing to be alpha-numeric or with
underscores, and that if it is not, then the use of TRACE_SYSTEM_VAR is
required to make something that is.
An example of this is shown in samples/trace_events/trace-events-sample.h
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Normally the compiler will use the same pointer for a string throughout
the file. But there's no guarantee of that happening. Later changes will
require that all events have the same pointer to the system string.
Name the system string and have all events point to it.
Testing this, it did not increases the size of the text, except for the
notes section, which should not harm the real size any.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Every tracing file must have its own TRACE_SYSTEM defined.
The brcmsmac tracepoint header broke this and added in the middle
of the file:
#undef TRACE_SYSTEM
#define TRACE_SYSTEM brcmsmac
#undef TRACE_SYSTEM
#define TRACE_SYSTEM brcmsmac_tx
#undef TRACE_SYSTEM
#define TRACE_SYSTEM brcmsmac_msg
Unfortunately, this broke new code in the ftrace infrastructure.
Moving each of these TRACE_SYSTEMs into their own trace file with
just one TRACE_SYSTEM per file fixes the issue.
Link: http://lkml.kernel.org/r/5524D99C.1050902@broadcom.com
Acked-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Every tracing file must have its own TRACE_SYSTEM defined.
The iwlwifi tracepoint header broke this and added in the middle
of the file:
#undef TRACE_SYSTEM
#define TRACE_SYSTEM iwlwifi_io
#undef TRACE_SYSTEM
#define TRACE_SYSTEM iwlwifi_ucode
#undef TRACE_SYSTEM
#define TRACE_SYSTEM iwlwifi_msg
#undef TRACE_SYSTEM
#define TRACE_SYSTEM iwlwifi_data
#undef TRACE_SYSTEM
#define TRACE_SYSTEM iwlwifi
Unfortunately, this broke new code in the ftrace infrastructure.
Moving each of these TRACE_SYSTEMs into their own trace file with
just one TRACE_SYSTEM per file fixes the issue.
Link: http://lkml.kernel.org/r/1428479094.2809.3.camel@sipsolutions.net
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Currently there's 3 (that I found) different and incomplete
implementations of printing perf_event_attr.
This is quite silly. Merge the lot.
While this patch does not retain the exact form all printing that I
found is debug output and thus it should not be critical.
Also, I cannot find a single print_event_desc() caller.
Pre:
$ perf record -vv -e cycles -- sleep 1
------------------------------------------------------------
perf_event_attr:
type 0
size 104
config 0
sample_period 4000
sample_freq 4000
sample_type 0x107
read_format 0
disabled 1 inherit 1
pinned 0 exclusive 0
exclude_user 0 exclude_kernel 0
exclude_hv 0 exclude_idle 0
mmap 1 comm 1
mmap2 1 comm_exec 1
freq 1 inherit_stat 0
enable_on_exec 1 task 1
watermark 0 precise_ip 0
mmap_data 0 sample_id_all 1
exclude_host 0 exclude_guest 1
excl.callchain_kern 0 excl.callchain_user 0
wakeup_events 0
wakeup_watermark 0
bp_type 0
bp_addr 0
config1 0
bp_len 0
config2 0
branch_sample_type 0
sample_regs_user 0
sample_stack_user 0
sample_regs_intr 0
------------------------------------------------------------
$ perf evlist -vv
cycles: sample_freq=4000, size: 104, sample_type: IP|TID|TIME|PERIOD,
disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, comm_exec: 1,
freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1
Post:
$ ./perf record -vv -e cycles -- sleep 1
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
------------------------------------------------------------
$ ./perf evlist -vv
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type:
IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq:
1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1,
mmap2: 1, comm_exec: 1
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150407091150.644238729@infradead.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Teach perf-record about the new perf_event_attr::{use_clockid, clockid}
fields. Add a simple parameter to set the clock (if any) to be used for
the events to be recorded into the data file.
Since we store the entire perf_event_attr in the EVENT_DESC section we
also already store the used clockid in the data file.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yunlong Song <yunlong.song@huawei.com>
Link: http://lkml.kernel.org/r/20150407154851.GR23123@twins.programming.kicks-ass.net
[ Conditionally define CLOCK_BOOTTIME, at least rhel6 doesn't have it - dsahern
Ditto for CLOCK_MONOTONIC_RAW, sles11sp2 doesn't have it - yunlong.song ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
instead of the default value 10
Since sched->replay_repeat is set to 10 as default, the sched->run_avg,
sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use
10 to calculate their value.
However, the replay_repeat can be changed to other value by using -r
option, so the calculation above should use replay_repeat to achieve
more accurate results instead of the default value 10.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-10-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Enable to use perf.data when it is not owned by current user or root.
Example:
$ ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data
$ sudo id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
$ sudo perf sched replay -f
run measurement overhead: 98 nsecs
sleep measurement overhead: 52909 nsecs
the run test took 1000015 nsecs
the sleep test took 1054253 nsecs
File perf.data not owned by current user or root (use -f to override)
As shown above, the -f option does not work at all.
After this patch:
$ sudo perf sched replay -f
run measurement overhead: 221 nsecs
sleep measurement overhead: 40514 nsecs
the run test took 1000003 nsecs
the sleep test took 1056098 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
...
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
------------------------------------------------------------
#1 : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18
#2 : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17
#3 : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48
#4 : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77
#5 : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43
#6 : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56
#7 : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62
#8 : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45
#9 : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87
#10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45
As shown above, the -f option really works now.
Besides for replay, -f option can also work for latency and map.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-9-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
maximum open files
The soft maximum number of open files for a calling process is 1024,
which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the
hard maximum number of open files for a calling process is 4096, which
is defined as INR_OPEN_MAX in include/uapi/linux/fs.h.
Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of
RLIMIT_NOFILE in include/asm-generic/resource.h.
And the soft maximum number finally decides the limitation of the
maximum files which are allowed to be opened.
That is to say a process can use at most 1024 file descriptors for its
o pened files, or an EMFILE error will happen.
This error can be fixed by increasing the soft maximum number, under the
constraint that the soft maximum number can not exceed the hard maximum
number, or both soft and hard maximum number should be increased
simultaneously with privilege.
For perf sched replay, it uses sys_perf_event_open to create the file
descriptor for each of the tasks in order to handle information of perf
events.
That is to say each task needs a unique file descriptor. In x86_64,
there may be over 1024 or 4096 tasks correspoinding to the record in
perf.data, which causes that no enough file descriptors can be used.
As a result, EMFILE error happens and stops the replay process. To solve
this problem, we adaptively increase the soft and hard maximum number of
open files with a '-f' option.
Example:
Test environment: x86_64 with 160 cores
$ cat /proc/sys/kernel/pid_max
163840
$ cat /proc/sys/fs/file-max
6815744
$ ulimit -Sn
1024
$ ulimit -Hn
4096
Before this patch:
$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
After this patch:
$ perf sched replay
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
Have a try with -f option
$ perf sched replay -f
...
task 1549 ( :163132: 163132), nr_events: 1
task 1550 ( :163540: 163540), nr_events: 1
task 1551 ( <unknown>: 0), nr_events: 10
------------------------------------------------------------
#1 : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21
#2 : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66
#3 : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99
#4 : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67
#5 : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96
#6 : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82
#7 : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39
#8 : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59
#9 : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65
#10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
fails for any task
Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem).
The sem_wait can continue only when work_done_sem is greater than 0, or
it will be blocked.
For perf sched replay, one task may sem_post the work_done_sem of
another task, which causes the work_done_sem of that task processed in a
reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...
This sequence simulates the sched process of the running tasks at the
time when perf sched record runs.
As a result, all the tasks are required and their threads must be
successfully created.
If any one (task A) of the tasks fails to create its thread, then
another task (task B), whose work_done_sem needs sem_post from that
failed task A, may likely block itself due to seg_wait.
And this is a dead halt, since task B's thread_func cannot continue at
all.
To solve this problem, perf sched replay should exit once any task fails
to create its thread.
Example:
Test environment: x86_64 with 160 cores
Before this patch:
$ perf sched replay
...
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
------------------------------------------------------------ <- dead halt
After this patch:
$ perf sched replay
...
task 1551 ( <unknown>: 0), nr_events: 10
Error: sys_perf_event_open() syscall returned with -1 (Too many open
files)
$
As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|