Age | Commit message (Collapse) | Author |
|
Same as for Spectrum-1, test the ability to add the maximum number of
routes possible to the switch.
Invoke the test from the 'resource_scale' wrapper script.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The helper mlxsw_sp_ipip_dev_ul_tb_id() determines the underlay VRF of a
GRE tunnel. For a tunnel without a bound device, it uses the same VRF that
the tunnel is in. However in Linux, a GRE tunnel without a bound device
uses the main VRF as the underlay. Fix the function accordingly.
mlxsw further assumed that moving a tunnel to a different VRF could cause
conflict in local tunnel endpoint address, which cannot be offloaded.
However, the only way that an underlay could be changed by moving the
tunnel device itself is if the tunnel device does not have a bound device.
But in that case the underlay is always the main VRF, so there is no
opportunity to introduce a conflict by moving such device. Thus this check
constitutes a dead code, and can be removed, which do.
Fixes: 6ddb7426a7d4 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case of errors in unlink_clip_vcc, the logging level is set to
pr_crit but failures in clip_setentry are handled by pr_err().
The patch changes the severity consistent across invocations.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jesper Dangaard says:
====================
page_pool: followup changes to restore tracepoint features
This patchset is a followup to Jonathan patch, that do not release
pool until inflight == 0. That changed page_pool to be responsible for
its own delayed destruction instead of relying on xdp memory model.
As the page_pool maintainer, I'm promoting the use of tracepoint to
troubleshoot and help driver developers verify correctness when
converting at driver to use page_pool. The role of xdp:mem_disconnect
have changed, which broke my bpftrace tools for shutdown verification.
With these changes, the same capabilities are regained.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MM tracepoint for page free (called kmem:mm_page_free) doesn't provide
the page pointer directly, instead it provides the PFN (Page Frame Number).
This is annoying when writing a page_pool leak detector in BPF.
This patch change page_pool tracepoints to also provide the PFN.
The page pointer is still provided to allow other kinds of
troubleshooting from BPF.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When Jonathan change the page_pool to become responsible to its
own shutdown via deferred work queue, then the disconnect_cnt
counter was removed from xdp memory model tracepoint.
This patch change the page_pool_inflight tracepoint name to
page_pool_release, because it reflects the new responsability
better. And it reintroduces a counter that reflect the number of
times page_pool_release have been tried.
The counter is also used by the code, to only empty the alloc
cache once. With a stuck work queue running every second and
counter being 64-bit, it will overrun in approx 584 billion
years. For comparison, Earth lifetime expectancy is 7.5 billion
years, before the Sun will engulf, and destroy, the Earth.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When looking at the details I realised that the memory poison in
__xdp_mem_allocator_rcu_free doesn't make sense. This is because the
SLUB allocator uses the first 16 bytes (on 64 bit), for its freelist,
which overlap with members in struct xdp_mem_allocator, that were
updated. Thus, SLUB already does the "poisoning" for us.
I still believe that poisoning memory make sense in other cases.
Kernel have gained different use-after-free detection mechanism, but
enabling those is associated with a huge overhead. Experience is that
debugging facilities can change the timing so much, that that a race
condition will not be provoked when enabled. Thus, I'm still in favour
of poisoning memory where it makes sense.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently match clause 45 PHYs using any ID read from a MMD marked
as present in the "Devices in package" registers 5 and 6. However,
this is incorrect. 45.2 says:
"The definition of the term package is vendor specific and could be
a chip, module, or other similar entity."
so a package could be more or less than the whole PHY - a PHY could be
made up of several modules instantiated onto a single chip such as the
Marvell 88x3310, or some of the MMDs could be disabled according to
chip configuration, such as the Broadcom 84881.
In the case of Broadcom 84881, the "Devices in package" registers
contain 0xc000009b, meaning that there is a PHYXS present in the
package, but all registers in MMD 4 return 0xffff. This leads to our
matching code incorrectly binding this PHY to one of our generic PHY
drivers.
This patch changes the way we determine whether to attempt to match a
MMD identifier, or use it to request a module - if the identifier is
all-ones, then we skip over it. When reading the identifiers, we
initialise phydev->c45_ids.device_ids to all-ones, only reading the
device ID if the "Devices in package" registers indicates we should.
This avoids the generic drivers incorrectly matching on a PHY ID of
0xffffffff.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King says:
====================
Add support for SFPs behind PHYs
This series adds partial support for SFP cages connected to PHYs,
specifically optical SFPs.
We add core infrastructure to phylib for this, and arrange for
minimal code in the PHY driver - currently, this is code to verify
that the module is one that we can support for Marvell 10G PHYs.
v2: add yaml binding patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for SFP+ cages to the Marvell 10G PHY driver. This is
slightly complicated by the way phylib works in that we need to use
a multi-step process to attach the SFP bus, and we also need to track
the phylink state machine to know when the module's transmit disable
signal should change state.
With appropriate DT changes, this allows the SFP+ canges on the
Macchiatobin platform to be functional.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add core phylib help for supporting SFP sockets on PHYs. This provides
a mechanism to inform the SFP layer about PHY up/down events, and also
unregister the SFP bus when the PHY is going away.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Document the missing sfp property for ethernet controllers (which
has existed for some time) which is being extended to ethernet PHYs.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Wildcard support for the net,iface set from Kristian Evensen.
2) Offload support for matching on the input interface.
3) Simplify matching on vlan header fields.
4) Add nft_payload_rebuild_vlan_hdr() function to rebuild the vlan
header from the vlan sk_buff metadata.
5) Pass extack to nft_flow_cls_offload_setup().
6) Add C-VLAN matching support.
7) Use time64_t in xt_time to fix y2038 overflow, from Arnd Bergmann.
8) Use time_t in nft_meta to fix y2038 overflow, also from Arnd.
9) Add flow_action_entry_next() helper function to flowtable offload
infrastructure.
10) Add IPv6 support to the flowtable offload infrastructure.
11) Support for input interface matching from postrouting,
from Phil Sutter.
12) Missing check for ndo callback in flowtable offload, from wenxu.
13) Remove conntrack parameter from flow_offload_fill_dir(), from wenxu.
14) Do not pass flow_rule object for rule removal, cookie is sufficient
to achieve this.
15) Release flow_rule object in case of error from the offload commit
path.
16) Undo offload ruleset updates if transaction fails.
17) Check for error when binding flowtable callbacks, from wenxu.
18) Always unbind flowtable callbacks when unregistering hooks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After previous patches removing bdev being passed around to set it to
bio, it has become unused in submit_extent_page. So it now has "only" 13
parameters.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We can now remove the bdev from extent_map. Previous patches made sure
that bio_set_dev is correctly in all places and that we don't need to
grab it from latest_bdev or pass it around inside the extent map.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
bio_set_dev sets a bdev to a bio and is not only setting a pointer bug
also changing some state bits if there was a different bdev set before.
This is one thing that's not needed.
Another thing is that setting a bdev at bio allocation time is too early
and actually does not work with plain redundancy profiles, where each
time we submit a bio to a device, the bdev is set correctly.
In many places the bio bdev is set to latest_bdev that seems to serve as
a stub pointer "just to put something to bio". But we don't have to do
that.
Where do we know which bdev to set:
* for regular IO: submit_stripe_bio that's called by btrfs_map_bio
* repair IO: repair_io_failure, read or write from specific device
* super block write (using buffer_heads but uses raw bdev) and barriers
* scrub: this does not use all regular IO paths as it needs to reach all
copies, verify and fixup eventually, and for that all bdev management
is independent
* raid56: rbio_add_io_page, for the RMW write
* integrity-checker: does it's own low-level block tracking
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This is preparatory patch to remove @bdev parameter from
submit_extent_page. It can't be removed completely, because the cgroups
need it for wbc when initializing the bio
wbc_init_bio
bio_associate_blkg_from_css
dereference bdev->bi_disk->queue
The bdev pointer is the same as latest_bdev, thus no functional change.
We can retrieve it from fs_devices that's reachable through several
dereferences. The local variable shadows the parameter, but that's only
temporary.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Record the first event parsing error and report. Implementing feedback
from Jiri Olsa:
https://lkml.org/lkml/2019/10/28/680
An example error is:
$ tools/perf/perf stat -e c/c/
WARNING: multiple event parsing errors
event syntax error: 'c/c/'
\___ unknown term
valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore
Initial error:
event syntax error: 'c/c/'
\___ Cannot find PMU `c'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20191116074652.9960-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Trace a magic number as immediate value if the target variable is not
found at some probe points which is based on one probe event.
This feature is good for the case if you trace a source code line with
some local variables, which is compiled into several instructions and
some of the variables are optimized out on some instructions.
Even if so, with this feature, perf probe trace a magic number instead
of such disappeared variables and fold those probes on one event.
E.g. without this patch:
# perf probe -D "pud_page_vaddr pud"
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
Failed to find 'pud' in this function.
p:probe/pud_page_vaddr _text+23480787 pud=%ax:x64
p:probe/pud_page_vaddr _text+23808453 pud=%bp:x64
p:probe/pud_page_vaddr _text+23558082 pud=%ax:x64
p:probe/pud_page_vaddr _text+328373 pud=%r8:x64
p:probe/pud_page_vaddr _text+348448 pud=%bx:x64
p:probe/pud_page_vaddr _text+23816818 pud=%bx:x64
With this patch:
# perf probe -D "pud_page_vaddr pud" | head
spurious_kernel_fault is blacklisted function, skip it.
vmalloc_fault is blacklisted function, skip it.
p:probe/pud_page_vaddr _text+23480787 pud=%ax:x64
p:probe/pud_page_vaddr _text+149051 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+23808453 pud=%bp:x64
p:probe/pud_page_vaddr _text+315926 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+23807209 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+23557365 pud=%ax:x64
p:probe/pud_page_vaddr _text+314097 pud=%di:x64
p:probe/pud_page_vaddr _text+314015 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+313893 pud=\deade12d:x64
p:probe/pud_page_vaddr _text+324083 pud=\deade12d:x64
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406476931.24476.6261475888681844285.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support DW_AT_const_value for variable assignment instead of location.
Note that this requires ftrace supporting immediate value.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406476012.24476.16096289871757175775.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support multiprobe event if the event is based on function and lines and
kernel supports it. In this case, perf probe creates the first probe
with an event, and tries to append following probes on that event, since
those probes must be on the same source code line.
Before this patch;
# perf probe -a vfs_read:18
Added new events:
probe:vfs_read_L18 (on vfs_read:18)
probe:vfs_read_L18_1 (on vfs_read:18)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_read_L18_1 -aR sleep 1
#
After this patch (on multiprobe supported kernel)
# perf probe -a vfs_read:18
Added new events:
probe:vfs_read_L18 (on vfs_read:18)
probe:vfs_read_L18 (on vfs_read:18)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_read_L18 -aR sleep 1
#
Committer testing:
On a kernel that doesn't support multiprobe events, after this patch:
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# grep append /sys/kernel/debug/tracing/README
be modified by appending '.descending' or '.ascending' to a
can be modified by appending any of the following modifiers
#
# perf probe -a vfs_read:18
Added new events:
probe:vfs_read_L18 (on vfs_read:18)
probe:vfs_read_L18_1 (on vfs_read:18)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_read_L18_1 -aR sleep 1
# perf probe -l
probe:vfs_read_L18 (on vfs_read:18@fs/read_write.c)
probe:vfs_read_L18_1 (on vfs_read:18@fs/read_write.c)
#
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406475010.24476.586290752591512351.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Generate event name from function name with line number as
<function>_L<line_number>. Note that this is only for the new event
which is defined by the line number of function (except for line 0).
If there is another event on same line, you have to use
"-f" option. In that case, the new event has "_1" suffix.
e.g.
# perf probe -a kernel_read:2
Added new event:
probe:kernel_read_L2 (on kernel_read:2)
You can now use it in all perf tools, such as:
perf record -e probe:kernel_read_L2 -aR sleep 1
But if we omit the line number or 0th line, it will
have no suffix.
# perf probe -a kernel_read:0
Added new event:
probe:kernel_read (on kernel_read)
You can now use it in all perf tools, such as:
perf record -e probe:kernel_read -aR sleep 1
probe:kernel_read (on kernel_read@linux-5.0.0/fs/read_write.c)
probe:kernel_read_L2 (on kernel_read:2@linux-5.0.0/fs/read_write.c)
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406474026.24476.2828897745502059569.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since perf probe -L shows non representive lines, it can be mislead
users where user can put probes. This prevents to show such non
representive lines so that user can understand which lines user can
probe.
# perf probe -L kernel_read
<kernel_read@/build/linux-pvZVvI/linux-5.0.0/fs/read_write.c:0>
0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
{
2 mm_segment_t old_fs;
ssize_t result;
old_fs = get_fs();
6 set_fs(get_ds());
/* The cast to a user pointer is valid due to the set_fs() */
8 result = vfs_read(file, (void __user *)buf, count, pos);
9 set_fs(old_fs);
10 return result;
}
EXPORT_SYMBOL(kernel_read);
Committer testing:
Before:
# perf probe -L kernel_read
<kernel_read@/usr/src/debug/kernel-5.3.fc30/linux-5.3.8-200.fc30.x86_64/fs/read_write.c:0>
0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
1 {
2 mm_segment_t old_fs;
3 ssize_t result;
5 old_fs = get_fs();
6 set_fs(KERNEL_DS);
/* The cast to a user pointer is valid due to the set_fs() */
8 result = vfs_read(file, (void __user *)buf, count, pos);
9 set_fs(old_fs);
10 return result;
}
EXPORT_SYMBOL(kernel_read);
#
See the 1, 3, 5 lines? They shouldn't be there, after this patch:
# perf probe -L kernel_read
<kernel_read@/usr/src/debug/kernel-5.3.fc30/linux-5.3.8-200.fc30.x86_64/fs/read_write.c:0>
0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
{
2 mm_segment_t old_fs;
ssize_t result;
old_fs = get_fs();
6 set_fs(KERNEL_DS);
/* The cast to a user pointer is valid due to the set_fs() */
8 result = vfs_read(file, (void __user *)buf, count, pos);
9 set_fs(old_fs);
10 return result;
}
EXPORT_SYMBOL(kernel_read);
#
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406473064.24476.2913278267727587314.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Verify user given probe line is a representive line (which doesn't share
the address with other lines or the line is the least line among the
lines which shares same address), and if not, it shows what is the
representive line.
Without this fix, user can put a probe on the lines which is not a a
representive line. But since this is not a representive line, perf probe
-l shows a representive line number instead of user given line number.
e.g. (put kernel_read:3, but listed as kernel_read:2)
# perf probe -a kernel_read:3
Added new event:
probe:kernel_read (on kernel_read:3)
You can now use it in all perf tools, such as:
perf record -e probe:kernel_read -aR sleep 1
# perf probe -l
probe:kernel_read (on kernel_read:2@linux-5.0.0/fs/read_write.c)
With this fix, perf probe doesn't allow user to put a probe on a
representive line, and tell what is the representive line.
# perf probe -a kernel_read:3
This line is sharing the addrees with other lines.
Please try to probe at kernel_read:2 instead.
Error: Failed to add events.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406472071.24476.14915451439785001021.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The dwarf_getsrc_die() can return the line which is not a statement nor
the least line number among the lines which shares same address.
This can lead perf probe --list shows incorrect line number for probed
address.
To fix this, this introduces cu_getsrc_die() which returns only a
statement line and which is the least line number (we call it the
representive line for an address), and use it in cu_find_lineinfo().
Also, if the given address is the entry address of a real function,
cu_find_lineinfo() returns the function declared line number instead of
the start line number of the function body.
For example, without this change perf probe -l shows incorrect line as
below.
# perf probe -a kernel_read:2
Added new event:
probe:kernel_read (on kernel_read:2)
You can now use it in all perf tools, such as:
perf record -e probe:kernel_read -aR sleep 1
# perf probe -l
probe:kernel_read (on kernel_read:1@linux-5.0.0/fs/read_write.c)
With this fix, it shows correct line number as below;
# perf probe -l
probe:kernel_read (on kernel_read:2@linux-5.0.0/fs/read_write.c)
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: http://lore.kernel.org/lkml/157406471067.24476.17463149618465494448.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add to the opcode map the following instructions:
cldemote
tpause
umonitor
umwait
movdiri
movdir64b
enqcmd
enqcmds
encls
enclu
enclv
pconfig
wbnoinvd
For information about the instructions, refer Intel SDM May 2019
(325462-070US) and Intel Architecture Instruction Set Extensions
May 2019 (319433-037).
The instruction decoding can be tested using the perf tools'
"x86 instruction decoder - new instructions" test as folllows:
$ perf test -v "new " 2>&1 | grep -i cldemote
Decoded ok: 0f 1c 00 cldemote (%eax)
Decoded ok: 0f 1c 05 78 56 34 12 cldemote 0x12345678
Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%eax,%ecx,8)
Decoded ok: 0f 1c 00 cldemote (%rax)
Decoded ok: 41 0f 1c 00 cldemote (%r8)
Decoded ok: 0f 1c 04 25 78 56 34 12 cldemote 0x12345678
Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%rax,%rcx,8)
Decoded ok: 41 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%r8,%rcx,8)
$ perf test -v "new " 2>&1 | grep -i tpause
Decoded ok: 66 0f ae f3 tpause %ebx
Decoded ok: 66 0f ae f3 tpause %ebx
Decoded ok: 66 41 0f ae f0 tpause %r8d
$ perf test -v "new " 2>&1 | grep -i umonitor
Decoded ok: 67 f3 0f ae f0 umonitor %ax
Decoded ok: f3 0f ae f0 umonitor %eax
Decoded ok: 67 f3 0f ae f0 umonitor %eax
Decoded ok: f3 0f ae f0 umonitor %rax
Decoded ok: 67 f3 41 0f ae f0 umonitor %r8d
$ perf test -v "new " 2>&1 | grep -i umwait
Decoded ok: f2 0f ae f0 umwait %eax
Decoded ok: f2 0f ae f0 umwait %eax
Decoded ok: f2 41 0f ae f0 umwait %r8d
$ perf test -v "new " 2>&1 | grep -i movdiri
Decoded ok: 0f 38 f9 03 movdiri %eax,(%ebx)
Decoded ok: 0f 38 f9 88 78 56 34 12 movdiri %ecx,0x12345678(%eax)
Decoded ok: 48 0f 38 f9 03 movdiri %rax,(%rbx)
Decoded ok: 48 0f 38 f9 88 78 56 34 12 movdiri %rcx,0x12345678(%rax)
$ perf test -v "new " 2>&1 | grep -i movdir64b
Decoded ok: 66 0f 38 f8 18 movdir64b (%eax),%ebx
Decoded ok: 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%eax),%ecx
Decoded ok: 67 66 0f 38 f8 1c movdir64b (%si),%bx
Decoded ok: 67 66 0f 38 f8 8c 34 12 movdir64b 0x1234(%si),%cx
Decoded ok: 66 0f 38 f8 18 movdir64b (%rax),%rbx
Decoded ok: 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%rax),%rcx
Decoded ok: 67 66 0f 38 f8 18 movdir64b (%eax),%ebx
Decoded ok: 67 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i enqcmd
Decoded ok: f2 0f 38 f8 18 enqcmd (%eax),%ebx
Decoded ok: f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%eax),%ecx
Decoded ok: 67 f2 0f 38 f8 1c enqcmd (%si),%bx
Decoded ok: 67 f2 0f 38 f8 8c 34 12 enqcmd 0x1234(%si),%cx
Decoded ok: f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
Decoded ok: 67 f3 0f 38 f8 1c enqcmds (%si),%bx
Decoded ok: 67 f3 0f 38 f8 8c 34 12 enqcmds 0x1234(%si),%cx
Decoded ok: f2 0f 38 f8 18 enqcmd (%rax),%rbx
Decoded ok: f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%rax),%rcx
Decoded ok: 67 f2 0f 38 f8 18 enqcmd (%eax),%ebx
Decoded ok: 67 f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%eax),%ecx
Decoded ok: f3 0f 38 f8 18 enqcmds (%rax),%rbx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%rax),%rcx
Decoded ok: 67 f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: 67 f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i enqcmds
Decoded ok: f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
Decoded ok: 67 f3 0f 38 f8 1c enqcmds (%si),%bx
Decoded ok: 67 f3 0f 38 f8 8c 34 12 enqcmds 0x1234(%si),%cx
Decoded ok: f3 0f 38 f8 18 enqcmds (%rax),%rbx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%rax),%rcx
Decoded ok: 67 f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: 67 f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i encls
Decoded ok: 0f 01 cf encls
Decoded ok: 0f 01 cf encls
$ perf test -v "new " 2>&1 | grep -i enclu
Decoded ok: 0f 01 d7 enclu
Decoded ok: 0f 01 d7 enclu
$ perf test -v "new " 2>&1 | grep -i enclv
Decoded ok: 0f 01 c0 enclv
Decoded ok: 0f 01 c0 enclv
$ perf test -v "new " 2>&1 | grep -i pconfig
Decoded ok: 0f 01 c5 pconfig
Decoded ok: 0f 01 c5 pconfig
$ perf test -v "new " 2>&1 | grep -i wbnoinvd
Decoded ok: f3 0f 09 wbnoinvd
Decoded ok: f3 0f 09 wbnoinvd
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20191115135447.6519-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add to the "x86 instruction decoder - new instructions" test the following
instructions:
cldemote
tpause
umonitor
umwait
movdiri
movdir64b
enqcmd
enqcmds
encls
enclu
enclv
pconfig
wbnoinvd
For information about the instructions, refer Intel SDM May 2019
(325462-070US) and Intel Architecture Instruction Set Extensions
May 2019 (319433-037).
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20191115135447.6519-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently, with latest llvm trunk, selftest test_progs failed obj
file test_seg6_loop.o with the following error in verifier:
infinite loop detected at insn 76
The byte code sequence looks like below, and noted that alu32 has been
turned off by default for better generated codes in general:
48: w3 = 100
49: *(u32 *)(r10 - 68) = r3
...
; if (tlv.type == SR6_TLV_PADDING) {
76: if w3 == 5 goto -18 <LBB0_19>
...
85: r1 = *(u32 *)(r10 - 68)
; for (int i = 0; i < 100; i++) {
86: w1 += -1
87: if w1 == 0 goto +5 <LBB0_20>
88: *(u32 *)(r10 - 68) = r1
The main reason for verification failure is due to partial spills at
r10 - 68 for induction variable "i".
Current verifier only handles spills with 8-byte values. The above 4-byte
value spill to stack is treated to STACK_MISC and its content is not
saved. For the above example:
w3 = 100
R3_w=inv100 fp-64_w=inv1086626730498
*(u32 *)(r10 - 68) = r3
R3_w=inv100 fp-64_w=inv1086626730498
...
r1 = *(u32 *)(r10 - 68)
R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
fp-64=inv1086626730498
To resolve this issue, verifier needs to be extended to track sub-registers
in spilling, or llvm needs to enhanced to prevent sub-register spilling
in register allocation phase. The former will increase verifier complexity
and the latter will need some llvm "hacking".
Let us workaround this issue by declaring the induction variable as "long"
type so spilling will happen at non sub-register level. We can revisit this
later if sub-register spilling causes similar or other verification issues.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191117214036.1309510-1-yhs@fb.com
|
|
When run_kselftests.sh is run, it hangs after test_tc_tunnel.sh. The reason
is test_tc_tunnel.sh ensures the server ('nc -l') is run all the time,
starting it again every time it is expected to terminate. The exception is
the final client_connect: the server is not started anymore, which ensures
no process is kept running after the test is finished.
For a sit test, though, the script is terminated prematurely without the
final client_connect and the 'nc' process keeps running. This in turn causes
the run_one function in kselftest/runner.sh to hang forever, waiting for the
runaway process to finish.
Ensure a remaining server is terminated on cleanup.
Fixes: f6ad6accaa99 ("selftests/bpf: expand test_tc_tunnel with SIT encap")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/60919291657a9ee89c708d8aababc28ebe1420be.1573821780.git.jbenc@redhat.com
|
|
The actual test to run is test_xdping.sh, which is already in TEST_PROGS.
The xdping program alone is not runnable with 'make run_tests', it
immediatelly fails due to missing arguments.
Move xdping to TEST_GEN_PROGS_EXTENDED in order to be built but not run.
Fixes: cd5385029f1d ("selftests/bpf: measure RTT from xdp using xdping")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/4365c81198f62521344c2215909634407184387e.1573821726.git.jbenc@redhat.com
|
|
So we start with:
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */
/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
u32 prot; /* 44 4 */
u32 flags; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
u64 pgoff; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 reloc; /* 64 8 */
u32 maj; /* 72 4 */
u32 min; /* 76 4 */
u64 ino; /* 80 8 */
u64 ino_generation; /* 88 8 */
u64 (*map_ip)(struct map *, u64); /* 96 8 */
u64 (*unmap_ip)(struct map *, u64); /* 104 8 */
struct dso * dso; /* 112 8 */
refcount_t refcnt; /* 120 4 */
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 116, holes: 2, sum holes: 7 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* padding: 4 */
/* forced alignments: 1 */
} __attribute__((__aligned__(8)));
$
and 'flags' is seldom used when printing details about the map or with
the "cacheline" sort order, we can move them it to the second cacheline,
that will allow combining it with 'refcnt', that is only four bytes:
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */
/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
u32 prot; /* 44 4 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 maj; /* 64 4 */
u32 min; /* 68 4 */
u64 ino; /* 72 8 */
u64 ino_generation; /* 80 8 */
u64 (*map_ip)(struct map *, u64); /* 88 8 */
u64 (*unmap_ip)(struct map *, u64); /* 96 8 */
struct dso * dso; /* 104 8 */
refcount_t refcnt; /* 112 4 */
u32 flags; /* 116 4 */
/* size: 120, cachelines: 2, members: 17 */
/* sum members: 116, holes: 1, sum holes: 3 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* forced alignments: 1 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-2cdw3zlw1mkamaf7nqtdlxfi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The map->priv and map->erange_warned are seldom used, the first only in
tests/vmlinux-kallsyms.c, the later only when hist_entry__inc_addr_samples()
returns -ERANGE in 'perf top', which are really rare occasions, so make
them a bool bitfield.
This will open up space for other members on the first cacheline.
$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */
/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */
u32 prot; /* 44 4 */
u32 flags; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
u64 pgoff; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 reloc; /* 64 8 */
u32 maj; /* 72 4 */
u32 min; /* 76 4 */
u64 ino; /* 80 8 */
u64 ino_generation; /* 88 8 */
u64 (*map_ip)(struct map *, u64); /* 96 8 */
u64 (*unmap_ip)(struct map *, u64); /* 104 8 */
struct dso * dso; /* 112 8 */
refcount_t refcnt; /* 120 4 */
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 116, holes: 2, sum holes: 7 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* padding: 4 */
/* forced alignments: 1 */
} __attribute__((__aligned__(8)));
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-g5545pcq4ff0wr17tfb1piqt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since the execlists_active() is no longer protected by the
engine->active.lock, we need to protect the request pointer with RCU to
prevent it being freed as we evaluate whether or not we need to preempt.
Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191104090158.2959-2-chris@chris-wilson.co.uk
(cherry picked from commit 7d148635253328dda7cfe55d57e3c828e9564427)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 8eb4704b124cbd44f189709959137d77063ecfa1)
(cherry picked from commit 7e27238e149ce4f00d9cd801fe3aa0ea55e986a2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Testing with the new fsstress support for subvolumes uncovered a pretty
bad problem with rename exchange on subvolumes. We're modifying two
different subvolumes, but we only start the transaction on one of them,
so the other one is not added to the dirty root list. This is caught by
btrfs_cow_block() with a warning because the root has not been updated,
however if we do not modify this root again we'll end up pointing at an
invalid root because the root item is never updated.
Fix this by making sure we add the destination root to the trans list,
the same as we do with normal renames. This fixes the corruption.
Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Change to use SPI_CS_HIGH to support spi CS polarity setting
for chips support enhance_timing.
Signed-off-by: Luhua Xu <luhua.xu@mediatek.com>
Link: https://lore.kernel.org/r/1574053037-26721-2-git-send-email-luhua.xu@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
set_page_dirty says:
For pages with a mapping this should be done under the page lock
for the benefit of asynchronous memory errors who prefer a
consistent dirty state. This rule can be broken in some special
cases, but should be better not to.
Under those rules, it is only safe for us to use the plain set_page_dirty
calls for shmemfs/anonymous memory. Userptr may be used with real
mappings and so needs to use the locked version (set_page_dirty_lock).
However, following a try_to_unmap() we may want to remove the userptr and
so call put_pages(). However, try_to_unmap() acquires the page lock and
so we must avoid recursively locking the pages ourselves -- which means
that we cannot safely acquire the lock around set_page_dirty(). Since we
can't be sure of the lock, we have to risk skip dirtying the page, or
else risk calling set_page_dirty() without a lock and so risk fs
corruption.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203317
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112012
Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl")
References: cb6d7c7dc7ff ("drm/i915/userptr: Acquire the page lock around set_page_dirty()")
References: 505a8ec7e11a ("Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()"")
References: 6dcc693bc57f ("ext4: warn when page is dirtied without buffers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191111133205.11590-1-chris@chris-wilson.co.uk
(cherry picked from commit 0d4bbe3d407f79438dc4f87943db21f7134cfc65)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit cee7fb437edcdb2f9f8affa959e274997f5dca4d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
We report "frequencies" (actual-frequency, requested-frequency) as the
number of accumulated cycles so that the average frequency over that
period may be determined by the user. This means the units we report to
the user are Mcycles (or just M), not MHz.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191109105356.5273-1-chris@chris-wilson.co.uk
(cherry picked from commit e88866ef02851c88fe95a4bb97820b94b4d46f36)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit a7d87b70d6da96c6772e50728c8b4e78e4cbfd55)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The LUTs are single buffered so in order to program them without
tearing we'd have to do it during vblank (actually to be 100%
effective it has to happen between start of vblank and frame start).
We have no proper mechanism for that at the moment so we just
defer loading them after the vblank waits have happened. That
is not quite sufficient (especially when committing multiple pipes
whose vblanks don't line up) so the LUT load will often leak into
the following frame causing tearing.
However in case the hardware wasn't previously using the LUT we
can preload it before setting the enable bit (which is double
buffered so won't tear). Let's determine if we can do such
preloading and make it happen. Slight variation between the
hardware requires some platforms specifics in the checks.
Hans is seeing ugly colored flash on VLV/CHV macchines (GPD win
and Asus T100HA) when the gamma LUT gets loaded for the first
time as the BIOS has left some junk in the LUT memory.
v2: Deal with uapi vs. hw crtc state split
s/GCM/CGM/ typo fix
Cc: Hans de Goede <hdegoede@redhat.com>
Fixes: 051a6d8d3ca0 ("drm/i915: Move LUT programming to happen after vblank waits")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191030190815.7359-1-ville.syrjala@linux.intel.com
Tested-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
(cherry picked from commit 0ccc42a2fd5107a7f58e62c8b35b61de9a70ce82)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit f77021372e2880237278e0ee57faadc077a8256a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Make sure we have a crtc before probing its primary plane's
max stride. Initially I thought we can't get this far without
crtcs, but looks like we can via the dumb_create ioctl.
Not sure if we shouldn't disable dumb buffer support entirely
when we have no crtcs, but that would require some amount of work
as the only thing currently being checked is dev->driver->dumb_create
which we'd have to convert to some device specific dynamic thing.
Cc: stable@vger.kernel.org
Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Fixes: aa5ca8b7421c ("drm/i915: Align dumb buffer stride to 4k to allow for gtt remapping")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191106172349.11987-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit baea9ffe64200033499a4955f431e315bb807899)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit aeec766133f99d45aad60d650de50fb382104d95)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When doing a device replace, while at scrub.c:scrub_enumerate_chunks(), we
set the block group to RO mode and then wait for any ongoing writes into
extents of the block group to complete. While doing that wait we overwrite
the value of the variable 'ret' and can break out of the loop if an error
happens without turning the block group back into RW mode. So what happens
is the following:
1) btrfs_inc_block_group_ro() returns 0, meaning it set the block group
to RO mode (its ->ro field set to 1 or incremented to some value > 1);
2) Then btrfs_wait_ordered_roots() returns a value > 0;
3) Then if either joining or committing the transaction fails, we break
out of the loop wihtout calling btrfs_dec_block_group_ro(), leaving
the block group in RO mode forever.
To fix this, just remove the code that waits for ongoing writes to extents
of the block group, since it's not needed because in the initial setup
phase of a device replace operation, before starting to find all chunks
and their extents, we set the target device for replace while holding
fs_info->dev_replace->rwsem, which ensures that after releasing that
semaphore, any writes into the source device are made to the target device
as well (__btrfs_map_block() guarantees that). So while at
scrub_enumerate_chunks() we only need to worry about finding and copying
extents (from the source device to the target device) that were written
before we started the device replace operation.
Fixes: f0e9b7d6401959 ("Btrfs: fix race setting block group readonly during device replace")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
When running btrfs/072 with only one online CPU, it has a pretty high
chance to fail:
btrfs/072 12s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//btrfs/072.dmesg)
- output mismatch (see xfstests-dev/results//btrfs/072.out.bad)
--- tests/btrfs/072.out 2019-10-22 15:18:14.008965340 +0800
+++ /xfstests-dev/results//btrfs/072.out.bad 2019-11-14 15:56:45.877152240 +0800
@@ -1,2 +1,3 @@
QA output created by 072
Silence is golden
+Scrub find errors in "-m dup -d single" test
...
And with the following call trace:
BTRFS info (device dm-5): scrub: started on devid 1
------------[ cut here ]------------
BTRFS: Transaction aborted (error -27)
WARNING: CPU: 0 PID: 55087 at fs/btrfs/block-group.c:1890 btrfs_create_pending_block_groups+0x3e6/0x470 [btrfs]
CPU: 0 PID: 55087 Comm: btrfs Tainted: G W O 5.4.0-rc1-custom+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:btrfs_create_pending_block_groups+0x3e6/0x470 [btrfs]
Call Trace:
__btrfs_end_transaction+0xdb/0x310 [btrfs]
btrfs_end_transaction+0x10/0x20 [btrfs]
btrfs_inc_block_group_ro+0x1c9/0x210 [btrfs]
scrub_enumerate_chunks+0x264/0x940 [btrfs]
btrfs_scrub_dev+0x45c/0x8f0 [btrfs]
btrfs_ioctl+0x31a1/0x3fb0 [btrfs]
do_vfs_ioctl+0x636/0xaa0
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x43/0x50
do_syscall_64+0x79/0xe0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
---[ end trace 166c865cec7688e7 ]---
[CAUSE]
The error number -27 is -EFBIG, returned from the following call chain:
btrfs_end_transaction()
|- __btrfs_end_transaction()
|- btrfs_create_pending_block_groups()
|- btrfs_finish_chunk_alloc()
|- btrfs_add_system_chunk()
This happens because we have used up all space of
btrfs_super_block::sys_chunk_array.
The root cause is, we have the following bad loop of creating tons of
system chunks:
1. The only SYSTEM chunk is being scrubbed
It's very common to have only one SYSTEM chunk.
2. New SYSTEM bg will be allocated
As btrfs_inc_block_group_ro() will check if we have enough space
after marking current bg RO. If not, then allocate a new chunk.
3. New SYSTEM bg is still empty, will be reclaimed
During the reclaim, we will mark it RO again.
4. That newly allocated empty SYSTEM bg get scrubbed
We go back to step 2, as the bg is already mark RO but still not
cleaned up yet.
If the cleaner kthread doesn't get executed fast enough (e.g. only one
CPU), then we will get more and more empty SYSTEM chunks, using up all
the space of btrfs_super_block::sys_chunk_array.
[FIX]
Since scrub/dev-replace doesn't always need to allocate new extent,
especially chunk tree extent, so we don't really need to do chunk
pre-allocation.
To break above spiral, here we introduce a new parameter to
btrfs_inc_block_group(), @do_chunk_alloc, which indicates whether we
need extra chunk pre-allocation.
For relocation, we pass @do_chunk_alloc=true, while for scrub, we pass
@do_chunk_alloc=false.
This should keep unnecessary empty chunks from popping up for scrub.
Also, since there are two parameters for btrfs_inc_block_group_ro(),
add more comment for it.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
struct btrfs_fs_devices::rotating currently is declared as an integer
variable but only used as a boolean.
Change the variable definition to bool and update to code touching it to
set 'true' and 'false'.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
struct btrfs_fs_devices::seeding currently is declared as an integer
variable but only used as a boolean.
Change the variable definition to bool and update to code touching it to
set 'true' and 'false'.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The type name is misleading, a single entry is named 'cache' while this
normally means a collection of objects. Rename that everywhere. Also the
identifier was quite long, making function prototypes harder to format.
Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
For read_one_block_group(), its only caller has already got the item key
to search next block group item.
So we can use that key directly without doing our own convertion on
stack.
Also, since that key used in btrfs_read_block_groups() is vital for
block group item search, add 'const' keyword for that parameter to
prevent read_one_block_group() to modify it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Refactor the work inside the loop of btrfs_read_block_groups() into one
separate function, read_one_block_group().
This allows read_one_block_group to be reused for later BG_TREE feature.
The refactor does the following extra fix:
- Use btrfs_fs_incompat() to replace open-coded feature check
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
A nice writeup of the LKMM (Linux Kernel Memory Model) rules for access
once policies can be found here
https://lwn.net/Articles/799218/#Access-Marking%20Policies .
The locked and unlocked access to eb::blocking_writers should be
annotated accordingly, following this:
Writes:
- locked write must use ONCE, may use plain read
- unlocked write must use ONCE
Reads:
- unlocked read must use ONCE
- locked read may use plain read iff not mixed with unlocked read
- unlocked read then locked must use ONCE
There's one difference on the assembly level, where
btrfs_tree_read_lock_atomic and btrfs_try_tree_read_lock used the cached
value and did not reevaluate it after taking the lock. This could have
missed some opportunities to take the lock in case blocking writers
changed between the calls, but the window is just a few instructions
long. As this is in try-lock, the callers handle that.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The increment and decrement was inherited from previous version that
used atomics, switched in commit 06297d8cefca ("btrfs: switch
extent_buffer blocking_writers from atomic to int"). The only possible
values are 0 and 1 so we can set them directly.
The generated assembly (gcc 9.x) did the direct value assignment in
btrfs_set_lock_blocking_write (asm diff after change in 06297d8cefca):
5d: test %eax,%eax
5f: je 62 <btrfs_set_lock_blocking_write+0x22>
61: retq
- 62: lock incl 0x44(%rdi)
- 66: add $0x50,%rdi
- 6a: jmpq 6f <btrfs_set_lock_blocking_write+0x2f>
+ 62: movl $0x1,0x44(%rdi)
+ 69: add $0x50,%rdi
+ 6d: jmpq 72 <btrfs_set_lock_blocking_write+0x32>
The part in btrfs_tree_unlock did a decrement because
BUG_ON(blockers > 1) is probably not a strong hint for the compiler, but
otherwise the output looks safe:
- lock decl 0x44(%rdi)
+ sub $0x1,%eax
+ mov %eax,0x44(%rdi)
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There are two ifs that use eb::blocking_writers. As this is a variable
modified inside and outside of locks, we could minimize number of
accesses to avoid problems with getting different results at different
times.
The access here is locked so this can only race with btrfs_tree_unlock
that sets blocking_writers to 0 without lock and unsets the lock owner.
The first branch is taken only if the same thread already holds the
lock, the second if checks for blocking writers. Here we'd either unlock
and wait, or proceed. Both are valid states of the locking protocol.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
|