Age | Commit message (Collapse) | Author |
|
The more intense grace-period processing resulting from the 50x RCU
Tasks Trace grace-period speedups exposed the following race condition:
o Task A running on CPU 0 executes rcu_read_lock_trace(),
entering a read-side critical section.
o When Task A eventually invokes rcu_read_unlock_trace()
to exit its read-side critical section, this function
notes that the ->trc_reader_special.s flag is zero and
and therefore invoke wil set ->trc_reader_nesting to zero
using WRITE_ONCE(). But before that happens...
o The RCU Tasks Trace grace-period kthread running on some other
CPU interrogates Task A, but this fails because this task is
currently running. This kthread therefore sends an IPI to CPU 0.
o CPU 0 receives the IPI, and thus invokes trc_read_check_handler().
Because Task A has not yet cleared its ->trc_reader_nesting
counter, this function sees that Task A is still within its
read-side critical section. This function therefore sets the
->trc_reader_nesting.b.need_qs flag, AKA the .need_qs flag.
Except that Task A has already checked the .need_qs flag, which
is part of the ->trc_reader_special.s flag. The .need_qs flag
therefore remains set until Task A's next rcu_read_unlock_trace().
o Task A now invokes synchronize_rcu_tasks_trace(), which cannot
start a new grace period until the current grace period completes.
And thus cannot return until after that time.
But Task A's .need_qs flag is still set, which prevents the current
grace period from completing. And because Task A is blocked, it
will never execute rcu_read_unlock_trace() until its call to
synchronize_rcu_tasks_trace() returns.
We are therefore deadlocked.
This race is improbable, but 80 hours of rcutorture made it happen twice.
The race was possible before the grace-period speedup, but roughly 50x
less probable. Several thousand hours of rcutorture would have been
necessary to have a reasonable chance of making this happen before this
50x speedup.
This commit therefore eliminates this deadlock by setting
->trc_reader_nesting to a large negative number before checking the
.need_qs and zeroing (or decrementing with respect to its initial
value) ->trc_reader_nesting. For its part, the IPI handler's
trc_read_check_handler() function adds a check for negative values,
deferring evaluation of the task in this case. Taken together, these
changes avoid this deadlock scenario.
Fixes: 276c410448db ("rcu-tasks: Split ->trc_reader_need_end")
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.7.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The various RCU tasks flavors currently wait 100 milliseconds between each
grace period in order to prevent CPU-bound loops and to favor efficiency
over latency. However, RCU Tasks Trace needs to have a grace-period
latency of roughly 25 milliseconds, which is completely infeasible given
the 100-millisecond per-grace-period sleep. This commit therefore reduces
this sleep duration to 5 milliseconds (or one jiffy, whichever is longer)
in kernels built with CONFIG_TASKS_TRACE_RCU_READ_MB=y.
Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Many workloads are quite sensitive to IPIs, and such workloads should
build kernels with CONFIG_TASKS_TRACE_RCU_READ_MB=y to prevent RCU
Tasks Trace from using them under normal conditions. However, other
workloads are quite happy to permit more IPIs if doing so makes BPF
program updates go faster. This commit therefore sets the default
value for the rcupdate.rcu_task_ipi_delay kernel parameter to zero for
kernels that have been built with CONFIG_TASKS_TRACE_RCU_READ_MB=n,
while retaining the old default of (HZ / 10) for kernels that have
indicated an aversion to IPIs via CONFIG_TASKS_TRACE_RCU_READ_MB=y.
Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Commit 8344496e8b49 ("rcu-tasks: Conditionally compile
show_rcu_tasks_gp_kthreads()") introduced conditional
compilation of several functions, but forgot one occurrence of
show_rcu_tasks_classic_gp_kthread() that causes the compiler to warn of
an unused static function. This commit uses "static inline" to avoid
these complaints and possibly also to avoid emitting an actual definition
of this function.
Fixes: 8344496e8b49 ("rcu-tasks: Conditionally compile show_rcu_tasks_gp_kthreads()")
Cc: <stable@vger.kernel.org> # 5.8.x
Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The RCU Tasks Trace grace periods are too slow, as in 40x slower than
those of RCU Tasks. This is due to my having assumed a one-second grace
period was OK, and thus not having optimized any further. This commit
provides the first step in this optimization process, namely by allowing
the task_list scan backoff interval to be specified on a per-flavor basis,
and then speeding up the scans for RCU Tasks Trace. However, kernels
built with CONFIG_TASKS_TRACE_RCU_READ_MB=y continue to use the old slower
backoff, consistent with that Kconfig option's goal of reducing IPIs.
Link: https://lore.kernel.org/bpf/CAADnVQK_AiX+S_L_A4CQWT11XyveppBbQSQgH_qWGyzu_E8Yeg@mail.gmail.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The n_heavy_reader_attempts, n_heavy_reader_updates, and
n_heavy_reader_ofl_updates variables are not used outside of their
translation unit, so this commit marks them static.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
if of_find_device_by_node() succeed, mtk_drm_kms_init() doesn't have
a corresponding put_device(). Thus add jump target to fix the exception
handling for this function implementation.
Fixes: 8f83f26891e1 ("drm/mediatek: Add HDMI support")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
if of_find_device_by_node() succeed, mtk_drm_kms_init() doesn't have
a corresponding put_device(). Thus add jump target to fix the exception
handling for this function implementation.
Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
mtk_ddp_comp_init() is called in a loop in mtk_drm_probe(), if it
fail, previous successive init component is not proccessed.
Thus uninitialize valid component and put their device if component
init failed.
Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
if of_find_device_by_node() succeed, mtk_ddp_comp_init() doesn't have
a corresponding put_device(). Thus add put_device() to fix the exception
handling for this function implementation.
Fixes: d0afe37f5209 ("drm/mediatek: support CMDQ interface in ddp component")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
Even though cmdq client is created successfully, without the cmdq event,
cmdq could not work correctly, so use CPU when fail to get cmdq event.
Fixes: 60fa8c13ab1a ("drm/mediatek: Move gce event property to mutex device node")
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
Remove mtk_drm_ddp.h which is included more than once
Fixes: 9aef5867c86c ("drm/mediatek: drop use of drmP.h")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
This patch add support to --show-ring & --set-ring Ethtool functions:
- Adding min, max, power of two check to new ring parameter's value.
- Bring down the network interface before changing the value of ring
parameters.
- Bring up the network interface after changing the value of ring
parameters.
Signed-off-by: Song, Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel says:
====================
mlxsw: Refactor headroom management
Petr says:
On Spectrum, port buffers, also called port headroom, is where packets are
stored while they are parsed and the forwarding decision is being made. For
lossless traffic flows, in case shared buffer admission is not allowed,
headroom is also where to put the extra traffic received before the sent
PAUSE takes effect. Another aspect of the port headroom is the so called
internal buffer, which is used for egress mirroring.
Linux supports two DCB interfaces related to the headroom: dcbnl_setbuffer
for configuration, and dcbnl_getbuffer for inspection. In order to make it
possible to implement these interfaces, it is first necessary to clean up
headroom handling, which is currently strewn in several places in the
driver.
The end goal is an architecture whereby it is possible to take a copy of
the current configuration, adjust parameters, and then hand the proposed
configuration over to the system to implement it. When everything works,
the proposed configuration is accepted and saved. First, this centralizes
the reconfiguration handling to one function, which takes care of
coordinating buffer size changes and priority map changes to avoid
introducing drops. Second, the fact that the configuration is all in one
place makes it easy to keep a backup and handle error path rollbacks, which
were previously hard to understand.
Patch #1 introduces struct mlxsw_sp_hdroom, which will keep port headroom
configuration.
Patch #2 unifies handling of delay provision between PFC and PAUSE. From
now on, delay is to be measured in bytes of extra space, and will not
include MTU. PFC handler sets the delay directly from the parameter it gets
through the DCB interface. For PAUSE, MLXSW_SP_PAUSE_DELAY is converted to
have the same meaning.
In patches #3-#5, MTU, lossiness and priorities are gradually moved over to
struct mlxsw_sp_hdroom.
In patches #6-#11, handling of buffer resizing and priority maps is moved
from spectrum.c and spectrum_dcb.c to spectrum_buffers.c. The API is
gradually adapted so that struct mlxsw_sp_hdroom becomes the main interface
through which the various clients express how the headroom should be
configured.
Patch #12 is a small cleanup that the previous transformation made
possible.
In patch #13, the port init code becomes a boring client of the headroom
code, instead of rolling its own thing.
Patches #14 and #15 move handling of internal mirroring buffer to the new
headroom code as well. Previously, this code was in the SPAN module. This
patchset converts the SPAN module to another boring client of the headroom
code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Traffic mirroring modes that are in-chip implemented on egress need an
internal buffer to work. As the only client, the SPAN module was managing
the buffer so far. However logically it belongs to the buffers module. E.g.
buffer size validation needs to take the size of the internal buffer into
account.
Therefore move the related code from SPAN to spectrum_buffers. Move over
the callbacks that determine the minimum buffer size as a function of
maximum speed and MTU. Add a field describing the internal buffer to struct
mlxsw_sp_hdroom. Extend mlxsw_sp_hdroom_bufs_reset_sizes() to take care of
sizing the internal buffer as well. Change the SPAN module to invoke that
function and mlxsw_sp_hdroom_configure() like all the other hdroom clients.
Drop the now-unnecessary mlxsw_sp_span_port_buffer_disable().
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The size of the internal buffer is currently calculated in the SPAN module.
Logically it belongs to the spectrum_buffers module, where it should be
moved. However, that being a chip-specific operation, it needs dynamic
dispatch. There currently is a chip-specific structure for description of
shared buffer values, struct mlxsw_sp_sb_vals. However placing ops into
this structure would be confusing. Therefore introduce a new per-chip
structure, currently empty, and initialize the ops pointer as appropriate.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently mlxsw_sp_port_headroom_init() configures both priomap and buffers
by hand. Additionally, for port buffers, it configures buffer 0 with a size
that it will never again have if PFC configuration is touched.
Rewrite the init code to become a client of the new hdroom code. The only
difference in invocation is that the configuration is forced, so that it is
issued even if the desired configuration happens to match what is contained
in (hitherto not initialized with meaningful values) mlxsw_sp_port->hdroom.
Since now mlxsw_sp_port_headroom_init() initializes all the PG buffers to
meaningful values, mlxsw_sp_hdroom_configure_buffers() can avoid querying
the current configuration, and can fill the whole PBMC itself.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This function is now only used from the buffers module, and is a trivial
field reference. Just inline it and drop the related artifacts.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move all the headroom code to the spectrum_buffers module, where it
belongs.
Rename mlxsw_sp_pg_buf_threshold_get() and mlxsw_sp_pg_buf_pack() to
..._hdroom_... to match the naming convention of the new headroom code.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ETS handler performs the headroom configuration in three steps: first
it resizes the buffers and adds any new ones. Then it redirects priorities
to the new buffers. And finally it sets the size of the now-unused buffers
to zero. This way no packet drops are introduced.
This sort of careful approach will also be useful for configuring port
buffer sizes and priority map by hand, through dcbnl_setbuffer. Therefore
move the code from the DCB handler to the generic headroom function.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The new hdroom code has certain conventions: iteration over priorities is
done through a variable named `prio', configuration is not pushed unless it
is dirty, but a `force' flag can be used to override this, updated
configuration is written to port. Convert the function
mlxsw_sp_port_pg_prio_map() to use these conventions and rename
appropriately to fit in.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ETS handler performs the headroom configuration in three steps: first
it resizes the buffers and adds any new ones. Then it redirects priorities
to the new buffers. And finally it sets the size of the now-unused buffers
to zero. This way no packet drops are introduced.
Both of the buffer size configuration operations are simply buffer size
configurations, there is no material difference between setting buffers to
zero and any other value. Therefore simply invoke the same
mlxsw_sp_hdroom_configure(), and drop mlxsw_sp_port_pg_destroy() and
mlxsw_sp_ets_has_pg() which are now unused.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Split mlxsw_sp_port_headroom_set() to three functions.
mlxsw_sp_hdroom_bufs_reset_sizes() changes the sizes of the individual PG
buffers, and mlxsw_sp_hdroom_configure_buffers() will actually apply the
configuration. A third function, mlxsw_sp_hdroom_bufs_fit(), verifies that
the requested buffer configuration matches total headroom size
requirements.
Add wrappers, mlxsw_sp_hdroom_configure() and __..., that will eventually
perform full headroom configuration, but for now, only have them verify the
configured headroom size, and invoke mlxsw_sp_hdroom_configure_buffers().
Have them take the `force` argument to prepare for a later patch, even
though it is currently unused.
Note that the loop in mlxsw_sp_hdroom_configure_buffers() only goes through
DCBX_MAX_BUFFERS. Since there is no logic to configure the control buffer,
it needs to keep the values queried from the FW. Eventually this function
should configure all the PGs.
Note that conversion of __mlxsw_sp_dcbnl_ieee_setets() is not trivial. That
function performs the headroom configuration in three steps: first it
resizes the buffers and adds any new ones. Then it redirects priorities to
the new buffers. And finally it sets the size of the now-unused buffers to
zero. This way no packet drops are introduced.
So after invoking mlxsw_sp_hdroom_bufs_reset_sizes(), tweak the
configuration to keep the old sizes of PG buffers for those buffers whose
size was set to zero.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
So far, port buffers were always autoconfigured. When dcbnl_setbuffer
callback is implemented, it will allow the user to change the buffer size
configuration by hand. The sizes therefore need to be a configuration
parameter, not always deduced, and therefore belong to struct
mlxsw_sp_hdroom, where the configuration routine should take them from.
Update mlxsw_sp_port_headroom_set() to update these sizes. Have the
function update the sizes even for the case that a given buffer is not
used.
Additionally, change the loop iteration end to DCBX_MAX_BUFFERS instead of
IEEE_8021QAZ_MAX_TCS. The value is the same, but the semantics differ.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Client-side configuration has lossiness as an attribute of a priority.
Therefore add a "lossy" attribute to struct mlxsw_sp_hdroom_prio.
To a Spectrum ASIC, lossiness is a feature of a port buffer. Therefore add
struct mlxsw_sp_hdroom_buf, which in the following patches will get more
attributes, but right now only use it to track port buffer lossiness.
Instead of passing around the primary indicators of PFC and pause_en, add a
function mlxsw_sp_hdroom_bufs_reset_lossiness() to compute the buffer
lossiness from the priority map and priority lossiness. Change
mlxsw_sp_port_headroom_set() to take the buffer lossy flag from the
headroom configuration. Have the PFC and pause handlers configure priority
lossiness in mlxsw_sp_hdroom, from where it will propagate.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mapping from priorities to buffers determines which buffers should be
configured. Lossiness of these priorities combined with the mapping
determines whether a given buffer should be lossy.
Currently this configuration is stored implicitly in DCB ETS, PFC and
ethtool PAUSE configuration. Keeping it together with the rest of the
headroom configuration and deriving it as needed from PFC / ETS / PAUSE
will make things clearer. To that end, add a field "prios" to struct
mlxsw_sp_hdroom.
Previously, __mlxsw_sp_port_headroom_set() took prio_tc as an argument, and
assumed that the same mapping as we use on the egress should be used on
ingress as well. Instead, track this configuration at each priority, so
that it can be adjusted flexibly.
In the following patches, as dcbnl_setbuffer is implemented, it will need
to store its own mapping, and it will also be sometimes necessary to revert
back to the original ETS mapping. Therefore track two buffer indices: the
one for chip configuration (buf_idx), and the source one (ets_buf_idx).
Introduce a function to configure the chip-level buffer index, and for now
have it simply copy the ETS mapping over to the chip mapping.
Update the ETS handler to project prio_tc to the ets_buf_idx and invoke the
buf_idx recomputation.
Now that there is a canonical place to look for this configuration,
mlxsw_sp_port_headroom_set() does not need to invent def_prio_tc to use if
DCB is compiled out.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MTU influences sizes of auto-allocated buffers. Make it a part of port
buffer configuration and have __mlxsw_sp_port_headroom_set() take it from
there, instead of as an argument.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a priority is marked as lossless using DCB PFC, or when pause frames
are enabled on a port, mlxsw adds to port buffers an extra space to cover
the traffic that will arrive between the time that a pause or PFC frame is
emitted, and the time traffic actually stops. This is called the delay. The
concept is the same in PFC and pause, however the way the extra buffer
space is calculated differs.
In this patch, unify this handling. Delay is to be measured in bytes of
extra space, and will not include MTU. PFC handler sets the delay directly
from the parameter it gets through the DCB interface.
To convert pause handler, move MLXSW_SP_PAUSE_DELAY to ethtool module,
convert to bytes, and reduce it by maximum MTU, and divide by two. Then it
has the same meaning as the delay_bytes set by the PFC handler.
Keep the delay_bytes value in struct mlxsw_sp_hdroom introduced in the
previous patch. Change PFC and pause handlers to store the new delay value
there and have __mlxsw_sp_port_headroom_set() take it from there.
Instead of mlxsw_sp_pfc_delay_get() and mlxsw_sp_pg_buf_delay_get(),
introduce mlxsw_sp_hdroom_buf_delay_get() to calculate the delay provision.
Drop the unnecessary MLXSW_SP_CELL_FACTOR, and instead add an explanatory
comment describing the formula used.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The port headroom handling is currently strewn across several modules and
tricky to follow: MTU, DCB PFC, DCB ETS and ethtool pause all influence the
settings, and then there is the completely separate initial configuraion in
spectrum_buffers. A following patch will implement the dcbnl_setbuffer
callback, which is going to further complicate the landscape.
In order to simplify work with port buffers, the following patches are
going to centralize all port-buffer handling in spectrum_buffers. As a
first step, introduce a (currently empty) struct mlxsw_sp_hdroom that will
keep the configuration parameters, and allocate and free it in appropriate
places.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2020-09-15
Various updates to mlx5 driver,
1) Eli adds support for TC trap action.
2) Eran, minor improvements to clock.c code structure
3) Better handling of error reporting in LAG from Jianbo
4) IPv6 traffic class (DSCP) header rewrite support from Maor
5) Ofer Levi adds support for CQE compression of multi-strides packets
6) Vu, Enables use of vport meta data by default.
7) Some minor code cleanup
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On A20R machines the interrupt pending bits in cause register need to be
updated by requesting the chipset to do it. This needs to be done to
find the interrupt cause and after interrupt service. In
commit 0b888c7f3a03 ("MIPS: SNI: Convert to new irq_chip functions") the
function to do after service update got lost, which caused spurious
interrupts.
Fixes: 0b888c7f3a03 ("MIPS: SNI: Convert to new irq_chip functions")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Set PERF_SAMPLE_PERIOD if attr->freq is set.
- Remove trailing commas from AMD JSON vendor event files.
- Don't clear event's period if set by a event definition term.
- Leader sampling shouldn't clear sample period in 'perf test'.
- Fix the "signal" test inline assembly when built with DEBUG=1.
- Fix memory leaks detected by ASAN, some in normal paths, some in
error paths.
- Fix 2 memory sanitizer warnings in 'perf bench'.
- Fix the ratio comments of miss-events in 'perf stat'.
- Prevent override of attr->sample_period for libpfm4 events.
- Sync kvm.h and in.h headers with the kernel sources.
* tag 'perf-tools-fixes-for-v5.9-2020-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf stat: Fix the ratio comments of miss-events
perf test: Free formats for perf pmu parse test
perf metric: Do not free metric when failed to resolve
perf metric: Free metric when it failed to resolve
perf metric: Release expr_parse_ctx after testing
perf test: Fix memory leaks in parse-metric test
perf parse-event: Fix memory leak in evsel->unit
perf evlist: Fix cpu/thread map leak
perf metric: Fix some memory leaks - part 2
perf metric: Fix some memory leaks
perf test: Free aliases for PMU event map aliases test
perf vendor events amd: Remove trailing commas
perf test: Leader sampling shouldn't clear sample period
perf record: Don't clear event's period if set by a term
tools headers UAPI: update linux/in.h copy
tools headers UAPI: Sync kvm.h headers with the kernel sources
perf record: Prevent override of attr->sample_period for libpfm4 events
perf record: Set PERF_RECORD_PERIOD if attr->freq is set.
perf bench: Fix 2 memory sanitizer warnings
perf test: Fix the "signal" test inline assembly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk driver fixes from Stephen Boyd:
"A handful of clk driver fixes. Mostly they're for error paths or
improper memory allocations sizes. Nothing as exciting as a wildfire"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: lpass: Correct goto target in lpass_core_sc7180_probe()
clk: versatile: Add of_node_put() before return statement
clk: bcm: dvp: Select the reset framework
clk: rockchip: Fix initialization of mux_pll_src_4plls_p
clk: davinci: Use the correct size when allocating memory
|
|
Max's and Shravan's usernames were changed while @mellanox.com emails
were transferred to be @nvidia.com.
Fixes: f6da70d99c96 ("MAINTAINERS: Update Mellanox and Cumulus Network addresses to new domain")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The C3 BusMaster idle code takes lock in a number of places, some deep
inside the ACPI code. Instead of wrapping it all in RCU_NONIDLE, have
the driver take over RCU-idle duty and avoid flipping RCU state back
and forth a lot.
( by marking 'C3 && bm_check' as RCU_IDLE, we _must_ call enter_bm() for
that combination, otherwise we'll loose RCU-idle, this requires
shuffling some code around )
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Some drivers have to do significant work, some of which relies on RCU
still being active. Instead of using RCU_NONIDLE in the drivers and
flipping RCU back on, allow drivers to take over RCU-idle duty.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Make acpi_processor_idle() use the generic TLB flushing code.
This again removes RCU usage after rcu_idle_enter().
(XXX make every C3 invalidate TLBs, not just C3-BM)
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Make acpi_processor_idle use the common broadcast code, there's no
reason not to. This also removes some RCU usage after
rcu_idle_enter().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
that percpu-rwsem is a blocking primitive, should be just fine.
However, file_end_write() is used from IRQ context and will cause
load-store issues on architectures where the per-cpu accessors are not
natively irq-safe.
Fix it by using the IRQ-safe this_cpu_*() for operations on
read_count. This will generate more expensive code on a number of
platforms, which might cause a performance regression for some of the
other percpu-rwsem users.
If any such is reported, we can consider alternative solutions.
Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com
|
|
'perf stat' displays miss ratio of L1-dcache, L1-icache, dTLB cache,
iTLB cache and LL-cache. Take L1-dcache for example, miss ratio is
caculated as "L1-dcache-load-misses/L1-dcache-loads". So "of all
L1-dcache hits" is unsuitable to describe it, and "of all L1-dcache
accesses" seems better.
The comments of L1-icache, dTLB cache, iTLB cache and LL-cache are
fixed in the same way.
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1600253331-10535-1-git-send-email-liuqi115@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
syzbot is reporting OOB read at fbcon_resize() [1], for
commit 39b3cffb8cf31117 ("fbcon: prevent user font height or width change
from causing potential out-of-bounds access") is by error using
registered_fb[con2fb_map[vc->vc_num]]->fbcon_par->p->userfont (which was
set to non-zero) instead of fb_display[vc->vc_num].userfont (which remains
zero for that display).
We could remove tricky userfont flag [2], for we can determine it by
comparing address of the font data and addresses of built-in font data.
But since that commit is failing to fix the original OOB read [3], this
patch keeps the change minimal in case we decide to revert altogether.
[1] https://syzkaller.appspot.com/bug?id=ebcbbb6576958a496500fee9cf7aa83ea00b5920
[2] https://syzkaller.appspot.com/text?tag=Patch&x=14030853900000
[3] https://syzkaller.appspot.com/bug?id=6fba8c186d97cf1011ab17660e633b1cc4e080c9
Reported-by: syzbot <syzbot+b38b1ef6edf0c74a8d97@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 39b3cffb8cf31117 ("fbcon: prevent user font height or width change from causing potential out-of-bounds access")
Cc: George Kennedy <george.kennedy@oracle.com>
Link: https://lore.kernel.org/r/f6e3e611-8704-1263-d163-f52c906a4f06@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Simply add Lakefield model ID. No additional changes are needed.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
[ rjw: Minor subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
These serial ports are exposed by the OOB-management-engine on
RealManage-enabled network cards (e.g. AMD DASH enabled systems using
Realtek cards).
Because these have 3 BARs, they fail the "num_iomem <= 1" check in
serial_pci_guess_board.
I've manually checked the two IOMEM regions and BAR 2 doesn't seem to
respond to reads, but BAR 4 seems to be an MMIO version of the IO ports
(untested).
With this change, the ports are detected:
0000:02:00.1: ttyS0 at I/O 0x2200 (irq = 82, base_baud = 115200) is a 16550A
0000:02:00.2: ttyS1 at I/O 0x2100 (irq = 55, base_baud = 115200) is a 16550A
lspci output:
02:00.1 0700: 10ec:816a (rev 0e) (prog-if 02 [16550])
Subsystem: 17aa:5082
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin B routed to IRQ 82
IOMMU group: 11
Region 0: I/O ports at 2200 [size=256]
Region 2: Memory at fd715000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at fd704000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
Not readable
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [170 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [178 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=150us PortTPowerOnTime=150us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
02:00.2 0700: 10ec:816b (rev 0e)
[...same...]
Signed-off-by: Tobias Diedrich <tobiasdiedrich@gmail.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200914173628.GA22508@yamamaya.is-a-geek.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix the port-lock initialisation regression introduced by commit
a3cb39d258ef ("serial: core: Allow detach and attach serial device for
console") by making sure that the lock is again initialised during
console setup.
The console may be registered before the serial controller has been
probed in which case the port lock needs to be initialised during
console setup by a call to uart_set_options(). The console-detach
changes introduced a regression in several drivers by effectively
removing that initialisation by not initialising the lock when the port
is used as a console (which is always the case during console setup).
Add back the early lock initialisation and instead use a new
console-reinit flag to handle the case where a console is being
re-attached through sysfs.
The question whether the console-detach interface should have been added
in the first place is left for another discussion.
Note that the console-enabled check in uart_set_options() is not
redundant because of kgdboc, which can end up reinitialising an already
enabled console (see commit 42b6a1baa3ec ("serial_core: Don't
re-initialize a previously initialized spinlock.")).
Fixes: a3cb39d258ef ("serial: core: Allow detach and attach serial device for console")
Cc: stable <stable@vger.kernel.org> # 5.7
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200909143101.15389-3-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit f743061a85f5 ("serial: core: Initialise spin lock before use in
uart_configure_port()") tried to work around a breakage introduced by
commit a3cb39d258ef ("serial: core: Allow detach and attach serial
device for console") by adding a second initialisation of the port lock
when registering the port.
As reported by the build robots [1], this doesn't really solve the
regression introduced by the console-detach changes and also adds a
second redundant initialisation of the lock for normal ports.
Start cleaning up this mess by removing the redundant initialisation and
making sure that the port lock is again initialised once-only for ports
that aren't already in use as a console.
[1] https://lore.kernel.org/r/20200802054852.GR23458@shao2-debian
Fixes: f743061a85f5 ("serial: core: Initialise spin lock before use in uart_configure_port()")
Fixes: a3cb39d258ef ("serial: core: Allow detach and attach serial device for console")
Cc: stable <stable@vger.kernel.org> # 5.7
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200909143101.15389-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Check and return if there are errors. The response bits are valid
only on no errors.
Fixes: b7404a29cd3d ("usb: typec: intel_pmc_mux: Definitions for response status bits")
Signed-off-by: Madhusudanarao Amara <madhusudanarao.amara@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20200916091102.27118-4-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add a USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk for the BYD zhaoxin notebook.
This notebook come with usb touchpad. And we would like to disable
touchpad wakeup on this notebook by default.
Signed-off-by: Penghao <penghao@uniontech.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200907023026.28189-1-penghao@uniontech.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The SCSI layer can go into an ugly loop if you ignore that a device is
gone. You need to report an error in the command rather than in the
return value of the queue method.
We need to specifically check for ENODEV. The issue goes back to the
introduction of the driver.
Fixes: 115bb1ffa54c3 ("USB: Add UAS driver")
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200916094026.30085-2-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Sometimes the embedded controller firmware does not
terminate the list of alternate modes that the partner
supports in its response to the GET_ALTERNATE_MODES command.
Instead the firmware returns the supported alternate modes
over and over again until the driver stops requesting them.
If that happens, the number of modes for each alternate mode
will exceed the maximum 6 that is defined in the USB Power
Delivery specification. Making sure that can't happen by
adding a check for it.
This fixes NULL pointer dereference that is caused by the
overrun.
Fixes: ad74b8649beaf ("usb: typec: ucsi: Preliminary support for alternate modes")
Cc: stable@vger.kernel.org
Reported-by: Zwane Mwaikambo <zwanem@gmail.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20200916090034.25119-3-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|