Age | Commit message (Collapse) | Author |
|
The timerlat tracer aims to help the preemptive kernel developers to
found souces of wakeup latencies of real-time threads. Like cyclictest,
the tracer sets a periodic timer that wakes up a thread. The thread then
computes a *wakeup latency* value as the difference between the *current
time* and the *absolute time* that the timer was set to expire. The main
goal of timerlat is tracing in such a way to help kernel developers.
Usage
Write the ASCII text "timerlat" into the current_tracer file of the
tracing system (generally mounted at /sys/kernel/tracing).
For example:
[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo timerlat > current_tracer
It is possible to follow the trace by reading the trace trace file:
[root@f32 tracing]# cat trace
# tracer: timerlat
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# || /
# |||| ACTIVATION
# TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY
# | | | |||| | | | |
<idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns
<...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns
<idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns
<...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns
<idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns
<...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns
<idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns
<...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns
The tracer creates a per-cpu kernel thread with real-time priority that
prints two lines at every activation. The first is the *timer latency*
observed at the *hardirq* context before the activation of the thread.
The second is the *timer latency* observed by the thread, which is the
same level that cyclictest reports. The ACTIVATION ID field
serves to relate the *irq* execution to its respective *thread* execution.
The irq/thread splitting is important to clarify at which context
the unexpected high value is coming from. The *irq* context can be
delayed by hardware related actions, such as SMIs, NMIs, IRQs
or by a thread masking interrupts. Once the timer happens, the delay
can also be influenced by blocking caused by threads. For example, by
postponing the scheduler execution via preempt_disable(), by the
scheduler execution, or by masking interrupts. Threads can
also be delayed by the interference from other threads and IRQs.
The timerlat can also take advantage of the osnoise: traceevents.
For example:
[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo timerlat > current_tracer
[root@f32 tracing]# echo osnoise > set_event
[root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
[root@f32 tracing]# tail -10 trace
cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 1585 ns
cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 4597 ns
cc1-87882 [005] dNLh2.. 548.771083: irq_noise: reschedule:253 start 548.771083017 duration 56 ns
cc1-87882 [005] dNLh2.. 548.771086: irq_noise: call_function_single:251 start 548.771083811 duration 2048 ns
cc1-87882 [005] dNLh2.. 548.771088: irq_noise: call_function_single:251 start 548.771086814 duration 1495 ns
cc1-87882 [005] dNLh2.. 548.771091: irq_noise: call_function_single:251 start 548.771089194 duration 1558 ns
cc1-87882 [005] dNLh2.. 548.771094: irq_noise: call_function_single:251 start 548.771091719 duration 1932 ns
cc1-87882 [005] dNLh2.. 548.771096: irq_noise: call_function_single:251 start 548.771094696 duration 1050 ns
cc1-87882 [005] d...3.. 548.771101: thread_noise: cc1:87882 start 548.771078243 duration 10909 ns
timerlat/5-1035 [005] ....... 548.771103: #402268 context thread timer_latency 25960 ns
For further information see: Documentation/trace/timerlat-tracer.rst
Link: https://lkml.kernel.org/r/71f18efc013e1194bcaea1e54db957de2b19ba62.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
In the context of high-performance computing (HPC), the Operating System
Noise (*osnoise*) refers to the interference experienced by an application
due to activities inside the operating system. In the context of Linux,
NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the
system. Moreover, hardware-related jobs can also cause noise, for example,
via SMIs.
The osnoise tracer leverages the hwlat_detector by running a similar
loop with preemption, SoftIRQs and IRQs enabled, thus allowing all
the sources of *osnoise* during its execution. Using the same approach
of hwlat, osnoise takes note of the entry and exit point of any
source of interferences, increasing a per-cpu interference counter. The
osnoise tracer also saves an interference counter for each source of
interference. The interference counter for NMI, IRQs, SoftIRQs, and
threads is increased anytime the tool observes these interferences' entry
events. When a noise happens without any interference from the operating
system level, the hardware noise counter increases, pointing to a
hardware-related noise. In this way, osnoise can account for any
source of interference. At the end of the period, the osnoise tracer
prints the sum of all noise, the max single noise, the percentage of CPU
available for the thread, and the counters for the noise sources.
Usage
Write the ASCII text "osnoise" into the current_tracer file of the
tracing system (generally mounted at /sys/kernel/tracing).
For example::
[root@f32 ~]# cd /sys/kernel/tracing/
[root@f32 tracing]# echo osnoise > current_tracer
It is possible to follow the trace by reading the trace trace file::
[root@f32 tracing]# cat trace
# tracer: osnoise
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth MAX
# || / SINGLE Interference counters:
# |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+
# TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD
# | | | |||| | | | | | | | | | |
<...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1
<...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3
<...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21
<...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0
<...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41
<...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2
<...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1
<...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19
In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the
tracer prints a message at the end of each period for each CPU that is
running an osnoise/CPU thread. The osnoise specific fields report:
- The RUNTIME IN USE reports the amount of time in microseconds that
the osnoise thread kept looping reading the time.
- The NOISE IN US reports the sum of noise in microseconds observed
by the osnoise tracer during the associated runtime.
- The % OF CPU AVAILABLE reports the percentage of CPU available for
the osnoise thread during the runtime window.
- The MAX SINGLE NOISE IN US reports the maximum single noise observed
during the runtime window.
- The Interference counters display how many each of the respective
interference happened during the runtime window.
Note that the example above shows a high number of HW noise samples.
The reason being is that this sample was taken on a virtual machine,
and the host interference is detected as a hardware interference.
Tracer options
The tracer has a set of options inside the osnoise directory, they are:
- osnoise/cpus: CPUs at which a osnoise thread will execute.
- osnoise/period_us: the period of the osnoise thread.
- osnoise/runtime_us: how long an osnoise thread will look for noise.
- osnoise/stop_tracing_us: stop the system tracing if a single noise
higher than the configured value happens. Writing 0 disables this
option.
- osnoise/stop_tracing_total_us: stop the system tracing if total noise
higher than the configured value happens. Writing 0 disables this
option.
- tracing_threshold: the minimum delta between two time() reads to be
considered as noise, in us. When set to 0, the default value will
be used, which is currently 5 us.
Additional Tracing
In addition to the tracer, a set of tracepoints were added to
facilitate the identification of the osnoise source.
- osnoise:sample_threshold: printed anytime a noise is higher than
the configurable tolerance_ns.
- osnoise:nmi_noise: noise from NMI, including the duration.
- osnoise:irq_noise: noise from an IRQ, including the duration.
- osnoise:softirq_noise: noise from a SoftIRQ, including the
duration.
- osnoise:thread_noise: noise from a thread, including the duration.
Note that all the values are *net values*. For example, if while osnoise
is running, another thread preempts the osnoise thread, it will start a
thread_noise duration at the start. Then, an IRQ takes place, preempting
the thread_noise, starting a irq_noise. When the IRQ ends its execution,
it will compute its duration, and this duration will be subtracted from
the thread_noise, in such a way as to avoid the double accounting of the
IRQ execution. This logic is valid for all sources of noise.
Here is one example of the usage of these tracepoints::
osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns
osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns
migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns
osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8723 ns interferences 2
In this example, a noise sample of 8 microseconds was reported in the last
line, pointing to two interferences. Looking backward in the trace, the
two previous entries were about the migration thread running after a
timer IRQ execution. The first event is not part of the noise because
it took place one millisecond before.
It is worth noticing that the sum of the duration reported in the
tracepoints is smaller than eight us reported in the sample_threshold.
The reason roots in the overhead of the entry and exit code that happens
before and after any interference execution. This justifies the dual
approach: measuring thread and tracing.
Link: https://lkml.kernel.org/r/e649467042d60e7b62714c9c6751a56299d15119.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
[
Made the following functions static:
trace_irqentry_callback()
trace_irqexit_callback()
trace_intel_irqentry_callback()
trace_intel_irqexit_callback()
Added to include/trace.h:
osnoise_arch_register()
osnoise_arch_unregister()
Fixed define logic for LATENCY_FS_NOTIFY
Reported-by: kernel test robot <lkp@intel.com>
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Considering the following irq_domain_ops call chain:
- .alloc() is called when a clients calls platform_get_irq() or
gpiod_to_irq()
- .activate() is called next, when the clients calls
request_threaded_irq()
Check for the IRQ MUX conflict during the first stage (alloc instead of
activate). This avoids to provide the client with an IRQ that can't be
used.
Signed-off-by: Fabien Dessenne <fabien.dessenne@foss.st.com>
Link: https://lore.kernel.org/r/20210617144602.2557619-1-fabien.dessenne@foss.st.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
With the coming addition of the osnoise tracer, the configs needed to
include the latency_fsnotify() has become more complex, and to keep the
declaration in the header file the same as in the C file, just have the
logic needed to define it in one place, and that defines LATENCY_FS_NOTIFY
which will be used in the C code.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Some minor code cleanups and updates which includes
- Mention module name under help in Kconfig.
- Remove extra lines and duplicate Pin range checks.
- Replace 'return ret' with 'return 0' in success path.
- Copyright year update.
- use devm_pinctrl_register() instead pinctrl_register() in probe.
Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com>
Link: https://lore.kernel.org/r/1624273214-66849-1-git-send-email-lakshmi.sai.krishna.potthuri@xilinx.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Reduce #ifdefs a bit by making exit_must_hard_disable() return
true on PPC32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/52531029563c1fc823b790058e799d0ca71b028c.1624631463.git.christophe.leroy@csgroup.eu
|
|
The purpose of the device-managed functions is to bind the life-time of an
object to that of a parent device object.
This is not the case for the 'vdd-cpu' regulator in this driver. A
reference is obtained via devm_regulator_get() and immediately released
with devm_regulator_put().
In this case, the usage of devm_ functions is slightly excessive, as the
un-managed versions of these functions is a little cleaner (and slightly
more economical in terms of allocation).
This change converts the devm_regulator_{get,put}() to
regulator_{get,put}() in the get_alignment_from_regulator() function of
this driver.
Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
Link: https://lore.kernel.org/r/20210624084737.42336-1-aardelean@deviqon.com
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
'clk_hw_set_rate_range()' does not return any error code and 'ret' is
known to be 0 at this point, so this message can never be displayed.
Remove it.
Fixes: 3fde0e16d016 ("drivers: clk: Add ZynqMP clock driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/71a9fed5f762a71248b8ac73c0a15af82f3ce1e2.1619867987.git.christophe.jaillet@wanadoo.fr
Reviewed-by: Michael Tretter <m.tretter@pengutronix.de>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
When the firmware code is disabled, the incomplete error handling
in the clk driver causes compile-time warnings:
drivers/clk/zynqmp/pll.c: In function 'zynqmp_pll_recalc_rate':
drivers/clk/zynqmp/pll.c:147:29: error: 'fbdiv' is used uninitialized [-Werror=uninitialized]
147 | rate = parent_rate * fbdiv;
| ~~~~~~~~~~~~^~~~~~~
In function 'zynqmp_pll_get_mode',
inlined from 'zynqmp_pll_recalc_rate' at drivers/clk/zynqmp/pll.c:148:6:
drivers/clk/zynqmp/pll.c:61:27: error: 'ret_payload' is used uninitialized [-Werror=uninitialized]
61 | return ret_payload[1];
| ~~~~~~~~~~~^~~
drivers/clk/zynqmp/pll.c: In function 'zynqmp_pll_recalc_rate':
drivers/clk/zynqmp/pll.c:53:13: note: 'ret_payload' declared here
53 | u32 ret_payload[PAYLOAD_ARG_CNT];
| ^~~~~~~~~~~
drivers/clk/zynqmp/clk-mux-zynqmp.c: In function 'zynqmp_clk_mux_get_parent':
drivers/clk/zynqmp/clk-mux-zynqmp.c:57:16: error: 'val' is used uninitialized [-Werror=uninitialized]
57 | return val;
| ^~~
As it was apparently intentional to support this for compile testing
purposes, change the code to have just enough error handling for the
compiler to not notice the remaining bugs.
Fixes: 21f237534661 ("clk: zynqmp: Drop dependency on ARCH_ZYNQMP")
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Link: https://lore.kernel.org/r/f1c4e8c903fe2d5df5413421920a56890a46387a.1624356908.git.michal.simek@xilinx.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, both in upper layer drivers (scsi disk and cdrom).
The sd one is fixing a commit changing revalidation that came from the
block tree a while ago (5.10) and the sr one adds handling of a
condition we didn't previously handle for manually removed media"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART)
scsi: sr: Return appropriate error code when disk is ejected
|
|
The Qualcomm SC8180X has the typical ADSP, CDSP and MPSS remote
processors operated using the PAS interface, add support for these.
Attempts to configuring mss.lvl is failing, so a new adsp_data is
provided that skips this resource, for now.
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20210608174944.2045215-2-bjorn.andersson@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Add compatibles for the Audio DSP, Compute DSP and Modem subsystem found
in the Qualcomm SC8180x to the Peripheral Authentication Service
remoteproc binding.
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210608174944.2045215-1-bjorn.andersson@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Convert the simple GPIO clock multiplexer Device Tree binding
documentation to json-schema.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/14cb3b4da446f26a4780e0bd1b58788eb6085d05.1623414619.git.geert+renesas@glider.be
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
To have nanosecond output displayed in a more human readable format, its
nicer to convert it to a seconds format (XXX.YYYYYYYYY). The problem is that
to do so, the numbers must be divided by NSEC_PER_SEC, and moded too. But as
these numbers are 64 bit, this can not be done simply with '/' and '%'
operators, but must use do_div() instead.
Instead of performing the expensive do_div() in the hot path of the
tracepoint, it is more efficient to perform it during the output phase. But
passing in do_div() can confuse the parser, and do_div() doesn't work
exactly like a normal C function. It modifies the number in place, and we
don't want to modify the actual values in the ring buffer.
Two helper functions are now created:
__print_ns_to_secs() and __print_ns_without_secs()
They both take a value of nanoseconds, and the former will return that
number divided by NSEC_PER_SEC, and the latter will mod it with NSEC_PER_SEC
giving a way to print a nice human readable format:
__print_fmt("time=%llu.%09u",
__print_ns_to_secs(REC->nsec_val),
__print_ns_without_secs(REC->nsec_val))
Link: https://lkml.kernel.org/r/e503b903045496c4ccde52843e1e318b422f7a56.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
hwlat has some time operation checks on the sample loop, and it is
currently using pr_err (printk) to report them. The problem is that
this can lead the system to an unresponsible state due to an overflow of
printk messages. This problem can be mitigated by writing the error
message to the trace buffer.
Remove the printk messages from the sampling loop, switching the to
messages in the trace buffer.
No functional change.
Link: https://lkml.kernel.org/r/9d77c34869748aa105e965c769d24642914eea3a.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Use the trace_min_max_param to reduce code duplication.
No functional change.
Link: https://lkml.kernel.org/r/b91accd5a7c6c14ea02d3379aae974ba22b47dd6.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The hwlat detector and (in preparation for) the osnoise/timerlat tracers
have a set of u64 parameters that the user can read/write via tracefs.
For instance, we have hwlat_detector's window and width.
To reduce the code duplication, hwlat's window and width share the same
read function. However, they do not share the write functions because
they do different parameter checks. For instance, the width needs to
be smaller than the window, while the window needs to be larger
than the window. The same pattern repeats on osnoise/timerlat, and
a large portion of the code was devoted to the write function.
Despite having different checks, the write functions have the same
structure:
read a user-space buffer
take the lock that protects the value
check for minimum and maximum acceptable values
save the value
release the lock
return success or error
To reduce the code duplication also in the write functions, this patch
provides a generic read and write implementation for u64 values that
need to be within some minimum and/or maximum parameters, while
(potentially) being protected by a lock.
To use this interface, the structure trace_min_max_param needs to be
filled:
struct trace_min_max_param {
struct mutex *lock;
u64 *val;
u64 *min;
u64 *max;
};
The desired value is stored on the variable pointed by *val. If *min
points to a minimum acceptable value, it will be checked during the
write operation. Likewise, if *max points to a maximum allowable value,
it will be checked during the write operation. Finally, if *lock points
to a mutex, it will be taken at the beginning of the operation and
released at the end.
The definition of a trace_min_max_param needs to passed as the
(private) *data for tracefs_create_file(), and the trace_min_max_fops
(added by this patch) as the *fops file_operations.
Link: https://lkml.kernel.org/r/3e35760a7c8b5c55f16ae5ad5fc54a0e71cbe647.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Implements the per-cpu mode in which a sampling thread is created for
each cpu in the "cpus" (and tracing_mask).
The per-cpu mode has the potention to speed up the hwlat detection by
running on multiple CPUs at the same time, at the cost of higher cpu
usage with irqs disabled. Use with care.
[
Changed get_cpu_data() to static.
Reported-by: kernel test robot <lkp@intel.com>
]
Link: https://lkml.kernel.org/r/ec06d0ab340e8460d293772faba19ad8a5c371aa.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Kate Carcia <kcarcia@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Clark Willaims <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The iclogbuf ring attached to the struct xlog is circular, hence the
first and last iclogs in the ring can only be determined by
comparing them against the log->l_iclog pointer.
In xfs_cil_push_work(), we want to wait on previous iclogs that were
issued so that we can flush them to stable storage with the commit
record write, and it simply waits on the previous iclog in the ring.
This, however, leads to CIL push hangs in generic/019 like so:
task:kworker/u33:0 state:D stack:12680 pid: 7 ppid: 2 flags:0x00004000
Workqueue: xfs-cil/pmem1 xlog_cil_push_work
Call Trace:
__schedule+0x30b/0x9f0
schedule+0x68/0xe0
xlog_wait_on_iclog+0x121/0x190
? wake_up_q+0xa0/0xa0
xlog_cil_push_work+0x994/0xa10
? _raw_spin_lock+0x15/0x20
? xfs_swap_extents+0x920/0x920
process_one_work+0x1ab/0x390
worker_thread+0x56/0x3d0
? rescuer_thread+0x3c0/0x3c0
kthread+0x14d/0x170
? __kthread_bind_mask+0x70/0x70
ret_from_fork+0x1f/0x30
With other threads blocking in either xlog_state_get_iclog_space()
waiting for iclog space or xlog_grant_head_wait() waiting for log
reservation space.
The problem here is that the previous iclog on the ring might
actually be a future iclog. That is, if log->l_iclog points at
commit_iclog, commit_iclog is the first (oldest) iclog in the ring
and there are no previous iclogs pending as they have all completed
their IO and been activated again. IOWs, commit_iclog->ic_prev
points to an iclog that will be written in the future, not one that
has been written in the past.
Hence, in this case, waiting on the ->ic_prev iclog is incorrect
behaviour, and depending on the state of the future iclog, we can
end up with a circular ABA wait cycle and we hang.
The fix is made more complex by the fact that many iclogs states
cannot be used to determine if the iclog is a past or future iclog.
Hence we have to determine past iclogs by checking the LSN of the
iclog rather than their state. A past ACTIVE iclog will have a LSN
of zero, while a future ACTIVE iclog will have a LSN greater than
the current iclog. We don't wait on either of these cases.
Similarly, a future iclog that hasn't completed IO will have an LSN
greater than the current iclog and so we don't wait on them. A past
iclog that is still undergoing IO completion will have a LSN less
than the current iclog and those are the only iclogs that we need to
wait on.
Hence we can use the iclog LSN to determine what iclogs we need to
wait on here.
Fixes: 5fd9256ce156 ("xfs: separate CIL commit record IO")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Implementation reuses dm_io_tracker, that until now was only used by
dm-cache, to track if any writes were issued directly to the origin
(due to cache being full) within the last second. If so writeback is
paused for a second.
This change improves performance for when the cache is full and IO is
issued directly to the origin device (rather than through the cache).
Depends-on: d53f1fafec9d ("dm writecache: do direct write if the cache is full")
Suggested-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Allow other code to use dm_io_tracker.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
remove_raw() in dm_btree_remove() may fail due to IO read error
(e.g. read the content of origin block fails during shadowing),
and the value of shadow_spine::root is uninitialized, but
the uninitialized value is still assign to new_root in the
end of dm_btree_remove().
For dm-thin, the value of pmd->details_root or pmd->root will become
an uninitialized value, so if trying to read details_info tree again
out-of-bound memory may occur as showed below:
general protection fault, probably for non-canonical address 0x3fdcb14c8d7520
CPU: 4 PID: 515 Comm: dmsetup Not tainted 5.13.0-rc6
Hardware name: QEMU Standard PC
RIP: 0010:metadata_ll_load_ie+0x14/0x30
Call Trace:
sm_metadata_count_is_more_than_one+0xb9/0xe0
dm_tm_shadow_block+0x52/0x1c0
shadow_step+0x59/0xf0
remove_raw+0xb2/0x170
dm_btree_remove+0xf4/0x1c0
dm_pool_delete_thin_device+0xc3/0x140
pool_message+0x218/0x2b0
target_message+0x251/0x290
ctl_ioctl+0x1c4/0x4d0
dm_ctl_ioctl+0xe/0x20
__x64_sys_ioctl+0x7b/0xb0
do_syscall_64+0x40/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixing it by only assign new_root when removal succeeds
Signed-off-by: Hou Tao <houtao1@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Make sure that the zone write pointer offset array is allocated with a
vmalloc in dm_zone_revalidate_cb() by passing GFP_KERNEL gfp flag to
kvcalloc(). However, since we do not want to trigger IOs while
revalidating zones, change dm_revalidate_zones() to have the zone scan
done in GFP_NOIO context using memalloc_noio_save/restore calls.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: bb37d77239af ("dm: introduce zone append emulation")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The continue statement at the end of a for-loop has no effect,
remove it.
Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Add a "metadata_only" parameter that when present: only metadata is
promoted to the cache. This option improves performance for heavier
REQ_META workloads (e.g. device-mapper-test-suite's "git clone and
checkout" benchmark improves from 341s to 312s).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Backfill missing Documentation.
Fixes: 93de44eb3fc8 ("dm writecache: implement the "cleaner" policy")
Fixes: 3923d4854e18 ("dm writecache: implement gradual cleanup")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
There was one place where we weren't locking CurrentMid, and although
likely to be safe since even without the lock since it is during
negotiate protocol, it is more consistent to lock it in this last remaining
place, and avoids confusing Coverity warning.
Addresses-Coverity: 1486665 ("Data race condition")
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-06-25
This series contains updates to ice driver only.
Jesse adds support for tracepoints to aide in debugging.
Maciej adds support for PTP auxiliary pin support.
Victor removes the VSI info from the old aggregator when moving the VSI
to another aggregator.
Tony removes an unnecessary VSI assignment.
Christophe Jaillet fixes a memory leak for failed allocation in
ice_pf_dcb_cfg().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When smc_sendmsg() is called before the SMC socket initialization has
completed, smc_tx_sendmsg() will access un-initialized fields of the
SMC socket which results in a null-pointer dereference.
Fix this by checking the socket state first in smc_tx_sendmsg().
Fixes: e0e4b8fa5338 ("net/smc: Add SMC statistics support")
Reported-by: syzbot+5dda108b672b54141857@syzkaller.appspotmail.com
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for v5.14
Second, and most likely the last, set of patches for v5.14. mt76 and
iwlwifi have most patches in this round, but rtw88 also has some new
features. Nothing special really standing out.
mt76
* mt7915 MSI support
* disable ASPM on mt7915
* mt7915 tx status reporting
* mt7921 decap offload
rtw88
* beacon filter support
* path diversity support
* firmware crash information via devcoredump
* quirks for disabling pci capabilities
mt7601u
* add USB ID for a XiaoDu WiFi Dongle
ath11k
* enable support for QCN9074 PCI devices
brcmfmac
* support parse country code map from DeviceTree
iwlwifi
* support for new hardware
* support for BIOS control of 11ax enablement in Russia
* support UNII4 band enablement from BIOS
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The newly implemented fwnode_mdbiobus_register turned out to be
problematic - in case the fwnode_/of_/acpi_mdio are built as
modules, a dependency cycle can be observed during the depmod phase of
modules_install, eg.:
depmod: ERROR: Cycle detected: fwnode_mdio -> of_mdio -> fwnode_mdio
depmod: ERROR: Found 2 modules in dependency cycles!
OR:
depmod: ERROR: Cycle detected: acpi_mdio -> fwnode_mdio -> acpi_mdio
depmod: ERROR: Found 2 modules in dependency cycles!
A possible solution could be to rework fwnode_mdiobus_register,
so that to merge the contents of acpi_mdiobus_register and
of_mdiobus_register. However feasible, such change would
be very intrusive and affect huge amount of the of_mdiobus_register
users.
Since there are currently 2 users of ACPI and MDIO
(xgmac_mdio and mvmdio), withdraw the fwnode_mdbiobus_register
and roll back to a simple 'if' condition in affected drivers.
Fixes: 62a6ef6a996f ("net: mdiobus: Introduce fwnode_mdbiobus_register()")
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Patch was based on wrong presumption that be_poll can be called only
from bh context. It reintroducing old regression (also reverted) and
causing deadlock when we use netconsole with benet in bonding.
Old revert: commit 072a9c486004 ("netpoll: revert 6bdb7fe3104 and fix
be_poll() instead")
[ 331.269715] bond0: (slave enp0s7f0): Releasing backup interface
[ 331.270121] CPU: 4 PID: 1479 Comm: ifenslave Not tainted 5.13.0-rc7+ #2
[ 331.270122] Call Trace:
[ 331.270122] [c00000001789f200] [c0000000008c505c] dump_stack+0x100/0x174 (unreliable)
[ 331.270124] [c00000001789f240] [c008000001238b9c] be_poll+0x64/0xe90 [be2net]
[ 331.270125] [c00000001789f330] [c000000000d1e6e4] netpoll_poll_dev+0x174/0x3d0
[ 331.270127] [c00000001789f400] [c008000001bc167c] bond_poll_controller+0xb4/0x130 [bonding]
[ 331.270128] [c00000001789f450] [c000000000d1e624] netpoll_poll_dev+0xb4/0x3d0
[ 331.270129] [c00000001789f520] [c000000000d1ed88] netpoll_send_skb+0x448/0x470
[ 331.270130] [c00000001789f5d0] [c0080000011f14f8] write_msg+0x180/0x1b0 [netconsole]
[ 331.270131] [c00000001789f640] [c000000000230c0c] console_unlock+0x54c/0x790
[ 331.270132] [c00000001789f7b0] [c000000000233098] vprintk_emit+0x2d8/0x450
[ 331.270133] [c00000001789f810] [c000000000234758] vprintk+0xc8/0x270
[ 331.270134] [c00000001789f850] [c000000000233c28] printk+0x40/0x54
[ 331.270135] [c00000001789f870] [c000000000ccf908] __netdev_printk+0x150/0x198
[ 331.270136] [c00000001789f910] [c000000000ccfdb4] netdev_info+0x68/0x94
[ 331.270137] [c00000001789f950] [c008000001bcbd70] __bond_release_one+0x188/0x6b0 [bonding]
[ 331.270138] [c00000001789faa0] [c008000001bcc6f4] bond_do_ioctl+0x42c/0x490 [bonding]
[ 331.270139] [c00000001789fb60] [c000000000d0d17c] dev_ifsioc+0x17c/0x400
[ 331.270140] [c00000001789fbc0] [c000000000d0db70] dev_ioctl+0x390/0x890
[ 331.270141] [c00000001789fc10] [c000000000c7c76c] sock_do_ioctl+0xac/0x1b0
[ 331.270142] [c00000001789fc90] [c000000000c7ffac] sock_ioctl+0x31c/0x6e0
[ 331.270143] [c00000001789fd60] [c0000000005b9728] sys_ioctl+0xf8/0x150
[ 331.270145] [c00000001789fdb0] [c0000000000336c0] system_call_exception+0x160/0x2f0
[ 331.270146] [c00000001789fe10] [c00000000000d35c] system_call_common+0xec/0x278
[ 331.270147] --- interrupt: c00 at 0x7fffa6c6ec00
[ 331.270147] NIP: 00007fffa6c6ec00 LR: 0000000105c4185c CTR: 0000000000000000
[ 331.270148] REGS: c00000001789fe80 TRAP: 0c00 Not tainted (5.13.0-rc7+)
[ 331.270148] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000428 XER: 00000000
[ 331.270155] IRQMASK: 0
[ 331.270156] GPR00: 0000000000000036 00007fffd494d5b0 00007fffa6d57100 0000000000000003
[ 331.270158] GPR04: 0000000000008991 00007fffd494d6d0 0000000000000008 00007fffd494f28c
[ 331.270161] GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000
[ 331.270164] GPR12: 0000000000000000 00007fffa6dfa220 0000000000000000 0000000000000000
[ 331.270167] GPR16: 0000000105c44880 0000000000000000 0000000105c60088 0000000105c60318
[ 331.270170] GPR20: 0000000105c602c0 0000000105c44560 0000000000000000 0000000000000000
[ 331.270172] GPR24: 00007fffd494dc50 00007fffd494d6a8 0000000105c60008 00007fffd494d6d0
[ 331.270175] GPR28: 00007fffd494f27e 0000000105c6026c 00007fffd494f284 0000000000000000
[ 331.270178] NIP [00007fffa6c6ec00] 0x7fffa6c6ec00
[ 331.270178] LR [0000000105c4185c] 0x105c4185c
[ 331.270179] --- interrupt: c00
This reverts commit d0d006a43e9a7a796f6f178839c92fcc222c564d.
Fixes: d0d006a43e9a7a ("be2net: disable bh with spin_lock in be_process_mcc")
Signed-off-by: Petr Oros <poros@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If this 'kzalloc()' fails we must free some resources as in all the other
error handling paths of this function.
Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ice_get_vf_vsi() is being called twice for the same VSI. Remove the
unnecessary call/assignment.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
|
|
Remove the VSI info from previous aggregator after moving the VSI to a
new aggregator.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The E810 device supports programmable pins for enabling both input and
output events related to the PTP hardware clock. This includes both
output signals with programmable period, as well as timestamping of
events on input pins.
Add support for enabling these using the CONFIG_PTP_1588_CLOCK
interface.
This allows programming the software defined pins to take advantage of
the hardware clock features.
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The iclog callback chain has it's own lock. That was added way back
in 2008 by myself to alleviate severe lock contention on the
icloglock in commit 114d23aae512 ("[XFS] Per iclog callback chain
lock"). This was long before delayed logging took the icloglock out
of the hot transaction commit path and removed all contention on it.
Hence the separate ic_callback_lock doesn't serve any scalability
purpose anymore, and hasn't for close on a decade.
Further, we only attach callbacks to iclogs in one place where we
are already taking the icloglock soon after attaching the callbacks.
We also have to drop the icloglock to run callbacks and grab it
immediately afterwards again. So given that the icloglock is no
longer hot, making it cover callbacks again doesn't really change
the locking patterns very much at all.
We also need to extend the icloglock to cover callback addition to
fix a zero-day UAF in the CIL push code. This occurs when shutdown
races with xlog_cil_push_work() and the shutdown runs the callbacks
before the push releases the iclog. This results in the CIL context
structure attached to the iclog being freed by the callback before
the CIL push has finished referencing it, leading to UAF bugs.
Hence, to avoid this UAF, we need the callback attachment to be
atomic with post processing of the commit iclog and references to
the structures being attached to the iclog. This requires holding
the icloglock as that's the only way to serialise iclog state
against a shutdown in progress.
The result is we need to be using the icloglock to protect the
callback list addition and removal and serialise them with shutdown.
That makes the ic_callback_lock redundant and so it can be removed.
Fixes: 71e330b59390 ("xfs: Introduce delayed logging core code")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
If we are processing callbacks on an iclog, nothing can be
concurrently adding callbacks to the loop. We only add callbacks to
the iclog when they are in ACTIVE or WANT_SYNC state, and we
explicitly do not add callbacks if the iclog is already in IOERROR
state.
The only way to have a dequeue racing with an enqueue is to be
processing a shutdown without a direct reference to an iclog in
ACTIVE or WANT_SYNC state. As the enqueue avoids this race
condition, we only ever need a single dequeue operation in
xlog_state_do_iclog_callbacks(). Hence we can remove the loop.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
It's completely unnecessary because callbacks are added to iclogs
without holding the icloglock, hence no amount of ordering between
the icloglock and ic_callback_lock will order the removal of
callbacks from the iclog.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
This patch adds build and driver logic for the "mlxbf_gige"
Ethernet driver from Mellanox Technologies. The second
generation BlueField SoC from Mellanox supports an
out-of-band GigaBit Ethernet management port to the Arm
subsystem. This driver supports TCP/IP network connectivity
for that port, and provides back-end routines to handle
basic ethtool requests.
The driver interfaces to the Gigabit Ethernet block of
BlueField SoC via MMIO accesses to registers, which contain
control information or pointers describing transmit and
receive resources. There is a single transmit queue, and
the port supports transmit ring sizes of 4 to 256 entries.
There is a single receive queue, and the port supports
receive ring sizes of 32 to 32K entries. The transmit and
receive rings are allocated from DMA coherent memory. There
is a 16-bit producer and consumer index per ring to denote
software ownership and hardware ownership, respectively.
The main driver logic such as probe(), remove(), and netdev
ops are in "mlxbf_gige_main.c". Logic in "mlxbf_gige_rx.c"
and "mlxbf_gige_tx.c" handles the packet processing for
receive and transmit respectively.
The logic in "mlxbf_gige_ethtool.c" supports the handling
of some basic ethtool requests: get driver info, get ring
parameters, get registers, and get statistics.
The logic in "mlxbf_gige_mdio.c" is the driver controlling
the Mellanox BlueField hardware that interacts with a PHY
device via MDIO/MDC pins. This driver does the following:
- At driver probe time, it configures several BlueField MDIO
parameters such as sample rate, full drive, voltage and MDC
- It defines functions to read and write MDIO registers and
registers the MDIO bus.
- It defines the phy interrupt handler reporting a
link up/down status change
- This driver's probe is invoked from the main driver logic
while the phy interrupt handler is registered in ndo_open.
Driver limitations
- Only supports 1Gbps speed
- Only supports GMII protocol
- Supports maximum packet size of 2KB
- Does not support scatter-gather buffering
Testing
- Successful build of kernel for ARM64, ARM32, X86_64
- Tested ARM64 build on FastModels & Palladium
- Tested ARM64 build on several Mellanox boards that are built with
the BlueField-2 SoC. The testing includes coverage in the areas
of networking (e.g. ping, iperf, ifconfig, route), file transfers
(e.g. SCP), and various ethtool options relevant to this driver.
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Liming Sun <limings@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A recent bug report generated a warning that a code path in
xfs_attr_remove_iter could potentially return error uninitialized in the
case of XFS_DAS_RM_SHRINK state. Fix this by initializing error.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Merge misc fixes from Andrew Morton:
"24 patches, based on 4a09d388f2ab382f217a764e6a152b3f614246f6.
Subsystems affected by this patch series: mm (thp, vmalloc, hugetlb,
memory-failure, and pagealloc), nilfs2, kthread, MAINTAINERS, and
mailmap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
mailmap: add Marek's other e-mail address and identity without diacritics
MAINTAINERS: fix Marek's identity again
mm/page_alloc: do bulk array bounds check after checking populated elements
mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array
mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned
mm/memory-failure: use a mutex to avoid memory_failure() races
mm, futex: fix shared futex pgoff on shmem huge page
kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
kthread_worker: split code for canceling the delayed work timer
mm/vmalloc: unbreak kasan vmalloc support
KVM: s390: prepare for hugepage vmalloc
mm/vmalloc: add vmalloc_no_huge
nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
mm: page_vma_mapped_walk(): get vma_address_end() earlier
mm: page_vma_mapped_walk(): use goto instead of while (1)
mm: page_vma_mapped_walk(): add a level of indentation
mm: page_vma_mapped_walk(): crossing page table boundary
...
|
|
The goal is to keep the mark during a bpf_redirect(), like it is done for
legacy encapsulation / decapsulation, when there is no x-netns.
This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb
mark within the same name space").
When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the
second argument (xnet) was set to true to force a call to skb_orphan(). At
this time, the mark was always cleanned up by skb_scrub_packet(), whatever
xnet value was.
This call to skb_orphan() was removed later in commit
9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.").
But this 'true' stayed here without any real reason.
Let's correctly set xnet in ____dev_forward_skb(), this function has access
to the previous interface and to the new interface.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This ioctl request reads from uffdio_continue structure written by
userspace which justifies _IOC_WRITE flag. It also writes back to that
structure which justifies _IOC_READ flag.
See NOTEs in include/uapi/asm-generic/ioctl.h for more information.
Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This adds unit tests for kunit_filter_subsuite() and
kunit_filter_suites().
Note: what the executor means by "subsuite" is the array of suites
corresponding to each test file.
This patch lightly refactors executor.c to avoid the use of global
variables to make it testable.
It also includes a clever `kfree_at_end()` helper that makes this test
easier to write than it otherwise would have been.
Tested by running just the new tests using itself
$ ./tools/testing/kunit/kunit.py run '*exec*'
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Three more driver bugfixes and an annotation fix for the core"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: robotfuzz-osif: fix control-request directions
i2c: dev: Add __user annotation
i2c: cp2615: check for allocation failure in cp2615_i2c_recv()
i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access
|
|
Make use of the recently added kunit_skip() to skip tests, as it permits
TAP parsers to recognize if a test was deliberately skipped.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Add two new tests to the example test suite, both of which are always
skipped. This is used as an example for how to write tests which are
skipped, and to demonstrate the difference between kunit_skip() and
kunit_mark_skipped().
Note that these tests are enabled by default, so a default run of KUnit
will have two skipped tests.
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Add support for the SKIP directive to kunit_tool's TAP parser.
Skipped tests now show up as such in the printed summary. The number of
skipped tests is counted, and if all tests in a suite are skipped, the
suite is also marked as skipped. Otherwise, skipped tests do affect the
suite result.
Example output:
[00:22:34] ======== [SKIPPED] example_skip ========
[00:22:34] [SKIPPED] example_skip_test # SKIP this test should be skipped
[00:22:34] [SKIPPED] example_mark_skipped_test # SKIP this test should be skipped
[00:22:34] ============================================================
[00:22:34] Testing complete. 2 tests run. 0 failed. 0 crashed. 2 skipped.
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|