Age | Commit message (Collapse) | Author |
|
Currently hist trigger expressions don't support the use of numeric
literals:
e.g. echo 'hist:keys=common_pid:x=$y-1234'
--> is not valid expression syntax
Having the ability to use numeric constants in hist triggers supports
a wider range of expressions for creating variables.
Add support for creating trace event histogram variables from numeric
literals.
e.g. echo 'hist:keys=common_pid:x=1234,y=size-1024' >> event/trigger
A negative numeric constant is created, using unary minus operator
(parentheses are required).
e.g. echo 'hist:keys=common_pid:z=-(2)' >> event/trigger
Constants can be used with division/multiplication (added in the
next patch in this series) to implement granularity filters for frequent
trace events. For instance we can limit emitting the rss_stat
trace event to when there is a 512KB cross over in the rss size:
# Create a synthetic event to monitor instead of the high frequency
# rss_stat event
echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr;
int member; long size' >> tracing/synthetic_events
# Create a hist trigger that emits the synthetic rss_stat_throttled
# event only when the rss size crosses a 512KB boundary.
echo 'hist:keys=keys=mm_id,member:bucket=size/0x80000:onchange($bucket)
.rss_stat_throttled(mm_id,curr,member,size)'
>> events/kmem/rss_stat/trigger
A use case for using constants with addition/subtraction is not yet
known, but for completeness the use of constants are supported for all
operators.
Link: https://lkml.kernel.org/r/20211025200852.3002369-2-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Stop tracing while reading the trace file by default, to prevent
the test results while checking it and to avoid taking a long time
to check the result.
If there is any testcase which wants to test the tracing while reading
the trace file, please override this setting inside the test case.
This also recovers the pause-on-trace when clean it up.
Link: https://lkml.kernel.org/r/163529053143.690749.15365238954175942026.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.15-2021-10-21:
amdgpu:
- Fix a potential out of bounds write in debugfs
- Fix revision handling for Yellow Carp
- Display fixes for Yellow Carp
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211021203430.4578-1-alexander.deucher@amd.com
|
|
sizeof when applied to a pointer typed expression gives the size of
the pointer.
./drivers/firmware/psci/psci_checker.c:158:41-47: ERROR application of sizeof to pointer
This issue was detected with the help of Coccinelle.
Fixes: 7401056de5f8 ("drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests")
Cc: stable@vger.kernel.org
Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: jing yangyang <jing.yangyang@zte.com.cn>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"One last set of small fixes for the soc tree:
- Incorrect ethernet phy settings found on i.mx and allwinner
platforms
- a revert for a Qualcomm DT change that caused a boot regression
- four patches for incorrect settings in i.MX DT files
- new MAINTAINER file entries for dhcom boards
- a Kconfig fix for a reset driver that became unselectable
- three more code changes for bugs in reset drivers"
* tag 'arm-soc-fixes-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: Add maintainers for DHCOM i.MX6 and DHCOM/DHCOR STM32MP1
Revert "arm64: dts: qcom: sm8250: remove bus clock from the mdss node for sm8250 target"
arm64: dts: imx8mm-kontron: Fix connection type for VSC8531 RGMII PHY
arm64: dts: imx8mm-kontron: Fix CAN SPI clock frequency
arm64: dts: imx8mm-kontron: Fix polarity of reg_rst_eth2
arm64: dts: imx8mm-kontron: Set lower limit of VDD_SNVS to 800 mV
arm64: dts: imx8mm-kontron: Make sure SOC and DRAM supply voltages are correct
reset: socfpga: add empty driver allowing consumers to probe
reset: tegra-bpmp: Handle errors in BPMP response
reset: pistachio: Re-enable driver selection
reset: brcmstb-rescal: fix incorrect polarity of status bit
ARM: dts: sun7i: A20-olinuxino-lime2: Fix ethernet phy-mode
arm64: dts: allwinner: h5: NanoPI Neo 2: Fix ethernet node
|
|
When dispatching a zone append write request to a SCSI zoned block device,
if the target zone of the request is already locked, the device driver will
return BLK_STS_ZONE_RESOURCE and the request will be pushed back to the
hctx dipatch queue. The queue will be marked as RESTART in
dd_finish_request() and restarted in __blk_mq_free_request(). However, this
restart applies to the hctx of the completed request. If the requeued
request is on a different hctx, dispatch will no be retried until another
request is submitted or the next periodic queue run triggers, leading to up
to 30 seconds latency for the requeued request.
Fix this problem by scheduling a queue restart similarly to the
BLK_STS_RESOURCE case or when we cannot get the budget.
Also, consolidate the checks into the "need_resource" variable to simplify
the condition.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Link: https://lore.kernel.org/r/20211026165127.4151055-1-naohiro.aota@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move this out of the generic read/write prep path, and place it in the
write specific kiocb setup instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2021-10-26
We've added 12 non-merge commits during the last 7 day(s) which contain
a total of 23 files changed, 118 insertions(+), 98 deletions(-).
The main changes are:
1) Fix potential race window in BPF tail call compatibility check, from Toke Høiland-Jørgensen.
2) Fix memory leak in cgroup fs due to missing cgroup_bpf_offline(), from Quanyang Wang.
3) Fix file descriptor reference counting in generic_map_update_batch(), from Xu Kuohai.
4) Fix bpf_jit_limit knob to the max supported limit by the arch's JIT, from Lorenz Bauer.
5) Fix BPF sockmap ->poll callbacks for UDP and AF_UNIX sockets, from Cong Wang and Yucong Sun.
6) Fix BPF sockmap concurrency issue in TCP on non-blocking sendmsg calls, from Liu Jian.
7) Fix build failure of INODE_STORAGE and TASK_STORAGE maps on !CONFIG_NET, from Tejun Heo.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix potential race in tail call compatibility check
bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET
selftests/bpf: Use recv_timeout() instead of retries
net: Implement ->sock_is_readable() for UDP and AF_UNIX
skmsg: Extract and reuse sk_msg_is_readable()
net: Rename ->stream_memory_read to ->sock_is_readable
tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
cgroup: Fix memory leak caused by missing cgroup_bpf_offline
bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch()
bpf: Prevent increasing bpf_jit_limit above max
bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT
bpf: Define bpf_jit_alloc_exec_limit for riscv JIT
====================
Link: https://lore.kernel.org/r/20211026201920.11296-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is no git tree for KPROBES in MAINTAINERS, it is not convinent to
rebase, lib/test_kprobes.c and samples/kprobes belong to kprobe, so add
git tree and missing files for KPROBES, and also use linux-trace.git for
TRACING to avoid confusing.
Link: https://lkml.kernel.org/r/1635213091-24387-5-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Since config KPROBES_SANITY_TEST is in lib/Kconfig.debug, it is better to
let test_kprobes.c in lib/, just like other similar tests found in lib/.
Link: https://lkml.kernel.org/r/1635213091-24387-4-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The following reference is invalid, remove it.
https://www.ibm.com/developerworks/library/l-kprobes/index.html
Add the following new reference "An introduction to KProbes":
https://lwn.net/Articles/132196/
Link: https://lkml.kernel.org/r/1635213091-24387-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Use the actual return value instead of always -1 if register_kretprobe()
failed.
E.g. without this patch:
# insmod samples/kprobes/kretprobe_example.ko func=no_such_func
insmod: ERROR: could not insert module samples/kprobes/kretprobe_example.ko: Operation not permitted
With this patch:
# insmod samples/kprobes/kretprobe_example.ko func=no_such_func
insmod: ERROR: could not insert module samples/kprobes/kretprobe_example.ko: Unknown symbol in module
Link: https://lkml.kernel.org/r/1635213091-24387-2-git-send-email-yangtiezhu@loongson.cn
Fixes: 804defea1c02 ("Kprobes: move kprobe examples to samples/")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix the kernel doc of xbc_get_info() to add '@' to the parameters.
Link: https://lkml.kernel.org/r/163525086738.676803.15352231787913236933.stgit@devnote2
Fixes: e306220cb7b7 ("bootconfig: Add xbc_get_info() for the node information")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Add a test case for stacktrace from kretprobe handler and
nested kretprobe handlers.
This test checks both of stack trace inside kretprobe handler
and stack trace from pt_regs. Those stack trace must include
actual function return address instead of kretprobe trampoline.
The nested kretprobe stacktrace test checks whether the unwinder
can correctly unwind the call frame on the stack which has been
modified by the kretprobe.
Since the stacktrace on kretprobe is correctly fixed only on x86,
this introduces a meta kconfig ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
which tells user that the stacktrace on kretprobe is correct or not.
The test results will be shown like below;
TAP version 14
1..1
# Subtest: kprobes_test
1..6
ok 1 - test_kprobe
ok 2 - test_kprobes
ok 3 - test_kretprobe
ok 4 - test_kretprobes
ok 5 - test_stacktrace_on_kretprobe
ok 6 - test_stacktrace_on_nested_kretprobe
# kprobes_test: pass:6 fail:0 skip:0 total:6
# Totals: pass:6 fail:0 skip:0 total:6
ok 1 - kprobes_test
Link: https://lkml.kernel.org/r/163516211244.604541.18350507860972214415.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Since the xbc_alloc_mem() and xbc_free_mem() are used from
the __init functions and memblock_alloc() is __init function,
make them __init functions too.
Link: https://lkml.kernel.org/r/163515075747.547467.5746167540626712819.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 4ee1b4cac236 ("bootconfig: Cleanup dummy headers in tools/bootconfig")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
link
Using the linker script to fix an issue where some archs call the
function tracer with just the ip (instruction pointer) and pip (parent
instruction pointer) where as more up to date archs also pass in the
associated ftrace_ops and the ftrace_regs pointer, the generic code
will be called either with two parameters or four. To avoid any C
undefined behavior of calling two parameters to four or four to two
parameter function, two functions are created, where a preprocessor
macro uses the one that matches the architecture. As the function
pointers for them may be different, a typecast is used. But this
triggers issues with newer compilers that will fail due to -Werror.
A linker trick is now used to map the generic function to the function
that is used (note the generic function is only used to set the default
function callback). The linker trick defines ftrace_ops_list_func (the
generic function) to arch_ftrace_ops_list_func (the arch defined one).
Link: https://lore.kernel.org/all/20200617165616.52241bde@oasis.local.home/
But this fails sh arch because their linker script is included in their
compressed image that does not define arch_ftrace_ops_list_func at all
sh4-linux-ld:arch/sh/boot/compressed/../../kernel/vmlinux.lds:32: undefined symbol `arch_ftrace_ops_list_func' referenced in expression
Included a stub by that name in the misc.c to allow the code to
compile and link, even though it's not used.
This is similar to what was done for ftrace_stub:
b83b43ffc6e4b ("fgraph: Fix function type mismatches of
ftrace_graph_return using ftrace_stub")
Link: https://lkml.kernel.org/r/20211021221627.5d7270de@rorschach.local.home
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fixed warning: Function parameter or member 'enable' not
described in 'genphy_c45_fast_retrain'
Signed-off-by: Luo Jie <luoj@codeaurora.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20211026102957.17100-1-luoj@codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Lorenzo noticed that the code testing for program type compatibility of
tail call maps is potentially racy in that two threads could encounter a
map with an unset type simultaneously and both return true even though they
are inserting incompatible programs.
The race window is quite small, but artificially enlarging it by adding a
usleep_range() inside the check in bpf_prog_array_compatible() makes it
trivial to trigger from userspace with a program that does, essentially:
map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
pid = fork();
if (pid) {
key = 0;
value = xdp_fd;
} else {
key = 1;
value = tc_fd;
}
err = bpf_map_update_elem(map_fd, &key, &value, 0);
While the race window is small, it has potentially serious ramifications in
that triggering it would allow a BPF program to tail call to a program of a
different type. So let's get rid of it by protecting the update with a
spinlock. The commit in the Fixes tag is the last commit that touches the
code in question.
v2:
- Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
v3:
- Put lock and the members it protects into an embedded 'owner' struct (Daniel)
Fixes: 3324b584b6f6 ("ebpf: misc core cleanup")
Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
|
|
bpf_types.h has BPF_MAP_TYPE_INODE_STORAGE and BPF_MAP_TYPE_TASK_STORAGE
declared inside #ifdef CONFIG_NET although they are built regardless of
CONFIG_NET. So, when CONFIG_BPF_SYSCALL && !CONFIG_NET, they are built
without the declarations leading to spurious build failures and not
registered to bpf_map_types making them unavailable.
Fix it by moving the BPF_MAP_TYPE for the two map types outside of
CONFIG_NET.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs")
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/YXG1cuuSJDqHQfRY@slm.duckdns.org
|
|
Cong Wang says:
====================
This patchset fixes ->poll() for sockets in sockmap and updates
selftests accordingly with select(). Please check each patch
for more details.
Fixes: c50524ec4e3a ("Merge branch 'sockmap: add sockmap support for unix datagram socket'")
Fixes: 89d69c5d0fbc ("Merge branch 'sockmap: introduce BPF_SK_SKB_VERDICT and support UDP'")
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
v4: add a comment in udp_poll()
v3: drop sk_psock_get_checked()
reuse tcp_bpf_sock_is_readable()
v2: rename and reuse ->stream_memory_read()
fix a compile error in sk_psock_get_checked()
Cong Wang (3):
net: rename ->stream_memory_read to ->sock_is_readable
skmsg: extract and reuse sk_msg_is_readable()
net: implement ->sock_is_readable() for UDP and AF_UNIX
====================
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We use non-blocking sockets in those tests, retrying for
EAGAIN is ugly because there is no upper bound for the packet
arrival time, at least in theory. After we fix poll() on
sockmap sockets, now we can switch to select()+recv().
Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211008203306.37525-5-xiyou.wangcong@gmail.com
|
|
Yucong noticed we can't poll() sockets in sockmap even
when they are the destination sockets of redirections.
This is because we never poll any psock queues in ->poll(),
except for TCP. With ->sock_is_readable() now we can
overwrite >sock_is_readable(), invoke and implement it for
both UDP and AF_UNIX sockets.
Reported-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211008203306.37525-4-xiyou.wangcong@gmail.com
|
|
tcp_bpf_sock_is_readable() is pretty much generic,
we can extract it and reuse it for non-TCP sockets.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211008203306.37525-3-xiyou.wangcong@gmail.com
|
|
The proto ops ->stream_memory_read() is currently only used
by TCP to check whether psock queue is empty or not. We need
to rename it before reusing it for non-TCP protocols, and
adjust the exsiting users accordingly.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211008203306.37525-2-xiyou.wangcong@gmail.com
|
|
With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls (or
multiple cores) on a single socket 'sk' we could get the following flow.
msgA, sk msgB, sk
----------- ---------------
tcp_bpf_sendmsg()
lock(sk)
psock = sk->psock
tcp_bpf_sendmsg()
lock(sk) ... blocking
tcp_bpf_send_verdict
if (psock->eval == NONE)
psock->eval = sk_psock_msg_verdict
..
< handle SK_REDIRECT case >
release_sock(sk) < lock dropped so grab here >
ret = tcp_bpf_sendmsg_redir
psock = sk->psock
tcp_bpf_send_verdict
lock_sock(sk) ... blocking on B
if (psock->eval == NONE) <- boom.
psock->eval will have msgA state
The problem here is we dropped the lock on msgA and grabbed it with msgB.
Now we have old state in psock and importantly psock->eval has not been
cleared. So msgB will run whatever action was done on A and the verdict
program may never see it.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211012052019.184398-1-liujian56@huawei.com
|
|
Use asm/unwind.h to implement wchan, since we cannot always rely on
STACKTRACE=y.
Fixes: bc9bbb81730e ("x86: Fix get_wchan() to support the ORC unwinder")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20211022152104.137058575@infradead.org
|
|
rpcif_sw_init() can fail so make sure we check the return value
of it and on error exit rpcif_spi_probe() callback with error code.
Fixes: eb8d6d464a27 ("spi: add Renesas RPC-IF driver")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20211025205631.21151-4-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
pm_runtime_disable() cancels all pending power requests, while they
should be completed for the Tegra SPI driver. Otherwise SPI clock won't
be disabled ever again because clk refcount will become unbalanced.
Enforce runtime PM suspension to put device into expected state before
driver is unbound and device's RPM state is reset by driver's core.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20211023225951.14253-2-digetx@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
pm_runtime_disable() cancels all pending power requests, while they
should be completed for the Tegra SPI driver. Otherwise SPI clock won't
be disabled ever again because clk refcount will become unbalanced.
Enforce runtime PM suspension to put device into expected state before
driver is unbound and device's RPM state is reset by driver's core.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Link: https://lore.kernel.org/r/20211023225951.14253-1-digetx@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Fix the missing clk_disable_unprepare() before return
from bcm_qspi_probe() in the error handling case.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20211018073413.2029081-1-yangyingliang@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Fixes screen orientation for GPD Win 3 handheld gaming console.
Signed-off-by: Mario Risoldi <awxkrnl@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211026112737.9181-1-awxkrnl@gmail.com
|
|
TI's implementation does not service the watchdog even if the kernel
command line parameter omap_wdt.early_enable is set to 1. This patch
fixes the issue.
Signed-off-by: Walter Stoll <walter.stoll@duagon.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/88a8fe5229cd68fa0f1fd22f5d66666c1b7057a0.camel@duagon.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
sparse reports the following address space warning.
drivers/watchdog/ixp4xx_wdt.c:122:20: sparse:
incorrect type in assignment (different address spaces)
drivers/watchdog/ixp4xx_wdt.c:122:20: sparse:
expected void [noderef] __iomem *base
drivers/watchdog/ixp4xx_wdt.c:122:20: sparse:
got void *platform_data
Add a typecast to solve the problem.
Fixes: 21a0a29d16c6 ("watchdog: ixp4xx: Rewrite driver to use core")
Cc: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210911042925.556889-1-linux@roeck-us.net
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
The MODULE_DEVICE_TABLE already creates proper alias for platform
driver. Having another MODULE_ALIAS causes the alias to be duplicated.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210917092024.19323-1-krzysztof.kozlowski@canonical.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
SBSA says of the generic watchdog:
All registers are 32 bits in size and should be accessed using 32-bit
reads and writes. If an access size other than 32 bits is used then
the results are IMPLEMENTATION DEFINED.
and for qemu, the implementation will only allow 32-bit accesses
resulting in a synchronous external abort when configuring the watchdog.
Use lo_hi_* accessors rather than a readq/writeq.
Fixes: abd3ac7902fb ("watchdog: sbsa: Support architecture version 1")
Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/20210903112101.493552-1-quic_jiles@quicinc.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
This reverts commit cb011044e34c ("watchdog: iTCO_wdt: Account for
rebooting on second timeout") and commit aec42642d91f ("watchdog: iTCO_wdt:
Fix detection of SMI-off case") since those patches cause a regression
on certain boards (https://bugzilla.kernel.org/show_bug.cgi?id=213809).
While this revert may result in some boards to only reset after twice
the configured timeout value, that is still better than a watchdog reset
after half the configured value.
Fixes: cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on second timeout")
Fixes: aec42642d91f ("watchdog: iTCO_wdt: Fix detection of SMI-off case")
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Mantas Mikulėnas <grawity@gmail.com>
Reported-by: Javier S. Pedro <debbugs@javispedro.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20211008003302.1461733-1-linux@roeck-us.net
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
|
|
revert commit 46ae40b94d88 ("net/mlx5: Let user configure io_eq_size param")
revert commit a6cb08daa3b4 ("net/mlx5: Let user configure event_eq_size param")
revert commit 554604061979 ("net/mlx5: Let user configure max_macs param")
The EQE parameters are applicable to more drivers, they should
be configured via standard API, probably ethtool. Example of
another driver needing something similar:
https://lore.kernel.org/all/1633454136-14679-3-git-send-email-sbhatta@marvell.com/
The last param for "max_macs" is probably fine but the documentation
is severely lacking. The meaning and implications for changing the
param need to be stated.
Link: https://lore.kernel.org/r/20211026152939.3125950-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 95ea0486b20e ("btrfs: allow read-write for 4K sectorsize on 64K
page size systems") added write support for 4K sectorsize on a 64K
systems. Fix the now stale comments.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Christoph pointed out that I'm updating bdev->bd_inode for the device
time when we remove block devices from a btrfs file system, however this
isn't actually exposed to anything. The inode we want to update is the
one that's associated with the path to the device, usually on devtmpfs,
so that blkid notices the difference.
We still don't want to do the blkdev_open, so use kern_path() to get the
path to the given device and do the update time on that inode.
Fixes: 8f96a5bfa150 ("btrfs: update the bdev time directly when closing")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If you already have an inode and need to update the time on the inode
there is no way to do this properly. Export this helper to allow file
systems to update time on the inode so the appropriate handler is
called, either ->update_time or generic_update_time.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Attempting to defragment a Btrfs file containing a transparent huge page
immediately deadlocks with the following stack trace:
#0 context_switch (kernel/sched/core.c:4940:2)
#1 __schedule (kernel/sched/core.c:6287:8)
#2 schedule (kernel/sched/core.c:6366:3)
#3 io_schedule (kernel/sched/core.c:8389:2)
#4 wait_on_page_bit_common (mm/filemap.c:1356:4)
#5 __lock_page (mm/filemap.c:1648:2)
#6 lock_page (./include/linux/pagemap.h:625:3)
#7 pagecache_get_page (mm/filemap.c:1910:4)
#8 find_or_create_page (./include/linux/pagemap.h:420:9)
#9 defrag_prepare_one_page (fs/btrfs/ioctl.c:1068:9)
#10 defrag_one_range (fs/btrfs/ioctl.c:1326:14)
#11 defrag_one_cluster (fs/btrfs/ioctl.c:1421:9)
#12 btrfs_defrag_file (fs/btrfs/ioctl.c:1523:9)
#13 btrfs_ioctl_defrag (fs/btrfs/ioctl.c:3117:9)
#14 btrfs_ioctl (fs/btrfs/ioctl.c:4872:10)
#15 vfs_ioctl (fs/ioctl.c:51:10)
#16 __do_sys_ioctl (fs/ioctl.c:874:11)
#17 __se_sys_ioctl (fs/ioctl.c:860:1)
#18 __x64_sys_ioctl (fs/ioctl.c:860:1)
#19 do_syscall_x64 (arch/x86/entry/common.c:50:14)
#20 do_syscall_64 (arch/x86/entry/common.c:80:7)
#21 entry_SYSCALL_64+0x7c/0x15b (arch/x86/entry/entry_64.S:113)
A huge page is represented by a compound page, which consists of a
struct page for each PAGE_SIZE page within the huge page. The first
struct page is the "head page", and the remaining are "tail pages".
Defragmentation attempts to lock each page in the range. However,
lock_page() on a tail page actually locks the corresponding head page.
So, if defragmentation tries to lock more than one struct page in a
compound page, it tries to lock the same head page twice and deadlocks
with itself.
Ideally, we should be able to defragment transparent huge pages.
However, THP for filesystems is currently read-only, so a lot of code is
not ready to use huge pages for I/O. For now, let's just return
ETXTBUSY.
This can be reproduced with the following on a kernel with
CONFIG_READ_ONLY_THP_FOR_FS=y:
$ cat create_thp_file.c
#include <fcntl.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
static const char zeroes[1024 * 1024];
static const size_t FILE_SIZE = 2 * 1024 * 1024;
int main(int argc, char **argv)
{
if (argc != 2) {
fprintf(stderr, "usage: %s PATH\n", argv[0]);
return EXIT_FAILURE;
}
int fd = creat(argv[1], 0777);
if (fd == -1) {
perror("creat");
return EXIT_FAILURE;
}
size_t written = 0;
while (written < FILE_SIZE) {
ssize_t ret = write(fd, zeroes,
sizeof(zeroes) < FILE_SIZE - written ?
sizeof(zeroes) : FILE_SIZE - written);
if (ret < 0) {
perror("write");
return EXIT_FAILURE;
}
written += ret;
}
close(fd);
fd = open(argv[1], O_RDONLY);
if (fd == -1) {
perror("open");
return EXIT_FAILURE;
}
/*
* Reserve some address space so that we can align the file mapping to
* the huge page size.
*/
void *placeholder_map = mmap(NULL, FILE_SIZE * 2, PROT_NONE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (placeholder_map == MAP_FAILED) {
perror("mmap (placeholder)");
return EXIT_FAILURE;
}
void *aligned_address =
(void *)(((uintptr_t)placeholder_map + FILE_SIZE - 1) & ~(FILE_SIZE - 1));
void *map = mmap(aligned_address, FILE_SIZE, PROT_READ | PROT_EXEC,
MAP_SHARED | MAP_FIXED, fd, 0);
if (map == MAP_FAILED) {
perror("mmap");
return EXIT_FAILURE;
}
if (madvise(map, FILE_SIZE, MADV_HUGEPAGE) < 0) {
perror("madvise");
return EXIT_FAILURE;
}
char *line = NULL;
size_t line_capacity = 0;
FILE *smaps_file = fopen("/proc/self/smaps", "r");
if (!smaps_file) {
perror("fopen");
return EXIT_FAILURE;
}
for (;;) {
for (size_t off = 0; off < FILE_SIZE; off += 4096)
((volatile char *)map)[off];
ssize_t ret;
bool this_mapping = false;
while ((ret = getline(&line, &line_capacity, smaps_file)) > 0) {
unsigned long start, end, huge;
if (sscanf(line, "%lx-%lx", &start, &end) == 2) {
this_mapping = (start <= (uintptr_t)map &&
(uintptr_t)map < end);
} else if (this_mapping &&
sscanf(line, "FilePmdMapped: %ld", &huge) == 1 &&
huge > 0) {
return EXIT_SUCCESS;
}
}
sleep(6);
rewind(smaps_file);
fflush(smaps_file);
}
}
$ ./create_thp_file huge
$ btrfs fi defrag -czstd ./huge
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit 2efc459d06f1 ("sysfs: Add sysfs_emit and sysfs_emit_at to format
sysfs out") merged in 5.10 introduced two new functions sysfs_emit() and
sysfs_emit_at() which are aware of the PAGE_SIZE limit of the output
buffer.
Use the above two new functions instead of scnprintf() and snprintf()
in various sysfs show().
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
It's a common practice to avoid use sizeof(struct btrfs_super_block)
(3531), but to use BTRFS_SUPER_INFO_SIZE (4096).
The problem is that, sizeof(struct btrfs_super_block) doesn't match
BTRFS_SUPER_INFO_SIZE from the very beginning.
Furthermore, for all call sites except selftests, we always allocate
BTRFS_SUPER_INFO_SIZE space for super block, there isn't any real reason
to use the smaller value, and it doesn't really save any space.
So let's get rid of such confusing behavior, and unify those two values.
This modification also adds a new static_assert() to verify the size,
and moves the BTRFS_SUPER_INFO_* macros to the definition of
btrfs_super_block for the static_assert().
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Update the comments at btrfs_chunk_alloc() and do_chunk_alloc() that
describe which cases can lead to a failure to allocate metadata and system
space despite having previously reserved space. This adds one more reason
that I previously forgot to mention.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When a task is doing some modification to the chunk btree and it is not in
the context of a chunk allocation or a chunk removal, it can deadlock with
another task that is currently allocating a new data or metadata chunk.
These contexts are the following:
* When relocating a system chunk, when we need to COW the extent buffers
that belong to the chunk btree;
* When adding a new device (ioctl), where we need to add a new device item
to the chunk btree;
* When removing a device (ioctl), where we need to remove a device item
from the chunk btree;
* When resizing a device (ioctl), where we need to update a device item in
the chunk btree and may need to relocate a system chunk that lies beyond
the new device size when shrinking a device.
The problem happens due to a sequence of steps like the following:
1) Task A starts a data or metadata chunk allocation and it locks the
chunk mutex;
2) Task B is relocating a system chunk, and when it needs to COW an extent
buffer of the chunk btree, it has locked both that extent buffer as
well as its parent extent buffer;
3) Since there is not enough available system space, either because none
of the existing system block groups have enough free space or because
the only one with enough free space is in RO mode due to the relocation,
task B triggers a new system chunk allocation. It blocks when trying to
acquire the chunk mutex, currently held by task A;
4) Task A enters btrfs_chunk_alloc_add_chunk_item(), in order to insert
the new chunk item into the chunk btree and update the existing device
items there. But in order to do that, it has to lock the extent buffer
that task B locked at step 2, or its parent extent buffer, but task B
is waiting on the chunk mutex, which is currently locked by task A,
therefore resulting in a deadlock.
One example report when the deadlock happens with system chunk relocation:
INFO: task kworker/u9:5:546 blocked for more than 143 seconds.
Not tainted 5.15.0-rc3+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u9:5 state:D stack:25936 pid: 546 ppid: 2 flags:0x00004000
Workqueue: events_unbound btrfs_async_reclaim_metadata_space
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xcd9/0x2530 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
rwsem_down_read_slowpath+0x4ee/0x9d0 kernel/locking/rwsem.c:993
__down_read_common kernel/locking/rwsem.c:1214 [inline]
__down_read kernel/locking/rwsem.c:1223 [inline]
down_read_nested+0xe6/0x440 kernel/locking/rwsem.c:1590
__btrfs_tree_read_lock+0x31/0x350 fs/btrfs/locking.c:47
btrfs_tree_read_lock fs/btrfs/locking.c:54 [inline]
btrfs_read_lock_root_node+0x8a/0x320 fs/btrfs/locking.c:191
btrfs_search_slot_get_root fs/btrfs/ctree.c:1623 [inline]
btrfs_search_slot+0x13b4/0x2140 fs/btrfs/ctree.c:1728
btrfs_update_device+0x11f/0x500 fs/btrfs/volumes.c:2794
btrfs_chunk_alloc_add_chunk_item+0x34d/0xea0 fs/btrfs/volumes.c:5504
do_chunk_alloc fs/btrfs/block-group.c:3408 [inline]
btrfs_chunk_alloc+0x84d/0xf50 fs/btrfs/block-group.c:3653
flush_space+0x54e/0xd80 fs/btrfs/space-info.c:670
btrfs_async_reclaim_metadata_space+0x396/0xa90 fs/btrfs/space-info.c:953
process_one_work+0x9df/0x16d0 kernel/workqueue.c:2297
worker_thread+0x90/0xed0 kernel/workqueue.c:2444
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task syz-executor:9107 blocked for more than 143 seconds.
Not tainted 5.15.0-rc3+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor state:D stack:23200 pid: 9107 ppid: 7792 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xcd9/0x2530 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
__mutex_lock_common kernel/locking/mutex.c:669 [inline]
__mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729
btrfs_chunk_alloc+0x31a/0xf50 fs/btrfs/block-group.c:3631
find_free_extent_update_loop fs/btrfs/extent-tree.c:3986 [inline]
find_free_extent+0x25cb/0x3a30 fs/btrfs/extent-tree.c:4335
btrfs_reserve_extent+0x1f1/0x500 fs/btrfs/extent-tree.c:4415
btrfs_alloc_tree_block+0x203/0x1120 fs/btrfs/extent-tree.c:4813
__btrfs_cow_block+0x412/0x1620 fs/btrfs/ctree.c:415
btrfs_cow_block+0x2f6/0x8c0 fs/btrfs/ctree.c:570
btrfs_search_slot+0x1094/0x2140 fs/btrfs/ctree.c:1768
relocate_tree_block fs/btrfs/relocation.c:2694 [inline]
relocate_tree_blocks+0xf73/0x1770 fs/btrfs/relocation.c:2757
relocate_block_group+0x47e/0xc70 fs/btrfs/relocation.c:3673
btrfs_relocate_block_group+0x48a/0xc60 fs/btrfs/relocation.c:4070
btrfs_relocate_chunk+0x96/0x280 fs/btrfs/volumes.c:3181
__btrfs_balance fs/btrfs/volumes.c:3911 [inline]
btrfs_balance+0x1f03/0x3cd0 fs/btrfs/volumes.c:4301
btrfs_ioctl_balance+0x61e/0x800 fs/btrfs/ioctl.c:4137
btrfs_ioctl+0x39ea/0x7b70 fs/btrfs/ioctl.c:4949
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
So fix this by making sure that whenever we try to modify the chunk btree
and we are neither in a chunk allocation context nor in a chunk remove
context, we reserve system space before modifying the chunk btree.
Reported-by: Hao Sun <sunhao.th@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsax51i4mu6C0C3vJqQN3NR_iVuucoeG3U1HXjrgzn5FFQ@mail.gmail.com/
Fixes: 79bd37120b1495 ("btrfs: rework chunk allocation to avoid exhaustion of the system chunk array")
CC: stable@vger.kernel.org # 5.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently auto reclaim of unusable zones reclaims the block-groups in
the order they have been added to the reclaim list.
Change this to a greedy algorithm by sorting the list so we have the
block-groups with the least amount of valid bytes reclaimed first.
Note: we can't splice the block groups from reclaim_bgs to let the sort
happen outside of the lock. The block groups can be still in use by
other parts eg. via bg_list and we must hold unused_bgs_lock while
processing them.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ write note and comment why we can't splice the list ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Just use the %pg format specifier in all the debug printks previously
using it. Note that both bdevname and the %pg specifier never print
a pathname, so the kbasename call wasn't needed to start with.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ adjust messages and indentation ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
For device removal and replace we call btrfs_find_device_by_devspec,
which if we give it a device path and nothing else will call
btrfs_get_dev_args_from_path, which opens the block device and reads the
super block and then looks up our device based on that.
However at this point we're holding the sb write "lock", so reading the
block device pulls in the dependency of ->open_mutex, which produces the
following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #405 Not tainted
------------------------------------------------------
losetup/11576 is trying to acquire lock:
ffff9bbe8cded938 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
but task is already holding lock:
ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
__mutex_lock+0x7d/0x750
lo_open+0x28/0x60 [loop]
blkdev_get_whole+0x25/0xf0
blkdev_get_by_dev.part.0+0x168/0x3c0
blkdev_open+0xd2/0xe0
do_dentry_open+0x161/0x390
path_openat+0x3cc/0xa20
do_filp_open+0x96/0x120
do_sys_openat2+0x7b/0x130
__x64_sys_openat+0x46/0x70
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #3 (&disk->open_mutex){+.+.}-{3:3}:
__mutex_lock+0x7d/0x750
blkdev_get_by_dev.part.0+0x56/0x3c0
blkdev_get_by_path+0x98/0xa0
btrfs_get_bdev_and_sb+0x1b/0xb0
btrfs_find_device_by_devspec+0x12b/0x1c0
btrfs_rm_device+0x127/0x610
btrfs_ioctl+0x2a31/0x2e70
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #2 (sb_writers#12){.+.+}-{0:0}:
lo_write_bvec+0xc2/0x240 [loop]
loop_process_work+0x238/0xd00 [loop]
process_one_work+0x26b/0x560
worker_thread+0x55/0x3c0
kthread+0x140/0x160
ret_from_fork+0x1f/0x30
-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
process_one_work+0x245/0x560
worker_thread+0x55/0x3c0
kthread+0x140/0x160
ret_from_fork+0x1f/0x30
-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
__lock_acquire+0x10ea/0x1d90
lock_acquire+0xb5/0x2b0
flush_workqueue+0x91/0x5e0
drain_workqueue+0xa0/0x110
destroy_workqueue+0x36/0x250
__loop_clr_fd+0x9a/0x660 [loop]
block_ioctl+0x3f/0x50
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Chain exists of:
(wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&lo->lo_mutex);
lock(&disk->open_mutex);
lock(&lo->lo_mutex);
lock((wq_completion)loop0);
*** DEADLOCK ***
1 lock held by losetup/11576:
#0: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
stack backtrace:
CPU: 0 PID: 11576 Comm: losetup Not tainted 5.14.0-rc2+ #405
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack_lvl+0x57/0x72
check_noncircular+0xcf/0xf0
? stack_trace_save+0x3b/0x50
__lock_acquire+0x10ea/0x1d90
lock_acquire+0xb5/0x2b0
? flush_workqueue+0x67/0x5e0
? lockdep_init_map_type+0x47/0x220
flush_workqueue+0x91/0x5e0
? flush_workqueue+0x67/0x5e0
? verify_cpu+0xf0/0x100
drain_workqueue+0xa0/0x110
destroy_workqueue+0x36/0x250
__loop_clr_fd+0x9a/0x660 [loop]
? blkdev_ioctl+0x8d/0x2a0
block_ioctl+0x3f/0x50
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f31b02404cb
Instead what we want to do is populate our device lookup args before we
grab any locks, and then pass these args into btrfs_rm_device(). From
there we can find the device and do the appropriate removal.
Suggested-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We are going to want to populate our device lookup args outside of any
locks and then do the actual device lookup later, so add a helper to do
this work and make btrfs_find_device_by_devspec() use this helper for
now.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We have a lot of device lookup functions that all do something slightly
different. Clean this up by adding a struct to hold the different
lookup criteria, and then pass this around to btrfs_find_device() so it
can do the proper matching based on the lookup criteria.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|