Age | Commit message (Collapse) | Author |
|
We support using IORING_SETUP_ATTACH_WQ to share async backends between
rings created by the same process, this now also allows the same to
happen with SQPOLL. The setup procedure remains the same, the caller
sets io_uring_params->wq_fd to the 'parent' context, and then the newly
created ring will attach to that async backend.
This means that multiple rings can share the same SQPOLL thread, saving
resources.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the SQPOLL thread from the ctx, and use the io_sq_data as the
data structure we pass in. io_sq_data has a list of ctx's that we can
then iterate over and handle.
As of now we're ready to handle multiple ctx's, though we're still just
handling a single one after this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move all the necessary state out of io_ring_ctx, and into a new
structure, io_sq_data. The latter now deals with any state or
variables associated with the SQPOLL thread itself.
In preparation for supporting more than one io_ring_ctx per SQPOLL
thread.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is done in preparation for handling more than one ctx, but it also
cleans up the code a bit since io_sq_thread() was a bit too unwieldy to
get a get overview on.
__io_sq_thread() is now the main handler, and it returns an enum sq_ret
that tells io_sq_thread() what it ended up doing. The parent then makes
a decision on idle, spinning, or work handling based on that.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We need to decouple the clearing on wakeup from the the inline schedule,
as that is going to be required for handling multiple rings in one
thread.
Wrap our wakeup handler so we can clear it when we get the wakeup, by
definition that is when we no longer need the flag set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is in preparation to sharing the poller thread between rings. For
that we need per-ring wait_queue_entry storage, and we can't easily put
that on the stack if one thread is managing multiple rings.
We'll also be sharing the wait_queue_head across rings for the purposes
of wakeups, provide the usual private ring wait_queue_head for now but
make it a pointer so we can easily override it when sharing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have a set of flags that are shared between the two and inherired
in kiocb_set_rw_flags(), but we check and set these individually.
Reorder the IOCB flags so that the bottom part of the space is synced
with the RWF flag space, and then we can do them all in one mask and
set operation.
The only exception is RWF_SYNC, which needs to mark IOCB_SYNC and
IOCB_DSYNC. Do that one separately.
This shaves 15 bytes of text from kiocb_set_rw_flags() for me.
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We're not handling signals by default in kernel threads, and we never
use TWA_SIGNAL for the SQPOLL thread internally. Hence we can never
have a signal pending, and we don't need to check for it (nor flush it).
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
During a context switch the scheduler invokes wq_worker_sleeping() with
disabled preemption. Disabling preemption is needed because it protects
access to `worker->sleeping'. As an optimisation it avoids invoking
schedule() within the schedule path as part of possible wake up (thus
preempt_enable_no_resched() afterwards).
The io-wq has been added to the mix in the same section with disabled
preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping()
acquires a spinlock_t. Also within the schedule() the spinlock_t must be
acquired after tsk_is_pi_blocked() otherwise it will block on the
sleeping lock again while scheduling out.
While playing with `io_uring-bench' I didn't notice a significant
latency spike after converting io_wqe::lock to a raw_spinlock_t. The
latency was more or less the same.
In order to keep the spinlock_t it would have to be moved after the
tsk_is_pi_blocked() check which would introduce a branch instruction
into the hot path.
The lock is used to maintain the `work_list' and wakes one task up at
most.
Should io_wqe_cancel_pending_work() cause latency spikes, while
searching for a specific item, then it would need to drop the lock
during iterations.
revert_creds() is also invoked under the lock. According to debug
cred::non_rcu is 0. Otherwise it should be moved outside of the locked
section because put_cred_rcu()->free_uid() acquires a sleeping lock.
Convert io_wqe::lock to a raw_spinlock_t.c
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch adds a new IORING_SETUP_R_DISABLED flag to start the
rings disabled, allowing the user to register restrictions,
buffers, files, before to start processing SQEs.
When IORING_SETUP_R_DISABLED is set, SQE are not processed and
SQPOLL kthread is not started.
The restrictions registration are allowed only when the rings
are disable to prevent concurrency issue while processing SQEs.
The rings can be enabled using IORING_REGISTER_ENABLE_RINGS
opcode with io_uring_register(2).
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The new io_uring_register(2) IOURING_REGISTER_RESTRICTIONS opcode
permanently installs a feature allowlist on an io_ring_ctx.
The io_ring_ctx can then be passed to untrusted code with the
knowledge that only operations present in the allowlist can be
executed.
The allowlist approach ensures that new features added to io_uring
do not accidentally become available when an existing application
is launched on a newer kernel version.
Currently is it possible to restrict sqe opcodes, sqe flags, and
register opcodes.
IOURING_REGISTER_RESTRICTIONS can only be made once. Afterwards
it is not possible to change restrictions anymore.
This prevents untrusted code from removing restrictions.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The enumeration allows us to keep track of the last
io_uring_register(2) opcode available.
Behaviour and opcodes names don't change.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now we have a io_uring kernel header, move this definition out of fs.h
and into io_uring.h where it belongs.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we don't get and assign the namespace for the async work, then certain
paths just don't work properly (like /dev/stdin, /proc/mounts, etc).
Anything that references the current namespace of the given task should
be assigned for async work on behalf of that task.
Cc: stable@vger.kernel.org # v5.5+
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Grab actual references to the files_struct. To avoid circular references
issues due to this, we add a per-task note that keeps track of what
io_uring contexts a task has used. When the tasks execs or exits its
assigned files, we cancel requests based on this tracking.
With that, we can grab proper references to the files table, and no
longer need to rely on stashing away ring_fd and ring_file to check
if the ring_fd may have been closed.
Cc: stable@vger.kernel.org # v5.5+
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This allows us to selectively flush out pending overflows, depending on
the task and/or files_struct being passed in.
No intended functional changes in this patch.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Return whether we found and canceled requests or not. This is in
preparation for using this information, no functional changes in this
patch.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Sometimes we assign a weak reference to it, sometimes we grab a
reference to it. Clean this up and make it unconditional, and drop the
flag related to tracking this state.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We can grab a reference to the task instead of stashing away the task
files_struct. This is doable without creating a circular reference
between the ring fd and the task itself.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
No functional changes in this patch, prep patch for grabbing references
to the files_struct.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently cancel these when the ring exits, and we cancel all of
them. This is in preparation for killing only the ones associated
with a given task.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
* io_uring-5.9:
io_uring: fix async buffered reads when readahead is disabled
io_uring: fix potential ABBA deadlock in ->show_fdinfo()
io_uring: always delete double poll wait entry on match
|
|
Geetha sowjanya says:
====================
Fix bugs in Octeontx2 netdev driver
In existing Octeontx2 network drivers code has issues
like stale entries in broadcast replication list, missing
L3TYPE for IPv6 frames, running tx queues on error and
race condition in mbox reset.
This patch set fixes the above issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Mbox implementation in octeontx2 driver has three states
alloc, send and reset in mbox response. VF allocate and
sends message to PF for processing, PF ACKs them back and
reset the mbox memory. In some case we see synchronization
issue where after msgs_acked is incremented and before
mbox_reset API is called, if current execution is scheduled
out and a different thread is scheduled in which checks for
msgs_acked. Since the new thread sees msgs_acked == msgs_sent
it will try to allocate a new message and to send a new mbox
message to PF.Now if mbox_reset is scheduled in, PF will see
'0' in msgs_send.
This patch fixes the issue by calling mbox_reset before
incrementing msgs_acked flag for last processing message and
checks for valid message size.
Fixes: d424b6c02 ("octeontx2-pf: Enable SRIOV and added VF mbox handling")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently in otx2_open on failure of nix_lf_start
transmit queues are not stopped which are already
started in link_event. Since the tx queues are not
stopped network stack still try's to send the packets
leading to driver crash while access the device resources.
Fixes: 50fe6c02e ("octeontx2-pf: Register and handle link notifications")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For TCP/UDP checksum offload feature in Octeontx2
expects L3TYPE to be set irrespective of IP header
checksum is being offloaded or not. Currently for
IPv6 frames L3TYPE is not being set resulting in
packet drop with checksum error. This patch fixes
this issue.
Fixes: 3ca6c4c88 ("octeontx2-pf: Add packet transmission support")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Packet replication feature present in Octeontx2
is a hardware linked list of PF and its VF
interfaces so that broadcast packets are sent
to all interfaces present in the list. It is
driver job to add and delete a PF/VF interface
to/from the list when the interface is brought
up and down. This patch fixes the
npc_enadis_default_entries function to handle
broadcast replication properly if packet replication
feature is present.
Fixes: 40df309e4166 ("octeontx2-af: Support to enable/disable default MCAM entries")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-09-30
This series contains updates to ice driver only.
Jake increases the wait time for firmware response as it can take longer
than the current wait time. Preserves the NVM capabilities of the device in
safe mode so the device reports its NVM update capabilities properly
when in this state.
v2: Added cover letter
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Link: https://lore.kernel.org/r/20200925092115.16546-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Petazzzoni <thomas.petazzoni@bootlin.com>
|
|
Jonathan reports that the strict policy for memory mapped by the
ACPI core breaks the use case of passing ACPI table overrides via
initramfs. This is due to the fact that the memory type used for
loading the initramfs in memory is not recognized as a memory type
that is typically used by firmware to pass firmware tables.
Since the purpose of the strict policy is to ensure that no AML or
other ACPI code can manipulate any memory that is used by the kernel
to keep its internal state or the state of user tasks, we can relax
the permission check, and allow mappings of memory that is reserved
and marked as NOMAP via memblock, and therefore not covered by the
linear mapping to begin with.
Fixes: 1583052d111f ("arm64/acpi: disallow AML memory opregions to access kernel memory")
Fixes: 325f5585ec36 ("arm64/acpi: disallow writeable AML opregion mapping for EFI code regions")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20200929132522.18067-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Another batch of clk driver fixes:
- Make sure DRAM and ChipID region doesn't get disabled on Exynos
- Fix a SATA failure on Tegra
- Fix the emac_ptp clk divider on stratix10"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk
clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED
clk: tegra: Fix missing prototype for tegra210_clk_register_emc()
clk: tegra: Always program PLL_E when enabled
clk: tegra: Capitalization fixes
clk: samsung: Keep top BPLL mux on Exynos542x enabled
|
|
The NoMMU kernel is broken for QEMU virt machine from Linux-5.9-rc6
because clint_time_val is used even before CLINT driver is probed
at following places:
1. rand_initialize() calls get_cycles() which in-turn uses
clint_time_val
2. boot_init_stack_canary() calls get_cycles() which in-turn
uses clint_time_val
The issue#1 (above) is fixed by providing custom random_get_entropy()
for RISC-V NoMMU kernel. For issue#2 (above), we remove dependency of
boot_init_stack_canary() on get_cycles() and this is aligned with the
boot_init_stack_canary() implementations of ARM, ARM64 and MIPS kernel.
Fixes: d5be89a8d118 ("RISC-V: Resurrect the MMIO timer implementation for M-mode systems")
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
We use a device's allocation state tree to track ranges in a device used
for allocated chunks, and we set ranges in this tree when allocating a new
chunk. However after a device replace operation, we were not setting the
allocated ranges in the new device's allocation state tree, so that tree
is empty after a device replace.
This means that a fitrim operation after a device replace will trim the
device ranges that have allocated chunks and extents, as we trim every
range for which there is not a range marked in the device's allocation
state tree. It is also important during chunk allocation, since the
device's allocation state is used to determine if a range is already
allocated when allocating a new chunk.
This is trivial to reproduce and the following script triggers the bug:
$ cat reproducer.sh
#!/bin/bash
DEV1="/dev/sdg"
DEV2="/dev/sdh"
DEV3="/dev/sdi"
wipefs -a $DEV1 $DEV2 $DEV3 &> /dev/null
# Create a raid1 test fs on 2 devices.
mkfs.btrfs -f -m raid1 -d raid1 $DEV1 $DEV2 > /dev/null
mount $DEV1 /mnt/btrfs
xfs_io -f -c "pwrite -S 0xab 0 10M" /mnt/btrfs/foo
echo "Starting to replace $DEV1 with $DEV3"
btrfs replace start -B $DEV1 $DEV3 /mnt/btrfs
echo
echo "Running fstrim"
fstrim /mnt/btrfs
echo
echo "Unmounting filesystem"
umount /mnt/btrfs
echo "Mounting filesystem in degraded mode using $DEV3 only"
wipefs -a $DEV1 $DEV2 &> /dev/null
mount -o degraded $DEV3 /mnt/btrfs
if [ $? -ne 0 ]; then
dmesg | tail
echo
echo "Failed to mount in degraded mode"
exit 1
fi
echo
echo "File foo data (expected all bytes = 0xab):"
od -A d -t x1 /mnt/btrfs/foo
umount /mnt/btrfs
When running the reproducer:
$ ./replace-test.sh
wrote 10485760/10485760 bytes at offset 0
10 MiB, 2560 ops; 0.0901 sec (110.877 MiB/sec and 28384.5216 ops/sec)
Starting to replace /dev/sdg with /dev/sdi
Running fstrim
Unmounting filesystem
Mounting filesystem in degraded mode using /dev/sdi only
mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/sdi, missing codepage or helper program, or other error.
[19581.748641] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi started
[19581.803842] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi finished
[19582.208293] BTRFS info (device sdi): allowing degraded mounts
[19582.208298] BTRFS info (device sdi): disk space caching is enabled
[19582.208301] BTRFS info (device sdi): has skinny extents
[19582.212853] BTRFS warning (device sdi): devid 2 uuid 1f731f47-e1bb-4f00-bfbb-9e5a0cb4ba9f is missing
[19582.213904] btree_readpage_end_io_hook: 25839 callbacks suppressed
[19582.213907] BTRFS error (device sdi): bad tree block start, want 30490624 have 0
[19582.214780] BTRFS warning (device sdi): failed to read root (objectid=7): -5
[19582.231576] BTRFS error (device sdi): open_ctree failed
Failed to mount in degraded mode
So fix by setting all allocated ranges in the replace target device when
the replace operation is finishing, when we are holding the chunk mutex
and we can not race with new chunk allocations.
A test case for fstests follows soon.
Fixes: 1c11b63eff2a67 ("btrfs: replace pending/pinned chunks lists with io tree")
CC: stable@vger.kernel.org # 5.2+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When closing and freeing the source device we could end up doing our
final blkdev_put() on the bdev, which will grab the bd_mutex. As such
we want to be holding as few locks as possible, so move this call
outside of the dev_replace->lock_finishing_cancel_unmount lock. Since
we're modifying the fs_devices we need to make sure we're holding the
uuid_mutex here, so take that as well.
There's a report from syzbot probably hitting one of the cases where
the bd_mutex and device_list_mutex are taken in the wrong order, however
it's not with device replace, like this patch fixes. As there's no
reproducer available so far, we can't verify the fix.
https://lore.kernel.org/lkml/000000000000fc04d105afcf86d7@google.com/
dashboard link: https://syzkaller.appspot.com/bug?extid=84a0634dc5d21d488419
WARNING: possible circular locking dependency detected
5.9.0-rc5-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.0/6878 is trying to acquire lock:
ffff88804c17d780 (&bdev->bd_mutex){+.+.}-{3:3}, at: blkdev_put+0x30/0x520 fs/block_dev.c:1804
but task is already holding lock:
ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:956 [inline]
__mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255
btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109
__btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916
find_free_extent_update_loop fs/btrfs/extent-tree.c:3807 [inline]
find_free_extent+0x23b7/0x2e60 fs/btrfs/extent-tree.c:4127
btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206
cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063
btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838
writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439
__extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653
extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249
extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370
do_writepages+0xec/0x290 mm/page-writeback.c:2352
__writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461
writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721
wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894
wb_do_writeback fs/fs-writeback.c:2039 [inline]
wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
-> #3 (sb_internal#2){.+.+}-{0:0}:
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
__sb_start_write+0x234/0x470 fs/super.c:1672
sb_start_intwrite include/linux/fs.h:1690 [inline]
start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624
find_free_extent_update_loop fs/btrfs/extent-tree.c:3789 [inline]
find_free_extent+0x25e1/0x2e60 fs/btrfs/extent-tree.c:4127
btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206
cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063
btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838
writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439
__extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653
extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249
extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370
do_writepages+0xec/0x290 mm/page-writeback.c:2352
__writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461
writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721
wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894
wb_do_writeback fs/fs-writeback.c:2039 [inline]
wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
-> #2 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
__flush_work+0x60e/0xac0 kernel/workqueue.c:3041
wb_shutdown+0x180/0x220 mm/backing-dev.c:355
bdi_unregister+0x174/0x590 mm/backing-dev.c:872
del_gendisk+0x820/0xa10 block/genhd.c:933
loop_remove drivers/block/loop.c:2192 [inline]
loop_control_ioctl drivers/block/loop.c:2291 [inline]
loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257
vfs_ioctl fs/ioctl.c:48 [inline]
__do_sys_ioctl fs/ioctl.c:753 [inline]
__se_sys_ioctl fs/ioctl.c:739 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (loop_ctl_mutex){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:956 [inline]
__mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
lo_open+0x19/0xd0 drivers/block/loop.c:1893
__blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507
blkdev_get fs/block_dev.c:1639 [inline]
blkdev_open+0x227/0x300 fs/block_dev.c:1753
do_dentry_open+0x4b9/0x11b0 fs/open.c:817
do_open fs/namei.c:3251 [inline]
path_openat+0x1b9a/0x2730 fs/namei.c:3368
do_filp_open+0x17e/0x3c0 fs/namei.c:3395
do_sys_openat2+0x16d/0x420 fs/open.c:1168
do_sys_open fs/open.c:1184 [inline]
__do_sys_open fs/open.c:1192 [inline]
__se_sys_open fs/open.c:1188 [inline]
__x64_sys_open+0x119/0x1c0 fs/open.c:1188
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&bdev->bd_mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:2496 [inline]
check_prevs_add kernel/locking/lockdep.c:2601 [inline]
validate_chain kernel/locking/lockdep.c:3218 [inline]
__lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426
lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
__mutex_lock_common kernel/locking/mutex.c:956 [inline]
__mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
blkdev_put+0x30/0x520 fs/block_dev.c:1804
btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline]
btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline]
btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline]
close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161
close_fs_devices fs/btrfs/volumes.c:1193 [inline]
btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149
generic_shutdown_super+0x144/0x370 fs/super.c:464
kill_anon_super+0x36/0x60 fs/super.c:1108
btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265
deactivate_locked_super+0x94/0x160 fs/super.c:335
deactivate_super+0xad/0xd0 fs/super.c:366
cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118
task_work_run+0xdd/0x190 kernel/task_work.c:141
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Chain exists of:
&bdev->bd_mutex --> sb_internal#2 --> &fs_devs->device_list_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_devs->device_list_mutex);
lock(sb_internal#2);
lock(&fs_devs->device_list_mutex);
lock(&bdev->bd_mutex);
*** DEADLOCK ***
3 locks held by syz-executor.0/6878:
#0: ffff88809070c0e0 (&type->s_umount_key#70){++++}-{3:3}, at: deactivate_super+0xa5/0xd0 fs/super.c:365
#1: ffffffff8a5b37a8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_close_devices+0x23/0x1f0 fs/btrfs/volumes.c:1178
#2: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159
stack backtrace:
CPU: 0 PID: 6878 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
check_prev_add kernel/locking/lockdep.c:2496 [inline]
check_prevs_add kernel/locking/lockdep.c:2601 [inline]
validate_chain kernel/locking/lockdep.c:3218 [inline]
__lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426
lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
__mutex_lock_common kernel/locking/mutex.c:956 [inline]
__mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
blkdev_put+0x30/0x520 fs/block_dev.c:1804
btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline]
btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline]
btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline]
close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161
close_fs_devices fs/btrfs/volumes.c:1193 [inline]
btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149
generic_shutdown_super+0x144/0x370 fs/super.c:464
kill_anon_super+0x36/0x60 fs/super.c:1108
btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265
deactivate_locked_super+0x94/0x160 fs/super.c:335
deactivate_super+0xad/0xd0 fs/super.c:366
cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118
task_work_run+0xdd/0x190 kernel/task_work.c:141
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x460027
RSP: 002b:00007fff59216328 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000076035 RCX: 0000000000460027
RDX: 0000000000403188 RSI: 0000000000000002 RDI: 00007fff592163d0
RBP: 0000000000000333 R08: 0000000000000000 R09: 000000000000000b
R10: 0000000000000005 R11: 0000000000000246 R12: 00007fff59217460
R13: 0000000002df2a60 R14: 0000000000000000 R15: 00007fff59217460
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add syzbot reference ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Using struct|enum :c:`type` here sounds a bad idea, as it may be
documenting data types that were already removed. Also, it makes
the file harder to read.
As automarkup.py should be able to handle those cases, let's
just get rid of the markup.
While not required by automarkup, in order to make easier for
grep, both type and names are now at the same line.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
There are several places there where struct was used twice,
on multiple lines, like:
struct
struct ``v4l2_zoomcap``
That's probably done during the DocBook to ReSt conversion.
Probably the cases where "struct struct" were at the same line
was addressed back then, but the multi-line cases are still
there.
Get rid of them.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
As reported by Sphinx:
./Documentation/driver-api/media/v4l2-subdev:490: ./include/media/v4l2-subdev.h:384: WARNING: Unparseable C cross-reference: 'struct'
Invalid C declaration: Expected identifier in nested name, got keyword: struct [error at 6]
struct
------^
The markup there is wrong:
&struct &v4l2_input -> &struct v4l2_input
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
The new C domain code on Sphinx 3 is a lot more pedantic.
It only accepts real functions declared as c:function.
So, declarations like this are not valid:
.. c:function::
int (*adap_enable)(struct cec_adapter *adap, bool enable);
Also, no blank lines are allowed after ".. c:function:", and
continuation lines should be like:
.. c:function:
int (void foo, \
int bar);
Change the logic there, in order to avoid lots of warnings
when built with Sphinx 3.x.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
There are two kernel-doc markups for include/media/v4l2-async.h,
one at v4l2-async.rst and another one at v4l2-subdev.rst.
Sphinx 3.x checks it and complains for duplicated symbols.
So, get rid of one of them.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
The same markup are there twice, causing warnings with
Sphinx 3.x. Drop the second copy.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
There are several :c:type: definitions there, in order to
do cross-references with the driver's documentation.
Those are broken when docs are built with Sphinx 3.x, as
it would require :c:struct: instead.
For Sphinx < 3.x, the automarkup.py extension is able to do the
replacement already, and a future improvement on it should make
it also work with Sphinx 3.x.
So, get rid of the usage of the :c:type: macro there.
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
expose us for tracepoints. For imx6q cpuidle, this leads to an RCU splat
according to below.
[6.870684] [<c0db7690>] (_raw_spin_lock) from [<c011f6a4>] (imx6q_enter_wait+0x18/0x9c)
[6.878846] [<c011f6a4>] (imx6q_enter_wait) from [<c09abfb0>] (cpuidle_enter_state+0x168/0x5e4)
To fix the problem, let's assign the corresponding idlestate->flags the
CPUIDLE_FLAG_RCU_IDLE bit, which enables us to call rcu_idle_enter|exit()
at the proper point.
Reported-by: Dong Aisheng <aisheng.dong@nxp.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Fix a reStructuredText syntax error in the cpuidle PM admin-guide
documentation: the ``...'' quotation marks are parsed as partial ''...''
reStructuredText markup and break the output formatting.
This change them to "...".
Signed-off-by: Yoann Congal <yoann.congal@smile.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Fix missing return statement when writing "off" to intel_pstate status
sysfs I/F.
Fixes: 55671ea3257a ("cpufreq: intel_pstate: Free memory only when turning off")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If the driver initializes in safe mode, it will call
ice_set_safe_mode_caps. This results in clearing the capabilities
structures, in order to set them up for operating in safe mode, ensuring
many features are disabled.
This has a side effect of also clearing the capability bits that relate
to NVM update. The result is that the device driver will not indicate
support for unified update, even if the firmware is capable.
Fix this by adding the relevant capability fields to the list of values
we preserve. To simplify the code, use a common_cap structure instead of
a handful of local variables. To reduce some duplication of the
capability name, introduce a couple of macros used to restore the
capabilities values from the cached copy.
Fixes: de9b277ee032 ("ice: Add support for unified NVM update flow capability")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice driver needs to wait for a firmware response to each command to
write a block of data to the scratch area used to update the device
firmware. The driver currently waits for up to 1 second for this to be
returned.
It turns out that firmware might take longer than 1 second to return
a completion in some cases. If this happens, the flash update will fail
to complete.
Fix this by increasing the maximum time that the driver will wait for
both writing a block of data, and for activating the new NVM bank. The
timeout for an erase command is already several minutes, as the firmware
had to erase the entire bank which was already expected to take a minute
or more in the worst case.
In the case where firmware really won't respond, we will now take longer
to fail. However, this ensures that if the firmware is simply slow to
respond, the flash update can still complete. This new maximum timeout
should not adversely increase the update time, as the implementation for
wait_event_interruptible_timeout, and should wake very soon after we get
a completion event. It is better for a flash update be slow but still
succeed than to fail because we gave up too quickly.
Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brijesh Behera <brijeshx.behera@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
We must free the vqs array in the open failure path, because
vhost_vdpa_release will not be called.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/1600712588-9514-2-git-send-email-michael.christie@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
As warned by Sphinx:
./Documentation/gpu/drm-kms-helpers:305: ./include/drm/drm_dsc.h:587: WARNING: Unparseable C cross-reference: 'struct'
Invalid C declaration: Expected identifier in nested name, got keyword: struct [error at 6]
struct
------^
The markup for one struct is wrong, as struct is used twice.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/3d467022325e15bba8dcb13da8fb730099303266.1601467849.git.mchehab+huawei@kernel.org
|
|
Also partially revert the follow-up change "drm: pl111: Absorb the
external register header".
This reverts the parts of commits
7e4e589db76a3cf4c1f534eb5a09cc6422766b93 and
0fb8125635e8eb5483fb095f98dcf0651206a7b8 that touch paths outside
of drivers/gpu/drm/pl111.
The fbdev driver is used by Android's FVP configuration. Using the
DRM driver together with DRM's fbdev emulation results in a failure
to boot Android. The root cause is that Android's generic fbdev
userspace driver relies on the ability to set the pixel format via
FBIOPUT_VSCREENINFO, which is not supported by fbdev emulation.
There have been other less critical behavioral differences identified
between the fbdev driver and the DRM driver with fbdev emulation. The
DRM driver exposes different values for the panel's width, height and
refresh rate, and the DRM driver fails a FBIOPUT_VSCREENINFO syscall
with yres_virtual greater than the maximum supported value instead
of letting the syscall succeed and setting yres_virtual based on yres.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200929195344.2219796-1-pcc@google.com
|
|
gfxoff is temporarily disabled for navy_flounder, since
at present the feature caused some tdr when performing
display operations.
Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|