Age | Commit message (Collapse) | Author |
|
While we are finishing a device replace operation we can have a concurrent
task trying to do a read repair operation, in which case it will call
btrfs_map_block() to get a struct btrfs_bio which can have a stripe that
points to the source device of the device replace operation. This allows
for the read repair task to dereference the stripe's device pointer after
the device replace operation has freed the source device, resulting in
an invalid memory access. This is similar to the problem solved by my
previous patch in the same series and named "Btrfs: fix race between
device replace and discard".
So fix this by surrounding the call to btrfs_map_block() and the code
that uses the returned struct btrfs_bio with calls to
btrfs_bio_counter_inc_blocked() and btrfs_bio_counter_dec(), giving the
proper serialization with the finishing phase of the device replace
operation.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
|
|
While we are finishing a device replace operation, we can make a discard
operation (fs mounted with -o discard) do an invalid memory access like
the one reported by the following trace:
[ 3206.384654] general protection fault: 0000 [#1] PREEMPT SMP
[ 3206.387520] Modules linked in: dm_mod btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis psmouse tpm ppdev sg parport_pc evdev i2c_piix4 parport
processor serio_raw i2c_core pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci
virtio_ring scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[ 3206.388595] CPU: 14 PID: 29194 Comm: fsstress Not tainted 4.6.0-rc7-btrfs-next-29+ #1
[ 3206.388595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[ 3206.388595] task: ffff88017ace0100 ti: ffff880171b98000 task.ti: ffff880171b98000
[ 3206.388595] RIP: 0010:[<ffffffff8124d233>] [<ffffffff8124d233>] blkdev_issue_discard+0x5c/0x2a7
[ 3206.388595] RSP: 0018:ffff880171b9bb80 EFLAGS: 00010246
[ 3206.388595] RAX: ffff880171b9bc28 RBX: 000000000090d000 RCX: 0000000000000000
[ 3206.388595] RDX: ffffffff82fa1b48 RSI: ffffffff8179f46c RDI: ffffffff82fa1b48
[ 3206.388595] RBP: ffff880171b9bcc0 R08: 0000000000000000 R09: 0000000000000001
[ 3206.388595] R10: ffff880171b9bce0 R11: 000000000090f000 R12: ffff880171b9bbe8
[ 3206.388595] R13: 0000000000000010 R14: 0000000000004868 R15: 6b6b6b6b6b6b6b6b
[ 3206.388595] FS: 00007f6182e4e700(0000) GS:ffff88023fdc0000(0000) knlGS:0000000000000000
[ 3206.388595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3206.388595] CR2: 00007f617c2bbb18 CR3: 000000017ad9c000 CR4: 00000000000006e0
[ 3206.388595] Stack:
[ 3206.388595] 0000000000004878 0000000000000000 0000000002400040 0000000000000000
[ 3206.388595] 0000000000000000 ffff880171b9bbe8 ffff880171b9bbb0 ffff880171b9bbb0
[ 3206.388595] ffff880171b9bbc0 ffff880171b9bbc0 ffff880171b9bbd0 ffff880171b9bbd0
[ 3206.388595] Call Trace:
[ 3206.388595] [<ffffffffa042899e>] btrfs_issue_discard+0x12f/0x143 [btrfs]
[ 3206.388595] [<ffffffffa042899e>] ? btrfs_issue_discard+0x12f/0x143 [btrfs]
[ 3206.388595] [<ffffffffa042e862>] btrfs_discard_extent+0x87/0xde [btrfs]
[ 3206.388595] [<ffffffffa04303b5>] btrfs_finish_extent_commit+0xb2/0x1df [btrfs]
[ 3206.388595] [<ffffffff8149c246>] ? __mutex_unlock_slowpath+0x150/0x15b
[ 3206.388595] [<ffffffffa04464c4>] btrfs_commit_transaction+0x7fc/0x980 [btrfs]
[ 3206.388595] [<ffffffff8149c246>] ? __mutex_unlock_slowpath+0x150/0x15b
[ 3206.388595] [<ffffffffa0459af6>] btrfs_sync_file+0x38f/0x428 [btrfs]
[ 3206.388595] [<ffffffff811a8292>] vfs_fsync_range+0x8c/0x9e
[ 3206.388595] [<ffffffff811a82c0>] vfs_fsync+0x1c/0x1e
[ 3206.388595] [<ffffffff811a8417>] do_fsync+0x31/0x4a
[ 3206.388595] [<ffffffff811a8637>] SyS_fsync+0x10/0x14
[ 3206.388595] [<ffffffff8149e025>] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 3206.388595] [<ffffffff81100c6b>] ? time_hardirqs_off+0x9/0x14
[ 3206.388595] [<ffffffff8108e87d>] ? trace_hardirqs_off_caller+0x1f/0xaa
This happens because when we call btrfs_map_block() from
btrfs_discard_extent() to get a btrfs_bio structure, the device replace
operation has not finished yet, but before we use the device of one of the
stripes from the returned btrfs_bio structure, the device object is freed.
This is illustrated by the following diagram.
CPU 1 CPU 2
btrfs_dev_replace_start()
(...)
btrfs_dev_replace_finishing()
btrfs_start_transaction()
btrfs_commit_transaction()
(...)
btrfs_sync_file()
btrfs_start_transaction()
(...)
btrfs_commit_transaction()
btrfs_finish_extent_commit()
btrfs_discard_extent()
btrfs_map_block()
--> returns a struct btrfs_bio
with a stripe that has a
device field pointing to
source device of the replace
operation (the device that
is being replaced)
mutex_lock(&uuid_mutex)
mutex_lock(&fs_info->fs_devices->device_list_mutex)
mutex_lock(&fs_info->chunk_mutex)
btrfs_dev_replace_update_device_in_mapping_tree()
--> iterates the mapping tree and for each
extent map that has a stripe pointing to
the source device, it updates the stripe
to point to the target device instead
btrfs_rm_dev_replace_blocked()
--> waits for fs_info->bio_counter to go down to 0
btrfs_rm_dev_replace_remove_srcdev()
--> removes source device from the list of devices
mutex_unlock(&fs_info->chunk_mutex)
mutex_unlock(&fs_info->fs_devices->device_list_mutex)
mutex_unlock(&uuid_mutex)
btrfs_rm_dev_replace_free_srcdev()
--> frees the source device
--> iterates over all stripes
of the returned struct
btrfs_bio
--> for each stripe it
dereferences its device
pointer
--> it ends up finding a
pointer to the device
used as the source
device for the replace
operation and that was
already freed
So fix this by surrounding the call to btrfs_map_block(), and the code
that uses the returned struct btrfs_bio, with calls to
btrfs_bio_counter_inc_blocked() and btrfs_bio_counter_dec(), so that
the finishing phase of the device replace operation blocks until the
the bio counter decreases to zero before it frees the source device.
This is the same approach we do at btrfs_map_bio() for example.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
|
|
Merge lib/uuid fixes from Andy Shevchenko.
* emailed patches from Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
lib/uuid.c: use correct offset in uuid parser
lib/uuid: add a test module
|
|
Use '+ 0' and '+ 1' as offsets, like they were intended, instead of
adding to the result.
Fixes: 2b1b0d66704a ("lib/uuid.c: introduce a few more generic helpers")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It appears that somehow I missed a test of the latest UUID rework which
landed in the kernel. Present a small test module to avoid such cases
in the future.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- missing selection in public_key that may result in a build failure
- Potential crash in error path in omap-sham
- ccp AES XTS bug that affects requests larger than 4096"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Fix AES XTS error for request sizes above 4096
crypto: public_key: select CRYPTO_AKCIPHER
crypto: omap-sham - potential Oops on error in probe
|
|
Commit d30291b985d1 ("libceph: variable-sized ceph_object_id") changed
dout()s in what is now encode_request() and ceph_object_locator_to_pg()
to use %pE, mostly to document that, although all rbd and cephfs object
names are NULL-terminated strings, ceph_object_id will handle any RADOS
object name, including the one containing NULs, just fine.
However, it turns out that vbin_printf() can't handle anything but ints
and %s - all %p suffixes are ignored. The buffer %p** points to isn't
recorded, resulting in trash in the messages if the buffer had been
reused by the time bstr_printf() got to it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
handle_reply() may be called twice on the same request: on ack and then
on commit. This occurs on btrfs-formatted OSDs or if cephfs sync write
path is triggered - CEPH_OSD_FLAG_ACK | CEPH_OSD_FLAG_ONDISK.
handle_reply() handles this with the help of done_request().
Fixes: 5aea3dcd5021 ("libceph: a major OSD client update")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
For the benefit of every single caller, take osdc instead of map.
Also, now that osdc->osdmap can't ever be NULL, drop the check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When adding the gpiochip, the GPIO HW drivers' callback get_direction()
could get called in atomic context. Some of the GPIO HW drivers may
sleep when accessing the register.
Move the lock before initializing the descriptors.
Reported-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
In fdeb8e1547cb9dd39d5d7223b33f3565cf86c28e
("gpio: reflect base and ngpio into gpio_device")
assumed that GPIO descriptors are either valid or error
pointers, but gpiod_get_[index_]optional() actually return
NULL descriptors and then all subsequent calls should just
bail out.
Cc: stable@vger.kernel.org
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Fixes: fdeb8e1547cb ("gpio: reflect base and ngpio into gpio_device")
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Stop yelling the plane type. "STANDARD" doesn't mean anything anyway.
Let's just use the plane name here.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464371966-15190-8-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Rather than let the core generate usless encoder names, let's pass in
something that actually identifies the piece of hardware we're dealing
with.
v2: Use 'DSI %c' instead of 'MIPI %c' for DSI encoders (Jani)
v3: Use port_name() in DSI code since we have it
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464371966-15190-7-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Let's name our planes in a way that makes sense wrt. the spec:
- skl+ -> "plane 1A", "plane 2A", "plane 1C", "cursor A" etc.
- g4x+ -> "primary A", "primary B", "sprite A", "cursor C" etc.
- pre-g4x -> "plane A", "cursor B" etc.
v2: Rebase on top of the fixed/cleaned error paths
Use a local 'name' variable to make things easier
v3: Pass the name as a function argument to drm_universal_plane_init() (Jani)
v3: Pass the printf style string to drm_universal_plane_init()
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464371966-15190-6-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
Call intel_plane_destroy() instead of drm_plane_cleanup() so that we
also free the plane struct itself when bailing out of the crtc init.
And make intel_plane_destroy() NULL tolerant to avoid having to check
for it in the caller.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464371966-15190-5-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
v2: Fix intel_crtc leak on failure to allocate the name
Use a local 'name' variable to make things easier
v3: Pass the name as a function arguemnt to drm_crtc_init_with_planes() (Jani)
v4: Pass the printf style format string to drm_crtc_init_with_planes()
v5: Drop spurious code changes
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464371966-15190-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We have plane->name, so let's use that in debug messages instead
of just printing the more or less useless object ID.
v2: slap on a commit message
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464371966-15190-3-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
We have crtc->name, so let's use that in debug messages instead
of just printing the more or less useless object ID.
v2: Rebased due to intel_dpll_mgr.c, slap on a commit message
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464371966-15190-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
|
|
If we're using the compatible ioctl() we need to handle the
argument pointer in a special way or there will be trouble.
Fixes: 3c702e9987e2 ("gpio: add a userspace chardev ABI for GPIOs")
Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
This function cannot actually be called with npage = 0, so in practice
this doesn't return an uninitialized value.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Both the INTx and MSI/X disable paths do an eventfd_ctx_put() for the
trigger eventfd before calling vfio_virqfd_disable() any potential
mask and unmask eventfds. This opens a use-after-free race where an
inopportune irqfd can reference the freed signalling eventfd. Reorder
to avoid this possibility.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Downgrade pr_info to pr_debug for the "_PPC limits will be enforced"
message.
In server systems with many cores this message is annoying.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
While iterating and copying extents from the source device, the device
replace code keeps adjusting a left cursor that is used to make sure that
once we finish processing a device extent, any future writes to extents
from the corresponding block group will get into both the source and
target devices. This left cursor is also used for resuming the device
replace operation at mount time.
However using this left cursor to decide whether writes go into both
devices or only the source device is not enough to guarantee we don't
miss copying extents into the target device. There are two cases where
the current approach fails. The first one is related to when there are
holes in the device and they get allocated for new block groups while
the device replace operation is iterating the device extents (more on
this explained below). The second one is that when that loop over the
device extents finishes, we start dellaloc, wait for all ordered extents
and then commit the current transaction, we might have got new block
groups allocated that are now using a device extent that has an offset
greater then or equals to the value of the left cursor, in which case
writes to extents belonging to these new block groups will get issued
only to the source device.
For the first case where the current approach of using a left cursor
fails, consider the source device currently has the following layout:
[ extent bg A ] [ hole, unallocated space ] [extent bg B ]
3Gb 4Gb 5Gb
While we are iterating the device extents from the source device using
the commit root of the device tree, the following happens:
CPU 1 CPU 2
<we are at transaction N>
scrub_enumerate_chunks()
--> searches the device tree for
extents belonging to the source
device using the device tree's
commit root
--> 1st iteration finds extent belonging to
block group A
--> sets block group A to RO mode
(btrfs_inc_block_group_ro)
--> sets cursor left to found_key.offset
which is 3Gb
--> scrub_chunk() starts
copies all allocated extents from
block group's A stripe at source
device into target device
btrfs_alloc_chunk()
--> allocates device extent
in the range [4Gb, 5Gb[
from the source device for
a new block group C
extent allocated from block
group C for a direct IO,
buffered write or btree node/leaf
extent is written to, perhaps
in response to a writepages()
call from the VM or directly
through direct IO
the write is made only against
the source device and not against
the target device because the
extent's offset is in the interval
[4Gb, 5Gb[ which is larger then
the value of cursor_left (3Gb)
--> scrub_chunks() finishes
--> updates left cursor from 3Gb to
4Gb
--> btrfs_dec_block_group_ro() sets
block group A back to RW mode
<we are still at transaction N>
--> 2nd iteration finds extent belonging to
block group B - it did not find the new
extent in the range [4Gb, 5Gb[ for block
group C because we are using the device
tree's commit root or even because the
block group's items are not all yet
inserted in the respective btrees, that is,
the block group is still attached to some
transaction handle's new_bgs list and
btrfs_create_pending_block_groups() was
not called yet against that transaction
handle, so the device extent items were
not yet inserted into the devices tree
<we are still at transaction N>
--> so we end not copying anything from the newly
allocated device extent from the source device
to the target device
So fix this by making __btrfs_map_block() always redirect writes to the
target device as well, independently of the left cursor's value. With
this change the left cursor is now used only for the purpose of tracking
progress and allow a mount operation to resume a device replace.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
|
|
After it finishes processing a device extent, the device replace code sets
back the block group to RW mode and then after that it sets the left cursor
to match the logical end address of the block group, so that future writes
into extents belonging to the block group go both the source (old) and
target (new) devices. However from the moment we turn the block group
back to RW mode we have a short time window, that lasts until we update
the left cursor's value, where extents can be allocated from the block
group and written to, in which case they will not be copied/written to
the target (new) device. Fix this by updating the left cursor's value
before turning the block group back to RW mode.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
|
|
We were assigning new values to fields of the device replace object
without holding the respective lock after processing each device extent.
This is important for the left cursor field which can be accessed by a
concurrent task running __btrfs_map_block (which, correctly, takes the
device replace lock).
So change these fields while holding the device replace lock.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
|
|
When we do a device replace, for each device extent we find from the
source device, we set the corresponding block group to readonly mode to
prevent writes into it from happening while we are copying the device
extent from the source to the target device. However just before we set
the block group to readonly mode some concurrent task might have already
allocated an extent from it or decided it could perform a nocow write
into one of its extents, which can make the device replace process to
miss copying an extent since it uses the extent tree's commit root to
search for extents and only once it finishes searching for all extents
belonging to the block group it does set the left cursor to the logical
end address of the block group - this is a problem if the respective
ordered extents finish while we are searching for extents using the
extent tree's commit root and no transaction commit happens while we
are iterating the tree, since it's the delayed references created by the
ordered extents (when they complete) that insert the extent items into
the extent tree (using the non-commit root of course).
Example:
CPU 1 CPU 2
btrfs_dev_replace_start()
btrfs_scrub_dev()
scrub_enumerate_chunks()
--> finds device extent belonging
to block group X
<transaction N starts>
starts buffered write
against some inode
writepages is run against
that inode forcing dellaloc
to run
btrfs_writepages()
extent_writepages()
extent_write_cache_pages()
__extent_writepage()
writepage_delalloc()
run_delalloc_range()
cow_file_range()
btrfs_reserve_extent()
--> allocates an extent
from block group X
(which is not yet
in RO mode)
btrfs_add_ordered_extent()
--> creates ordered extent Y
flush_epd_write_bio()
--> bio against the extent from
block group X is submitted
btrfs_inc_block_group_ro(bg X)
--> sets block group X to readonly
scrub_chunk(bg X)
scrub_stripe(device extent from srcdev)
--> keeps searching for extent items
belonging to the block group using
the extent tree's commit root
--> it never blocks due to
fs_info->scrub_pause_req as no
one tries to commit transaction N
--> copies all extents found from the
source device into the target device
--> finishes search loop
bio completes
ordered extent Y completes
and creates delayed data
reference which will add an
extent item to the extent
tree when run (typically
at transaction commit time)
--> so the task doing the
scrub/device replace
at CPU 1 misses this
and does not copy this
extent into the new/target
device
btrfs_dec_block_group_ro(bg X)
--> turns block group X back to RW mode
dev_replace->cursor_left is set to the
logical end offset of block group X
So fix this by waiting for all cow and nocow writes after setting a block
group to readonly mode.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
|
|
When it's finishing, the device replace code iterates all extent maps
representing block group and for each one that has a stripe that refers
to the source device, it replaces its device with the target device.
However when it replaces the source device with the target device it,
the target device still has an ID of 0ULL (BTRFS_DEV_REPLACE_DEVID),
only after its ID is changed to match the one from the source device.
This leads to races with the chunk removal code that can temporarly see
a device with an ID of 0ULL and then attempt to use that ID to remove
items from the device tree and fail, causing a transaction abort:
[ 9238.594364] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 3) to /dev/sde finished
[ 9238.594377] ------------[ cut here ]------------
[ 9238.594402] WARNING: CPU: 14 PID: 21566 at fs/btrfs/volumes.c:2771 btrfs_remove_chunk+0x2e5/0x793 [btrfs]
[ 9238.594403] BTRFS: Transaction aborted (error 1)
[ 9238.594416] Modules linked in: btrfs crc32c_generic acpi_cpufreq xor tpm_tis tpm raid6_pq ppdev parport_pc processor psmouse parport i2c_piix4 evdev sg i2c_core se
rio_raw pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod fl
oppy [last unloaded: btrfs]
[ 9238.594418] CPU: 14 PID: 21566 Comm: btrfs-cleaner Not tainted 4.6.0-rc7-btrfs-next-29+ #1
[ 9238.594419] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[ 9238.594421] 0000000000000000 ffff88017f1dbc60 ffffffff8126b42c ffff88017f1dbcb0
[ 9238.594422] 0000000000000000 ffff88017f1dbca0 ffffffff81052b14 00000ad37f1dbd18
[ 9238.594423] 0000000000000001 ffff88018068a558 ffff88005c4b9c00 ffff880233f60db0
[ 9238.594424] Call Trace:
[ 9238.594428] [<ffffffff8126b42c>] dump_stack+0x67/0x90
[ 9238.594430] [<ffffffff81052b14>] __warn+0xc2/0xdd
[ 9238.594432] [<ffffffff81052b7a>] warn_slowpath_fmt+0x4b/0x53
[ 9238.594434] [<ffffffff8116c311>] ? kmem_cache_free+0x128/0x188
[ 9238.594450] [<ffffffffa04d43f5>] btrfs_remove_chunk+0x2e5/0x793 [btrfs]
[ 9238.594452] [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
[ 9238.594464] [<ffffffffa04a26fa>] btrfs_delete_unused_bgs+0x317/0x382 [btrfs]
[ 9238.594476] [<ffffffffa04a961d>] cleaner_kthread+0x1ad/0x1c7 [btrfs]
[ 9238.594489] [<ffffffffa04a9470>] ? btree_invalidatepage+0x8e/0x8e [btrfs]
[ 9238.594490] [<ffffffff8106f403>] kthread+0xd4/0xdc
[ 9238.594494] [<ffffffff8149e242>] ret_from_fork+0x22/0x40
[ 9238.594495] [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286
[ 9238.594496] ---[ end trace 183efbe50275f059 ]---
The sequence of steps leading to this is like the following:
CPU 1 CPU 2
btrfs_dev_replace_finishing()
at this point
dev_replace->tgtdev->devid ==
BTRFS_DEV_REPLACE_DEVID (0ULL)
...
btrfs_start_transaction()
btrfs_commit_transaction()
btrfs_delete_unused_bgs()
btrfs_remove_chunk()
looks up for the extent map
corresponding to the chunk
lock_chunks() (chunk_mutex)
check_system_chunk()
unlock_chunks() (chunk_mutex)
locks fs_info->chunk_mutex
btrfs_dev_replace_update_device_in_mapping_tree()
--> iterates fs_info->mapping_tree and
replaces the device in every extent
map's map->stripes[] with
dev_replace->tgtdev, which still has
an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)
iterates over all stripes from
the extent map
--> calls btrfs_free_dev_extent()
passing it the target device
that still has an ID of 0ULL
--> btrfs_free_dev_extent() fails
--> aborts current transaction
finishes setting up the target device,
namely it sets tgtdev->devid to the value
of srcdev->devid (which is necessarily > 0)
frees the srcdev
unlocks fs_info->chunk_mutex
So fix this by taking the device list mutex while processing the stripes
for the chunk's extent map. This is similar to the race between device
replace and block group creation that was fixed by commit 50460e37186a
("Btrfs: fix race when finishing dev replace leading to transaction abort").
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
|
|
The list of devices is protected by the device_list_mutex and the device
replace code, in its finishing phase correctly takes that mutex before
removing the source device from that list. However the readahead code was
iterating that list without acquiring the respective mutex leading to
crashes later on due to invalid memory accesses:
[125671.831036] general protection fault: 0000 [#1] PREEMPT SMP
[125671.832129] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis tpm ppdev evdev parport_pc psmouse sg parport
processor ser
[125671.834973] CPU: 10 PID: 19603 Comm: kworker/u32:19 Tainted: G W 4.6.0-rc7-btrfs-next-29+ #1
[125671.834973] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[125671.834973] Workqueue: btrfs-readahead btrfs_readahead_helper [btrfs]
[125671.834973] task: ffff8801ac520540 ti: ffff8801ac918000 task.ti: ffff8801ac918000
[125671.834973] RIP: 0010:[<ffffffff81270479>] [<ffffffff81270479>] __radix_tree_lookup+0x6a/0x105
[125671.834973] RSP: 0018:ffff8801ac91bc28 EFLAGS: 00010206
[125671.834973] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6a RCX: 0000000000000000
[125671.834973] RDX: 0000000000000000 RSI: 00000000000c1bff RDI: ffff88002ebd62a8
[125671.834973] RBP: ffff8801ac91bc70 R08: 0000000000000001 R09: 0000000000000000
[125671.834973] R10: ffff8801ac91bc70 R11: 0000000000000000 R12: ffff88002ebd62a8
[125671.834973] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000c1bff
[125671.834973] FS: 0000000000000000(0000) GS:ffff88023fd40000(0000) knlGS:0000000000000000
[125671.834973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[125671.834973] CR2: 000000000073cae4 CR3: 00000000b7723000 CR4: 00000000000006e0
[125671.834973] Stack:
[125671.834973] 0000000000000000 ffff8801422d5600 ffff8802286bbc00 0000000000000000
[125671.834973] 0000000000000001 ffff8802286bbc00 00000000000c1bff 0000000000000000
[125671.834973] ffff88002e639eb8 ffff8801ac91bc80 ffffffff81270541 ffff8801ac91bcb0
[125671.834973] Call Trace:
[125671.834973] [<ffffffff81270541>] radix_tree_lookup+0xd/0xf
[125671.834973] [<ffffffffa04ae6a6>] reada_peer_zones_set_lock+0x3e/0x60 [btrfs]
[125671.834973] [<ffffffffa04ae8b9>] reada_pick_zone+0x29/0x103 [btrfs]
[125671.834973] [<ffffffffa04af42f>] reada_start_machine_worker+0x129/0x2d3 [btrfs]
[125671.834973] [<ffffffffa04880be>] btrfs_scrubparity_helper+0x185/0x3aa [btrfs]
[125671.834973] [<ffffffffa0488341>] btrfs_readahead_helper+0xe/0x10 [btrfs]
[125671.834973] [<ffffffff81069691>] process_one_work+0x271/0x4e9
[125671.834973] [<ffffffff81069dda>] worker_thread+0x1eb/0x2c9
[125671.834973] [<ffffffff81069bef>] ? rescuer_thread+0x2b3/0x2b3
[125671.834973] [<ffffffff8106f403>] kthread+0xd4/0xdc
[125671.834973] [<ffffffff8149e242>] ret_from_fork+0x22/0x40
[125671.834973] [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286
So fix this by taking the device_list_mutex in the readahead code. We
can't use here the lighter approach of using a rcu_read_lock() and
rcu_read_unlock() pair together with a list_for_each_entry_rcu() call
because we end up doing calls to sleeping functions (kzalloc()) in the
respective code path.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
|
|
commit 059500940def (ACPI/video: export acpi_video_get_levels)
mistakenly dropped the correct value of max_level and that caused the
set_level function following failed and the acpi_video backlight interface
didn't get created. Fix this by passing back the correct max_level value.
While at it, also fix the param used in acpi_video_device_lcd_query_levels
where acpi_handle is expected but acpi_video_device is passed.
Fixes: 059500940def (ACPI/video: export acpi_video_get_levels)
Reported-and-tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If the documentation comment does not have params or sections, the
section heading may leak from the previous documentation comment.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
If there are multiple sections with the same section name, the current
implementation results in several sections by the same heading, with the
content duplicated from the last section to all. Even if there's the
error message, a more graceful approach is to combine all the
identically named sections into one, with concatenated contents.
With the supported sections already limited to select few, there are
massively fewer collisions than there used to be, but this is still
useful for e.g. when function parameters are documented in the middle of
a documentation comment, with description spread out above and
below. (This is not a recommended documentation style, but used in the
kernel nonetheless.)
We can now also demote the error to a warning.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
kernel-doc currently identifies anything matching "section header:"
(specifically a string of word characters and spaces followed by a
colon) as a new section in the documentation comment, and renders the
section header accordingly.
Unfortunately, this turns all uses of colon into sections, mostly
unintentionally. Considering the output, erroneously creating sections
when not intended is always worse than erroneously not creating sections
when intended. For example, a line with "http://example.com" turns into
a "http" heading followed by "//example.com" in normal text style, which
is quite ugly. OTOH, "WARNING: Beware of the Leopard" is just fine even
if "WARNING" does not turn into a heading.
It is virtually impossible to change all the kernel-doc comments, either
way. The compromise is to pick the most commonly used and depended on
section headers (with variants) and accept them as section headers.
The accepted section headers are, case insensitive:
* description:
* context:
* return:
* returns:
Additionally, case sensitive:
* @return:
All of the above are commonly used in the kernel-doc comments, and will
result in worse output if not identified as section headers. Also,
kernel-doc already has some special handling for all of them, so there's
nothing particularly controversial in adding more special treatment for
them.
While at it, improve the whitespace handling surrounding section
names. Do not consider the whitespace as part of the name.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Yes, for our purposes the type should contain typedef.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The latter isn't special to rst.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
If a param description spans multiple lines, check any leading
whitespace in the first continuation line, and remove same amount of
whitespace from following lines.
This allows indentation in the multi-line parameter descriptions for
aesthetical reasons while not causing accidentally significant
indentation in the rst output.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Handle whitespace on the first line of param text as if it was the empty
string. There is no need to add the newline in this case. This improves
the rst output in particular, where blank lines may be problematic in
parameter lists.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Move away from field lists, and simply use **strong emphasis** for
section headings on lines of their own. Do not use rst section headings,
because their nesting depth depends on the surrounding context, which
kernel-doc has no knowledge of. Also, they do not need to end up in any
table of contexts or indexes.
There are two related immediate benefits. Field lists are typically
rendered in two columns, while the new style uses the horizontal width
better. With no extra indent on the left, there's no need to be as fussy
about it. Field lists are more susceptible to indentation problems than
the new style.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The inline member markup allows whitespace lines before the actual
documentation starts. Strip the leading blank lines. This improves the
rst output.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Current approach leads to two blank lines, while one is enough.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
No functional changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The use of these is confusing in the script, and per this grep, they're
not used anywhere anyway:
$ git grep " \* [%$&][a-zA-Z0-9_]*:" -- *.[ch] | grep -v "\$\(Id\|Revision\|Date\)"
While at it, throw out the constants array, nothing is ever put there
again.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Let the user use @foo, &bar, %baz, etc. in the first kernel-doc purpose
line too.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
This bit is already done by xml_unescape() above.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Link "&foo->bar", "&foo->bar()", "&foo.bar", and "&foo.bar()" to the
struct/union/enum foo definition. The members themselves do not
currently have anchors to link to, but this is better than nothing, and
promotes a universal notation.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Let the user use "&union foo" and "&typedef foo" to reference foo. The
difference to using "union &foo", "typedef &foo", or just "&foo" (which
are valid too) is that "union" and "typedef" become part of the link
text.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
It's possible to use &foo to reference structs, enums, typedefs, etc. in
the Sphinx C domain. Thus do not prefix the links with "struct".
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The Sphinx C domain spec says function references should include the
parens ().
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
If the user requests a specific DOC: section by name, do not output its
section title. In these cases, the surrounding context already has a
heading, and the DOC: section title is only used as an identifier and a
heading for clarity in the source file.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Make the output selection a bit more readable by adding constants for
the various types of output selection. While at it, actually call the
variable for choosing what to output $output_selection.
No functional changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Make the state machine a bit more readable by adding constants for
parser states and inline member documentation parser substates. While at
it, rename the "split" documentation to "inline" documentation.
No functional changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|