Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, there is no special interesting feature, but we've
investigated a couple of tuning points with respect to the I/O flow.
Several major bug fixes and a bunch of clean-ups also have been made.
This patch-set includes the following major enhancement patches:
- enhance wait_on_page_writeback
- support SEEK_DATA and SEEK_HOLE
- enhance readahead flows
- enhance IO flushes
- support fiemap
- add some tracepoints
The other bug fixes are as follows:
- fix to support a large volume > 2TB correctly
- recovery bug fix wrt fallocated space
- fix recursive lock on xattr operations
- fix some cases on the remount flow
And, there are a bunch of cleanups"
* tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
f2fs: support f2fs_fiemap
f2fs: avoid not to call remove_dirty_inode
f2fs: recover fallocated space
f2fs: fix to recover data written by dio
f2fs: large volume support
f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
f2fs: avoid overflow when large directory feathure is enabled
f2fs: fix recursive lock by f2fs_setxattr
MAINTAINERS: add a co-maintainer from samsung for F2FS
MAINTAINERS: change the email address for f2fs
f2fs: use inode_init_owner() to simplify codes
f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency
f2fs: add a tracepoint for f2fs_read_data_page
f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages
f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page
f2fs: add a tracepoint for f2fs_write_end
f2fs: add a tracepoint for f2fs_write_begin
f2fs: fix checkpatch warning
f2fs: deactivate inode page if the inode is evicted
f2fs: decrease the lock granularity during write_begin
...
|
|
Fixes WARN()s from the DRM core since the page flip rework.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=77521
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Pull CIFS fixes from Steve French.
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix memory leaks in SMB2_open
cifs: ensure that vol->username is not NULL before running strlen on it
Clarify SMB2/SMB3 create context and add missing ones
Do not send ClientGUID on SMB2.02 dialect
cifs: Set client guid on per connection basis
fs/cifs/netmisc.c: convert printk to pr_foo()
fs/cifs/cifs.c: replace seq_printf by seq_puts
Update cifs version number to 2.03
fs: cifs: new helper: file_inode(file)
cifs: fix potential races in cifs_revalidate_mapping
cifs: new helper function: cifs_revalidate_mapping
cifs: convert booleans in cifsInodeInfo to a flags field
cifs: fix cifs_uniqueid_to_ino_t not to ever return 0
|
|
Updated powertune settings for certain SI asics.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This caused reduced performance for some users with advanced post
processing enabled. We need a better method to pick the
UVD state based on the amount of post processing required or tune
the advanced post processing to fit within the lower power state
envelope.
This reverts commit 14a9579ddbf15dd1992a9481a4ec80b0b91656d5.
Cc: "3.15" <stable@vger.kernel.org>
|
|
Query to find out how many compute units on a GPU.
Useful for OpenCL usermode drivers.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
v2: agd5f: simplify patch
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
And also domain to prefered_domains. That matches better
what those values represent.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The underlying reason for the crashes seems to be fixed now.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We never check the return value anyway and if the
index isn't valid would crash way before calling
the functions.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When we set the valid bit on invalid GART entries they are
loaded into the TLB when an adjacent entry is loaded. This
poisons the TLB with invalid entries which are sometimes
not correctly removed on TLB flush.
For stable inclusion the patch probably needs to be modified a bit.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Make sure that a hdmi deep color mode can't exceed the max tmds
clock limit of a hdmi sink if such a limit is defined by edid.
If requested deep color bpc would exceed the limit given the mode
to be set, try to degrade gracefully to lower supported deep color
bpc or to standard 8 bpc if needed.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
HDMI deep color setup must know which modes are supported if
it needs to degrade gracefully, as only 12 bpc / dc_36 is
guaranteed, but 10 bpc / dc_30 is optional. The maximum bpc
is not sufficient for this.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Hawaii has the same version of VCE as other CIK parts.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Replace occurrences of "v & 0xffffffff" with lower_32_bits(v)
when it's next to an upper_32_bits(v). Also remove unnecessary
"upper_32_bits(v) & 0xffffffff" code snippets.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This patch consists of the usual driver updates (qla2xxx, qla4xxx,
lpfc, be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to
maintained status of a long neglected driver for older hardware. In
addition there are a lot of minor fixes and cleanups and some more
updates to make scsi mq ready"
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (130 commits)
include/scsi/osd_protocol.h: remove unnecessary __constant
mvsas: Recognise device/subsystem 9485/9485 as 88SE9485
Revert "be2iscsi: Fix processing cqe for cxn whose endpoint is freed"
mptfusion: fix msgContext in mptctl_hp_hostinfo
acornscsi: remove linked command support
scsi/NCR5380: dprintk macro
fusion: Remove use of DEF_SCSI_QCMD
fusion: Add free msg frames to the head, not tail of list
mpt2sas: Add free smids to the head, not tail of list
mpt2sas: Remove use of DEF_SCSI_QCMD
mpt2sas: Remove uses of serial_number
mpt3sas: Remove use of DEF_SCSI_QCMD
mpt3sas: Remove uses of serial_number
qla2xxx: Use kmemdup instead of kmalloc + memcpy
qla4xxx: Use kmemdup instead of kmalloc + memcpy
qla2xxx: fix incorrect debug printk
be2iscsi: Bump the driver version
be2iscsi: Fix processing cqe for cxn whose endpoint is freed
be2iscsi: Fix destroy MCC-CQ before MCC-EQ is destroyed
be2iscsi: Fix memory corruption in MBX path
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"A big update to the Atmel touchscreen driver, devm support for polled
input devices, several drivers have been converted to using managed
resources, and assorted driver fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (87 commits)
Input: synaptics - fix resolution for manually provided min/max
Input: atmel_mxt_ts - fix invalid return from mxt_get_bootloader_version
Input: max8997_haptic - add error handling for regulator and pwm
Input: elantech - don't set bit 1 of reg_10 when the no_hw_res quirk is set
Input: elantech - deal with clickpads reporting right button events
Input: edt-ft5x06 - fix an i2c write for M09 support
Input: omap-keypad - remove platform data support
ARM: OMAP2+: remove unused omap4-keypad file and code
Input: ab8500-ponkey - switch to using managed resources
Input: max8925_onkey - switch to using managed resources
Input: 88pm860x-ts - switch to using managed resources
Input: 88pm860x_onkey - switch to using managed resources
Input: intel-mid-touch - switch to using managed resources
Input: wacom - process outbound for newer Cintiqs
Input: wacom - set stylus_in_proximity when pen is in range
DTS: ARM: OMAP3-N900: Add tsc2005 support
Input: tsc2005 - add DT support
Input: add common DT binding for touchscreens
Input: jornada680_kbd - switch top using managed resources
Input: adp5520-keys - switch to using managed resources
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux
Pull OMAP DT fbdev updates from Tomi Valkeinen:
"Here are display related device tree data changes for OMAP. They are
based on an already merged branch to satisfy the dependencies for the
dts file changes.
Add OMAP DT data:
- omap5 display subsystem
- display data for omap5 uEVM board
- am43xx display subsystem
- display data for am43xx ePOS and GP boards (LCD only)
- display data for GTA04 board
- display data for overo board
- display data for duovero-parlor board
- display data for omap3 evm and ldp boards"
* tag 'fbdev-omap-dt-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
ARM: omap5.dtsi: Add audio related parameters to hdmi node
ARM: omap4.dtsi: Add audio related parametes to hdmi node
ARM: dts: duovero-parlor: Add HDMI output
ARM: dts: overo: Add support for 3.5'' LCD output
ARM: dts: overo: Add support for 4.3'' LCD output
ARM: dts: overo: Add support for DVI output
ARM: dts: Add LCD panel sharp ls037v7dw01 support for omap3-evm and ldp
ARM: dts: omap3-gta04: Add display support
ARM: dts: omap5-uevm.dts: add display nodes
ARM: dts: omap5-uevm.dts: add tca6424a
ARM: dts: omap5.dtsi: add DSS nodes
ARM: dts: am43x-epos-evm: add LCD data
ARM: dts: am437x-gp-evm: add LCD data
ARM: dts: am4372.dtsi: add DSS information
|
|
Pull MIPS updates from Ralf Baechle:
- three fixes for 3.15 that didn't make it in time
- limited Octeon 3 support.
- paravirtualization support
- improvment to platform support for Netlogix SOCs.
- add support for powering down the Malta eval board in software
- add many instructions to the in-kernel microassembler.
- add support for the BPF JIT.
- minor cleanups of the BCM47xx code.
- large cleanup of math emu code resulting in significant code size
reduction, better readability of the code and more accurate
emulation.
- improvments to the MIPS CPS code.
- support C3 power status for the R4k count/compare clock device.
- improvments to the GIO support for older SGI workstations.
- increase number of supported CPUs to 256; this can be reached on
certain embedded multithreaded ccNUMA configurations.
- various small cleanups, updates and fixes
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (173 commits)
MIPS: IP22/IP28: Improve GIO support
MIPS: Octeon: Add twsi interrupt initialization for OCTEON 3XXX, 5XXX, 63XX
DEC: Document the R4k MB ASIC mini interrupt controller
DEC: Add self as the maintainer
MIPS: Add microMIPS MSA support.
MIPS: Replace calls to obsolete strict_strto call with kstrto* equivalents.
MIPS: Replace obsolete strict_strto call with kstrto
MIPS: BFP: Simplify code slightly.
MIPS: Call find_vma with the mmap_sem held
MIPS: Fix 'write_msa_##' inline macro.
MIPS: Fix MSA toolchain support detection.
mips: Update the email address of Geert Uytterhoeven
MIPS: Add minimal defconfig for mips_paravirt
MIPS: Enable build for new system 'paravirt'
MIPS: paravirt: Add pci controller for virtio
MIPS: Add code for new system 'paravirt'
MIPS: Add functions for hypervisor call
MIPS: OCTEON: Add OCTEON3 to __get_cpu_type
MIPS: Add function get_ebase_cpunum
MIPS: Add minimal support for OCTEON3 to c-r4k.c
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
"Nothing too exciting here, just minor fixes/cleanup. Only noteworthy
ones are:
- Moving cache disabling to early boot
- ARC UART enabled only if earlyprintk setup in cmdline"
* tag 'arc-v3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Disable caches in early boot if so configured
ARC: [arcfpga] Early ARC UART to be only activated by cmdline
ARC: [arcfpga] Get rid of legacy BVCI latency unit support
ARC: remove duplicate header exports
ARC: arc_local_timer_setup() need not pass own cpu id
ARC: Fixed spelling errors within comments
ARC: make start_thread() out-of-line
ARC: fix mmuv2 warning
ARC: [SMP] ISS SMP extension bitrot
|
|
The raid5 sync_request() processing calls handle_stripe() within the context of
the resync-thread. The resync-thread issues the first set of read requests
and this adds execution latency and slows down the scheduling of the next
sync_request().
The current rebuild/resync speed of raid5 is not much faster than what
rotational HDDs can sustain.
Testing the following patch on a 6-drive array, I can increase the rebuild
speed from 100 MB/s to 175 MB/s.
The sync_request() now just sets STRIPE_HANDLE and releases the stripe. This
creates some more parallelism between the resync-thread and raid5 kernel daemon.
Signed-off-by: Eivind Sarto <esarto@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
The skinny extents are intepreted incorrectly in scrub_print_warning(),
and end up hitting the BUG() in btrfs_extent_inline_ref_size.
Reported-by: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
When cloning into a file, we were correctly replacing the extent
items in the target range and removing the extent maps. However
we weren't replacing the extent maps with new ones that point to
the new extents - as a consequence, an incremental fsync (when the
inode doesn't have the full sync flag) was a NOOP, since it relies
on the existence of extent maps in the modified list of the inode's
extent map tree, which was empty. Therefore add new extent maps to
reflect the target clone range.
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We want to make sure the point is still within the extent item, not to verify
the memory it's pointing to.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
The backref code was looking at nodes as well as leaves when we tried to
populate extent item entries. This is not good, and although we go away with it
for the most part because we'd skip where disk_bytenr != random_memory,
sometimes random_memory would match and suddenly boom. This fixes that problem.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
In inode.c:btrfs_page_exists_in_range(), if the page we got from
the radix tree is an exception entry, which can't be retried, we
exit the loop with a non-NULL page and then call page_cache_release
against it, which is not ok since it's not a valid page. This could
also make us return true when we shouldn't.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
In inode.c:btrfs_page_exists_in_range(), if the page we get from the
radix tree is an exception which should make us retry, set page to
NULL in order to really retry, because otherwise we don't get another
loop iteration executed (page != NULL makes the while loop exit).
This also was making us call page_cache_release after exiting the loop,
which isn't correct because page doesn't point to a valid page, and
possibly return true from the function when we shouldn't.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
In inode.c:btrfs_page_exists_in_range(), if we can't get the page
we need to retry. However we weren't retrying because we weren't
setting page to NULL, which makes the while loop exit immediately
and will make us call page_cache_release after exiting the loop
which is incorrect because our page get didn't succeed. This could
also make us return true when we shouldn't.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
To return EOPNOTSUPP is more user friendly than to return EINVAL,
and then user-space tool will show that the dev_replace operation
for raid56 is not currently supported rather than showing that
there is an invalid argument.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Several reports about leaf corruption has been floating on the list, one of them
points to __btrfs_drop_extents(), and we find that the leaf becomes corrupted
after __btrfs_drop_extents(), it's really a rare case but it does exist.
The problem turns out to be btrfs_next_leaf() called in __btrfs_drop_extents().
So in btrfs_next_leaf(), we release the current path to re-search the last key of
the leaf for locating next leaf, and we've taken it into account that there might
be balance operations between leafs during this 'unlock and re-lock' dance, so
we check the path again and advance it if there are now more items available.
But things are a bit different if that last key happens to be removed and balance
gets a bigger key as the last one, and btrfs_search_slot will return it with
ret > 0, IOW, nothing change in this leaf except the new last key, then we think
we're okay because there is no more item balanced in, fine, we thinks we can
go to the next leaf.
However, we should return that bigger key, otherwise we deserve leaf corruption,
for example, in endio, skipping that key means that __btrfs_drop_extents() thinks
it has dropped all extent matched the required range and finish_ordered_io can
safely insert a new extent, but it actually doesn't and ends up a leaf
corruption.
One may be asking that why our locking on extent io tree doesn't work as
expected, ie. it should avoid this kind of race situation. But in
__btrfs_drop_extents(), we don't always find extents which are included within
our locking range, IOW, extents can start before our searching start, in this
case locking on extent io tree doesn't protect us from the race.
This takes the special case into account.
Reviewed-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We might have had an item with the previous key in the tree right
before we released our path. And after we released our path, that
item might have been pushed to the first slot (0) of the leaf we
were holding due to a tree balance. Alternatively, an item with the
previous key can exist as the only element of a leaf (big fat item).
Therefore account for these 2 cases, so that our callers (like
btrfs_previous_item) don't miss an existing item with a key matching
the previous key we computed above.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
If the NO_HOLES feature is enabled holes don't have file extent items in
the btree that represent them anymore. This made the clone operation
ignore the gaps that exist between consecutive file extent items and
therefore not create the holes at the destination. When not using the
NO_HOLES feature, the holes were created at the destination.
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
On heavy workloads, we're seeing soft lockup warnings on
root->inode_lock in __btrfs_release_delayed_node. The low hanging fruit
is to reduce the size of the critical section.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
To be accurate about the error case,
if the new size is beyond ULLONG_MAX, return ERANGE instead of EINVAL.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
If btrfs_log_dentry_safe() returns an error, we set ret to 1 and
fall through with the goal of committing the transaction. However,
in the case where the inode doesn't need a full sync, we would call
btrfs_wait_ordered_range() against the target range for our inode,
and if it returned an error, we would return without commiting or
ending the transaction.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
btrfs_punch_hole() will truncate unaligned pages or punch hole on a
already existed hole.
This will cause unneeded zero page or holes splitting the original huge
hole.
This patch will skip already existed holes before any page truncating or
hole punching.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
On snapshot creation (either writable or read-only), we do orphan cleanup
against the root of the snapshot. If the cleanup did remove any orphans,
then the current root node will be different from the commit root node
until the next transaction commit happens.
A send operation always uses the commit root of a snapshot - this means
it will see the orphans if it starts computing the send stream before the
next transaction commit happens (triggered by a timer or sync() for .e.g),
which is when the commit root gets assigned a reference to current root,
where the orphans are not visible anymore. The consequence of send seeing
the orphans is explained below.
For example:
mkfs.btrfs -f /dev/sdd
mount -o commit=999 /dev/sdd /mnt
# open a file with O_TMPFILE and leave it open
# write some data to the file
btrfs subvolume snapshot -r /mnt /mnt/snap1
btrfs send /mnt/snap1 -f /tmp/send.data
The send operation will fail with the following error:
ERROR: send ioctl failed with -116: Stale file handle
What happens here is that our snapshot has an orphan inode still visible
through the commit root, that corresponds to the tmpfile. However send
will attempt to call inode.c:btrfs_iget(), with the goal of reading the
file's data, which will return -ESTALE because it will use the current
root (and not the commit root) of the snapshot.
Of course, there are other cases where we can get orphans, but this
example using a tmpfile makes it much easier to reproduce the issue.
Therefore on snapshot creation, after calling btrfs_orphan_cleanup, if
the commit root is different from the current root, just commit the
transaction associated with the snapshot's root (if it exists), so that
a send will not see any orphans that don't exist anymore. This also
guarantees a send will always see the same content regardless of whether
a transaction commit happened already before the send was requested and
after the orphan cleanup (meaning the commit root and current roots are
the same) or it hasn't happened yet (commit and current roots are
different).
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
In ioctl.c:lock_extent_range(), after locking our target range, the
ordered extent that btrfs_lookup_first_ordered_extent() returns us
may not overlap our target range at all. In this case we would just
unlock our target range, wait for any new ordered extents that overlap
the range to complete, lock again the range and repeat all these steps
until we don't get any ordered extent and the delalloc flag isn't set
in the io tree for our target range.
Therefore just stop if we get an ordered extent that doesn't overlap
our target range and the dealalloc flag isn't set for the range in
the inode's io tree.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
When cloning a range of a file, we were visiting all the extent items in
the btree that belong to our source inode. We don't need to visit those
extent items that don't overlap the range we are cloning, as doing so only
makes us waste time and do unnecessary btree navigations (btrfs_next_leaf)
for inodes that have a large number of file extent items in the btree.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We were setting the BTRFS_ROOT_SUBVOL_DEAD flag on the root of the
parent of our target snapshot, instead of setting it in the target
snapshot's root.
This is easy to observe by running the following scenario:
mkfs.btrfs -f /dev/sdd
mount /dev/sdd /mnt
btrfs subvolume create /mnt/first_subvol
btrfs subvolume snapshot -r /mnt /mnt/mysnap1
btrfs subvolume delete /mnt/first_subvol
btrfs subvolume snapshot -r /mnt /mnt/mysnap2
btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data
The send command failed because the send ioctl returned -EPERM.
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We were cleaning the clone target file range from the page cache before
we did replace the file extent items in the fs tree. This was racy,
as right after cleaning the relevant range from the page cache and before
replacing the file extent items, a read against that range could be
performed by another task and populate again the page cache with stale
data (stale after the cloning finishes). This would result in reads after
the clone operation successfully finishes to get old data (and potentially
for a very long time). Therefore evict the pages after replacing the file
extent items, so that subsequent reads will always get the new data.
Similarly, we were prone to races while cloning the file extent items
because we weren't locking the target range and wait for any existing
ordered extents against that range to complete. It was possible that
after cloning the extent items, a write operation that was performed
before the clone operation and overlaps the same range, would end up
undoing all or part of the work the clone operation did (a worker task
running inode.c:btrfs_finish_ordered_io). Therefore lock the target
range in the io tree, wait for all pending ordered extents against that
range to finish and then safely perform the cloning.
The issue of reading stale data after the clone operation is easy to
reproduce by running the following C program in a loop until it exits
with return value 1.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <pthread.h>
#include <fcntl.h>
#include <assert.h>
#include <asm/types.h>
#include <linux/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#define SRC_FILE "/mnt/sdd/foo"
#define DST_FILE "/mnt/sdd/bar"
#define FILE_SIZE (16 * 1024)
#define PATTERN_SRC 'X'
#define PATTERN_DST 'Y'
struct btrfs_ioctl_clone_range_args {
__s64 src_fd;
__u64 src_offset, src_length;
__u64 dest_offset;
};
#define BTRFS_IOCTL_MAGIC 0x94
#define BTRFS_IOC_CLONE_RANGE _IOW(BTRFS_IOCTL_MAGIC, 13, \
struct btrfs_ioctl_clone_range_args)
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static int clone_done = 0;
static int reader_ready = 0;
static int stale_data = 0;
static void *reader_loop(void *arg)
{
char buf[4096], want_buf[4096];
memset(want_buf, PATTERN_SRC, 4096);
pthread_mutex_lock(&mutex);
reader_ready = 1;
pthread_mutex_unlock(&mutex);
while (1) {
int done, fd, ret;
fd = open(DST_FILE, O_RDONLY);
assert(fd != -1);
pthread_mutex_lock(&mutex);
done = clone_done;
pthread_mutex_unlock(&mutex);
ret = read(fd, buf, 4096);
assert(ret == 4096);
close(fd);
if (done) {
ret = memcmp(buf, want_buf, 4096);
if (ret == 0) {
printf("Found new content\n");
} else {
printf("Found old content\n");
pthread_mutex_lock(&mutex);
stale_data = 1;
pthread_mutex_unlock(&mutex);
}
break;
}
}
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t reader;
int ret, i, fd;
struct btrfs_ioctl_clone_range_args clone_args;
int fd1, fd2;
ret = remove(SRC_FILE);
if (ret == -1 && errno != ENOENT) {
fprintf(stderr, "Error deleting src file: %s\n", strerror(errno));
return 1;
}
ret = remove(DST_FILE);
if (ret == -1 && errno != ENOENT) {
fprintf(stderr, "Error deleting dst file: %s\n", strerror(errno));
return 1;
}
fd = open(SRC_FILE, O_CREAT | O_WRONLY | O_TRUNC, S_IRWXU);
assert(fd != -1);
for (i = 0; i < FILE_SIZE; i++) {
char c = PATTERN_SRC;
ret = write(fd, &c, 1);
assert(ret == 1);
}
close(fd);
fd = open(DST_FILE, O_CREAT | O_WRONLY | O_TRUNC, S_IRWXU);
assert(fd != -1);
for (i = 0; i < FILE_SIZE; i++) {
char c = PATTERN_DST;
ret = write(fd, &c, 1);
assert(ret == 1);
}
close(fd);
sync();
ret = pthread_create(&reader, NULL, reader_loop, NULL);
assert(ret == 0);
while (1) {
int r;
pthread_mutex_lock(&mutex);
r = reader_ready;
pthread_mutex_unlock(&mutex);
if (r) break;
}
fd1 = open(SRC_FILE, O_RDONLY);
if (fd1 < 0) {
fprintf(stderr, "Error open src file: %s\n", strerror(errno));
return 1;
}
fd2 = open(DST_FILE, O_RDWR);
if (fd2 < 0) {
fprintf(stderr, "Error open dst file: %s\n", strerror(errno));
return 1;
}
clone_args.src_fd = fd1;
clone_args.src_offset = 0;
clone_args.src_length = 4096;
clone_args.dest_offset = 0;
ret = ioctl(fd2, BTRFS_IOC_CLONE_RANGE, &clone_args);
assert(ret == 0);
close(fd1);
close(fd2);
pthread_mutex_lock(&mutex);
clone_done = 1;
pthread_mutex_unlock(&mutex);
ret = pthread_join(reader, NULL);
assert(ret == 0);
pthread_mutex_lock(&mutex);
ret = stale_data ? 1 : 0;
pthread_mutex_unlock(&mutex);
return ret;
}
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
There is otherwise a risk of a possible null pointer dereference.
Was largely found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We are currently allocating space_info objects in an array when we
allocate space_info. When a user does something like:
# btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt
# btrfs balance start -mconvert=single -dconvert=single /mnt -f
# btrfs balance start -mconvert=raid1 -dconvert=raid1 /
We can end up with memory corruption since the kobject hasn't
been reinitialized properly and the name pointer was left set.
The rationale behind allocating them statically was to avoid
creating a separate kobject container that just contained the
raid type. It used the index in the array to determine the index.
Ultimately, though, this wastes more memory than it saves in all
but the most complex scenarios and introduces kobject lifetime
questions.
This patch allocates the kobjects dynamically instead. Note that
we also remove the kobject_get/put of the parent kobject since
kobject_add and kobject_del do that internally.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We were limiting the sum of the xattr name and value lengths to PATH_MAX,
which is not correct, specially on filesystems created with btrfs-progs
v3.12 or higher, where the default leaf size is max(16384, PAGE_SIZE), or
systems with page sizes larger than 4096 bytes.
Xattrs have their own specific maximum name and value lengths, which depend
on the leaf size, therefore use these limits to be able to send xattrs with
sizes larger than PATH_MAX.
A test case for xfstests follows.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
If we are doing an incremental send and the base snapshot has a
directory with name X that doesn't exist anymore in the second
snapshot and a new subvolume/snapshot exists in the second snapshot
that has the same name as the directory (name X), the incremental
send would fail with -ENOENT error. This is because it attempts
to lookup for an inode with a number matching the objectid of a
root, which doesn't exist.
Steps to reproduce:
mkfs.btrfs -f /dev/sdd
mount /dev/sdd /mnt
mkdir /mnt/testdir
btrfs subvolume snapshot -r /mnt /mnt/mysnap1
rmdir /mnt/testdir
btrfs subvolume create /mnt/testdir
btrfs subvolume snapshot -r /mnt /mnt/mysnap2
btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data
A test case for xfstests follows.
Reported-by: Robert White <rwhite@pobox.com>
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Delayed extent operations are triggered during transaction commits.
The goal is to queue up a healthly batch of changes to the extent
allocation tree and run through them in bulk.
This farms them off to async helper threads. The goal is to have the
bulk of the delayed operations being done in the background, but this is
also important to limit our stack footprint.
Signed-off-by: Chris Mason <clm@fb.com>
|
|
__extent_writepage has two unrelated parts. First it does the delayed
allocation dance and second it does the mapping and IO for the page
we're actually writing.
This splits it up into those two parts so the stack from one doesn't
impact the stack from the other.
Signed-off-by: Chris Mason <clm@fb.com>
|