Age | Commit message (Collapse) | Author |
|
We used the wrong ioctl macro for the getflags ioctl before.
As we don't have the set/getflags ioctls in the user space ioctl.h
at the moment, it's safe to fix it now.
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
|
|
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts. This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Miao pointed out there's a problem with mixing dio writes and buffered
reads. If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache. Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data. So we need to lock the extent and
check for uptodate bits in the range. If there are uptodate bits we need to
unlock and invalidate again. This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents. There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size. So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
|
|
It is normal behaviour of the low level btrfs function btrfs_map_bio()
to complete a bio with -EIO if the device is missing, instead of just
preventing the bio creation in an earlier step.
This used to cause I/O statistic read error increments and annoying
printk_ratelimited messages. This commit fixes the issue.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Carey Underwood <cwillu@cwillu.com>
|
|
The buffer reading code in xfs_dir2_leaf_getdents is complex and difficult to
follow due to the readahead and all the context is carries. it is also badly
indented and so difficult to read. Factor it out into a separate function to
make it easier to understand and optimise in future patches.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
The struct xfs_dabuf now only tracks a single xfs_buf and all the
information it holds can be gained directly from the xfs_buf. Hence
we can remove the struct dabuf and pass the xfs_buf around
everywhere.
Kill the struct dabuf and the associated infrastructure.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
First step in converting the directory code to use native
discontiguous buffers and replacing the dabuf construct.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
discontigous buffer in separate buffer format structures. This means log
recovery will recover all the changes on a per segment basis without
requiring any knowledge of the fact that it was logged from a
compound buffer.
To do this, we need to be able to determine what buffer segment any
given offset into the compound buffer sits over. This enables us to
translate the dirty bitmap in the number of separate buffer format
structures required.
We also need to be able to determine the number of bitmap elements
that a given buffer segment has, as this determines the size of the
buffer format structure. Hence we need to be able to determine the
both the start offset into the buffer and the length of a given
segment to be able to calculate this.
With this information, we can preallocate, build and format the
correct log vector array for each segment in a compound buffer to
appear exactly the same as individually logged buffers in the log.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Now that the buffer cache supports discontiguous buffers, add
support to the transaction buffer interface for getting and reading
buffers.
Note that this patch does not convert the buffer item logging to
support discontiguous buffers. That will be done as a separate
commit.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
With the internal interfaces supporting discontiguous buffer maps,
add external lookup, read and get interfaces so they can start to be
used.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
While the external interface currently uses separate blockno/length
variables, we need to move internal interfaces to passing and
parsing vector maps. This will then allow us to add external
interfaces to support discontiguous buffer maps as the internal code
will already support them.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
To support discontiguous buffers in the buffer cache, we need to
separate the cache index variables from the I/O map. While this is
currently a 1:1 mapping, discontiguous buffer support will break
this relationship.
However, for caching purposes, we can still treat them the same as a
contiguous buffer - the block number of the first block and the
length of the buffer - as that is still a unique representation.
Also, the only way we will ever access the discontiguous regions of
buffers is via bulding the complete buffer in the first place, so
using the initial block number and entire buffer length is a sane
way to index the buffers.
Add a block mapping vector construct to the xfs_buf and use it in
the places where we are doing IO instead of the current
b_bn/b_length variables.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
The struct xfs_buf_log_format wants to think the dirty bitmap is
variable sized. In fact, it is variable size on disk simply due to
the way we map it from the in-memory structure, but we still just
use a fixed size memory allocation for the in-memory structure.
Hence it makes no sense to set the function up as a variable sized
structure when we already know it's maximum size, and we always
allocate it as such. Simplify the structure by making the dirty
bitmap a fixed sized array and just using the size of the structure
for the allocation size.
This will make it much simpler to allocate and manipulate an array
of format structures for discontiguous buffer support.
The previous struct xfs_buf_log_item size according to
/proc/slabinfo was 224 bytes. pahole doesn't give the same size
because of the variable size definition. With this modification,
pahole reports the same as /proc/slabinfo:
/* size: 224, cachelines: 4, members: 6 */
Because the xfs_buf_log_item size is now determined by the maximum
supported block size we introduce a dependency on xfs_alloc_btree.h.
Avoid this dependency by moving the idefines for the maximum block
sizes supported to xfs_types.h with all the other max/min type
defines to avoid any new dependencies.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Make it possible for ext4_count_free to operate on buffers and not
just data in buffer_heads.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Ext4 must make sure the transaction to be commited to the disk when
user opens a file with O_(D)SYNC flag and do a fallocate(2) call.
This problem had been reported by Christoph Hellwig in this thread:
http://www.spinics.net/lists/linux-btrfs/msg13621.html
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Currently ext4_mb_load_buddy is called for every group, irrespective
of whether the group info is already in memory, while reading
/proc/fs/ext4/<partition>/mb_groups proc file. For the purpose of
mb_groups proc file, it is unnecessary to load the file group info
from disk if it was loaded in past. These calls to ext4_mb_load_buddy
make reading the mb_groups proc file expensive.
Also, the locks around ext4_get_group_info are not required.
This patch modifies the code to call ext4_mb_load_buddy only if the
group info had never been loaded into memory in past. It also removes
the mb group locking around ext4_get_group_info call.
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This gives pnfs a chance to do a layout commit inside the v4 code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
pNFS needs to select a write function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing writes.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
pNFS needs to select a read function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing reads.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This gives NFS v4 a way to set up callbacks and sessions without v2 or
v3 having to do them as well.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
NFS v4 needs a way to shut down callbacks and sessions, but v2 and v3
don't.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Delegations are a v4 feature, so push return_delegation out of the
generic client by creating a new rpc_op and renaming the old function to
be in the nfs v4 "namespace"
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Delegations are a v4 feature, so push them out of the generic code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
v2 and v3 don't need to worry about doing a pnfs layoutcommit.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
I can use this function to return delegations and unset the pnfs layout
driver rather than continuing to do these things in the generic client.
With this change, we no longer need an nfs4_kill_super().
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The generic client doesn't need to know about pnfs layout drivers, so
this should be done in the v4 code.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Convert the pNFS file layout to use the same system as the
object and block layout.
Remove unnecessary dependencies on NFS_FS
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We prepare for the largest possible GETDEVICEINFO response, which
can not be greater than the negotiated session maximum response size.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The 'committed' field is not needed once we have put the struct nfs_page
on the right list.
Also correct the type of the verifier: it is not an array of __be32, but
simply an 8 byte long opaque array.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The verifier returned by the GETDEVICELIST operation is not a write
verifier, but a nfs4_verifier.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Move the test for nfs4_has_session out of the nfs4_state_manager()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Handling a slot recall situation should always takes precedence over
state recovery to allow the server to manage its resources.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Use the new xdr_stream_pos() helper instead.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Use the xdr_stream position counter as the basis for the calculation
instead of assuming that we can calculate an offset to the start
of the iovec.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
xdr_read_pages will already do all of the buffer overflow checks that are
currently being open-coded in the various callers. This patch simplifies
the existing code by replacing the open coded checks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes from Jan Kara:
"Make UDF more robust in presence of corrupted filesystem"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fortify loading of sparing table
udf: Avoid run away loop when partition table length is corrupted
udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol()
|
|
Pull ubi/ubifs fixes from Artem Bityutskiy:
"Fix the debugfs regression - we never enable it because incorrect
'IS_ENABLED()' macro usage: should be 'IS_ENABLED(CONFIG_DEBUG_FS)',
but we had 'IS_ENABLED(DEBUG_FS)'. Also fix incorrect assertion."
* tag 'upstream-3.5-rc5' of git://git.infradead.org/linux-ubifs:
UBI: correct usage of IS_ENABLED()
UBIFS: correct usage of IS_ENABLED()
UBIFS: fix assertion
|
|
Add sanity checks when loading sparing table from disk to avoid accessing
unallocated memory or writing to it.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
This patch fixes buffer_head double free in following code path:
gfs2_block_map
=> gfs2_meta_inode_buffer
=> gfs2_meta_indirect_buffer
=> gfs2_meta_read
=> release_metapath
gfs2_block_map calls gfs2_meta_inode_buffer with &mp.mp_bh[0]
as an argument. mp.mp_bh are filled with zero at the beginning
of gfs2_block_map.
If gfs2_meta_inode_buffer returns non-zero value, gfs2_block_map
calls release_metapath to free buffers chained to mp.mp_bh.
release_metapath checks each slot of mp.mp_bh[i] and
free(with brelse) unless the slot is filled with NULL.
&mp.mp_bh[0] passed to gfs2_meta_inode_buffer is filled at
gfs2_meta_read. gfs2_meta_read is filled a buffer allocated with
gfs2_getbuf even if EIO occurs. When EIO occurs, the allocated buffer
is brelse'ed though the pointer(wrong poiner) points the brelse'ed is
passed back to caller via an argument bhp.
gfs2_meta_indirect_buffer, the caller also pass the wrong pointer
to its caller with EIO. Finally gfs2_block_map gets both EIO and
&mp.mp_bh[0] filled with the wrong pointer. release_metapath
calls brelse again on the wrong pointer.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
With the tree mod log, we may end up with two roots (the current root and a
rewinded version of it) both pointing to two leaves, l1 and l2, of which l2
had already been cow-ed in the current transaction. If we don't rewind any
tree blocks, we cannot have two roots both pointing to an already cowed tree
block.
Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on
the next (right) leaf. And there is push_leaf_left, which has a (cowed!)
leaf locked and wants a lock on the previous (left) leaf.
In order to solve this dead lock situation, we use try_lock in
btrfs_next_leaf (only in case it's called with a tree mod log time_seq
paramter) and if we fail to get a lock on the next leaf, we give up our lock
on the current leaf and retry from the very beginning.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|
|
When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the
tree block. If its not the last key, removal involves a move operation.
This move operation was explicitly done before this commit.
However, at insertion time, there's a move operation before the actual
addition to make room for the new key, which is recorded in the tree mod
log as well. This means, we must drop the move operation when rewinding the
add operation, because the next operation we'll be rewinding will be the
corresponding MOD_LOG_MOVE_KEYS operation.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|
|
When delayed refs exist, btrfs_find_all_roots used to hold the delayed ref
mutex way longer than actually required. We ought to drop it immediately
after we're done collecting all the delayed refs.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|