Age | Commit message (Collapse) | Author |
|
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
- Fixes how eCryptfs handles msync to sync both the upper and lower
file
- A couple of MAINTAINERS updates
* tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Check return of filemap_write_and_wait during fsync
Update eCryptFS maintainers
ecryptfs: fixed msync to flush data
|
|
If sysfs_notify is called on a binary attribute, bad things can
happen, so prevent it.
Note, no in-kernel usage of this is currently present, but in the
future, it's good to be safe.
Changes in V2:
- Also ignore sysfs_notify on dirs, links
- Use WARN_ON rather than silently failing
- Compiled and tested (huge apologies about first submission)
Signed-off-by: Nick Dyer <nick.dyer@itdev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull CIFS fix from Steve French:
"Fix one byte buffer overrun with prefixpaths on cifs mounts which can
cause a problem with mount depending on the string length"
* 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix off-by-one bug in build_unc_path_to_root
|
|
1d2ef5901483004d74947bbf78d5146c24038fe7 caused a regression in ncpfs such that
directories could no longer be removed. This was because ncp_rmdir checked
to see if a dentry could be unhashed before allowing it to be removed. Since
1d2ef5901483004d74947bbf78d5146c24038fe7 introduced a change that incremented
dentry->d_count causing it to always be greater than 1 unhash would always
fail. Thus causing the error path in ncp_rmdir to always be taken. Removing
this error path is safe as unhashing is still accomplished by calls to dput
from vfs_rmdir.
Signed-off-by: Dave Chiluk <chiluk@canonical.com>
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
It is possible that iput is skipped after iget during the recovery.
In recover_dentry(),
dir = f2fs_iget();
...
if (de && inode->i_ino == le32_to_cpu(de->ino))
goto out;
In this case, this dir is not able to be added in dirty_dir_inode_list.
The actual linking is done only when set_page_dirty() is called.
So let's add this newly got inode into the list explicitly, and put it at the
end of the recovery routine.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Pull more xfs updates from Ben Myers:
"Here are several fixes for filesystems with CRC support turned on:
fixes for quota, remote attributes, and recovery. There is also some
feature work related to CRCs: the implementation of CRCs for the inode
unlinked lists, disabling noattr2/attr2 options when appropriate, and
bumping the maximum number of ACLs.
I would have preferred to defer this last category of items to 3.11.
This would require setting a feature bit for the on-disk changes, so
there is some pressure to get these in 3.10. I believe this
represents the end of the CRC related queue.
- Rework of dquot CRCs
- Fix for remote attribute invalidation of a leaf
- Fix ordering of transaction replay in recovery
- Implement CRCs for inode unlinked list
- Disable noattr2/attr2 mount options when CRCs are enabled
- Bump the limitation of ACL entries for v5 superblocks"
* tag 'for-linus-v3.10-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: increase number of ACL entries for V5 superblocks
xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
xfs: inode unlinked list needs to recalculate the inode CRC
xfs: fix log recovery transaction item reordering
xfs: fix remote attribute invalidation for a leaf
xfs: rework dquot CRCs
|
|
State recovery currently relies on being able to find a valid
nfs_open_context in the inode->open_files list.
We therefore need to put the nfs_open_context on the list while
we're still protected by the sp->so_reclaim_seqcount in order
to avoid reboot races.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Instead of having the callers set ctx->state, do it inside
_nfs4_open_and_get_state.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
All the callers have an open_context at this point, and since we always
need one in order to do state recovery, it makes sense to use it as the
basis for the nfs4_do_open() call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We already check the EXEC access mode in the lower layers.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
ctx->cred == ctx->state->owner->so_cred, so let's just use the former.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Use the EXCHGID4_FLAG_BIND_PRINC_STATEID exchange_id flag to enable
stateid protection. This means that if we create a stateid using a
particular principal, then we must use the same principal if we
want to change that state.
IOW: if we OPEN a file using a particular credential, then we have
to use the same credential in subsequent OPEN_DOWNGRADE, CLOSE,
or DELEGRETURN operations that use that stateid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This is not strictly needed, since get_deviceinfo is not allowed to
return NFS4ERR_ACCESS or NFS4ERR_WRONG_CRED, but lets do it anyway
for consistency with other pNFS operations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We want to use the same credential for reclaim_complete as we used
for the exchange_id call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We need to use the same credential as was used for the layoutget
and/or layoutcommit operations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Ensure that we use the same credential for layoutget, layoutcommit and
layoutreturn.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Rename ext4_da_writepages() to ext4_writepages() and use it for all
modes. We still need to iterate over all the pages in the case of
data=journalling, but in the case of nodelalloc/data=ordered (which is
what file systems mounted using ext3 backwards compatibility will use)
this will allow us to use a much more efficient I/O submission path.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The limit of 25 ACL entries is arbitrary, but baked into the on-disk
format. For version 5 superblocks, increase it to the maximum nuber
of ACLs that can fit into a single xattr.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 5c87d4bc1a86bd6e6754ac3d6e111d776ddcfe57)
|
|
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit d3eaace84e40bf946129e516dcbd617173c1cf14)
|
|
The inode unlinked list manipulations operate directly on the inode
buffer, and so bypass the inode CRC calculation mechanisms. Hence an
inode on the unlinked list has an invalid CRC. Fix this by
recalculating the CRC whenever we modify an unlinked list pointer in
an inode, ncluding during log recovery. This is trivial to do and
results in unlinked list operations always leaving a consistent
inode in the buffer.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 0a32c26e720a8b38971d0685976f4a7d63f9e2ef)
|
|
There are several constraints that inode allocation and unlink
logging impose on log recovery. These all stem from the fact that
inode alloc/unlink are logged in buffers, but all other inode
changes are logged in inode items. Hence there are ordering
constraints that recovery must follow to ensure the correct result
occurs.
As it turns out, this ordering has been working mostly by chance
than good management. The existing code moves all buffers except
cancelled buffers to the head of the list, and everything else to
the tail of the list. The problem with this is that is interleaves
inode items with the buffer cancellation items, and hence whether
the inode item in an cancelled buffer gets replayed is essentially
left to chance.
Further, this ordering causes problems for log recovery when inode
CRCs are enabled. It typically replays the inode unlink buffer long before
it replays the inode core changes, and so the CRC recorded in an
unlink buffer is going to be invalid and hence any attempt to
validate the inode in the buffer is going to fail. Hence we really
need to enforce the ordering that the inode alloc/unlink code has
expected log recovery to have since inode chunk de-allocation was
introduced back in 2003...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit a775ad778073d55744ed6709ccede36310638911)
|
|
When invalidating an attribute leaf block block, there might be
remote attributes that it points to. With the recent rework of the
remote attribute format, we have to make sure we calculate the
length of the attribute correctly. We aren't doing that in
xfs_attr3_leaf_inactive(), so fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 59913f14dfe8eb772ff93eb442947451b4416329)
|
|
Calculating dquot CRCs when the backing buffer is written back just
doesn't work reliably. There are several places which manipulate
dquots directly in the buffers, and they don't calculate CRCs
appropriately, nor do they always set the buffer up to calculate
CRCs appropriately.
Firstly, if we log a dquot buffer (e.g. during allocation) it gets
logged without valid CRC, and so on recovery we end up with a dquot
that is not valid.
Secondly, if we recover/repair a dquot, we don't have a verifier
attached to the buffer and hence CRCs are not calculated on the way
down to disk.
Thirdly, calculating the CRC after we've changed the contents means
that if we re-read the dquot from the buffer, we cannot verify the
contents of the dquot are valid, as the CRC is invalid.
So, to avoid all the dquot CRC errors that are being detected by the
read verifier, change to using the same model as for inodes. That
is, dquot CRCs are calculated and written to the backing buffer at
the time the dquot is flushed to the backing buffer. If we modify
the dquot directly in the backing buffer, calculate the CRC
immediately after the modification is complete. Hence the dquot in
the on-disk buffer should always have a valid CRC.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 6fcdc59de28817d1fbf1bd58cc01f4f3fac858fb)
|
|
The test_root() function could potentially loop forever due to
overflow issues. So rewrite test_root() to avoid this issue; as a
bonus, it is 38% faster when benchmarked via a test loop:
int main(int argc, char **argv)
{
int i;
for (i = 0; i < 1 << 24; i++) {
if (test_root(i, 7))
printf("%d\n", i);
}
}
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The group number passed to ext4_get_group_info() should be valid, but
let's add an assert to check this before we start creating a pointer
based on that group number and dereferencing it.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Check the group number for sanity earilier, before calling routines
such as ext4_bg_has_super() or ext4_group_overhead_blocks().
Reported-by: Jonathan Salwan <jonathan.salwan@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The bio_alloc() function can return NULL if the memory allocation
fails. So we need to check for this.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If kthread_run() fails, init_threads() returns
IS_ERR(p) instead of PTR_ERR(p).
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Fix the function get_victim_by_default, where it checks
for the condition that p.min_segno != NULL_SEGNO as
shown:
if (p.min_segno != NULL_SEGNO)
goto got_it;
and if above condition is true then
got_it:
if (p.min_segno != NULL_SEGNO) {
So this condition is being checked twice. Hence move the goto
statement after the if condition so that duplication of condition
check is avoided.
Also this function makes a call to get_max_cost() to compute
the max cost based on the f2fs_sbi_info and victim policy. Since
get_max_cost depends on on three parameters of victim_sel_policy
=> alloc_mode, gc_mode & ofs_unit, once this victim policy is
initialised, these value will not change till the execution
time of get_victim_by_default() & also f2fs_sbi_info structure
parameters will not change.
Hence making calls to get_max_cost() in while loop does not seems to
be a good point. Instead we can call it once in begining and store
the results in local variable, which later can serve our purpose
for comparing the cost with max cost inside the while loop.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Use a more current logging style.
Add __printf format and argument verification.
Remove embedded function names from formats.
Add %pf, __builtin_return_address(0) to jfs_error.
Add newlines to formats for kernel style consistency.
(One format already had an erroneous newline)
Coalesce formats and align arguments.
Object size reduced ~1KiB.
$ size fs/jfs/built-in.o*
text data bss dec hex filename
201891 35488 63936 301315 49903 fs/jfs/built-in.o.new
202821 35488 64192 302501 49da5 fs/jfs/built-in.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
CHECK fs/jfs/xattr.c
fs/jfs/xattr.c:1092:5: warning: symbol 'jfs_initxattrs' was not declared. Should it be static?
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
The limit of 25 ACL entries is arbitrary, but baked into the on-disk
format. For version 5 superblocks, increase it to the maximum nuber
of ACLs that can fit into a single xattr.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
The inode unlinked list manipulations operate directly on the inode
buffer, and so bypass the inode CRC calculation mechanisms. Hence an
inode on the unlinked list has an invalid CRC. Fix this by
recalculating the CRC whenever we modify an unlinked list pointer in
an inode, ncluding during log recovery. This is trivial to do and
results in unlinked list operations always leaving a consistent
inode in the buffer.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
There are several constraints that inode allocation and unlink
logging impose on log recovery. These all stem from the fact that
inode alloc/unlink are logged in buffers, but all other inode
changes are logged in inode items. Hence there are ordering
constraints that recovery must follow to ensure the correct result
occurs.
As it turns out, this ordering has been working mostly by chance
than good management. The existing code moves all buffers except
cancelled buffers to the head of the list, and everything else to
the tail of the list. The problem with this is that is interleaves
inode items with the buffer cancellation items, and hence whether
the inode item in an cancelled buffer gets replayed is essentially
left to chance.
Further, this ordering causes problems for log recovery when inode
CRCs are enabled. It typically replays the inode unlink buffer long before
it replays the inode core changes, and so the CRC recorded in an
unlink buffer is going to be invalid and hence any attempt to
validate the inode in the buffer is going to fail. Hence we really
need to enforce the ordering that the inode alloc/unlink code has
expected log recovery to have since inode chunk de-allocation was
introduced back in 2003...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
This wrapper function is no longer required, so get rid of it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Use PTR_RET in place of open coding this function.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
With recent changes to the transactions, it appears that we
are no longer using the "log ops" for resource groups. Since the
log commit code processes the array of log ops, eliminating this
should be marginally better for performance. Therefore this patch
eliminates it.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch simply sort the data and metadata buffer lists by their
inplace block number. This makes gfs2_log_flush issue the inplace IO
in sequential order, which will hopefully speed up writing the IO
out to disk.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Error out of ecryptfs_fsync() if filemap_write_and_wait() fails.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Paul Taysom <taysom@chromium.org>
Cc: Olof Johansson <olofj@chromium.org>
Cc: stable@vger.kernel.org # v3.6+
|
|
Pull gfs2 fixes from Steven Whitehouse:
"There are four patches this time.
The first fixes a problem where the wrong descriptor type was being
written into the log for journaled data blocks.
The second fixes a race relating to the deallocation of allocator
data.
The third provides a fallback if kmalloc is unable to satisfy a
request to allocate a directory hash table.
The fourth fixes the iopen glock caching so that inodes are deleted in
a more timely manner after rmdir/unlink"
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Don't cache iopen glocks
GFS2: Fall back to vmalloc if kmalloc fails for dir hash tables
GFS2: Increase i_writecount during gfs2_setattr_size
GFS2: Set log descriptor type for jdata blocks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"One patch fixes an Oops introduced in 3.9 with the readdirplus
feature. The rest are fixes for async-dio in 3.10"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix alignment in short read optimization for async_dio
fuse: return -EIOCBQUEUED from fuse_direct_IO() for all async requests
fuse: fix readdirplus Oops in fuse_dentry_revalidate
fuse: update inode size and invalidate attributes on fallocate
fuse: truncate pagecache range on hole punch
fuse: allocate for_background dio requests based on io->async state
|
|
When invalidating an attribute leaf block block, there might be
remote attributes that it points to. With the recent rework of the
remote attribute format, we have to make sure we calculate the
length of the attribute correctly. We aren't doing that in
xfs_attr3_leaf_inactive(), so fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Calculating dquot CRCs when the backing buffer is written back just
doesn't work reliably. There are several places which manipulate
dquots directly in the buffers, and they don't calculate CRCs
appropriately, nor do they always set the buffer up to calculate
CRCs appropriately.
Firstly, if we log a dquot buffer (e.g. during allocation) it gets
logged without valid CRC, and so on recovery we end up with a dquot
that is not valid.
Secondly, if we recover/repair a dquot, we don't have a verifier
attached to the buffer and hence CRCs are not calculated on the way
down to disk.
Thirdly, calculating the CRC after we've changed the contents means
that if we re-read the dquot from the buffer, we cannot verify the
contents of the dquot are valid, as the CRC is invalid.
So, to avoid all the dquot CRC errors that are being detected by the
read verifier, change to using the same model as for inodes. That
is, dquot CRCs are calculated and written to the backing buffer at
the time the dquot is flushed to the backing buffer. If we modify
the dquot directly in the backing buffer, calculate the CRC
immediately after the modification is complete. Hence the dquot in
the on-disk buffer should always have a valid CRC.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Now that we clear PageWriteback after extent conversion, there's no
need to wait for io_end processing in ext4_evict_inode(). Running
AIO/DIO keeps file reference until aio_complete() is called so
ext4_evict_inode() cannot be called. For io_end structures resulting
from buffered IO waiting is happening because we wait for
PageWriteback in truncate_inode_pages().
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
We don't have to wait for extent conversion in ext4_punch_hole() as
buffered IO for the punched range has been flushed and waited upon
(thus all extent conversions for that range have completed). Also we
wait for all DIO to finish using inode_dio_wait() so there cannot be
any extent conversions pending due to direct IO.
Also remove ext4_flush_unwritten_io() since it's unused now.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
We don't have to wait for unwritten extent conversion in
ext4_ind_direct_IO() as all writes that happened before DIO are
flushed by the generic code and extent conversion has happened before
we cleared PageWriteback bit.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|