Age | Commit message (Collapse) | Author |
|
In f2fs_ioctl() function, it is using generic flags.
Since F2FS specific flags are defined. So lets use
those flags.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Return the FIEMAP_EXTENT_UNKNOWN flag as well except the
FIEMAP_EXTENT_DELALLOC because the data location of an
delayed allocation extent is unknown.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
|
|
Commit b6e96d0067d8 ("jbd2: use module parameters instead of debugfs
for jbd_debug") removed any need for a dependency on DEBUG_FS. It
also moved the /sys variables out from underneath the typical debugfs
mount point. Delete the dependency and update the /sys path to where
the debug settings are currently.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):
[ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079
i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved. The output should have been:
(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104
It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suggested-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Currently we see this output:
$git grep phase fs/jbd2
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 1\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 2\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 3\n");
fs/jbd2/commit.c: jbd_debug(3, "JBD2: commit phase 4\n");
[...]
There is clearly a duplicate label for phase 2, and they are
both active (i.e. not in #if ... #else block). Rename them to
be "2a" and "2b" so the debug output is unambiguous.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:
rm D ffff8802203afbc0 4600 4878 4748 0x00000000
ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
[<ffffffff8172dc34>] schedule+0x24/0x70
[<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
[<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
[<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
[<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
[<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
[<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
[<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
[<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
[<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
[<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
[<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
[<ffffffff8114370f>] do_rmdir+0xdf/0x120
[<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
[<ffffffff81002889>] ? do_notify_resume+0x89/0x100
[<ffffffff817361ae>] ? int_signal+0x12/0x17
[<ffffffff81145d85>] sys_unlinkat+0x25/0x40
[<ffffffff81735f22>] system_call_fastpath+0x16/0x1b
What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space. And then, as we
are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
bit is set, then we will also try to take the same checkpoint_mutex.
It seems that we need to drop the checkpoint_mutex while sitting in
jbd2_log_wait_commit, if we want to guarantee that progress can be made
by jbd2_journal_commit_transaction(). There does not seem to be
anything preempt-rt specific about this, other then perhaps increasing
the odds of it happening.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The state lock is taken after we are doing an assert on the state
value, not before. So we might in fact be doing an assert on a
transient value. Ensure the state check is within the scope of
the state lock being taken.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If filesystem was aborted after inode's write back is complete
but before its metadata was updated we may return success
results in data loss.
In order to handle fs abort correctly we have to check
fs state once we discover that it is in MS_RDONLY state
Test case: http://patchwork.ozlabs.org/patch/244297
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Inode's data or non journaled quota may be written w/o jounral so we
_must_ send a barrier at the end of ext4_sync_fs. But it can be
skipped if journal commit will do it for us.
Also fix data integrity for nojournal mode.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already have jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Merge misc fixes from Andrew Morton:
"Bunch of fixes and one little addition to math64.h"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
include/linux/math64.h: add div64_ul()
mm: memcontrol: fix lockless reclaim hierarchy iterator
frontswap: fix incorrect zeroing and allocation size for frontswap_map
kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
mm: migration: add migrate_entry_wait_huge()
ocfs2: add missing lockres put in dlm_mig_lockres_handler
mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
aio: fix io_destroy() regression by using call_rcu()
rtc-at91rm9200: use shadow IMR on at91sam9x5
rtc-at91rm9200: add shadow interrupt mask
rtc-at91rm9200: refactor interrupt-register handling
rtc-at91rm9200: add configuration support
rtc-at91rm9200: add match-table compile guard
fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
cciss: fix broken mutex usage in ioctl
audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
...
|
|
dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.
Signed-off-by: joyce <xuejiufei@huawei.com>
Reviewed-by: shencanquan <shencanquan@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There was a regression introduced by 36f5588905c1 ("aio: refcounting
cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
using RCU in the shutdown path, but the synchronize_rcu() was done in
the context of the io_destroy() syscall greatly increasing the time it
could block.
This patch switches it to call_rcu() and makes shutdown asynchronous
(more asynchronous than it was originally; before the refcount changes
io_destroy() would still wait on pending kiocbs).
Note that there's a global quota on the max outstanding kiocbs, and that
quota must be manipulated synchronously; otherwise io_setup() could
return -EAGAIN when there isn't quota available, and userspace won't
have any way of waiting until shutdown of the old kioctxs has finished
(besides busy looping).
So we release our quota before kioctx shutdown has finished, which
should be fine since the quota never corresponded to anything real
anyways.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
While removing a non-empty directory, the kernel dumps a message:
(rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39
Suppress the error message from being printed in the dmesg so users
don't panic.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir,
ocfs2_prep_new_orphaned_file will release the inode_ac, then when the
caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to
a NULL ocfs2_alloc_context struct in the following functions. A kernel
panic happens.
Signed-off-by: "Xiaowei.Hu" <xiaowei.hu@oracle.com>
Reviewed-by: shencanquan <shencanquan@huawei.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.
To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:
- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.
- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).
The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.
AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.
Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.
To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.
- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192
[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 18888cf0883c: "ext4: speed up truncate/unlink by not using
bforget() unless needed" removed the use of EXT4_FREE_BLOCKS_FORGET in
the most important codepath for file systems using extents, but a
similar optimization also can be done for file systems using indirect
blocks, and for the two special cases in the ext4 extents code.
Cc: Andrey Sidorov <qrxd43@motorola.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
For a file systems with a very large number of block groups, if all of
the block group bitmaps are in memory and the file system is
relatively badly fragmented, it's possible ext4_mb_regular_allocator()
to take a long time trying to find a good match. This is especially
true if the tuning parameter mb_max_to_scan has been sent to a very
large number. So add a cond_resched() to avoid soft lockup warnings
and to provide better system responsiveness.
For ext4_free_blocks(), if we are deleting a large range of blocks,
and data=journal is enabled so that EXT4_FREE_BLOCKS_FORGET is passed,
the loop to call sb_find_get_block() and to call ext4_forget() can
take over 10-15 milliseocnds or more. So it's better to add a
cond_resched() here a well.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
"There is a pair of fixes for double-frees in the recent bundle for
3.10, a couple of fixes for long-standing bugs (sleep while atomic and
an endianness fix), and a locking fix that can be triggered when osds
are going down"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: fix cleanup in rbd_add()
rbd: don't destroy ceph_opts in rbd_add()
ceph: ceph_pagelist_append might sleep while atomic
ceph: add cpu_to_le32() calls when encoding a reconnect capability
libceph: must hold mutex for reset_changed_osds()
|
|
If new dentry block is allocated and its i_size is updated, we should update
its inode block together in order to sync i_size and its block allocation.
Otherwise, we can loose additional dentry block due to the unconsistent i_size.
Errorneous Scenario
-------------------
In the recovery routine,
- recovery_dentry
| - __f2fs_add_link
| | - get_new_data_page
| | | - i_size_write(new_i_size)
| | | - mark_inode_dirty_sync(dir)
| | - update_parent_metadata
| | | - mark_inode_dirty(dir)
|
- write_checkpoint
- sync_dirty_dir_inodes
- filemap_flush(dentry_blocks)
- f2fs_write_data_page
- skip to write the last dentry block due to index < i_size
In the above flow, new_i_size is not updated to its inode block so that the
last dentry block will be lost accordingly.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Creation of a new inode requires a directory search in order to ensure
that we are not trying to create an inode with the same name as an
existing one. This was hidden away inside the create_ok() function.
In the case that there was an existing inode, and a lookup can be
substituted for a create (which is the case with regular files
when the O_EXCL flag is not in use) then we were doing a second
lookup in order to return the inode.
This patch merges these two lookups into one. This can be done by
passing a flag to gfs2_dir_search() to tell it to just return -EEXIST
in the cases where we don't actually want to look up the inode.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Basically an inode manages the number of allocated blocks with inode->i_blocks
which is represented in a unit of sectors, not file system blocks.
But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr
translates it to the number of sectors when fstat is called.
However, previously f2fs_file_inode_operations only has this, so this patch adds
it to all the types of inode_operations.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
In f2fs_fill_super(), set sb->s_fs_info before calling parse_options(), then we can get
f2fs_sb_info via F2FS_SB(sb) in parse_options().
So that the second argument "sbi" of func parse_options() is no longer needed.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch adds the support of security labels for f2fs, which will be used
by Linus Security Models (LSMs).
Quote from http://en.wikipedia.org/wiki/Linux_Security_Modules:
"Linux Security Modules (LSM) is a framework that allows the Linux kernel to
support a variety of computer security models while avoiding favoritism toward
any single security implementation. The framework is licensed under the terms of
the GNU General Public License and is standard part of the Linux kernel since
Linux 2.6. AppArmor, SELinux, Smack and TOMOYO Linux are the currently accepted
modules in the official kernel.".
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
This patch fixes warnings due to missing lock on write error path.
WARNING: at fs/hpfs/hpfs_fn.h:353 hpfs_truncate+0x75/0x80 [hpfs]()
Hardware name: empty
Pid: 26563, comm: dd Tainted: P O 3.9.4 #12
Call Trace:
hpfs_truncate+0x75/0x80 [hpfs]
hpfs_write_begin+0x84/0x90 [hpfs]
_hpfs_bmap+0x10/0x10 [hpfs]
generic_file_buffered_write+0x121/0x2c0
__generic_file_aio_write+0x1c7/0x3f0
generic_file_aio_write+0x7c/0x100
do_sync_write+0x98/0xd0
hpfs_file_write+0xd/0x50 [hpfs]
vfs_write+0xa2/0x160
sys_write+0x51/0xa0
page_fault+0x22/0x30
system_call_fastpath+0x1a/0x1f
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
- Trivial: unused variable removal
- Posix-timers: Add the clock ID to the new proc interface to make it
useful. The interface is new and should be functional when we reach
the final 3.10 release.
- Cure a false positive warning in the tick code introduced by the
overhaul in 3.10
- Fix for a persistent clock detection regression introduced in this
cycle
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Correct run-time detection of persistent_clock.
ntp: Remove unused variable flags in __hardpps
posix-timers: Show clock ID in proc file
tick: Cure broadcast false positive pending bit warning
|
|
Dave reported a panic because the extent_root->commit_root was NULL in the
caching kthread. That is because we just unset it in free_root_pointers, which
is not the correct thing to do, we have to either wait for the caching kthread
to complete or hold the extent_commit_sem lock so we know the thread has exited.
This patch makes the kthreads all stop first and then we do our cleanup. This
should fix the race. Thanks,
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
|
Commit be283b2e674a09457d4563729015adb637ce7cc1
( Btrfs: use helper to cleanup tree roots) introduced the following bug,
BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]
[...]
Pid: 2463, comm: btrfs-cache-1 Tainted: G O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox
RIP: 0010:[<ffffffffa039368c>] [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]
Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730)
[...]
Call Trace:
[<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs]
[<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs]
[<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs]
[<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs]
[<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs]
[<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[<ffffffff81068d3d>] kthread+0x8d/0x95
[<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43
[<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0
[<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43
RIP [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]
We've free'ed commit_root before actually getting to free block groups where
caching thread needs valid extent_root->commit_root.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
Dave reported a NULL pointer deref. This is caused because he thought he'd be
smart and add sanity checks to the extent_io bit operations, but he didn't
expect a tree to have a NULL mapping. To fix this we just need to init the
relocation's processed_blocks with the btree_inode->i_mapping. Thanks,
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
There is a path where btrfs_drop_inode() is called with its inode's root
is NULL: In btrfs_new_inode(), when btrfs_set_inode_index() fails,
iput() is called. We should handle this case before taking look at the
root->root_item.
Signed-off-by: Naohiro Aota <naota@elisp.net>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
We get a use after free if we had a transaction to cleanup since there could be
delayed inodes which refer to their respective fs_root. Thanks
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
- Fixes how eCryptfs handles msync to sync both the upper and lower
file
- A couple of MAINTAINERS updates
* tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Check return of filemap_write_and_wait during fsync
Update eCryptFS maintainers
ecryptfs: fixed msync to flush data
|
|
If sysfs_notify is called on a binary attribute, bad things can
happen, so prevent it.
Note, no in-kernel usage of this is currently present, but in the
future, it's good to be safe.
Changes in V2:
- Also ignore sysfs_notify on dirs, links
- Use WARN_ON rather than silently failing
- Compiled and tested (huge apologies about first submission)
Signed-off-by: Nick Dyer <nick.dyer@itdev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull CIFS fix from Steve French:
"Fix one byte buffer overrun with prefixpaths on cifs mounts which can
cause a problem with mount depending on the string length"
* 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix off-by-one bug in build_unc_path_to_root
|
|
1d2ef5901483004d74947bbf78d5146c24038fe7 caused a regression in ncpfs such that
directories could no longer be removed. This was because ncp_rmdir checked
to see if a dentry could be unhashed before allowing it to be removed. Since
1d2ef5901483004d74947bbf78d5146c24038fe7 introduced a change that incremented
dentry->d_count causing it to always be greater than 1 unhash would always
fail. Thus causing the error path in ncp_rmdir to always be taken. Removing
this error path is safe as unhashing is still accomplished by calls to dput
from vfs_rmdir.
Signed-off-by: Dave Chiluk <chiluk@canonical.com>
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
It is possible that iput is skipped after iget during the recovery.
In recover_dentry(),
dir = f2fs_iget();
...
if (de && inode->i_ino == le32_to_cpu(de->ino))
goto out;
In this case, this dir is not able to be added in dirty_dir_inode_list.
The actual linking is done only when set_page_dirty() is called.
So let's add this newly got inode into the list explicitly, and put it at the
end of the recovery routine.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Pull more xfs updates from Ben Myers:
"Here are several fixes for filesystems with CRC support turned on:
fixes for quota, remote attributes, and recovery. There is also some
feature work related to CRCs: the implementation of CRCs for the inode
unlinked lists, disabling noattr2/attr2 options when appropriate, and
bumping the maximum number of ACLs.
I would have preferred to defer this last category of items to 3.11.
This would require setting a feature bit for the on-disk changes, so
there is some pressure to get these in 3.10. I believe this
represents the end of the CRC related queue.
- Rework of dquot CRCs
- Fix for remote attribute invalidation of a leaf
- Fix ordering of transaction replay in recovery
- Implement CRCs for inode unlinked list
- Disable noattr2/attr2 mount options when CRCs are enabled
- Bump the limitation of ACL entries for v5 superblocks"
* tag 'for-linus-v3.10-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: increase number of ACL entries for V5 superblocks
xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
xfs: inode unlinked list needs to recalculate the inode CRC
xfs: fix log recovery transaction item reordering
xfs: fix remote attribute invalidation for a leaf
xfs: rework dquot CRCs
|
|
Rename ext4_da_writepages() to ext4_writepages() and use it for all
modes. We still need to iterate over all the pages in the case of
data=journalling, but in the case of nodelalloc/data=ordered (which is
what file systems mounted using ext3 backwards compatibility will use)
this will allow us to use a much more efficient I/O submission path.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The limit of 25 ACL entries is arbitrary, but baked into the on-disk
format. For version 5 superblocks, increase it to the maximum nuber
of ACLs that can fit into a single xattr.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 5c87d4bc1a86bd6e6754ac3d6e111d776ddcfe57)
|
|
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit d3eaace84e40bf946129e516dcbd617173c1cf14)
|
|
The inode unlinked list manipulations operate directly on the inode
buffer, and so bypass the inode CRC calculation mechanisms. Hence an
inode on the unlinked list has an invalid CRC. Fix this by
recalculating the CRC whenever we modify an unlinked list pointer in
an inode, ncluding during log recovery. This is trivial to do and
results in unlinked list operations always leaving a consistent
inode in the buffer.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 0a32c26e720a8b38971d0685976f4a7d63f9e2ef)
|
|
There are several constraints that inode allocation and unlink
logging impose on log recovery. These all stem from the fact that
inode alloc/unlink are logged in buffers, but all other inode
changes are logged in inode items. Hence there are ordering
constraints that recovery must follow to ensure the correct result
occurs.
As it turns out, this ordering has been working mostly by chance
than good management. The existing code moves all buffers except
cancelled buffers to the head of the list, and everything else to
the tail of the list. The problem with this is that is interleaves
inode items with the buffer cancellation items, and hence whether
the inode item in an cancelled buffer gets replayed is essentially
left to chance.
Further, this ordering causes problems for log recovery when inode
CRCs are enabled. It typically replays the inode unlink buffer long before
it replays the inode core changes, and so the CRC recorded in an
unlink buffer is going to be invalid and hence any attempt to
validate the inode in the buffer is going to fail. Hence we really
need to enforce the ordering that the inode alloc/unlink code has
expected log recovery to have since inode chunk de-allocation was
introduced back in 2003...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit a775ad778073d55744ed6709ccede36310638911)
|
|
When invalidating an attribute leaf block block, there might be
remote attributes that it points to. With the recent rework of the
remote attribute format, we have to make sure we calculate the
length of the attribute correctly. We aren't doing that in
xfs_attr3_leaf_inactive(), so fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinuguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 59913f14dfe8eb772ff93eb442947451b4416329)
|
|
Calculating dquot CRCs when the backing buffer is written back just
doesn't work reliably. There are several places which manipulate
dquots directly in the buffers, and they don't calculate CRCs
appropriately, nor do they always set the buffer up to calculate
CRCs appropriately.
Firstly, if we log a dquot buffer (e.g. during allocation) it gets
logged without valid CRC, and so on recovery we end up with a dquot
that is not valid.
Secondly, if we recover/repair a dquot, we don't have a verifier
attached to the buffer and hence CRCs are not calculated on the way
down to disk.
Thirdly, calculating the CRC after we've changed the contents means
that if we re-read the dquot from the buffer, we cannot verify the
contents of the dquot are valid, as the CRC is invalid.
So, to avoid all the dquot CRC errors that are being detected by the
read verifier, change to using the same model as for inodes. That
is, dquot CRCs are calculated and written to the backing buffer at
the time the dquot is flushed to the backing buffer. If we modify
the dquot directly in the backing buffer, calculate the CRC
immediately after the modification is complete. Hence the dquot in
the on-disk buffer should always have a valid CRC.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 6fcdc59de28817d1fbf1bd58cc01f4f3fac858fb)
|
|
The test_root() function could potentially loop forever due to
overflow issues. So rewrite test_root() to avoid this issue; as a
bonus, it is 38% faster when benchmarked via a test loop:
int main(int argc, char **argv)
{
int i;
for (i = 0; i < 1 << 24; i++) {
if (test_root(i, 7))
printf("%d\n", i);
}
}
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The group number passed to ext4_get_group_info() should be valid, but
let's add an assert to check this before we start creating a pointer
based on that group number and dereferencing it.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Check the group number for sanity earilier, before calling routines
such as ext4_bg_has_super() or ext4_group_overhead_blocks().
Reported-by: Jonathan Salwan <jonathan.salwan@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The bio_alloc() function can return NULL if the memory allocation
fails. So we need to check for this.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If kthread_run() fails, init_threads() returns
IS_ERR(p) instead of PTR_ERR(p).
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Fix the function get_victim_by_default, where it checks
for the condition that p.min_segno != NULL_SEGNO as
shown:
if (p.min_segno != NULL_SEGNO)
goto got_it;
and if above condition is true then
got_it:
if (p.min_segno != NULL_SEGNO) {
So this condition is being checked twice. Hence move the goto
statement after the if condition so that duplication of condition
check is avoided.
Also this function makes a call to get_max_cost() to compute
the max cost based on the f2fs_sbi_info and victim policy. Since
get_max_cost depends on on three parameters of victim_sel_policy
=> alloc_mode, gc_mode & ofs_unit, once this victim policy is
initialised, these value will not change till the execution
time of get_victim_by_default() & also f2fs_sbi_info structure
parameters will not change.
Hence making calls to get_max_cost() in while loop does not seems to
be a good point. Instead we can call it once in begining and store
the results in local variable, which later can serve our purpose
for comparing the cost with max cost inside the while loop.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|