Age | Commit message (Collapse) | Author |
|
Currently, kernel uses strictly 512-byte sectors for EFI GPT parsing.
That's wrong.
UEFI standard (version 2.3, May 2009, 5.3.1 GUID Format overview, page
95) defines that LBA is always based on the logical block size. It
means bdev_logical_block_size() (aka BLKSSZGET) for Linux.
This patch removes static sector size from EFI GPT parser.
The problem is reproducible with the latest GNU Parted:
# modprobe scsi_debug dev_size_mb=50 sector_size=4096
# ./parted /dev/sdb print
Model: Linux scsi_debug (scsi)
Disk /dev/sdb: 52.4MB
Sector size (logical/physical): 4096B/4096B
Partition Table: gpt
Number Start End Size File system Name Flags
1 24.6kB 3002kB 2978kB primary
2 3002kB 6001kB 2998kB primary
3 6001kB 9003kB 3002kB primary
# blockdev --rereadpt /dev/sdb
# dmesg | tail -1
sdb: unknown partition table <---- !!!
with this patch:
# blockdev --rereadpt /dev/sdb
# dmesg | tail -1
sdb: sdb1 sdb2 sdb3
Signed-off-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
The block validity framework does a more comprehensive set of checks,
and it saves object code space to use the ext4_data_block_valid() than
the limited open-coded version that had been in ext4_free_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Add the facility for ext4_forget() to be called from
ext4_free_blocks(). This simplifies the code in a large number of
places, and centralizes most of the work of calling ext4_forget() into
a single place.
Also fix a bug in the extents migration code; it wasn't calling
ext4_forget() when releasing the indirect blocks during the
conversion. As a result, if the system cashed during or shortly after
the extents migration, and the released indirect blocks get reused as
data blocks, the journal replay would corrupt the data blocks. With
this new patch, fixing this bug was as simple as adding the
EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
|
|
ext4_mb_free_blocks() is only called by ext4_free_blocks(), and the
latter function doesn't really do much. So merge the two functions
together, such that ext4_free_blocks() is now found in
fs/ext4/mballoc.c. This saves about 200 bytes of compiled text space.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Convert the last two callers of ext4_journal_forget() to use
ext4_forget() instead, and then fold ext4_journal_forget() into
ext4_forget(). This reduces are code complexity and shortens our call
stack.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The only caller of ext4_journal_revoke() is ext4_forget(), so we can
fold ext4_journal_revoke() into ext4_forget() to simplify the code and
shorten the call stack.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The ext4_forget() function better belongs in ext4_jbd2.c. This will
allow us to do some cleanup of the ext4_journal_revoke() and
ext4_journal_forget() functions, as well as giving us better error
reporting since we can report the caller of ext4_forget() when things
go wrong.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Currently shipping discard capable SSDs and arrays have rather sub-optimal
implementations of the command and can the use of it can cause massive
slowdowns. Make issueing these commands option as it's already in btrfs
and gfs2.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[hirofumi@mail.parknet.co.jp: tweaks, and add "discard" to fat_show_options]
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
|
|
Provide nop fscache_stat_d() macro if CONFIG_FSCACHE_STATS=n lest errors like
the following occur:
fs/fscache/cache.c: In function 'fscache_withdraw_cache':
fs/fscache/cache.c:386: error: implicit declaration of function 'fscache_stat_d'
fs/fscache/cache.c:386: error: 'fscache_n_cop_sync_cache' undeclared (first use in this function)
fs/fscache/cache.c:386: error: (Each undeclared identifier is reported only once
fs/fscache/cache.c:386: error: for each function it appears in.)
fs/fscache/cache.c:392: error: 'fscache_n_cop_dissociate_pages' undeclared (first use in this function)
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
GFS2 has been altered to pass THIS_MODULE to slow_work_register_user(), but
hasn't been altered to #include <linux/module.h> to provide it, resulting in
the following error:
fs/gfs2/recovery.c:596: error: 'THIS_MODULE' undeclared here (not in a function)
Add the missing #include.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
As of the patch:
SLOW_WORK: Wait for outstanding work items belonging to a module to clear
Wait for outstanding slow work items belonging to a module to clear
when unregistering that module as a user of the facility. This
prevents the put_ref code of a work item from being taken away before
it returns.
slow_work_register_user() takes a module pointer as an argument. CIFS must now
pass THIS_MODULE as that argument, lest the following error be observed:
fs/cifs/cifsfs.c: In function 'init_cifs':
fs/cifs/cifsfs.c:1040: error: too few arguments to function 'slow_work_register_user'
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that
nobody can modify the tree in the middle of its work. Now that we
kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS
here because we already hold the reiserfs lock.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Trivial cleanup of jbd compatibility layer removal
ocfs2: Refresh documentation
ocfs2: return f_fsid info in ocfs2_statfs()
ocfs2: duplicate inline data properly during reflink.
ocfs2: Move ocfs2_complete_reflink to the right place.
ocfs2: Return -EINVAL when a device is not ocfs2.
|
|
This adds "norecovery" mount option which disables temporal write
access to read-only mounts or snapshots during mount/recovery.
Without this option, write access will be even performed for those
types of mounts; the temporal write access is needed to mount root
file system read-only after an unclean shutdown.
This option will be helpful when user wants to prevent any write
access to the device.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Eric Sandeen <sandeen@redhat.com>
|
|
This adds a helper function, nilfs_valid_fs() which returns if nilfs
is in a valid state or not.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Although mount recovery of nilfs is integrated in load_nilfs()
procedure, the completion of recovery was isolated from the procedure
and performed at the end of the fill_super routine.
This was somewhat confusing since the recovery is needed for the nilfs
object, not for a super block instance.
To resolve the inconsistency, this will integrate the recovery
completion into load_nilfs().
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This inserts readahead in the recovery code. The readahead request is
issued per segment while searching the latest super root block.
This will shorten mount time after unclean unmount. A measurement
shows the recovery time was reduced by more than 60 percent:
e.g. real 0m11.586s -> 0m3.918s (x 2.96)
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This eliminates obsolete nilfs_get_sufile_get_segment_usage() and
nilfs_set_sufile_segment_usage() from sufile.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This adds nilfs_sufile_set_segment_usage() function in sufile to
replace direct access to the sufile metadata in log writer code.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This adds nilfs_sufile_mark_dirty() function in sufile to replace
nilfs_touch_segusage() function in log writer code. This is a
preparation for the further cleanup which will move out low level
sufile operations in the log writer.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This implements cache operation in get block routines of palloc code:
nilfs_palloc_get_desc_block(), nilfs_palloc_get_bitmap_block(), and
nilfs_palloc_get_entry_block().
This will complete the palloc cache.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This adds the palloc cache to ifile. The palloc cache is allocated on
the extended region of nilfs_mdt_info struct. The struct
nilfs_ifile_info defines the extended on memory structure of ifile.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Data pages in gcdat metadata file (i.e. the secondary DAT for GC), are
cleared or even moved back to the normal DAT when a shot of garbage
collection was done.
Buffer heads held by the palloc cache of gcdat must be cleared before
these page cache manipulation. This adds nilfs_palloc_clear_cache()
to ensure this.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This adds the palloc cache to DAT file. The palloc cache is allocated
on the extended region of nilfs_mdt_info struct. The struct
nilfs_dat_info defines the extended on memory structure of DAT.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This adds setup and cleanup routines of the persistent object
allocator cache.
According to ftrace analyses, accessing buffers of the DAT file
suffers indispensable overhead many times. To mitigate the overhead,
This introduce cache framework for the persistent object allocator
(palloc) which the DAT file and ifile are using.
struct nilfs_palloc_cache represents the cache object per metadata
file using palloc.
The cache is initialized through nilfs_palloc_setup_cache() and
destroyed by nilfs_palloc_destroy_cache(); callers of the former
function will be added to individual allocators of DAT and ifile on
successive patches.
nilfs_palloc_destroy_cache() will be called from nilfs_mdt_destroy()
if the cache is attached to a metadata file. A companion function
nilfs_palloc_clear_cache() is provided to allow releasing buffer head
references independently with the cleanup task. This adjunctive
function will be used before invalidating pages of metadata file with
the cache.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This expands a trivial address calculation in the function into its
every callsite. This expansion improves readability of the callers.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This removes the obsolete nilfs_btnode_get() function and makes
nilfs_btree_get_block() directly call nilfs_btnode_submit_block().
This expansion will provide better opportunity for code optimization.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This removes the obsolete argument from nilfs_btnode_submit_block().
This will complete separating a create function of btree node.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This displaces nilfs_btnode_get() use to create new btree node block
with nilfs_btnode_create_block.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Adds a separate routine for creating a btree node block. This is a
preparation to reduce the depth of function calls during submitting
btree node buffer.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This turns off readhead action of metadata file if nilfs_mdt_get_block
function was called with a create flag.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Previously, this function took an status code to return possible error
codes. The ("nilfs2: add local variable to cache the number of clean
segments") patch removed the possibility to return errors.
So, this simplifies the function definition to make it directly return
the number of clean segments.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This makes it possible for sufile to get the number of clean segments
faster.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This unfolds the nilfs_sufile_block_get_header() function for
simplicity.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This will hide a function call of nilfs_mdt_clear() in
nilfs_mdt_destroy().
This ensures nilfs_mdt_destroy() to do cleanup jobs included in
nilfs_mdt_clear().
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Removes two inline functions: nilfs_mdt_read_inode_direct() and
nilfs_mdt_write_inode_direct().
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Will displace nilfs_mdt_read_inode_direct function with an individual
read method: nilfs_dat_read, nilfs_sufile_read, nilfs_cpfile_read.
This provides the opportunity to initialize local variables of each
metadata file after reading the inode.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This will displace nilfs_mdt_new() constructor with individual
metadata file constructors like nilfs_dat_new(), new_sufile_new(),
nilfs_cpfile_new(), and nilfs_ifile_new().
This makes it possible for each metadata file to have own
intialization code.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This adds an optional "object size" argument to nilfs_mdt_new_common()
function; the argument specifies the size of private object attached
to a newly allocated metadata file inode.
This will afford space to keep local variables for meta data files.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called
mark_inode_dirty() after they changed the number of data blocks.
This moves these calls outside bmap outermost functions like
nilfs_bmap_insert() or nilfs_bmap_truncate().
This will mitigate overhead for truncate or delete operation since
they repeatedly remove set of blocks. Nearly 10 percent improvement
was observed for removal of a large file:
# dd if=/dev/zero of=/test/aaa bs=1M count=512
# time rm /test/aaa
real 2.968s -> 2.705s
Further optimization may be possible by eliminating these
mark_inode_dirty() uses though I avoid mixing separate changes here.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Since metadata file routines mark the inode dirty after they
successfully changed bmap objects, nilfs_mdt_mark_dirty() calls in
nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() are redundant.
This removes these overlapping calls from the bmap routines.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
lock_buffer() and unlock_buffer() uses in btree.c are eliminable
because btree functions gain buffer heads through nilfs_btnode_get(),
which never returns an on-the-fly buffer.
Although nilfs_clear_dirty_page() and nilfs_copy_back_pages() in
nilfs_commit_gcdat_inode() juggle btree node buffers of DAT, this is
safe because these operations are protected by a log writer lock or
the metadata file semaphore of DAT.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This lock is eliminable because inodes on the buffer can be updated
independently. Although a log writer also fills in bmap data on the
on-disk inodes, this update is exclusively done by a log writer lock.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
match_bool function is not used anymore.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
Since most of fs using nofoobar style option,
modified barrier=off option as nobarrier.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
This is a trivial patch to expose struct nilfs_fs_btree_node.
The struct should be exposed outside of kernel, for it is disk format.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
The current btree lookup routines make a kernel oops when detected
inconsistency in btree blocks. These routines should instead return a
proper error code because the inconsistency usually comes from
corruption of on-disk metadata.
This fixes the issue by converting BUG_ON calls to proper error
handlings.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
SUNRPC: Address buffer overrun in rpc_uaddr2sockaddr()
NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
|
|
Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.
In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.
Also show this status in /proc/mounts
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|