summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-11-05Merge branch 'jk/jbd2-revoke-overflow'Theodore Ts'o
2019-11-05jbd2: Fine tune estimate of necessary descriptor blocksJan Kara
Currently we reserve j_max_transaction_buffers / 32 for transaction descriptor blocks. Now that revoke descriptors are accounted for separately this estimate is unnecessarily high and we can actually compute much tighter estimate. In the common case of 32k journal blocks and 4k blocksize this actually reduces the amount of reserved descriptor blocks from 256 to ~25 which allows us to fit more real data into a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-25-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Provide trace event for handle restartsJan Kara
Provide trace event for handle restarts to ease debugging. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-24-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Reserve revoke credits for freed blocksJan Kara
So far we have reserved only relatively high fixed amount of revoke credits for each transaction. We over-reserved by large amount for most cases but when freeing large directories or files with data journalling, the fixed amount is not enough. In fact the worst case estimate is inconveniently large (maximum extent size) for freeing of one extent. We fix this by doing proper estimate of the amount of blocks that need to be revoked when removing blocks from the inode due to truncate or hole punching and otherwise reserve just a small amount of revoke credits for each transaction to accommodate freeing of xattrs block or so. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Make credit checking more strictJan Kara
Make checking of available credits in jbd2_journal_dirty_metadata() more strict. There should be always enough credits in the handle to write all potential revoke descriptors. Also we warn in case there are not enough credits since this is a bug in the filesystem. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-22-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Rename h_buffer_credits to h_total_creditsJan Kara
The credit counter now contains both buffer and revoke descriptor block credits. Rename to counter to h_total_credits to reflect that. No functional change. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-21-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Reserve space for revoke descriptor blocksJan Kara
Extend functions for starting, extending, and restarting transaction handles to take number of revoke records handle must be able to accommodate. These functions then make sure transaction has enough credits to be able to store resulting revoke descriptor blocks. Also revoke code tracks number of revoke records created by a handle to catch situation where some place didn't reserve enough space for revoke records. Similarly to standard transaction credits, space for unused reserved revoke records is released when the handle is stopped. On the ext4 side we currently take a simplistic approach of reserving space for 1024 revoke records for any transaction. This grows amount of credits reserved for each handle only by a few and is enough for any normal workload so that we don't hit warnings in jbd2. We will refine the logic in following commits. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Drop jbd2_space_needed()Jan Kara
The function is now just a trivial wrapper returning journal->j_max_transaction_buffers. Drop it. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-19-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Account descriptor blocks into t_outstanding_creditsJan Kara
Currently, journal descriptor blocks were not accounted in transaction->t_outstanding_credits and we were just leaving some slack space in the journal for them (in jbd2_log_space_left() and jbd2_space_needed()). This is making proper accounting (and reservation we want to add) of descriptor blocks difficult so switch to accounting descriptor blocks in transaction->t_outstanding_credits and just reserve the same amount of credits in t_outstanding credits for journal descriptor blocks when creating transaction. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Factor out common parts of stopping and restarting a handleJan Kara
jbd2__journal_restart() has quite some code that is common with jbd2_journal_stop(). Factor this functionality into stop_this_handle() helper and use it from both functions. Note that this also drops t_handle_lock protection from jbd2__journal_restart() as jbd2_journal_stop() does the same thing without it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-17-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Drop pointless wakeup from jbd2_journal_stop()Jan Kara
When we drop last handle from a transaction and journal->j_barrier_count > 0, jbd2_journal_stop() wakes up journal->j_wait_transaction_locked wait queue. This looks pointless - wait for outstanding handles always happens on journal->j_wait_updates waitqueue. journal->j_wait_transaction_locked is used to wait for transaction state changes and by start_this_handle() for waiting until journal->j_barrier_count drops to 0. The first case is clearly irrelevant here since only jbd2 thread changes transaction state. The second case looks related but jbd2_journal_unlock_updates() is responsible for the wakeup in this case. So just drop the wakeup. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-16-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Drop pointless check from jbd2_journal_stop()Jan Kara
If a transaction is larger than journal->j_max_transaction_buffers, that is a bug and not a trigger for transaction commit. Also the very next attempt to start new handle will start transaction commit anyway. So just remove the pointless check. Arguably, we could start transaction commit whenever the transaction size is *close* to journal->j_max_transaction_buffers. This has a potential to reduce latency of the next jbd2_journal_start() at the cost of somewhat smaller transactions. However for this to have any effect, it would mean that there isn't someone already waiting in jbd2_journal_start() which means metadata load for the fs is pretty light anyway so probably this optimization is not worth it. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-15-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Reorganize jbd2_journal_stop()Jan Kara
Move code in jbd2_journal_stop() around a bit. It removes some unnecessary code duplication and will make factoring out parts common with jbd2__journal_restart() easier. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-14-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Fix statistics for the number of logged blocksJan Kara
jbd2 statistics counting number of blocks logged in a transaction was wrong. It didn't count the commit block and more importantly it didn't count revoke descriptor blocks. Make sure these get properly counted. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-13-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ocfs2: Use accessor function for h_buffer_creditsJan Kara
Use the jbd2 accessor function for h_buffer_credits. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-12-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4, jbd2: Provide accessor function for handle creditsJan Kara
Provide accessor function to get number of credits available in a handle and use it from ext4. Later, computation of available credits won't be so straightforward. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-11-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Provide function to handle transaction restartsJan Kara
Provide ext4_journal_ensure_credits_fn() function to ensure transaction has given amount of credits and call helper function to prepare for restarting a transaction. This allows to remove some boilerplate code from various places, add proper error handling for the case where transaction extension or restart fails, and reduces following changes needed for proper revoke record reservation tracking. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Avoid unnecessary revokes in ext4_alloc_branch()Jan Kara
Error cleanup path in ext4_alloc_branch() calls ext4_forget() on freshly allocated indirect blocks with 'metadata' set to 1. This results in generating revoke records for these blocks. However this is unnecessary as the freed blocks are only allocated in the current transaction and thus they will never be journalled. Make this cleanup path similar to e.g. cleanup in ext4_splice_branch() and use ext4_free_blocks() to handle block forgetting by passing EXT4_FREE_BLOCKS_FORGET and not EXT4_FREE_BLOCKS_METADATA to ext4_free_blocks(). This also allows allocating transaction not to reserve any credits for revoke records. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-9-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Use ext4_journal_extend() instead of jbd2_journal_extend()Jan Kara
Use ext4 helper ext4_journal_extend() instead of opencoding it in ext4_try_to_expand_extra_isize(). Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-8-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Fix ext4_should_journal_data() for EA inodesJan Kara
Similarly to directories, EA inodes do only journalled modifications to their data. Change ext4_should_journal_data() to return true for them so that we don't have to special-case them during truncate. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Fix credit estimate for final inode freeingJan Kara
Estimate for the number of credits needed for final freeing of inode in ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for orphan deletion, bitmap & group descriptor for inode freeing) and not just 3. [ Fixed minor whitespace nit. -- TYT ] Fixes: e50e5129f384 ("ext4: xattr-in-inode support") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-6-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05io-wq: use proper nesting IRQ disabling spinlocks for cancelJens Axboe
We don't know what context we'll be called in for cancel, it could very well be with IRQs disabled already. Use the IRQ saving variants of the locking primitives. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-05ext4: introduce direct I/O write using iomap infrastructureMatthew Bobrowski
This patch introduces a new direct I/O write path which makes use of the iomap infrastructure. All direct I/O writes are now passed from the ->write_iter() callback through to the new direct I/O handler ext4_dio_write_iter(). This function is responsible for calling into the iomap infrastructure via iomap_dio_rw(). Code snippets from the existing direct I/O write code within ext4_file_write_iter() such as, checking whether the I/O request is unaligned asynchronous I/O, or whether the write will result in an overwrite have effectively been moved out and into the new direct I/O ->write_iter() handler. The block mapping flags that are eventually passed down to ext4_map_blocks() from the *_get_block_*() suite of routines have been taken out and introduced within ext4_iomap_alloc(). For inode extension cases, ext4_handle_inode_extension() is effectively the function responsible for performing such metadata updates. This is called after iomap_dio_rw() has returned so that we can safely determine whether we need to potentially truncate any allocated blocks that may have been prepared for this direct I/O write. We don't perform the inode extension, or truncate operations from the ->end_io() handler as we don't have the original I/O 'length' available there. The ->end_io() however is responsible fo converting allocated unwritten extents to written extents. In the instance of a short write, we fallback and complete the remainder of the I/O using buffered I/O via ext4_buffered_write_iter(). The existing buffer_head direct I/O implementation has been removed as it's now redundant. [ Fix up ext4_dio_write_iter() per Jan's comments at https://lore.kernel.org/r/20191105135932.GN22379@quack2.suse.cz -- TYT ] Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/e55db6f12ae6ff017f36774135e79f3e7b0333da.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Do not iput inode under running transactionJan Kara
When ext4_mkdir(), ext4_symlink(), ext4_create(), or ext4_mknod() fail to add entry into directory, it ends up dropping freshly created inode under the running transaction and thus inode truncation happens under that transaction. That breaks assumptions that evict() does not get called from a transaction context and at least in ext4_symlink() case it can result in inode eviction deadlocking in inode_wait_for_writeback() when flush worker finds symlink inode, starts to write it back and blocks on starting a transaction. So change the code in ext4_mkdir() and ext4_add_nondir() to drop inode reference only after the transaction is stopped. We also have to add inode to the orphan list in that case as otherwise the inode would get leaked in case we crash before inode deletion is committed. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: Move marking of handle as sync to ext4_add_nondir()Jan Kara
Every caller of ext4_add_nondir() marks handle as sync if directory has DIRSYNC set. Move this marking to ext4_add_nondir() so reduce some duplication. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Completely fill journal descriptor blocksJan Kara
With 32-bit block numbers, we don't allocate the array for journal buffer heads large enough for corresponding descriptor tags to fill the descriptor block. Thus we end up writing out half-full descriptor blocks to the journal unnecessarily growing the transaction. Fix the logic to allocate the array large enough. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05jbd2: Fixup stale comment in commit codeJan Kara
jbd2_journal_next_log_block() does not look at transaction->t_outstanding_credits. Remove the misleading comment. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: update ext4_sync_file() to not use __generic_file_fsync()Matthew Bobrowski
When the filesystem is created without a journal, we eventually call into __generic_file_fsync() in order to write out all the modified in-core data to the permanent storage device. This function happens to try and obtain an inode_lock() while synchronizing the files buffer and it's associated metadata. Generally, this is fine, however it becomes a problem when there is higher level code that has already obtained an inode_lock() as this leads to a recursive lock situation. This case is especially true when porting across direct I/O to iomap infrastructure as we obtain an inode_lock() early on in the I/O within ext4_dio_write_iter() and hold it until the I/O has been completed. Consequently, to not run into this specific issue, we move away from calling into __generic_file_fsync() and perform the necessary synchronization tasks within ext4_sync_file(). Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/3495f35ef67f2021b567e28e6f59222e583689b8.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: move inode extension check out from ext4_iomap_alloc()Matthew Bobrowski
Lift the inode extension/orphan list handling code out from ext4_iomap_alloc() and apply it within the ext4_dax_write_iter(). Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/fd5c84db25d5d0da87d97ed4c36fd844f57da759.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: move inode extension/truncate code out from ->iomap_end() callbackMatthew Bobrowski
In preparation for implementing the iomap direct I/O modifications, the inode extension/truncate code needs to be moved out from the ext4_iomap_end() callback. For direct I/O, if the current code remained, it would behave incorrrectly. Updating the inode size prior to converting unwritten extents would potentially allow a racing direct I/O read to find unwritten extents before being converted correctly. The inode extension/truncate code now resides within a new helper ext4_handle_inode_extension(). This function has been designed so that it can accommodate for both DAX and direct I/O extension/truncate operations. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/d41ffa26e20b15b12895812c3cad7c91a6a59bc6.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: introduce direct I/O read using iomap infrastructureMatthew Bobrowski
This patch introduces a new direct I/O read path which makes use of the iomap infrastructure. The new function ext4_do_read_iter() is responsible for calling into the iomap infrastructure via iomap_dio_rw(). If the read operation performed on the inode is not supported, which is checked via ext4_dio_supported(), then we simply fallback and complete the I/O using buffered I/O. Existing direct I/O read code path has been removed, as it is now redundant. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/f98a6f73fadddbfbad0fc5ed04f712ca0b799f37.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: introduce new callback for IOMAP_REPORTMatthew Bobrowski
As part of the ext4_iomap_begin() cleanups that precede this patch, we also split up the IOMAP_REPORT branch into a completely separate ->iomap_begin() callback named ext4_iomap_begin_report(). Again, the raionale for this change is to reduce the overall clutter within ext4_iomap_begin(). Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/5c97a569e26ddb6696e3d3ac9fbde41317e029a0.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helperMatthew Bobrowski
In preparation for porting across the ext4 direct I/O path over to the iomap infrastructure, split up the IOMAP_WRITE branch that's currently within ext4_iomap_begin() into a separate helper ext4_alloc_iomap(). This way, when we add in the necessary code for direct I/O, we don't end up with ext4_iomap_begin() becoming a monstrous twisty maze. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/50eef383add1ea529651640574111076c55aca9f.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: move set iomap routines into a separate helper ext4_set_iomap()Matthew Bobrowski
Separate the iomap field population code that is currently within ext4_iomap_begin() into a separate helper ext4_set_iomap(). The intent of this function is self explanatory, however the rationale behind taking this step is to reeduce the overall clutter that we currently have within the ext4_iomap_begin() callback. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/1ea34da65eecffcddffb2386668ae06134e8deaf.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: iomap that extends beyond EOF should be marked dirtyMatthew Bobrowski
This patch addresses what Dave Chinner had discovered and fixed within commit: 7684e2c4384d. This changes does not have any user visible impact for ext4 as none of the current users of ext4_iomap_begin() that extend files depend on IOMAP_F_DIRTY. When doing a direct IO that spans the current EOF, and there are written blocks beyond EOF that extend beyond the current write, the only metadata update that needs to be done is a file size extension. However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that there is IO completion metadata updates required, and hence we may fail to correctly sync file size extensions made in IO completion when O_DSYNC writes are being used and the hardware supports FUA. Hence when setting IOMAP_F_DIRTY, we need to also take into account whether the iomap spans the current EOF. If it does, then we need to mark it dirty so that IO completion will call generic_write_sync() to flush the inode size update to stable storage correctly. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/8b43ee9ee94bee5328da56ba0909b7d2229ef150.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: update direct I/O read lock pattern for IOCB_NOWAITMatthew Bobrowski
This patch updates the lock pattern in ext4_direct_IO_read() to not block on inode lock in cases of IOCB_NOWAIT direct I/O reads. The locking condition implemented here is similar to that of 942491c9e6d6 ("xfs: fix AIM7 regression"). Fixes: 16c54688592c ("ext4: Allow parallel DIO reads") Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c5d5e759f91747359fbd2c6f9a36240cf75ad79f.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05ext4: reorder map.m_flags checks within ext4_iomap_begin()Matthew Bobrowski
For the direct I/O changes that follow in this patch series, we need to accommodate for the case where the block mapping flags passed through to ext4_map_blocks() result in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits set. In order for any allocated unwritten extents to be converted correctly in the ->end_io() handler, the iomap->type must be set to IOMAP_UNWRITTEN for cases where the EXT4_MAP_UNWRITTEN bit has been set within m_flags. Hence the reason why we need to reshuffle this conditional statement around. This change is a no-op for DAX as the block mapping flags passed through to ext4_map_blocks() i.e. EXT4_GET_BLOCKS_CREATE_ZERO never results in both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN being set at once. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/1309ad80d31a637b2deed55a85283d582a54a26a.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-05Merge branch 'iomap-for-next' into mb/dioTheodore Ts'o
2019-11-05xfs: make the assertion message functions take a mount parameterDarrick J. Wong
Make the assfail and asswarn functions take a struct xfs_mount so that we can start tying debugging and corruption messages to a particular mount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-05xfs: add missing assert in xfs_fsmap_owner_from_rmapDarrick J. Wong
The fsmap handler shouldn't fail silently if the rmap code ever feeds it a special owner number that isn't known to the fsmap handler. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-05xfs: decrease indenting problems in xfs_dabuf_mapDarrick J. Wong
Refactor the code that complains when a dir/attr mapping doesn't exist but the caller requires a mapping. This small restructuring helps us to reduce the indenting level. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-05xfs: fold xfs_mount-alloc() into xfs_init_fs_context()Ian Kent
After switching to use the mount-api the only remaining caller of xfs_mount_alloc() is xfs_init_fs_context(), so fold xfs_mount_alloc() into it. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: move xfs_fc_parse_param() above xfs_fc_get_tree()Ian Kent
Grouping the options parsing and mount handling functions above the struct fs_context_operations but below the struct super_operations should improve (some) the grouping of the super operations while also improving the grouping of the options parsing and mount handling code. Lastly move xfs_fc_parse_param() and related functions down to above xfs_fc_get_tree() and it's related functions. But leave the options enum, struct fs_parameter_spec and the struct fs_parameter_description declarations at the top since that's the logical place for them. This is a straight code move, there aren't any functional changes. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: move xfs_fc_get_tree() above xfs_fc_reconfigure()Ian Kent
Grouping the options parsing and mount handling functions above the struct fs_context_operations but below the struct super_operations should improve (some) the grouping of the super operations while also improving the grouping of the options parsing and mount handling code. Now move xfs_fc_get_tree() and friends, also take the oppertunity to change STATIC to static for the xfs_fs_put_super() function. This is a straight code move, there aren't any functional changes. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: move xfs_fc_reconfigure() above xfs_fc_free()Ian Kent
Grouping the options parsing and mount handling functions above the struct fs_context_operations but below the struct super_operations should improve (some) the grouping of the super operations while also improving the grouping of the options parsing and mount handling code. Start by moving xfs_fc_reconfigure() and friends. This is a straight code move, there aren't any functional changes. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: switch to use the new mount-apiIan Kent
Define the struct fs_parameter_spec table that's used by the new mount-api for options parsing. Create the various fs context operations methods and define the fs_context_operations struct. Create the fs context initialization method and update the struct file_system_type to utilize it. The initialization function is responsible for working storage initialization, allocation and initialization of file system private information storage and for setting the operations in the fs context. Also set struct file_system_type .parameters to the newly defined struct fs_parameter_spec options parsing table for use by the fs context methods and remove unused code. [darrick: add a comment pointing out the one place where mp->m_super is null] Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: dont set sb in xfs_mount_alloc()Ian Kent
When changing to use the new mount api the super block won't be available when the xfs_mount struct is allocated so move setting the super block in xfs_mount to xfs_fs_fill_super(). Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: move xfs_parseargs() validation to a helperIan Kent
Move the validation code of xfs_parseargs() into a helper for later use within the mount context methods. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: refactor xfs_parseags()Ian Kent
Refactor xfs_parseags(), move the entire token case block to a separate function in an attempt to highlight the code that actually changes in converting to use the new mount api. Also change the break in the switch to a return in the factored out xfs_fc_parse_param() function. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-05xfs: avoid redundant checks when options is emptyIan Kent
When options passed to xfs_parseargs() is NULL the checks performed after taking the branch are made with the initial values of dsunit, dswidth and iosizelog. But all the checks do nothing in this case so return immediately instead. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>