summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-12-30btrfs: punt all bios created in btrfs_submit_compressed_write()Dennis Zhou
Compressed writes happen in the background via kworkers. However, this causes bios to be attributed to root bypassing any cgroup limits from the actual writer. We tag the first bio with REQ_CGROUP_PUNT, which will punt the bio to an appropriate cgroup specific workqueue and attribute the IO properly. However, if btrfs_submit_compressed_write() creates a new bio, we don't tag it the same way. Add the appropriate tagging for subsequent bios. Fixes: ec39f7696ccfa ("Btrfs: use REQ_CGROUP_PUNT for worker thread submitted bios") Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-29Merge tag 'locks-v5.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull /proc/locks formatting fix from Jeff Layton: "This is a trivial fix for a _very_ long standing bug in /proc/locks formatting. Ordinarily, I'd wait for the merge window for something like this, but it is making it difficult to validate some overlayfs fixes. I've also gone ahead and marked this for stable" * tag 'locks-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: print unsigned ino in /proc/locks
2019-12-29Merge tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "One performance fix for large directory searches, and one minor style cleanup noticed by Clang" * tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Optimize readdir on reparse points cifs: Adjust indentation in smb2_open_file
2019-12-29locks: print unsigned ino in /proc/locksAmir Goldstein
An ino is unsigned, so display it as such in /proc/locks. Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-12-28block: add bio_truncate to fix guard_bio_eodMing Lei
Some filesystem, such as vfat, may send bio which crosses device boundary, and the worse thing is that the IO request starting within device boundaries can contain more than one segment past EOD. Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors") tries to fix this issue by returning -EIO for this situation. However, this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb() may hang for ever. Also the current truncating on last segment is dangerous by updating the last bvec, given bvec table becomes not immutable any more, and fs bio users may not retrieve the truncated pages via bio_for_each_segment_all() in its .end_io callback. Fixes this issue by supporting multi-segment truncating. And the approach is simpler: - just update bio size since block layer can make correct bvec with the updated bio size. Then bvec table becomes really immutable. - zero all truncated segments for read bio Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: linux-fsdevel@vger.kernel.org Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors") Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-27Merge tag 'io_uring-5.5-20191226' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Removal of now unused busy wqe list (Hillf) - Add cond_resched() to io-wq work processing (Hillf) - And then the series that I hinted at from last week, which removes the sqe from the io_kiocb and keeps all sqe handling on the prep side. This guarantees that an opcode can't do the wrong thing and read the sqe more than once. This is unchanged from last week, no issues have been observed with this in testing. Hence I really think we should fold this into 5.5. * tag 'io_uring-5.5-20191226' of git://git.kernel.dk/linux-block: io-wq: add cond_resched() to worker thread io-wq: remove unused busy list from io_sqe io_uring: pass in 'sqe' to the prep handlers io_uring: standardize the prep methods io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler io_uring: move all prep state for IORING_OP_CONNECT to prep handler io_uring: add and use struct io_rw for read/writes io_uring: use u64_to_user_ptr() consistently
2019-12-26ext4: Optimize ext4 DIO overwritesJan Kara
Currently we start transaction for mapping every extent for writing using direct IO. This is unnecessary when we know we are overwriting already allocated blocks and the overhead of starting a transaction can be significant especially for multithreaded workloads doing small writes. Use iomap operations that avoid starting a transaction for direct IO overwrites. This improves throughput of 4k random writes - fio jobfile: [global] rw=randrw norandommap=1 invalidate=0 bs=4k numjobs=16 time_based=1 ramp_time=30 runtime=120 group_reporting=1 ioengine=psync direct=1 size=16G filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31 file_service_type=random nrfiles=32 from 3018MB/s to 4059MB/s in my test VM running test against simulated pmem device (note that before iomap conversion, this workload was able to achieve 3708MB/s because old direct IO path avoided transaction start for overwrites as well). For dax, the win is even larger improving throughput from 3042MB/s to 4311MB/s. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191218174433.19380-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26ext4: export information about first/last errors via /sys/fs/ext4/<dev>Theodore Ts'o
Make {first,last}_error_{ino,block,line,func,errcode} available via sysfs. Also add a missing newline for {first,last}_error_time. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26ext4: simulate various I/O and checksum errors when reading metadataTheodore Ts'o
This allows us to test various error handling code paths Link: https://lore.kernel.org/r/20191209012317.59398-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26ext4: save the error code which triggered an ext4_error() in the superblockTheodore Ts'o
This allows the cause of an ext4_error() report to be categorized based on whether it was triggered due to an I/O error, or an memory allocation error, or other possible causes. Most errors are caused by a detected file system inconsistency, so the default code stored in the superblock will be EXT4_ERR_EFSCORRUPTED. Link: https://lore.kernel.org/r/20191204032335.7683-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26Merge branch 'rk/inode_lock' into devTheodore Ts'o
These are ilock patches which helps improve the current inode lock scalabiliy problem in ext4 DIO mixed read/write workload case. The problem was first reported by Joseph [1]. This should help improve mixed read/write workload cases for databases which use directIO. These patches are based upon upstream discussion with Jan Kara & Joseph [2]. The problem really is that in case of DIO overwrites, we start with a exclusive lock and then downgrade it later to shared lock. This causes a scalability problem in case of mixed DIO read/write workload case. i.e. if we have any ongoing DIO reads and then comes a DIO writes, (since writes starts with excl. inode lock) then it has to wait until the shared lock is released (which only happens when DIO read is completed). Same is true for vice versa as well. The same can be easily observed with perf-tools trace analysis [3]. For more details, including performance numbers, please see [4]. [1] https://lore.kernel.org/linux-ext4/1566871552-60946-4-git-send-email-joseph.qi@linux.alibaba.com/ [2] https://lore.kernel.org/linux-ext4/20190910215720.GA7561@quack2.suse.cz/ [3] https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/ext4/perf.report [4] https://lore.kernel.org/r/20191212055557.11151-1-riteshh@linux.ibm.com
2019-12-25Merge branch 'core/kprobes' into perf/core, to pick up a completed branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25Merge tag 'v5.5-rc3' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-24io-wq: add cond_resched() to worker threadHillf Danton
Reschedule the current IO worker to cut the risk that it is becoming a cpu hog. Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-23io-wq: remove unused busy list from io_sqeHillf Danton
Commit e61df66c69b1 ("io-wq: ensure free/busy list browsing see all items") added a list for io workers in addition to the free and busy lists, not only making worker walk cleaner, but leaving the busy list unused. Let's remove it. Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-23cifs: Optimize readdir on reparse pointsPaulo Alcantara (SUSE)
When listing a directory with thounsands of files and most of them are reparse points, we simply marked all those dentries for revalidation and then sending additional (compounded) create/getinfo/close requests for each of them. Instead, upon receiving a response from an SMB2_QUERY_DIRECTORY (FileIdFullDirectoryInformation) command, the directory entries that have a file attribute of FILE_ATTRIBUTE_REPARSE_POINT will contain an EaSize field with a reparse tag in it, so we parse it and mark the dentry for revalidation only if it is a DFS or a symlink. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-23cifs: Adjust indentation in smb2_open_fileNathan Chancellor
Clang warns: ../fs/cifs/smb2file.c:70:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] if (oparms->tcon->use_resilient) { ^ ../fs/cifs/smb2file.c:66:2: note: previous statement is here if (rc) ^ 1 warning generated. This warning occurs because there is a space after the tab on this line. Remove it so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 592fafe644bf ("Add resilienthandles mount parm") Link: https://github.com/ClangBuiltLinux/linux/issues/826 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-22ext4: Move to shared i_rwsem even without dioread_nolock mount optRitesh Harjani
We were using shared locking only in case of dioread_nolock mount option in case of DIO overwrites. This mount condition is not needed anymore with current code, since:- 1. No race between buffered writes & DIO overwrites. Since buffIO writes takes exclusive lock & DIO overwrites will take shared locking. Also DIO path will make sure to flush and wait for any dirty page cache data. 2. No race between buffered reads & DIO overwrites, since there is no block allocation that is possible with DIO overwrites. So no stale data exposure should happen. Same is the case between DIO reads & DIO overwrites. 3. Also other paths like truncate is protected, since we wait there for any DIO in flight to be over. Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191212055557.11151-4-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-22ext4: Start with shared i_rwsem in case of DIO instead of exclusiveRitesh Harjani
Earlier there was no shared lock in DIO read path. But this patch (16c54688592ce: ext4: Allow parallel DIO reads) simplified some of the locking mechanism while still allowing for parallel DIO reads by adding shared lock in inode DIO read path. But this created problem with mixed read/write workload. It is due to the fact that in DIO path, we first start with exclusive lock and only when we determine that it is a ovewrite IO, we downgrade the lock. This causes the problem, since we still have shared locking in DIO reads. So, this patch tries to fix this issue by starting with shared lock and then switching to exclusive lock only when required based on ext4_dio_write_checks(). Other than that, it also simplifies below cases:- 1. Simplified ext4_unaligned_aio API to ext4_unaligned_io. Previous API was abused in the sense that it was not really checking for AIO anywhere also it used to check for extending writes. So this API was renamed and simplified to ext4_unaligned_io() which actully only checks if the IO is really unaligned. Now, in case of unaligned direct IO, iomap_dio_rw needs to do zeroing of partial block and that will require serialization against other direct IOs in the same block. So we take a exclusive inode lock for any unaligned DIO. In case of AIO we also need to wait for any outstanding IOs to complete so that conversion from unwritten to written is completed before anyone try to map the overlapping block. Hence we take exclusive inode lock and also wait for inode_dio_wait() for unaligned DIO case. Please note since we are anyway taking an exclusive lock in unaligned IO, inode_dio_wait() becomes a no-op in case of non-AIO DIO. 2. Added ext4_extending_io(). This checks if the IO is extending the file. 3. Added ext4_dio_write_checks(). In this we start with shared inode lock and only switch to exclusive lock if required. So in most cases with aligned, non-extending, dioread_nolock & overwrites, it tries to write with a shared lock. If not, then we restart the operation in ext4_dio_write_checks(), after acquiring exclusive lock. Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191212055557.11151-3-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-22ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAITRitesh Harjani
Apparently our current rwsem code doesn't like doing the trylock, then lock for real scheme. So change our dax read/write methods to just do the trylock for the RWF_NOWAIT case. This seems to fix AIM7 regression in some scalable filesystems upto ~25% in some cases. Claimed in commit 942491c9e6d6 ("xfs: fix AIM7 regression") Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191212055557.11151-2-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-22ext4: treat buffers contining write errors as valid in ext4_sb_bread()Theodore Ts'o
In commit 7963e5ac9012 ("ext4: treat buffers with write errors as containing valid data") we missed changing ext4_sb_bread() to use ext4_buffer_uptodate(). So fix this oversight. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-22Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "Eric's s_inodes softlockup fixes + Jan's fix for recent regression from pipe rework" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: call fsnotify_sb_delete after evict_inodes fs: avoid softlockups in s_inodes iterators pipe: Fix bogus dereference in iov_iter_alignment()
2019-12-22Merge tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Fix a few bugs that could lead to corrupt files, fsck complaints, and filesystem crashes: - Minor documentation fixes - Fix a file corruption due to read racing with an insert range operation. - Fix log reservation overflows when allocating large rt extents - Fix a buffer log item flags check - Don't allow administrators to mount with sunit= options that will cause later xfs_repair complaints about the root directory being suspicious because the fs geometry appeared inconsistent - Fix a non-static helper that should have been static" * tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Make the symbol 'xfs_rtalloc_log_count' static xfs: don't commit sunit/swidth updates to disk if that would cause repair failures xfs: split the sunit parameter update into two parts xfs: refactor agfl length computation function libxfs: resync with the userspace libxfs xfs: use bitops interface for buf log item AIL flag check xfs: fix log reservation overflows when allocating large rt extents xfs: stabilize insert range start boundary to avoid COW writeback race xfs: fix Sphinx documentation warning
2019-12-22Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "Ext4 bug fixes, including a regression fix" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: clarify impact of 'commit' mount option ext4: fix unused-but-set-variable warning in ext4_add_entry() jbd2: fix kernel-doc notation warning ext4: use RCU API in debug_print_tree ext4: validate the debug_want_extra_isize mount option at parse time ext4: reserve revoke credits in __ext4_new_inode ext4: unlock on error in ext4_expand_extra_isize() ext4: optimize __ext4_check_dir_entry() ext4: check for directory entries too close to block end ext4: fix ext4_empty_dir() for directories with holes
2019-12-22pipe: fix empty pipe check in pipe_write()Jan Stancek
LTP pipeio_1 test is hanging with v5.5-rc2-385-gb8e382a185eb, with read side observing empty pipe and sleeping and write side running out of space and then sleeping as well. In this scenario there are 5 writers and 1 reader. Problem is that after pipe_write() reacquires pipe lock, it re-checks for empty pipe with potentially stale 'head' and doesn't wake up read side anymore. pipe->tail can advance beyond 'head', because there are multiple writers. Use pipe->head for empty pipe check after reacquiring lock to observe current state. Testing: With patch, LTP pipeio_1 ran successfully in loop for 1 hour. Without patch it hanged within a minute. Fixes: 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup logic") Reported-by: Rachel Sibley <rasibley@redhat.com> Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-21ext4: fix unused-but-set-variable warning in ext4_add_entry()Yunfeng Ye
Warning is found when compile with "-Wunused-but-set-variable": fs/ext4/namei.c: In function ‘ext4_add_entry’: fs/ext4/namei.c:2167:23: warning: variable ‘sbi’ set but not used [-Wunused-but-set-variable] struct ext4_sb_info *sbi; ^~~ Fix this by moving the variable @sbi under CONFIG_UNICODE. Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/cb5eb904-224a-9701-c38f-cb23514b1fff@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-20Merge tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Here's a set of fixes that should go into 5.5-rc3 for io_uring. This is bigger than I'd like it to be, mainly because we're fixing the case where an application reuses sqe data right after issue. This really must work, or it's confusing. With 5.5 we're flagging us as submit stable for the actual data, this must also be the case for SQEs. Honestly, I'd really like to add another series on top of this, since it cleans it up considerable and prevents any SQE reuse by design. I posted that here: https://lore.kernel.org/io-uring/20191220174742.7449-1-axboe@kernel.dk/T/#u and may still send it your way early next week once it's been looked at and had some more soak time (does pass all regression tests). With that series, we've unified the prep+issue handling, and only the prep phase even has access to the SQE. Anyway, outside of that, fixes in here for a few other issues that have been hit in testing or production" * tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-block: io_uring: io_wq_submit_work() should not touch req->rw io_uring: don't wait when under-submitting io_uring: warn about unhandled opcode io_uring: read opcode and user_data from SQE exactly once io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable io_uring: make IORING_OP_CANCEL_ASYNC deferrable io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable io_uring: make HARDLINK imply LINK io_uring: any deferred command must have stable sqe data io_uring: remove 'sqe' parameter to the OP helpers that take it io_uring: fix pre-prepped issue with force_nonblock == true io-wq: re-add io_wq_current_is_worker() io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG io_uring: fix stale comment and a few typos
2019-12-20io_uring: pass in 'sqe' to the prep handlersJens Axboe
This moves the prep handlers outside of the opcode handlers, and allows us to pass in the sqe directly. If the sqe is non-NULL, it means that the request should be prepared for the first time. With the opcode handlers not having access to the sqe at all, we are guaranteed that the prep handler has setup the request fully by the time we get there. As before, for opcodes that need to copy in more data then the io_kiocb allows for, the io_async_ctx holds that info. If a prep handler is invoked with req->io set, it must use that to retain information for later. Finally, we can remove io_kiocb->sqe as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: standardize the prep methodsJens Axboe
We currently have a mix of use cases. Most of the newer ones are pretty uniform, but we have some older ones that use different calling calling conventions. This is confusing. For the opcodes that currently rely on the req->io->sqe copy saving them from reuse, add a request type struct in the io_kiocb command union to store the data they need. Prepare for all opcodes having a standard prep method, so we can call it in a uniform fashion and outside of the opcode handler. This is in preparation for passing in the 'sqe' pointer, rather than storing it in the io_kiocb. Once we have uniform prep handlers, we can leave all the prep work to that part, and not even pass in the sqe to the opcode handler. This ensures that we don't reuse sqe data inadvertently. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: read 'count' for IORING_OP_TIMEOUT in prep handlerJens Axboe
Add the count field to struct io_timeout, and ensure the prep handler has read it. Timeout also needs an async context always, set it up in the prep handler if we don't have one. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handlerJens Axboe
Add struct io_sr_msg in our io_kiocb per-command union, and ensure that the send/recvmsg prep handlers have grabbed what they need from the SQE by the time prep is done. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: move all prep state for IORING_OP_CONNECT to prep handlerJens Axboe
Add struct io_connect in our io_kiocb per-command union, and ensure that io_connect_prep() has grabbed what it needs from the SQE. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: add and use struct io_rw for read/writesJens Axboe
Put the kiocb in struct io_rw, and add the addr/len for the request as well. Use the kiocb->private field for the buffer index for fixed reads and writes. Any use of kiocb->ki_filp is flipped to req->file. It's the same thing, and less confusing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20xfs: Make the symbol 'xfs_rtalloc_log_count' staticChen Wandun
Fix the following sparse warning: fs/xfs/libxfs/xfs_trans_resv.c:206:1: warning: symbol 'xfs_rtalloc_log_count' was not declared. Should it be static? Fixes: b1de6fc7520f ("xfs: fix log reservation overflows when allocating large rt extents") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-20io_uring: use u64_to_user_ptr() consistentlyJens Axboe
We use it in some spots, but not consistently. Convert the rest over, makes it easier to read as well. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-19nfsd: remove nfs4_reset_lease() declarationsArnd Bergmann
The function was removed a long time ago, but the declaration and a dummy implementation are still there, referencing the deprecated time_t type. Remove both. Fixes: f958a1320ff7 ("nfsd4: remove unnecessary lease-setting function") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: use ktime_get_real_seconds() in nfs4_verifierArnd Bergmann
gen_confirm() generates a unique identifier based on the current time. This overflows in year 2038, but that is harmless since it generally does not lead to duplicates, as long as the time has been initialized by a real-time clock or NTP. Using ktime_get_boottime_seconds() or ktime_get_seconds() would avoid the overflow, but it would be more likely to result in non-unique numbers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: use boottime for lease expiry calculationArnd Bergmann
A couple of time_t variables are only used to track the state of the lease time and its expiration. The code correctly uses the 'time_after()' macro to make this work on 32-bit architectures even beyond year 2038, but the get_seconds() function and the time_t type itself are deprecated as they behave inconsistently between 32-bit and 64-bit architectures and often lead to code that is not y2038 safe. As a minor issue, using get_seconds() leads to problems with concurrent settimeofday() or clock_settime() calls, in the worst case timeout never triggering after the time has been set backwards. Change nfsd to use time64_t and ktime_get_boottime_seconds() here. This is clearly excessive, as boottime by itself means we never go beyond 32 bits, but it does mean we handle this correctly and consistently without having to worry about corner cases and should be no more expensive than the previous implementation on 64-bit architectures. The max_cb_time() function gets changed in order to avoid an expensive 64-bit division operation, but as the lease time is at most one hour, there is no change in behavior. Also do the same for server-to-server copy expiration time. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [bfields@redhat.com: fix up copy expiration] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: fix jiffies/time_t mixup in LRU listArnd Bergmann
The nfsd4_blocked_lock->nbl_time timestamp is recorded in jiffies, but then compared to a CLOCK_REALTIME timestamp later on, which makes no sense. For consistency with the other timestamps, change this to use a time_t. This is a change in behavior, which may cause regressions, but the current code is not sensible. On a system with CONFIG_HZ=1000, the 'time_after((unsigned long)nbl->nbl_time, (unsigned long)cutoff))' check is false for roughly the first 18 days of uptime and then true for the next 49 days. Fixes: 7919d0a27f1e ("nfsd: add a LRU list for blocked locks") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: fix delay timer on 32-bit architecturesArnd Bergmann
The nfsd4_cb_layout_done() function takes a 'time_t' value, multiplied by NSEC_PER_SEC*2 to get a nanosecond value. This works fine on 64-bit architectures, but on 32-bit, any value over 1 second results in a signed integer overflow with unexpected results. Cast one input to a 64-bit type in order to produce the same result that we have on 64-bit architectures, regarless of the type of nfsd4_lease. Fixes: 6b9b21073d3b ("nfsd: give up on CB_LAYOUTRECALLs after two lease periods") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: use time64_t in nfsd_proc_setattr() checkArnd Bergmann
Change to time64_t and ktime_get_real_seconds() to make the logic work correctly on 32-bit architectures beyond 2038. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: pass a 64-bit guardtime to nfsd_setattr()Arnd Bergmann
Guardtime handling in nfs3 differs between 32-bit and 64-bit architectures, and uses the deprecated time_t type. Change it to using time64_t, which behaves the same way on 64-bit and 32-bit architectures, treating the number as an unsigned 32-bit entity with a range of year 1970 to 2106 consistently, and avoiding the y2038 overflow. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: make 'boot_time' 64-bit wideArnd Bergmann
The local boot time variable gets truncated to time_t at the moment, which can lead to slightly odd behavior on 32-bit architectures. Use ktime_get_real_seconds() instead of get_seconds() to always get a 64-bit result, and keep it that way wherever possible. It still gets truncated in a few places: - When assigning to cl_clientid.cl_boot, this is already documented and is only used as a unique identifier. - In clients_still_reclaiming(), the truncation is to 'unsigned long' in order to use the 'time_before() helper. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: use timespec64 in encode_time_deltaArnd Bergmann
The values in encode_time_delta are always small and don't overflow the range of 'struct timespec', so changing it has no effect. Change it to timespec64 as a prerequisite for removing the timespec definition later. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: handle nfs3 timestamps as unsignedArnd Bergmann
The decode_time3 function behaves differently on 32-bit and 64-bit architectures: on the former, a 32-bit timestamp gets converted into an signed number and then into a timestamp between 1902 and 2038, while on the latter it is interpreted as unsigned in the range 1970-2106. Change all the remaining 'timespec' in nfsd to 'timespec64' to make the behavior the same, and use the current interpretation of the dominant 64-bit architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: print 64-bit timestamps in client_info_showArnd Bergmann
The nii_time field gets truncated to 'time_t' on 32-bit architectures before printing. Remove the use of 'struct timespec' to product the correct output beyond 2038. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: use ktime_get_seconds() for timestampsArnd Bergmann
The delegation logic in nfsd uses the somewhat inefficient seconds_since_boot() function to record time intervals. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: remove unnecessary assertion in nfsd4_encode_replayAditya Pakki
The replay variable is set in the only caller of nfsd4_encode_replay. The assertion is unnecessary and the patch removes this check. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd: Clone should commit src file metadata tooTrond Myklebust
vfs_clone_file_range() can modify the metadata on the source file too, so we need to commit that to stable storage as well. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Acked-by: Dave Chinner <david@fromorbit.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-19nfsd4: Remove unneeded semicolonzhengbin
Fixes coccicheck warning: fs/nfsd/nfs4state.c:3376:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>