summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-09-02afs: Support RCU pathwalkDavid Howells
Make afs_permission() and afs_d_revalidate() do initial checks in RCU-mode pathwalk to reduce latency in pathwalk elements that get done multiple times. We don't need to query the server unless we've received a notification from it that something has changed or the callback has expired. This requires that we can request a key and check permits under RCU conditions if we need to. Signed-off-by: David Howells <dhowells@redhat.com>
2019-09-02afs: Provide an RCU-capable key lookupDavid Howells
Provide an RCU-capable key lookup function. We don't want to call afs_request_key() in RCU-mode pathwalk as request_key() might sleep, even if we don't ask it to construct anything as it might find a key that is currently undergoing construction. Signed-off-by: David Howells <dhowells@redhat.com>
2019-09-02afs: Use afs_extract_discard() rather than iov_iter_discard()David Howells
Use afs_extract_discard() rather than iov_iter_discard() as the former is a wrapper for the latter, providing a place to put tracepoints. Signed-off-by: David Howells <dhowells@redhat.com>
2019-09-02afs: remove unused variable 'afs_zero_fid'YueHaibing
fs/afs/fsclient.c:18:29: warning: afs_zero_fid defined but not used [-Wunused-const-variable=] It is never used since commit 025db80c9e42 ("afs: Trace the initiation and completion of client calls") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-09-02afs: remove unused variable 'afs_voltypes'YueHaibing
fs/afs/volume.c:15:26: warning: afs_voltypes defined but not used [-Wunused-const-variable=] It is not used since commit d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-09-02cuse: fix broken releaseMiklos Szeredi
The inode parameter in cuse_release() is likely *not* a fuse inode. It's a small wonder it didn't blow up until now. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-02fuse: cleanup fuse_wait_on_page_writebackMaxim Patlasov
fuse_wait_on_page_writeback() always returns zero and nobody cares. Let's make it void. Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-02fuse: require /dev/fuse reads to have enough buffer capacity (take 2)Kirill Smelkov
[ This retries commit d4b13963f217 ("fuse: require /dev/fuse reads to have enough buffer capacity"), which was reverted. In this version we require only `sizeof(fuse_in_header) + sizeof(fuse_write_in)` instead of 4K for FUSE request header room, because, contrary to libfuse and kernel client behaviour, GlusterFS actually provides only so much room for request header. ] A FUSE filesystem server queues /dev/fuse sys_read calls to get filesystem requests to handle. It does not know in advance what would be that request as it can be anything that client issues - LOOKUP, READ, WRITE, ... Many requests are short and retrieve data from the filesystem. However WRITE and NOTIFY_REPLY write data into filesystem. Before getting into operation phase, FUSE filesystem server and kernel client negotiate what should be the maximum write size the client will ever issue. After negotiation the contract in between server/client is that the filesystem server then should queue /dev/fuse sys_read calls with enough buffer capacity to receive any client request - WRITE in particular, while FUSE client should not, in particular, send WRITE requests with > negotiated max_write payload. FUSE client in kernel and libfuse historically reserve 4K for request header. However an existing filesystem server - GlusterFS - was found which reserves only 80 bytes for header room (= `sizeof(fuse_in_header) + sizeof(fuse_write_in)`). Since `sizeof(fuse_in_header) + sizeof(fuse_write_in)` == `sizeof(fuse_in_header) + sizeof(fuse_read_in)` == `sizeof(fuse_in_header) + sizeof(fuse_notify_retrieve_in)` is the absolute minimum any sane filesystem should be using for header room, the contract is that filesystem server should queue sys_reads with `sizeof(fuse_in_header) + sizeof(fuse_write_in)` + max_write buffer. If the filesystem server does not follow this contract, what can happen is that fuse_dev_do_read will see that request size is > buffer size, and then it will return EIO to client who issued the request but won't indicate in any way that there is a problem to filesystem server. This can be hard to diagnose because for some requests, e.g. for NOTIFY_REPLY which mimics WRITE, there is no client thread that is waiting for request completion and that EIO goes nowhere, while on filesystem server side things look like the kernel is not replying back after successful NOTIFY_RETRIEVE request made by the server. We can make the problem easy to diagnose if we indicate via error return to filesystem server when it is violating the contract. This should not practically cause problems because if a filesystem server is using shorter buffer, writes to it were already very likely to cause EIO, and if the filesystem is read-only it should be too following FUSE_MIN_READ_BUFFER minimum buffer size. Please see [1] for context where the problem of stuck filesystem was hit for real (because kernel client was incorrectly sending more than max_write data with NOTIFY_REPLY; see also previous patch), how the situation was traced and for more involving patch that did not make it into the tree. [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Cc: Han-Wen Nienhuys <hanwen@google.com> Cc: Jakob Unterwurzacher <jakobunt@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-08-31ext4 crypto: fix to check feature status before get policyChao Yu
When getting fscrypt policy via EXT4_IOC_GET_ENCRYPTION_POLICY, if encryption feature is off, it's better to return EOPNOTSUPP instead of ENODATA, so let's add ext4_has_feature_encrypt() to do the check for that. This makes it so that all fscrypt ioctls consistently check for the encryption feature, and makes ext4 consistent with f2fs in this regard. Signed-off-by: Chao Yu <yuchao0@huawei.com> [EB - removed unneeded braces, updated the documentation, and added more explanation to commit message] Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-08-30xfs: remove the unused XFS_ALLOC_USERDATA flagChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: cleanup xfs_fsb_to_dbChristoph Hellwig
This function isn't a macro anymore, so remove various superflous braces, and explicit cast that is done implicitly due to the return value, use a normal if statement instead of trying to squeeze everything together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: fix the dax supported check in xfs_ioctl_setattr_dax_invalidateChristoph Hellwig
Setting the DAX flag on the directory of a file system that is not on a DAX capable device makes as little sense as setting it on a regular file on the same file system. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: Fix stale data exposure when readahead races with hole punchJan Kara
Hole puching currently evicts pages from page cache and then goes on to remove blocks from the inode. This happens under both XFS_IOLOCK_EXCL and XFS_MMAPLOCK_EXCL which provides appropriate serialization with racing reads or page faults. However there is currently nothing that prevents readahead triggered by fadvise() or madvise() from racing with the hole punch and instantiating page cache page after hole punching has evicted page cache in xfs_flush_unmap_range() but before it has removed blocks from the inode. This page cache page will be mapping soon to be freed block and that can lead to returning stale data to userspace or even filesystem corruption. Fix the problem by protecting handling of readahead requests by XFS_IOLOCK_SHARED similarly as we protect reads. CC: stable@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com/ Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: allocate xattr buffer on demandDave Chinner
When doing file lookups and checking for permissions, we end up in xfs_get_acl() to see if there are any ACLs on the inode. This requires and xattr lookup, and to do that we have to supply a buffer large enough to hold an maximum sized xattr. On workloads were we are accessing a wide range of cache cold files under memory pressure (e.g. NFS fileservers) we end up spending a lot of time allocating the buffer. The buffer is 64k in length, so is a contiguous multi-page allocation, and if that then fails we fall back to vmalloc(). Hence the allocation here is /expensive/ when we are looking up hundreds of thousands of files a second. Initial numbers from a bpf trace show average time in xfs_get_acl() is ~32us, with ~19us of that in the memory allocation. Note these are average times, so there are going to be affected by the worst case allocations more than the common fast case... To avoid this, we could just do a "null" lookup to see if the ACL xattr exists and then only do the allocation if it exists. This, however, optimises the path for the "no ACL present" case at the expense of the "acl present" case. i.e. we can halve the time in xfs_get_acl() for the no acl case (i.e down to ~10-15us), but that then increases the ACL case by 30% (i.e. up to 40-45us). To solve this and speed up both cases, drive the xattr buffer allocation into the attribute code once we know what the actual xattr length is. For the no-xattr case, we avoid the allocation completely, speeding up that case. For the common ACL case, we'll end up with a fast heap allocation (because it'll be smaller than a page), and only for the rarer "we have a remote xattr" will we have a multi-page allocation occur. Hence the common ACL case will be much faster, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: consolidate attribute value copyingDave Chinner
The same code is used to copy do the attribute copying in three different places. Consolidate them into a single function in preparation from on-demand buffer allocation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: move remote attr retrieval into xfs_attr3_leaf_getvalueDave Chinner
Because we repeat exactly the same code to get the remote attribute value after both calls to xfs_attr3_leaf_getvalue() if it's a remote attr. Just do it in xfs_attr3_leaf_getvalue() so the callers don't have to care about it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: remove unnecessary indenting from xfs_attr3_leaf_getvalueDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: make attr lookup returns consistentDave Chinner
Shortform, leaf and remote value attr value retrieval return different values for success. This makes it more complex to handle actual errors xfs_attr_get() as some errors mean success and some mean failure. Make the return values consistent for success and failure consistent for all attribute formats. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: reverse search directory freespace indexesDave Chinner
When a directory is growing rapidly, new blocks tend to get added at the end of the directory. These end up at the end of the freespace index, and when the directory gets large finding these new freespaces gets expensive. The code does a linear search across the frespace index from the first block in the directory to the last, hence meaning the newly added space is the last index searched. Instead, do a reverse order index search, starting from the last block and index in the freespace index. This makes most lookups for free space on rapidly growing directories O(1) instead of O(N), but should not have any impact on random insert workloads because the average search length is the same regardless of which end of the array we start at. The result is a major improvement in large directory grow rates: create time(sec) / rate (files/s) File count vanilla Prev commit Patched 10k 0.41 / 24.3k 0.42 / 23.8k 0.41 / 24.3k 20k 0.74 / 27.0k 0.76 / 26.3k 0.75 / 26.7k 100k 3.81 / 26.4k 3.47 / 28.8k 3.27 / 30.6k 200k 8.58 / 23.3k 7.19 / 27.8k 6.71 / 29.8k 1M 85.69 / 11.7k 48.53 / 20.6k 37.67 / 26.5k 2M 280.31 / 7.1k 130.14 / 15.3k 79.55 / 25.2k 10M 3913.26 / 2.5k 552.89 / 18.1k Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: speed up directory bestfree block scanningDave Chinner
When running a "create millions inodes in a directory" test recently, I noticed we were spending a huge amount of time converting freespace block headers from disk format to in-memory format: 31.47% [kernel] [k] xfs_dir2_node_addname 17.86% [kernel] [k] xfs_dir3_free_hdr_from_disk 3.55% [kernel] [k] xfs_dir3_free_bests_p We shouldn't be hitting the best free block scanning code so hard when doing sequential directory creates, and it turns out there's a highly suboptimal loop searching the the best free array in the freespace block - it decodes the block header before checking each entry inside a loop, instead of decoding the header once before running the entry search loop. This makes a massive difference to create rates. Profile now looks like this: 13.15% [kernel] [k] xfs_dir2_node_addname 3.52% [kernel] [k] xfs_dir3_leaf_check_int 3.11% [kernel] [k] xfs_log_commit_cil And the wall time/average file create rate differences are just as stark: create time(sec) / rate (files/s) File count vanilla patched 10k 0.41 / 24.3k 0.42 / 23.8k 20k 0.74 / 27.0k 0.76 / 26.3k 100k 3.81 / 26.4k 3.47 / 28.8k 200k 8.58 / 23.3k 7.19 / 27.8k 1M 85.69 / 11.7k 48.53 / 20.6k 2M 280.31 / 7.1k 130.14 / 15.3k The larger the directory, the bigger the performance improvement. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: factor free block index lookup from xfs_dir2_node_addname_int()Dave Chinner
Simplify the logic in xfs_dir2_node_addname_int() by factoring out the free block index lookup code that finds a block with enough free space for the entry to be added. The code that is moved gets a major cleanup at the same time, but there is no algorithm change here. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: factor data block addition from xfs_dir2_node_addname_int()Dave Chinner
Factor out the code that adds a data block to a directory from xfs_dir2_node_addname_int(). This makes the code flow cleaner and more obvious and provides clear isolation of upcoming optimsations. Signed-off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: move xfs_dir2_addname()Dave Chinner
This gets rid of the need for a forward declaration of the static function xfs_dir2_addname_int() and readies the code for factoring of xfs_dir2_addname_int(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-30xfs: remove all *_ITER_CONTINUE valuesDarrick J. Wong
Iterator functions already use 0 to signal "continue iterating", so get rid of the #defines and just do it directly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-08-30fs/namei.c: new helper - legitimize_root()Al Viro
identical logics in unlazy_walk() and unlazy_child() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-08-30kill the last users of user_{path,lpath,path_dir}()Al Viro
old wrappers with few callers remaining; put them out of their misery... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-08-30kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.hAl Viro
The former has no users left; the latter was only to get LOOKUP_... values to remapper in audit_inode() and that's an ex-parrot now. All places that use symbols from namei.h include it either directly or (in a few cases) via a local header, like fs/autofs/autofs_i.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-08-30[PATCH] fix d_absolute_path() interplay with fsmount()Al Viro
stuff in anon namespace should be treated as unattached. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-08-30isofs: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Reference: http://www.ecma-international.org/publications/standards/Ecma-119.htm Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30pstore: fs superblock limitsDeepa Dinamani
Leaving granularity at 1ns because it is dependent on the specific attached backing pstore module. ramoops has microsecond resolution. Fix the readback of ramoops fractional timestamp microseconds, which has incorrectly been reporting the value as nanoseconds. Fixes: 3f8f80f0cfeb ("pstore/ram: Read and write to the 'compressed' flag of pstore"). Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: anton@enomsg.org Cc: ccross@android.com Cc: keescook@chromium.org Cc: tony.luck@intel.com
2019-08-30fs: omfs: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Bob Copeland <me@bobcopeland.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: me@bobcopeland.com Cc: linux-karma-devel@lists.sourceforge.net
2019-08-30fs: hpfs: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also change the local_to_gmt() to use time64_t instead of time32_t. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: mikulas@artax.karlin.mff.cuni.cz
2019-08-30fs: ceph: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. According to the disscussion in https://patchwork.kernel.org/patch/8308691/ we agreed to use unsigned 32 bit timestamps on ceph. Update the limits accordingly. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: zyan@redhat.com Cc: sage@redhat.com Cc: idryomov@gmail.com Cc: ceph-devel@vger.kernel.org
2019-08-30fs: sysv: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: hch@infradead.org
2019-08-30fs: affs: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also fix timestamp calculation to avoid overflow while converting from days to seconds. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: dsterba@suse.com
2019-08-30fs: fat: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Some FAT variants indicate that the years after 2099 are not supported. Since commit 7decd1cb0305 ("fat: Fix and cleanup timestamp conversion") we support the full range of years that can be represented, up to 2107. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: hirofumi@mail.parknet.co.jp
2019-08-30fs: cifs: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also fixed cnvrtDosUnixTm calculations to avoid int overflow while computing maximum date. References: http://cifs.com/ https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cifs/d416ff7c-c536-406e-a951-4f04b2fd1d2b Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: sfrench@samba.org Cc: linux-cifs@vger.kernel.org
2019-08-30fs: nfs: Initialize filesystem timestamp rangesDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. The time formats for various verious is detailed in the RFCs as below: https://tools.ietf.org/html/rfc7862(time metadata) https://tools.ietf.org/html/rfc7530: nfstime4 struct nfstime4 { int64_t seconds; uint32_t nseconds; }; https://tools.ietf.org/html/rfc1094 struct timeval { unsigned int seconds; unsigned int useconds; }; https://tools.ietf.org/html/rfc1813 struct nfstime3 { uint32 seconds; uint32 nseconds; }; Use the limits as per the RFC. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: trond.myklebust@hammerspace.com Cc: anna.schumaker@netapp.com Cc: linux-nfs@vger.kernel.org
2019-08-30ext4: Initialize timestamps limitsDeepa Dinamani
ext4 has different overflow limits for max filesystem timestamps based on the extra bytes available. The timestamp limits are calculated according to the encoding table in a4dad1ae24f85i(ext4: Fix handling of extended tv_sec): * extra msb of adjust for signed * epoch 32-bit 32-bit tv_sec to * bits time decoded 64-bit tv_sec 64-bit tv_sec valid time range * 0 0 1 -0x80000000..-0x00000001 0x000000000 1901-12-13..1969-12-31 * 0 0 0 0x000000000..0x07fffffff 0x000000000 1970-01-01..2038-01-19 * 0 1 1 0x080000000..0x0ffffffff 0x100000000 2038-01-19..2106-02-07 * 0 1 0 0x100000000..0x17fffffff 0x100000000 2106-02-07..2174-02-25 * 1 0 1 0x180000000..0x1ffffffff 0x200000000 2174-02-25..2242-03-16 * 1 0 0 0x200000000..0x27fffffff 0x200000000 2242-03-16..2310-04-04 * 1 1 1 0x280000000..0x2ffffffff 0x300000000 2310-04-04..2378-04-22 * 1 1 0 0x300000000..0x37fffffff 0x300000000 2378-04-22..2446-05-10 Note that the time limits are not correct for deletion times. Added a warn when an inode cannot be extended to incorporate an extended timestamp. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: tytso@mit.edu Cc: adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org
2019-08-309p: Fill min and max timestamps in sbDeepa Dinamani
struct p9_wstat and struct p9_stat_dotl indicate that the wire transport uses u32 and u64 fields for timestamps. Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Note that the upper bound for V9FS_PROTO_2000L is retained as S64_MAX. This is because that is the upper bound supported by vfs. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: ericvh@gmail.com Cc: lucho@ionkov.net Cc: asmadeus@codewreck.org Cc: v9fs-developer@lists.sourceforge.net
2019-08-30fs: Fill in max and min timestamps in superblockDeepa Dinamani
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Even though some filesystems are read-only, fill in the timestamps to reflect the on-disk representation. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-By: Tigran Aivazian <aivazian.tigran@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: aivazian.tigran@gmail.com Cc: al@alarsen.net Cc: coda@cs.cmu.edu Cc: darrick.wong@oracle.com Cc: dushistov@mail.ru Cc: dwmw2@infradead.org Cc: hch@infradead.org Cc: jack@suse.com Cc: jaharkes@cs.cmu.edu Cc: luisbg@kernel.org Cc: nico@fluxnic.net Cc: phillip@squashfs.org.uk Cc: richard@nod.at Cc: salah.triki@gmail.com Cc: shaggy@kernel.org Cc: linux-xfs@vger.kernel.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: reiserfs-devel@vger.kernel.org
2019-08-30utimes: Clamp the timestamps before updateDeepa Dinamani
POSIX is ambiguous on the behavior of timestamps for futimens, utimensat and utimes. Whether to return an error or silently clamp a timestamp beyond the range supported by the underlying filesystems is not clear. POSIX.1 section for futimens, utimensat and utimes says: (http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html) The file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the specified time. If the tv_nsec field of a timespec structure has the special value UTIME_NOW, the file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the current time. [EINVAL] A new file timestamp would be a value whose tv_sec component is not a value supported by the file system. The patch chooses to clamp the timestamps according to the filesystem timestamp ranges and does not return an error. This is in line with the behavior of utime syscall also since the POSIX page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html) for utime does not mention returning an error or clamping like above. Same for utimes http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30mount: Add mount warning for impending timestamp expiryDeepa Dinamani
The warning reuses the uptime max of 30 years used by settimeofday(). Note that the warning is only emitted for writable filesystem mounts through the mount syscall. Automounts do not have the same warning. Print out the warning in human readable format using the struct tm. After discussion with Arnd Bergmann, we chose to print only the year number. The raw s_time_max is also displayed, and the user can easily decode it e.g. "date -u -d @$((0x7fffffff))". We did not want to consolidate struct rtc_tm and struct tm just to print the date using a format specifier as part of this series. Given that the rtc_tm is not compiled on all architectures, this is not a trivial patch. This can be added in the future. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30timestamp_truncate: Replace users of timespec64_truncDeepa Dinamani
Update the inode timestamp updates to use timestamp_truncate() instead of timespec64_trunc(). The change was mostly generated by the following coccinelle script. virtual context virtual patch @r1 depends on patch forall@ struct inode *inode; identifier i_xtime =~ "^i_[acm]time$"; expression e; @@ inode->i_xtime = - timespec64_trunc( + timestamp_truncate( ..., - e); + inode); Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: adrian.hunter@intel.com Cc: dedekind1@gmail.com Cc: gregkh@linuxfoundation.org Cc: hch@lst.de Cc: jaegeuk@kernel.org Cc: jlbec@evilplan.org Cc: richard@nod.at Cc: tj@kernel.org Cc: yuchao0@huawei.com Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-ntfs-dev@lists.sourceforge.net Cc: linux-mtd@lists.infradead.org
2019-08-30vfs: Add timestamp_truncate() apiDeepa Dinamani
timespec_trunc() function is used to truncate a filesystem timestamp to the right granularity. But, the function does not clamp tv_sec part of the timestamps according to the filesystem timestamp limits. The replacement api: timestamp_truncate() also alters the signature of the function to accommodate filesystem timestamp clamping according to flesystem limits. Note that the tv_nsec part is set to 0 if tv_sec is not within the range supported for the filesystem. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30vfs: Add file timestamp range supportDeepa Dinamani
Add fields to the superblock to track the min and max timestamps supported by filesystems. Initially, when a superblock is allocated, initialize it to the max and min values the fields can hold. Individual filesystems override these to match their actual limits. Pseudo filesystems are assumed to always support the min and max allowable values for the fields. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2019-08-30writeback: add tracepoints for cgroup foreign writebacksTejun Heo
cgroup foreign inode handling has quite a bit of heuristics and internal states which sometimes makes it difficult to understand what's going on. Add tracepoints to improve visibility. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-30erofs: reduntant assignment in __erofs_get_meta_page()Gao Xiang
As Joe Perches suggested [1], err = bio_add_page(bio, page, PAGE_SIZE, 0); - if (unlikely(err != PAGE_SIZE)) { + if (err != PAGE_SIZE) { err = -EFAULT; goto err_out; } The initial assignment to err is odd as it's not actually an error value -E<FOO> but a int size from a unsigned int len. Here the return is either 0 or PAGE_SIZE. This would be more legible to me as: if (bio_add_page(bio, page, PAGE_SIZE, 0) != PAGE_SIZE) { err = -EFAULT; goto err_out; } [1] https://lore.kernel.org/r/74c4784319b40deabfbaea92468f7e3ef44f1c96.camel@perches.com/ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190829171741.225219-1-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-30erofs: remove all likely/unlikely annotationsGao Xiang
As Dan Carpenter suggested [1], I have to remove all erofs likely/unlikely annotations. [1] https://lore.kernel.org/linux-fsdevel/20190829154346.GK23584@kadam/ Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190829163827.203274-1-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-29xfs: remove all *_ITER_ABORT valuesDarrick J. Wong
Use -ECANCELED to signal "stop iterating" instead of these magical *_ITER_ABORT values, since it's duplicative. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>