Age | Commit message (Collapse) | Author |
|
Make afs_permission() and afs_d_revalidate() do initial checks in RCU-mode
pathwalk to reduce latency in pathwalk elements that get done multiple
times. We don't need to query the server unless we've received a
notification from it that something has changed or the callback has
expired.
This requires that we can request a key and check permits under RCU
conditions if we need to.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Provide an RCU-capable key lookup function. We don't want to call
afs_request_key() in RCU-mode pathwalk as request_key() might sleep, even if
we don't ask it to construct anything as it might find a key that is currently
undergoing construction.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Use afs_extract_discard() rather than iov_iter_discard() as the former is a
wrapper for the latter, providing a place to put tracepoints.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
fs/afs/fsclient.c:18:29: warning:
afs_zero_fid defined but not used [-Wunused-const-variable=]
It is never used since commit 025db80c9e42 ("afs: Trace
the initiation and completion of client calls")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
fs/afs/volume.c:15:26: warning:
afs_voltypes defined but not used [-Wunused-const-variable=]
It is not used since commit d2ddc776a458 ("afs: Overhaul
volume and server record caching and fileserver rotation")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
The inode parameter in cuse_release() is likely *not* a fuse inode. It's a
small wonder it didn't blow up until now.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_wait_on_page_writeback() always returns zero and nobody cares.
Let's make it void.
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
[ This retries commit d4b13963f217 ("fuse: require /dev/fuse reads to have
enough buffer capacity"), which was reverted. In this version we require
only `sizeof(fuse_in_header) + sizeof(fuse_write_in)` instead of 4K for
FUSE request header room, because, contrary to libfuse and kernel client
behaviour, GlusterFS actually provides only so much room for request
header. ]
A FUSE filesystem server queues /dev/fuse sys_read calls to get filesystem
requests to handle. It does not know in advance what would be that request
as it can be anything that client issues - LOOKUP, READ, WRITE, ... Many
requests are short and retrieve data from the filesystem. However WRITE and
NOTIFY_REPLY write data into filesystem.
Before getting into operation phase, FUSE filesystem server and kernel
client negotiate what should be the maximum write size the client will ever
issue. After negotiation the contract in between server/client is that the
filesystem server then should queue /dev/fuse sys_read calls with enough
buffer capacity to receive any client request - WRITE in particular, while
FUSE client should not, in particular, send WRITE requests with >
negotiated max_write payload. FUSE client in kernel and libfuse
historically reserve 4K for request header. However an existing filesystem
server - GlusterFS - was found which reserves only 80 bytes for header room
(= `sizeof(fuse_in_header) + sizeof(fuse_write_in)`).
Since
`sizeof(fuse_in_header) + sizeof(fuse_write_in)` ==
`sizeof(fuse_in_header) + sizeof(fuse_read_in)` ==
`sizeof(fuse_in_header) + sizeof(fuse_notify_retrieve_in)`
is the absolute minimum any sane filesystem should be using for header
room, the contract is that filesystem server should queue sys_reads with
`sizeof(fuse_in_header) + sizeof(fuse_write_in)` + max_write buffer.
If the filesystem server does not follow this contract, what can happen
is that fuse_dev_do_read will see that request size is > buffer size,
and then it will return EIO to client who issued the request but won't
indicate in any way that there is a problem to filesystem server.
This can be hard to diagnose because for some requests, e.g. for
NOTIFY_REPLY which mimics WRITE, there is no client thread that is
waiting for request completion and that EIO goes nowhere, while on
filesystem server side things look like the kernel is not replying back
after successful NOTIFY_RETRIEVE request made by the server.
We can make the problem easy to diagnose if we indicate via error return to
filesystem server when it is violating the contract. This should not
practically cause problems because if a filesystem server is using shorter
buffer, writes to it were already very likely to cause EIO, and if the
filesystem is read-only it should be too following FUSE_MIN_READ_BUFFER
minimum buffer size.
Please see [1] for context where the problem of stuck filesystem was hit
for real (because kernel client was incorrectly sending more than
max_write data with NOTIFY_REPLY; see also previous patch), how the
situation was traced and for more involving patch that did not make it
into the tree.
[1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
When getting fscrypt policy via EXT4_IOC_GET_ENCRYPTION_POLICY, if
encryption feature is off, it's better to return EOPNOTSUPP instead of
ENODATA, so let's add ext4_has_feature_encrypt() to do the check for
that.
This makes it so that all fscrypt ioctls consistently check for the
encryption feature, and makes ext4 consistent with f2fs in this regard.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[EB - removed unneeded braces, updated the documentation, and
added more explanation to commit message]
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
This function isn't a macro anymore, so remove various superflous braces,
and explicit cast that is done implicitly due to the return value, use
a normal if statement instead of trying to squeeze everything together.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Setting the DAX flag on the directory of a file system that is not on a
DAX capable device makes as little sense as setting it on a regular file
on the same file system.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Hole puching currently evicts pages from page cache and then goes on to
remove blocks from the inode. This happens under both XFS_IOLOCK_EXCL
and XFS_MMAPLOCK_EXCL which provides appropriate serialization with
racing reads or page faults. However there is currently nothing that
prevents readahead triggered by fadvise() or madvise() from racing with
the hole punch and instantiating page cache page after hole punching has
evicted page cache in xfs_flush_unmap_range() but before it has removed
blocks from the inode. This page cache page will be mapping soon to be
freed block and that can lead to returning stale data to userspace or
even filesystem corruption.
Fix the problem by protecting handling of readahead requests by
XFS_IOLOCK_SHARED similarly as we protect reads.
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com/
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
When doing file lookups and checking for permissions, we end up in
xfs_get_acl() to see if there are any ACLs on the inode. This
requires and xattr lookup, and to do that we have to supply a buffer
large enough to hold an maximum sized xattr.
On workloads were we are accessing a wide range of cache cold files
under memory pressure (e.g. NFS fileservers) we end up spending a
lot of time allocating the buffer. The buffer is 64k in length, so
is a contiguous multi-page allocation, and if that then fails we
fall back to vmalloc(). Hence the allocation here is /expensive/
when we are looking up hundreds of thousands of files a second.
Initial numbers from a bpf trace show average time in xfs_get_acl()
is ~32us, with ~19us of that in the memory allocation. Note these
are average times, so there are going to be affected by the worst
case allocations more than the common fast case...
To avoid this, we could just do a "null" lookup to see if the ACL
xattr exists and then only do the allocation if it exists. This,
however, optimises the path for the "no ACL present" case at the
expense of the "acl present" case. i.e. we can halve the time in
xfs_get_acl() for the no acl case (i.e down to ~10-15us), but that
then increases the ACL case by 30% (i.e. up to 40-45us).
To solve this and speed up both cases, drive the xattr buffer
allocation into the attribute code once we know what the actual
xattr length is. For the no-xattr case, we avoid the allocation
completely, speeding up that case. For the common ACL case, we'll
end up with a fast heap allocation (because it'll be smaller than a
page), and only for the rarer "we have a remote xattr" will we have
a multi-page allocation occur. Hence the common ACL case will be
much faster, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
The same code is used to copy do the attribute copying in three
different places. Consolidate them into a single function in
preparation from on-demand buffer allocation.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Because we repeat exactly the same code to get the remote attribute
value after both calls to xfs_attr3_leaf_getvalue() if it's a remote
attr. Just do it in xfs_attr3_leaf_getvalue() so the callers don't
have to care about it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Shortform, leaf and remote value attr value retrieval return
different values for success. This makes it more complex to handle
actual errors xfs_attr_get() as some errors mean success and some
mean failure. Make the return values consistent for success and
failure consistent for all attribute formats.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
When a directory is growing rapidly, new blocks tend to get added at
the end of the directory. These end up at the end of the freespace
index, and when the directory gets large finding these new
freespaces gets expensive. The code does a linear search across the
frespace index from the first block in the directory to the last,
hence meaning the newly added space is the last index searched.
Instead, do a reverse order index search, starting from the last
block and index in the freespace index. This makes most lookups for
free space on rapidly growing directories O(1) instead of O(N), but
should not have any impact on random insert workloads because the
average search length is the same regardless of which end of the
array we start at.
The result is a major improvement in large directory grow rates:
create time(sec) / rate (files/s)
File count vanilla Prev commit Patched
10k 0.41 / 24.3k 0.42 / 23.8k 0.41 / 24.3k
20k 0.74 / 27.0k 0.76 / 26.3k 0.75 / 26.7k
100k 3.81 / 26.4k 3.47 / 28.8k 3.27 / 30.6k
200k 8.58 / 23.3k 7.19 / 27.8k 6.71 / 29.8k
1M 85.69 / 11.7k 48.53 / 20.6k 37.67 / 26.5k
2M 280.31 / 7.1k 130.14 / 15.3k 79.55 / 25.2k
10M 3913.26 / 2.5k 552.89 / 18.1k
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
When running a "create millions inodes in a directory" test
recently, I noticed we were spending a huge amount of time
converting freespace block headers from disk format to in-memory
format:
31.47% [kernel] [k] xfs_dir2_node_addname
17.86% [kernel] [k] xfs_dir3_free_hdr_from_disk
3.55% [kernel] [k] xfs_dir3_free_bests_p
We shouldn't be hitting the best free block scanning code so hard
when doing sequential directory creates, and it turns out there's
a highly suboptimal loop searching the the best free array in
the freespace block - it decodes the block header before checking
each entry inside a loop, instead of decoding the header once before
running the entry search loop.
This makes a massive difference to create rates. Profile now looks
like this:
13.15% [kernel] [k] xfs_dir2_node_addname
3.52% [kernel] [k] xfs_dir3_leaf_check_int
3.11% [kernel] [k] xfs_log_commit_cil
And the wall time/average file create rate differences are
just as stark:
create time(sec) / rate (files/s)
File count vanilla patched
10k 0.41 / 24.3k 0.42 / 23.8k
20k 0.74 / 27.0k 0.76 / 26.3k
100k 3.81 / 26.4k 3.47 / 28.8k
200k 8.58 / 23.3k 7.19 / 27.8k
1M 85.69 / 11.7k 48.53 / 20.6k
2M 280.31 / 7.1k 130.14 / 15.3k
The larger the directory, the bigger the performance improvement.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Simplify the logic in xfs_dir2_node_addname_int() by factoring out
the free block index lookup code that finds a block with enough free
space for the entry to be added. The code that is moved gets a major
cleanup at the same time, but there is no algorithm change here.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Factor out the code that adds a data block to a directory from
xfs_dir2_node_addname_int(). This makes the code flow cleaner and
more obvious and provides clear isolation of upcoming optimsations.
Signed-off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
This gets rid of the need for a forward declaration of the static
function xfs_dir2_addname_int() and readies the code for factoring
of xfs_dir2_addname_int().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Iterator functions already use 0 to signal "continue iterating", so get
rid of the #defines and just do it directly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
identical logics in unlazy_walk() and unlazy_child()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
old wrappers with few callers remaining; put them out of their misery...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The former has no users left; the latter was only to get LOOKUP_...
values to remapper in audit_inode() and that's an ex-parrot now.
All places that use symbols from namei.h include it either directly
or (in a few cases) via a local header, like fs/autofs/autofs_i.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
stuff in anon namespace should be treated as unattached.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Reference: http://www.ecma-international.org/publications/standards/Ecma-119.htm
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
|
|
Leaving granularity at 1ns because it is dependent on the specific
attached backing pstore module. ramoops has microsecond resolution.
Fix the readback of ramoops fractional timestamp microseconds,
which has incorrectly been reporting the value as nanoseconds.
Fixes: 3f8f80f0cfeb ("pstore/ram: Read and write to the 'compressed' flag of pstore").
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: anton@enomsg.org
Cc: ccross@android.com
Cc: keescook@chromium.org
Cc: tony.luck@intel.com
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: me@bobcopeland.com
Cc: linux-karma-devel@lists.sourceforge.net
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Also change the local_to_gmt() to use time64_t instead
of time32_t.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: mikulas@artax.karlin.mff.cuni.cz
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
According to the disscussion in
https://patchwork.kernel.org/patch/8308691/ we agreed to use
unsigned 32 bit timestamps on ceph.
Update the limits accordingly.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: zyan@redhat.com
Cc: sage@redhat.com
Cc: idryomov@gmail.com
Cc: ceph-devel@vger.kernel.org
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: hch@infradead.org
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Also fix timestamp calculation to avoid overflow
while converting from days to seconds.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: dsterba@suse.com
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Some FAT variants indicate that the years after 2099 are not supported.
Since commit 7decd1cb0305 ("fat: Fix and cleanup timestamp conversion")
we support the full range of years that can be represented, up to 2107.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: hirofumi@mail.parknet.co.jp
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Also fixed cnvrtDosUnixTm calculations to avoid int overflow
while computing maximum date.
References:
http://cifs.com/
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cifs/d416ff7c-c536-406e-a951-4f04b2fd1d2b
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: sfrench@samba.org
Cc: linux-cifs@vger.kernel.org
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
The time formats for various verious is detailed in the
RFCs as below:
https://tools.ietf.org/html/rfc7862(time metadata)
https://tools.ietf.org/html/rfc7530:
nfstime4
struct nfstime4 {
int64_t seconds;
uint32_t nseconds;
};
https://tools.ietf.org/html/rfc1094
struct timeval {
unsigned int seconds;
unsigned int useconds;
};
https://tools.ietf.org/html/rfc1813
struct nfstime3 {
uint32 seconds;
uint32 nseconds;
};
Use the limits as per the RFC.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: trond.myklebust@hammerspace.com
Cc: anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
|
|
ext4 has different overflow limits for max filesystem
timestamps based on the extra bytes available.
The timestamp limits are calculated according to the
encoding table in
a4dad1ae24f85i(ext4: Fix handling of extended tv_sec):
* extra msb of adjust for signed
* epoch 32-bit 32-bit tv_sec to
* bits time decoded 64-bit tv_sec 64-bit tv_sec valid time range
* 0 0 1 -0x80000000..-0x00000001 0x000000000 1901-12-13..1969-12-31
* 0 0 0 0x000000000..0x07fffffff 0x000000000 1970-01-01..2038-01-19
* 0 1 1 0x080000000..0x0ffffffff 0x100000000 2038-01-19..2106-02-07
* 0 1 0 0x100000000..0x17fffffff 0x100000000 2106-02-07..2174-02-25
* 1 0 1 0x180000000..0x1ffffffff 0x200000000 2174-02-25..2242-03-16
* 1 0 0 0x200000000..0x27fffffff 0x200000000 2242-03-16..2310-04-04
* 1 1 1 0x280000000..0x2ffffffff 0x300000000 2310-04-04..2378-04-22
* 1 1 0 0x300000000..0x37fffffff 0x300000000 2378-04-22..2446-05-10
Note that the time limits are not correct for deletion times.
Added a warn when an inode cannot be extended to incorporate an
extended timestamp.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: tytso@mit.edu
Cc: adilger.kernel@dilger.ca
Cc: linux-ext4@vger.kernel.org
|
|
struct p9_wstat and struct p9_stat_dotl indicate that the
wire transport uses u32 and u64 fields for timestamps.
Fill in the appropriate limits to avoid inconsistencies in
the vfs cached inode times when timestamps are outside the
permitted range.
Note that the upper bound for V9FS_PROTO_2000L is retained as S64_MAX.
This is because that is the upper bound supported by vfs.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: ericvh@gmail.com
Cc: lucho@ionkov.net
Cc: asmadeus@codewreck.org
Cc: v9fs-developer@lists.sourceforge.net
|
|
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.
Even though some filesystems are read-only, fill in the
timestamps to reflect the on-disk representation.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-By: Tigran Aivazian <aivazian.tigran@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: aivazian.tigran@gmail.com
Cc: al@alarsen.net
Cc: coda@cs.cmu.edu
Cc: darrick.wong@oracle.com
Cc: dushistov@mail.ru
Cc: dwmw2@infradead.org
Cc: hch@infradead.org
Cc: jack@suse.com
Cc: jaharkes@cs.cmu.edu
Cc: luisbg@kernel.org
Cc: nico@fluxnic.net
Cc: phillip@squashfs.org.uk
Cc: richard@nod.at
Cc: salah.triki@gmail.com
Cc: shaggy@kernel.org
Cc: linux-xfs@vger.kernel.org
Cc: codalist@coda.cs.cmu.edu
Cc: linux-ext4@vger.kernel.org
Cc: linux-mtd@lists.infradead.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: reiserfs-devel@vger.kernel.org
|
|
POSIX is ambiguous on the behavior of timestamps for
futimens, utimensat and utimes. Whether to return an
error or silently clamp a timestamp beyond the range
supported by the underlying filesystems is not clear.
POSIX.1 section for futimens, utimensat and utimes says:
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html)
The file's relevant timestamp shall be set to the greatest
value supported by the file system that is not greater
than the specified time.
If the tv_nsec field of a timespec structure has the special
value UTIME_NOW, the file's relevant timestamp shall be set
to the greatest value supported by the file system that is
not greater than the current time.
[EINVAL]
A new file timestamp would be a value whose tv_sec
component is not a value supported by the file system.
The patch chooses to clamp the timestamps according to the
filesystem timestamp ranges and does not return an error.
This is in line with the behavior of utime syscall also
since the POSIX page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html)
for utime does not mention returning an error or clamping like above.
Same for utimes http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
|
|
The warning reuses the uptime max of 30 years used by
settimeofday().
Note that the warning is only emitted for writable filesystem mounts
through the mount syscall. Automounts do not have the same warning.
Print out the warning in human readable format using the struct tm.
After discussion with Arnd Bergmann, we chose to print only the year number.
The raw s_time_max is also displayed, and the user can easily decode
it e.g. "date -u -d @$((0x7fffffff))". We did not want to consolidate
struct rtc_tm and struct tm just to print the date using a format specifier
as part of this series.
Given that the rtc_tm is not compiled on all architectures, this is not a
trivial patch. This can be added in the future.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
|
|
Update the inode timestamp updates to use timestamp_truncate()
instead of timespec64_trunc().
The change was mostly generated by the following coccinelle
script.
virtual context
virtual patch
@r1 depends on patch forall@
struct inode *inode;
identifier i_xtime =~ "^i_[acm]time$";
expression e;
@@
inode->i_xtime =
- timespec64_trunc(
+ timestamp_truncate(
...,
- e);
+ inode);
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: adrian.hunter@intel.com
Cc: dedekind1@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: hch@lst.de
Cc: jaegeuk@kernel.org
Cc: jlbec@evilplan.org
Cc: richard@nod.at
Cc: tj@kernel.org
Cc: yuchao0@huawei.com
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: linux-mtd@lists.infradead.org
|
|
timespec_trunc() function is used to truncate a
filesystem timestamp to the right granularity.
But, the function does not clamp tv_sec part of the
timestamps according to the filesystem timestamp limits.
The replacement api: timestamp_truncate() also alters the
signature of the function to accommodate filesystem
timestamp clamping according to flesystem limits.
Note that the tv_nsec part is set to 0 if tv_sec is not within
the range supported for the filesystem.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
|
|
Add fields to the superblock to track the min and max
timestamps supported by filesystems.
Initially, when a superblock is allocated, initialize
it to the max and min values the fields can hold.
Individual filesystems override these to match their
actual limits.
Pseudo filesystems are assumed to always support the
min and max allowable values for the fields.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
|
|
cgroup foreign inode handling has quite a bit of heuristics and
internal states which sometimes makes it difficult to understand
what's going on. Add tracepoints to improve visibility.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As Joe Perches suggested [1],
err = bio_add_page(bio, page, PAGE_SIZE, 0);
- if (unlikely(err != PAGE_SIZE)) {
+ if (err != PAGE_SIZE) {
err = -EFAULT;
goto err_out;
}
The initial assignment to err is odd as it's not
actually an error value -E<FOO> but a int size
from a unsigned int len.
Here the return is either 0 or PAGE_SIZE.
This would be more legible to me as:
if (bio_add_page(bio, page, PAGE_SIZE, 0) != PAGE_SIZE) {
err = -EFAULT;
goto err_out;
}
[1] https://lore.kernel.org/r/74c4784319b40deabfbaea92468f7e3ef44f1c96.camel@perches.com/
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190829171741.225219-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
As Dan Carpenter suggested [1], I have to remove
all erofs likely/unlikely annotations.
[1] https://lore.kernel.org/linux-fsdevel/20190829154346.GK23584@kadam/
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190829163827.203274-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Use -ECANCELED to signal "stop iterating" instead of these magical
*_ITER_ABORT values, since it's duplicative.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|