Age | Commit message (Collapse) | Author |
|
Make exfat_find_location() static.
Signed-off-by: Christophe Vu-Brugier <christophe.vu-brugier@seagate.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Fix typos in comments.
Signed-off-by: Christophe Vu-Brugier <christophe.vu-brugier@seagate.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
Simplify is_valid_cluster().
Signed-off-by: Christophe Vu-Brugier <christophe.vu-brugier@seagate.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
The 9P2000.L setattr method v9fs_vfs_setattr_dotl() copies struct iattr
values without checking whether they are valid causing unitialized
values to be copied. The 9P2000 setattr method v9fs_vfs_setattr() method
gets this right. Check whether struct iattr fields are valid first
before copying in v9fs_vfs_setattr_dotl() too and make sure that all
other fields are set to 0 apart from {g,u}id which should be set to
INVALID_{G,U}ID. This ensure that they can be safely sent over the wire
or printed for debugging later on.
Link: https://lkml.kernel.org/r/20211129114434.3637938-1-brauner@kernel.org
Link: https://lkml.kernel.org/r/000000000000a0d53f05d1c72a4c%40google.com
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: stable@kernel.org
Cc: v9fs-developer@lists.sourceforge.net
Reported-by: syzbot+dfac92a50024b54acaa4@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
[Dominique: do not set a/mtime with just ATTR_A/MTIME as discussed]
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
|
|
This issue was detected with the help of Coccinelle.
Link: https://lkml.kernel.org/r/20211112092547.9153-1-zhang.mingyu@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
|
|
whiteout inode should be put when do_tmpfile() failed if inode has been
initialized. Otherwise we will get following warning during umount:
UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS
assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930
VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds.
Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Suggested-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Since 9ec64962afb1702f75b("ubifs: Implement RENAME_EXCHANGE") and
9e0a1fff8db56eaaebb("ubifs: Implement RENAME_WHITEOUT") are applied,
ubifs_rename locks and changes 4 ubifs inodes, correct the comment
for ui_mutex in ubifs_inode.
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Following hung tasks:
[ 77.028764] task:kworker/u8:4 state:D stack: 0 pid: 132
[ 77.028820] Call Trace:
[ 77.029027] schedule+0x8c/0x1b0
[ 77.029067] mutex_lock+0x50/0x60
[ 77.029074] ubifs_write_inode+0x68/0x1f0 [ubifs]
[ 77.029117] __writeback_single_inode+0x43c/0x570
[ 77.029128] writeback_sb_inodes+0x259/0x740
[ 77.029148] wb_writeback+0x107/0x4d0
[ 77.029163] wb_workfn+0x162/0x7b0
[ 92.390442] task:aa state:D stack: 0 pid: 1506
[ 92.390448] Call Trace:
[ 92.390458] schedule+0x8c/0x1b0
[ 92.390461] wb_wait_for_completion+0x82/0xd0
[ 92.390469] __writeback_inodes_sb_nr+0xb2/0x110
[ 92.390472] writeback_inodes_sb_nr+0x14/0x20
[ 92.390476] ubifs_budget_space+0x705/0xdd0 [ubifs]
[ 92.390503] do_rename.cold+0x7f/0x187 [ubifs]
[ 92.390549] ubifs_rename+0x8b/0x180 [ubifs]
[ 92.390571] vfs_rename+0xdb2/0x1170
[ 92.390580] do_renameat2+0x554/0x770
, are caused by concurrent rename whiteout and inode writeback processes:
rename_whiteout(Thread 1) wb_workfn(Thread2)
ubifs_rename
do_rename
lock_4_inodes (Hold ui_mutex)
ubifs_budget_space
make_free_space
shrink_liability
__writeback_inodes_sb_nr
bdi_split_work_to_wbs (Queue new wb work)
wb_do_writeback(wb work)
__writeback_single_inode
ubifs_write_inode
LOCK(ui_mutex)
↑
wb_wait_for_completion (Wait wb work) <-- deadlock!
Reproducer (Detail program in [Link]):
1. SYS_renameat2("/mp/dir/file", "/mp/dir/whiteout", RENAME_WHITEOUT)
2. Consume out of space before kernel(mdelay) doing budget for whiteout
Fix it by doing whiteout space budget before locking ubifs inodes.
BTW, it also fixes wrong goto tag 'out_release' in whiteout budget
error handling path(It should at least recover dir i_size and unlock
4 ubifs inodes).
Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214733
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
'whiteout_ui->data' will be freed twice if space budget fail for
rename whiteout operation as following process:
rename_whiteout
dev = kmalloc
whiteout_ui->data = dev
kfree(whiteout_ui->data) // Free first time
iput(whiteout)
ubifs_free_inode
kfree(ui->data) // Double free!
KASAN reports:
==================================================================
BUG: KASAN: double-free or invalid-free in ubifs_free_inode+0x4f/0x70
Call Trace:
kfree+0x117/0x490
ubifs_free_inode+0x4f/0x70 [ubifs]
i_callback+0x30/0x60
rcu_do_batch+0x366/0xac0
__do_softirq+0x133/0x57f
Allocated by task 1506:
kmem_cache_alloc_trace+0x3c2/0x7a0
do_rename+0x9b7/0x1150 [ubifs]
ubifs_rename+0x106/0x1f0 [ubifs]
do_syscall_64+0x35/0x80
Freed by task 1506:
kfree+0x117/0x490
do_rename.cold+0x53/0x8a [ubifs]
ubifs_rename+0x106/0x1f0 [ubifs]
do_syscall_64+0x35/0x80
The buggy address belongs to the object at ffff88810238bed8 which
belongs to the cache kmalloc-8 of size 8
==================================================================
Let ubifs_free_inode() free 'whiteout_ui->data'. BTW, delete unused
assignment 'whiteout_ui->data_len = 0', process 'ubifs_evict_inode()
-> ubifs_jnl_delete_inode() -> ubifs_jnl_write_inode()' doesn't need it
(because 'inc_nlink(whiteout)' won't be excuted by 'goto out_release',
and the nlink of whiteout inode is 0).
Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
tctx_task_work() may get run after io_uring cancellation and so there
will be no one to put cached in tctx task refs that may have been added
back by tw handlers using inline completion infra, Call
io_uring_drop_tctx_refs() at the end of the main tw handler to release
them.
Cc: stable@vger.kernel.org # 5.15+
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Fixes: e98e49b2bbf7 ("io_uring: extend task put optimisations")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/69f226b35fbdb996ab799a8bbc1c06bf634ccec1.1641688805.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
RTM says "If the special ONE stateid is passed to
nfs4_preprocess_stateid_op(), it returns status=0 but does not set
*cstid. nfsd4_copy_notify() depends on stid being set if status=0, and
thus can crash if the client sends the right COPY_NOTIFY RPC."
RFC 7862 says "The cna_src_stateid MUST refer to either open or locking
states provided earlier by the server. If it is invalid, then the
operation MUST fail."
The RFC doesn't specify an error, and the choice doesn't matter much as
this is clearly illegal client behavior, but bad_stateid seems
reasonable.
Simplest is just to guarantee that nfs4_preprocess_stateid_op, called
with non-NULL cstid, errors out if it can't return a stateid.
Reported-by: rtm@csail.mit.edu
Fixes: 624322f1adc5 ("NFSD add COPY_NOTIFY operation")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
|
|
These functions are related to file handle processing and have
nothing to do with XDR encoding or decoding. Also they are no longer
NFSv3-specific. As a clean-up, move their definitions to a more
appropriate location. WCC is also an NFSv3-specific term, so rename
them as general-purpose helpers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes
returning a reasonable-looking value in the cinfo.before field and
zero in the cinfo.after field.
RFC 8881 Section 10.8.1 says:
> When a client is making changes to a given directory, it needs to
> determine whether there have been changes made to the directory by
> other clients. It does this by using the change attribute as
> reported before and after the directory operation in the associated
> change_info4 value returned for the operation.
and
> ... The post-operation change
> value needs to be saved as the basis for future change_info4
> comparisons.
A good quality client implementation therefore saves the zero
cinfo.after value. During a subsequent OPEN operation, it will
receive a different non-zero value in the cinfo.before field for
that directory, and it will incorrectly believe the directory has
changed, triggering an undesirable directory cache invalidation.
There are filesystem types where fs_supports_change_attribute()
returns false, tmpfs being one. On NFSv4 mounts, this means the
fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is
never invoked. Subsequently, nfsd4_change_attribute() is invoked
with an uninitialized @stat argument.
In fill_pre_wcc(), @stat contains stale stack garbage, which is
then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all
zeroes, so zero is placed on the wire. Both of these values are
meaningless.
This fix can be applied immediately to stable kernels. Once there
are more regression tests in this area, this optimization can be
attempted again.
Fixes: 428a23d2bf0c ("nfsd: skip some unnecessary stats in the v4 case")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
According to commit bbf2f098838a ("nfsd: Reset the boot verifier on
all write I/O errors"), the Linux NFS server forces all clients to
resend pending unstable writes if any server-side write or commit
operation encounters an error (say, ENOSPC). This is a rare and
quite exceptional event that could require administrative recovery
action, so it should be made trace-able. Example trace event:
nfsd-938 [002] 7174.945558: nfsd_writeverf_reset: boot_time= 61cc920d xid=0xdcd62036 error=-28 new verifier=0x08aecc6142515904
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: These functions handle what the specs call a write
verifier, which in the Linux NFS server implementation is now
divorced from the server's boot instance
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
There are two boot-time fields in struct nfsd_net: one called
boot_time and one called nfssvc_boot. The latter is used only to
form write verifiers, but its documenting comment declares:
/* Time of server startup */
Since commit 27c438f53e79 ("nfsd: Support the server resetting the
boot verifier"), this field can be reset at any time; it's no
longer tied to server restart. So that comment is stale.
Also, according to pahole, struct timespec64 is 16 bytes long on
x86_64. The nfssvc_boot field is used only to form a write verifier,
which is 8 bytes long.
Let's clarify this situation by manufacturing an 8-byte verifier
in nfs_reset_boot_verifier() and storing only that in struct
nfsd_net.
We're grabbing 128 bits of time, so compress all of those into a
64-bit verifier instead of throwing out the high-order bits.
In the future, the siphash_key can be re-used for other hashed
objects per-nfsd_net.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When vfs_iter_write() starts to fail because a file system is full,
a bunch of writes can fail at once with ENOSPC. These writes
repeatedly invoke nfsd_reset_boot_verifier() in quick succession.
Ensure that the time it grabs doesn't go backwards due to an ntp
adjustment going on at the same time.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since a clone error commit can cause the boot verifier to change,
we should trace those errors.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ cel: Addressed a checkpatch.pl splat in fs/nfsd/vfs.h ]
|
|
Since this pointer is used repeatedly, move it to a stack variable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since this pointer is used repeatedly, move it to a stack variable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The RWF_SYNC and !RWF_SYNC arms are now exactly alike except that
the RWF_SYNC arm resets the boot verifier twice in a row. Fix that
redundancy and de-duplicate the code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The nfsd_file nf_rwsem is currently being used to separate file write
and commit instances to ensure that we catch errors and apply them to
the correct write/commit.
We can improve scalability at the expense of a little accuracy (some
extra false positives) by replacing the nf_rwsem with more careful
use of the errseq_t mechanism to track errors across the different
operations.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ cel: rebased on zero-verifier fix ]
|
|
RFC 8881 explains the purpose of the write verifier this way:
> The final portion of the result is the field writeverf. This field
> is the write verifier and is a cookie that the client can use to
> determine whether a server has changed instance state (e.g., server
> restart) between a call to WRITE and a subsequent call to either
> WRITE or COMMIT.
But then it says:
> This cookie MUST be unchanged during a single instance of the
> NFSv4.1 server and MUST be unique between instances of the NFSv4.1
> server. If the cookie changes, then the client MUST assume that
> any data written with an UNSTABLE4 value for committed and an old
> writeverf in the reply has been lost and will need to be
> recovered.
RFC 1813 has similar language for NFSv3. NFSv2 does not have a write
verifier since it doesn't implement the COMMIT procedure.
Since commit 19e0663ff9bc ("nfsd: Ensure sampling of the write
verifier is atomic with the write"), the Linux NFS server has
returned a boot-time-based verifier for UNSTABLE WRITEs, but a zero
verifier for FILE_SYNC and DATA_SYNC WRITEs. FILE_SYNC and DATA_SYNC
WRITEs are not followed up with a COMMIT, so there's no need for
clients to compare verifiers for stable writes.
However, by returning a different verifier for stable and unstable
writes, the above commit puts the Linux NFS server a step farther
out of compliance with the first MUST above. At least one NFS client
(FreeBSD) noticed the difference, making this a potential
regression.
Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Link: https://lore.kernel.org/linux-nfs/YQXPR0101MB096857EEACF04A6DF1FC6D9BDD749@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM/T/
Fixes: 19e0663ff9bc ("nfsd: Ensure sampling of the write verifier is atomic with the write")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If we get back -EOPENSTALE from an NFSv4 open, then we either got some
unhandled error or the inode we got back was not the same as the one
associated with the dentry.
We really have no recourse in that situation other than to retry the
open, and if it fails to just return nfserr_stale back to the client.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The NFS client can occasionally return EREMOTEIO when signalling issues
with the server. ...map to NFSERR_IO.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now that we have open file cache, it is possible that another client
deletes the file and DP will not know about it. Then IO to MDS would
fail with BADSTATEID and knfsd would start state recovery, which
should fail as well and then nfs read/write will fail with EBADF.
And it triggers a WARN() in nfserrno().
-----------[ cut here ]------------
WARNING: CPU: 0 PID: 13529 at fs/nfsd/nfsproc.c:758 nfserrno+0x58/0x70 [nfsd]()
nfsd: non-standard errno: -9
modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_connt
pata_acpi floppy
CPU: 0 PID: 13529 Comm: nfsd Tainted: G W 4.1.5-00307-g6e6579b #7
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
0000000000000000 00000000464e6c9c ffff88079085fba8 ffffffff81789936
0000000000000000 ffff88079085fc00 ffff88079085fbe8 ffffffff810a08ea
ffff88079085fbe8 ffff88080f45c900 ffff88080f627d50 ffff880790c46a48
all Trace:
[<ffffffff81789936>] dump_stack+0x45/0x57
[<ffffffff810a08ea>] warn_slowpath_common+0x8a/0xc0
[<ffffffff810a0975>] warn_slowpath_fmt+0x55/0x70
[<ffffffff81252908>] ? splice_direct_to_actor+0x148/0x230
[<ffffffffa02fb8c0>] ? fsid_source+0x60/0x60 [nfsd]
[<ffffffffa02f9918>] nfserrno+0x58/0x70 [nfsd]
[<ffffffffa02fba57>] nfsd_finish_read+0x97/0xb0 [nfsd]
[<ffffffffa02fc7a6>] nfsd_splice_read+0x76/0xa0 [nfsd]
[<ffffffffa02fcca1>] nfsd_read+0xc1/0xd0 [nfsd]
[<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[<ffffffffa03073da>] nfsd3_proc_read+0xba/0x150 [nfsd]
[<ffffffffa02f7a03>] nfsd_dispatch+0xc3/0x210 [nfsd]
[<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[<ffffffffa0232913>] svc_process_common+0x453/0x6f0 [sunrpc]
[<ffffffffa0232cc3>] svc_process+0x113/0x1b0 [sunrpc]
[<ffffffffa02f740f>] nfsd+0xff/0x170 [nfsd]
[<ffffffffa02f7310>] ? nfsd_destroy+0x80/0x80 [nfsd]
[<ffffffff810bf3a8>] kthread+0xd8/0xf0
[<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff817912a2>] ret_from_fork+0x42/0x70
[<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The Linux NFS server currently responds to a zero-length NFSv3 WRITE
request with NFS3ERR_IO. It responds to a zero-length NFSv4 WRITE
with NFS4_OK and count of zero.
RFC 1813 says of the WRITE procedure's @count argument:
count
The number of bytes of data to be written. If count is
0, the WRITE will succeed and return a count of 0,
barring errors due to permissions checking.
RFC 8881 has similar language for NFSv4, though NFSv4 removed the
explicit @count argument because that value is already contained in
the opaque payload array.
The synthetic client pynfs's WRT4 and WRT15 tests do emit zero-
length WRITEs to exercise this spec requirement. Commit fdec6114ee1f
("nfsd4: zero-length WRITE should succeed") addressed the same
problem there with the same fix.
But interestingly the Linux NFS client does not appear to emit zero-
length WRITEs, instead squelching them. I'm not aware of a test that
can generate such WRITEs for NFSv3, so I wrote a naive C program to
generate a zero-length WRITE and test this fix.
Fixes: 8154ef2776aa ("NFSD: Clean up legacy NFS WRITE argument XDR decoders")
Reported-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nbl allocated in nfsd4_lock can be released by a several ways:
directly in nfsd4_lock(), via nfs4_laundromat(), via another nfs
command RELEASE_LOCKOWNER or via nfsd4_callback.
This structure should be refcounted to be used and released correctly
in all these cases.
Refcount is initialized to 1 during allocation and is incremented
when nbl is added into nbl_list/nbl_lru lists.
Usually nbl is linked into both lists together, so only one refcount
is used for both lists.
However nfsd4_lock() should keep in mind that nbl can be present
in one of lists only. This can happen if nbl was handled already
by nfs4_laundromat/nfsd4_callback/etc.
Refcount is decremented if vfs_lock_file() returns FILE_LOCK_DEFERRED,
because nbl can be handled already by nfs4_laundromat/nfsd4_callback/etc.
Refcount is not changed in find_blocked_lock() because of it reuses counter
released after removing nbl from lists.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
NFSv4.1 supports an optional lock notification feature which notifies
the client when a lock comes available. (Normally NFSv4 clients just
poll for locks if necessary.) To make that work, we need to request a
blocking lock from the filesystem.
We turned that off for NFS in commit f657f8eef3ff ("nfs: don't atempt
blocking locks on nfs reexports") [sic] because it actually blocks the
nfsd thread while waiting for the lock.
Thanks to Vasily Averin for pointing out that NFS isn't the only
filesystem with that problem.
Any filesystem that leaves ->lock NULL will use posix_lock_file(), which
does the right thing. Simplest is just to assume that any filesystem
that defines its own ->lock is not safe to request a blocking lock from.
So, this patch mostly reverts commit f657f8eef3ff ("nfs: don't atempt
blocking locks on nfs reexports") [sic] and commit b840be2f00c0 ("lockd:
don't attempt blocking locks on nfs reexports"), and instead uses a
check of ->lock (Vasily's suggestion) to decide whether to support
blocking lock notifications on a given filesystem. Also add a little
documentation.
Perhaps someday we could add back an export flag later to allow
filesystems with "good" ->lock methods to support blocking lock
notifications.
Reported-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: Description rewritten to address checkpatch nits ]
[ cel: Fixed warning when SUNRPC debugging is disabled ]
[ cel: Fixed NULL check ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
|
|
Clean up. Trond points out that xdr_stream_decode_uint32_array()
does the same thing as nfsd4_decode_bitmap4().
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The use of the bitmaps is confusing. Add a cross-reference to make it
easier to find the existing comment. Add an updated reference with URL
to make it quicker to look up. And a bit more editorializing about the
value of this.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: The garbage_args and cant_encode tracepoints report the
same information as each other, so combine them into a single
tracepoint class to reduce code duplication and slightly reduce the
size of trace.o.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull xfs fix from Darrick Wong:
- Make the old ALLOCSP ioctl behave in a consistent manner with newer
syscalls like fallocate.
* tag 'xfs-5.16-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate
|
|
Since do_proc_statt was modified to return process wide values instead
of per task values the exit_code calculation has never been updated.
Update it now to return the process wide exit_code when it is requested
and available.
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: bf719d26a5c1 ("[PATCH] distinct tgid/tid CPU usage")
Link: https://lkml.kernel.org/r/20220103213312.9144-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
This helper is misleading. It tests for an ongoing exec as well as
the process having received a fatal signal.
Sometimes it is appropriate to treat an on-going exec differently than
a process that is shutting down due to a fatal signal. In particular
taking the fast path out of exit_signals instead of retargeting
signals is not appropriate during exec, and not changing the the exit
code in do_group_exit during exec.
Removing the helper makes it more obvious what is going on as both
cases must be coded for explicitly.
While removing the helper fix the two cases where I have observed
using signal_group_exit resulted in the wrong result.
In exit_signals only test for SIGNAL_GROUP_EXIT so that signals are
retargetted during an exec.
In do_group_exit use 0 as the exit code during an exec as de_thread
does not set group_exit_code. As best as I can determine
group_exit_code has been is set to 0 most of the time during
de_thread. During a thread group stop group_exit_code is set to the
stop signal and when the thread group receives SIGCONT group_exit_code
is reset to 0.
Link: https://lkml.kernel.org/r/20211213225350.27481-8-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
The only remaining user of group_exit_task is exec. Rename the field
so that it is clear which part of the code uses it.
Update the comment above the definition of group_exec_task to document
how it is currently used.
Link: https://lkml.kernel.org/r/20211213225350.27481-7-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Currently the coredump code sets group_exit_task so that
signal_group_exit() will return true during a coredump. Now that the
coredump code always sets SIGNAL_GROUP_EXIT there is no longer a need
to set signal->group_exit_task.
Link: https://lkml.kernel.org/r/20211213225350.27481-6-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
After the previous cleanups "signal->core_state" is set whenever
SIGNAL_GROUP_COREDUMP is set and "signal->core_state" is tested
whenver the code wants to know if a coredump is in progress. The
remaining tests of SIGNAL_GROUP_COREDUMP also test to see if
SIGNAL_GROUP_EXIT is set. Similarly the only place that sets
SIGNAL_GROUP_COREDUMP also sets SIGNAL_GROUP_EXIT.
Which makes SIGNAL_GROUP_COREDUMP unecessary and redundant. So stop
setting SIGNAL_GROUP_COREDUMP, stop testing SIGNAL_GROUP_COREDUMP, and
remove it's definition.
With the setting of SIGNAL_GROUP_COREDUMP gone, coredump_finish no
longer needs to clear SIGNAL_GROUP_COREDUMP out of signal->flags
by setting SIGNAL_GROUP_EXIT.
Link: https://lkml.kernel.org/r/20211213225350.27481-5-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
There are only a few places that test SIGNAL_GROUP_EXIT and
are not also already testing SIGNAL_GROUP_COREDUMP.
This will not affect the callers of signal_group_exit as zap_process
also sets group_exit_task so signal_group_exit will continue to return
true at the same times.
This does not affect wait_task_zombie as the none of the threads
wind up in EXIT_ZOMBIE state during a coredump.
This does not affect oom_kill.c:__task_will_free_mem as
sig->core_state is tested and handled before SIGNAL_GROUP_EXIT is
tested for.
This does not affect complete_signal as signal->core_state is tested
for to ensure the coredump case is handled appropriately.
Link: https://lkml.kernel.org/r/20211213225350.27481-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
The point of using set_child_tid to hold the kthread pointer was that
it already did what is necessary. There are now restrictions on when
set_child_tid can be initialized and when set_child_tid can be used in
schedule_tail. Which indicates that continuing to use set_child_tid
to hold the kthread pointer is a bad idea.
Instead of continuing to use the set_child_tid field of task_struct
generalize the pf_io_worker field of task_struct and use it to hold
the kthread pointer.
Rename pf_io_worker (which is a void * pointer) to worker_private so
it can be used to store kthreads struct kthread pointer. Update the
kthread code to store the kthread pointer in the worker_private field.
Remove the places where set_child_tid had to be dealt with carefully
because kthreads also used it.
Link: https://lkml.kernel.org/r/CAHk-=wgtFAA9SbVYg0gR1tqPMC17-NYcs0GQkaYg1bGhh1uJQQ@mail.gmail.com
Link: https://lkml.kernel.org/r/87a6grvqy8.fsf_-_@email.froward.int.ebiederm.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
We still only operate on a single page of data at a time due to using
kmap(). A more complex implementation would work on each page in a folio,
but it's not clear that such a complex implementation would be worthwhile.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
|
|
find_lock_entries() already only returned the head page of folios, so
convert it to return a folio_batch instead of a pagevec. That cascades
through converting truncate_inode_pages_range() to
delete_from_page_cache_batch() and page_cache_delete_batch().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
|
|
When a TCP connection gets reestablished by the sender in cifs_reconnect,
There is a chance for race condition with demultiplex thread waiting in
cifs_readv_from_socket on the old socket. It will now return -ECONNRESET.
This condition is handled by comparing socket pointer before and after
sock_recvmsg. If the socket pointer has changed, we should not call
cifs_reconnect again, but instead retry with new socket.
Also fixed another bug in my prev mchan commits.
We should always reestablish session (even if binding) on a channel
that needs reconnection.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If functions like cifs_negotiate_protocol, cifs_setup_session,
cifs_tree_connect are called in parallel on different channels,
each of these will be execute the requests. This maybe unnecessary
in some cases, and only the first caller may need to do the work.
This is achieved by having more states for the tcp/smb/tcon session
status fields. And tracking the state of reconnection based on the
state machine.
For example:
for tcp connections:
CifsNew/CifsNeedReconnect ->
CifsNeedNegotiate ->
CifsInNegotiate ->
CifsNeedSessSetup ->
CifsInSessSetup ->
CifsGood
for smb sessions:
CifsNew/CifsNeedReconnect ->
CifsGood
for tcon:
CifsNew/CifsNeedReconnect ->
CifsInFilesInvalidate ->
CifsNeedTcon ->
CifsInTcon ->
CifsGood
If any channel reconnect sees that it's in the middle of
transition to CifsGood, then they can skip the function.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Mount will hang if using SMB1 and DFS.
This is because every call to get_next_mid() will, unconditionally,
mark tcpStatus to CifsNeedReconnect before even establishing the
initial connect, because "reconnect" variable was not initialized.
Initializing "reconnect" to false fix this issue.
Fixes: 220c5bc25d87 ("cifs: take cifs_tcp_ses_lock for status checks")
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
While checking/updating status for tcp ses, smb ses or tcon,
we take GlobalMid_Lock. This doesn't make any sense.
Replaced it with cifs_tcp_ses_lock.
Ideally, we should take a spin lock per struct.
But since tcp ses, smb ses and tcon objects won't add up to a lot,
I think there should not be too much contention.
Also, in few other places, these are checked without locking.
Added locking for these.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If the file type changes back to being a regular file on the server
between the failed OPEN and our LOOKUP, then we need to re-run the OPEN.
Fixes: 0dd2b474d0b6 ("nfs: implement i_op->atomic_open()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
If the application sets the O_DIRECTORY flag, and tries to open a
regular file, nfs_atomic_open() will punt to doing a regular lookup.
If the server then returns a regular file, we will happily return a
file descriptor with uninitialised open state.
The fix is to return the expected ENOTDIR error in these cases.
Reported-by: Lyu Tao <tao.lyu@epfl.ch>
Fixes: 0dd2b474d0b6 ("nfs: implement i_op->atomic_open()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Don't send a truncation RPC to the server if we're only shortening data
that's in the pagecache and is beyond the server's EOF.
Also don't automatically force writeback on setattr, but do wait to store
RPCs that are in the region to be removed on a shortening truncation.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
Acked-by: Jeff Layton <jlayton@kernel.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163819663275.215744.4781075713714590913.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906972600.143852.14237659724463048094.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967177522.1823006.15336589054269480601.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021571880.640689.1837025861707111004.stgit@warthog.procyon.org.uk/ # v4
|
|
When writing to the server from afs_writepage() or afs_writepages(), copy
the data to the cache object too.
To make this possible, the cookie must have its active users count
incremented when the page is dirtied and kept incremented until we manage
to clean up all the pages. This allows the writeback to take place after
the last file struct is released.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
Acked-by: Jeff Layton <jlayton@kernel.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819662333.215744.7531373404219224438.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906970998.143852.674420788614608063.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967176564.1823006.16666056085593949570.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021570208.640689.9193494979708031862.stgit@warthog.procyon.org.uk/ # v4
|