summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-03-16NFS:remove redundant call to nfs_do_accessZhouyi Zhou
In function nfs_permission: 1. the rcu_read_lock and rcu_read_unlock around nfs_do_access is unnecessary because the rcu critical data structure is already protected in subsidiary function nfs_access_get_cached_rcu. No other data structure needs rcu_read_lock in nfs_do_access. 2. call nfs_do_access once is enough, because: 2-1. when mask has MAY_NOT_BLOCK bit The second call to nfs_do_access will not happen. 2-2. when mask has no MAY_NOT_BLOCK bit The second call to nfs_do_access will happen if res == -ECHILD, which means the first nfs_do_access goes out after statement if (!may_block). The second call to nfs_do_access will go through this procedure once again except continue the work after if (!may_block). But above work can be performed by only one call to nfs_do_access without mangling the mask flag. Tested in x86_64 Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: Add support for CB_RECALL_ANY for flexfiles layoutsTrond Myklebust
When we receive a CB_RECALL_ANY that asks us to return flexfiles layouts, we iterate through all the layouts and look at whether or not there are active open file descriptors that might need them for I/O. If there are no such descriptors, we return the layouts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: Clean up nfs_delegation_reap_expired()Trond Myklebust
Convert to use nfs_client_for_each_server() for efficiency. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: Clean up nfs_delegation_reap_unclaimed()Trond Myklebust
Convert nfs_delegation_reap_unclaimed() to use nfs_client_for_each_server() for efficiency. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: Clean up nfs_client_return_marked_delegations()Trond Myklebust
Convert it to use the nfs_client_for_each_server() helper, and make it more efficient by skipping delegations for inodes we know are in the process of being freed. Also improve the efficiency of the cursor by skipping delegations that are being freed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFS: Add a helper nfs_client_for_each_server()Trond Myklebust
Add a helper nfs_client_for_each_server() to iterate through all the filesystems that are attached to a struct nfs_client, and apply a function to all the active ones. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4/pnfs: Clean up nfs_layout_find_inode()Trond Myklebust
Now that we can rely on just the rcu_read_lock(), remove the clp->cl_lock and clean up. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: Ensure layout headers are RCU safeTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4/pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()Trond Myklebust
Make sure to test the stateid for validity so that we catch instances where the server may have been reusing stateids in nfs_layout_find_inode_by_stateid(). Fixes: 7b410d9ce460 ("pNFS: Delay getting the layout header in CB_LAYOUTRECALL handlers") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16pNFS/flexfiles: Report DELAY and GRACE errors from the DS to the serverTrond Myklebust
Ensure that if the DS is returning too many DELAY and GRACE errors, we also report that to the MDS through the layouterror mechanism. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFS: Limit the size of the access cache by defaultTrond Myklebust
Currently, we have no real limit on the access cache size (we set it to ULONG_MAX). That can lead to credentials getting pinned for a very long time on lots of files if you have a system with a lot of memory. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFS: Avoid referencing the cred twice in async rename/unlinkTrond Myklebust
In both async rename and rename, we take a reference to the cred in the call arguments. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: Avoid unnecessary credential references in layoutgetTrond Myklebust
Layoutget is just using the credential attached to the open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/OTrond Myklebust
Avoid unnecessary references to the cred when we have already referenced it through the open context or the open owner. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFS: Assume cred is pinned by open context in I/O requestsTrond Myklebust
In read/write/commit, we should be able to assume that the cred is pinned by the open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFS: alloc_nfs_open_context() must use the file cred when availableTrond Myklebust
If we're creating a nfs_open_context() for a specific file pointer, we must use the cred assigned to that file. Fixes: a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFS: Ensure we time out if a delegreturn does not completeTrond Myklebust
We can't allow delegreturn to hold up nfs4_evict_inode() forever, since that can cause the memory shrinkers to block. This patch therefore ensures that we eventually time out, and complete the reclaim of the inode. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4/pnfs: pnfs_set_layout_stateid() should update the layout credTrond Myklebust
If the cred assigned to the layout that we're updating differs from the one used to retrieve the new layout segment, then we need to update the layout plh_lc_cred field. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFSv4: nfs_update_inplace_delegation() should update delegation credTrond Myklebust
If the cred assigned to the delegation that we're updating differs from the one we're updating too, then we need to update that field too. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16NFS: Use the 64-bit server readdir cookies when possibleTrond Myklebust
When we're running as a 64-bit architecture and are not running in 32-bit compatibility mode, it is better to use the 64-bit readdir cookies that supplied by the server. Doing so improves the accuracy of telldir()/seekdir(), particularly when the directory is changing, for instance, when doing 'rm -rf'. We still fall back to using the 32-bit offsets on 32-bit architectures and when in compatibility mode. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-15Merge tag 'locking-urgent-2020-03-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex fix from Thomas Gleixner: "Fix for yet another subtle futex issue. The futex code used ihold() to prevent inodes from vanishing, but ihold() does not guarantee inode persistence. Replace the inode pointer with a per boot, machine wide, unique inode identifier. The second commit fixes the breakage of the hash mechanism which causes a 100% performance regression" * tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Unbreak futex hashing futex: Fix inode life-time issue
2020-03-15xfs: xfs_dabuf_map should return ENOMEM when map allocation failsDarrick J. Wong
If the xfs_buf_map array allocation in xfs_dabuf_map fails for whatever reason, we bail out with error code zero. This will confuse callers, so make sure that we return ENOMEM. Allocation failure should never happen with the small size of the array, but code defensively anyway. Fixes: 45feef8f50b94d ("xfs: refactor xfs_dabuf_map") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2020-03-14io-wq: hash dependent workPavel Begunkov
Enable io-wq hashing stuff for dependent works simply by re-enqueueing such requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-14io-wq: split hashing and enqueueingPavel Begunkov
It's a preparation patch removing io_wq_enqueue_hashed(), which now should be done by io_wq_hash_work() + io_wq_enqueue(). Also, set hash value for dependant works, and do it as late as possible, because req->file can be unavailable before. This hash will be ignored by io-wq. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-14io-wq: don't resched if there is no workPavel Begunkov
This little tweak restores the behaviour that was before the recent io_worker_handle_work() optimisation patches. It makes the function do cond_resched() and flush_signals() only if there is an actual work to execute. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-14io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}Pavel Begunkov
Processing links, io_submit_sqe() prepares requests, drops sqes, and passes them with sqe=NULL to io_queue_sqe(). There IOSQE_DRAIN and/or IOSQE_ASYNC requests will go through the same prep, which doesn't expect sqe=NULL and fail with NULL pointer deference. Always do full prepare including io_alloc_async_ctx() for linked requests, and then it can skip the second preparation. Cc: stable@vger.kernel.org # 5.5 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-14ext4: remove map_from_cluster from ext4_ext_map_blocksEric Whitney
We can use the variable allocated_clusters rather than map_from_clusters to control reserved block/cluster accounting in ext4_ext_map_blocks. This eliminates a variable and associated code and improves readability a little. Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200311205125.25061-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: clean up ext4_ext_insert_extent() call in ext4_ext_map_blocks()Eric Whitney
Now that the eofblocks code has been removed, we don't need to assign 0 to err before calling ext4_ext_insert_extent() since it will assign a return value to ret anyway. The variable free_on_err can be eliminated and replaced by a reference to allocated_clusters which clearly conveys the idea that newly allocated blocks should be freed when recovering from an extent insertion failure. The error handling code itself should be restructured so that it errors out immediately on an insertion failure in the case where no new blocks have been allocated (bigalloc) rather than proceeding further into the mapping code. The initializer for fb_flags can also be rearranged for improved readability. Finally, insert a missing space in nearby code. No known bugs are addressed by this patch - it's simply a cleanup. Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20200311205033.25013-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: mark block bitmap corrupted when found instead of BUGONDmitry Monakhov
We already has similar code in ext4_mb_complex_scan_group(), but ext4_mb_simple_scan_group() still affected. Other reports: https://www.spinics.net/lists/linux-ext4/msg60231.html Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com> Link: https://lore.kernel.org/r/20200310150156.641-1-dmonakhov@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: use flexible-array member for xattr structsGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200309180813.GA3347@embeddedor Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: use flexible-array member in struct fnameGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200309154838.GA31559@embeddedor Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: move ext4_fiemap to use iomap frameworkRitesh Harjani
This patch moves ext4_fiemap to use iomap framework. For xattr a new 'ext4_iomap_xattr_ops' is added. Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/b9f45c885814fcdd0631747ff0fe08886270828c.1582880246.git.riteshh@linux.ibm.com Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: make ext4_ind_map_blocks work with fiemapRitesh Harjani
For indirect block mapping if the i_block > max supported block in inode then ext4_ind_map_blocks() returns a -EIO error. But in case of fiemap this could be a valid query to ->iomap_begin call. So check if the offset >= s_bitmap_maxbytes in ext4_iomap_begin_report(), then simply skip calling ext4_map_blocks(). Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/87fa0ddc5967fa707656212a3b66a7233425325c.1582880246.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: move ext4 bmap to use iomap infrastructureRitesh Harjani
ext4_iomap_begin is already implemented which provides ext4_map_blocks, so just move the API from generic_block_bmap to iomap_bmap for iomap conversion. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/8bbd53bd719d5ccfecafcce93f2bf1d7955a44af.1582880246.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: optimize ext4_ext_precache for 0 depthRitesh Harjani
This patch avoids the memory alloc & free path when depth is 0, since anyway there is no extra caching done in that case. So on checking depth 0, simply return early. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/93da0d0f073c73358e85bb9849d8a5378d1da539.1582880246.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-14ext4: add IOMAP_F_MERGED for non-extent based mappingRitesh Harjani
IOMAP_F_MERGED needs to be set in case of non-extent based mapping. This is needed in later patches for conversion of ext4_fiemap to use iomap. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/a4764c91c08c16d4d4a4b36defb2a08625b0e9b3.1582880246.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-03-13 The following pull-request contains BPF updates for your *net-next* tree. We've added 86 non-merge commits during the last 12 day(s) which contain a total of 107 files changed, 5771 insertions(+), 1700 deletions(-). The main changes are: 1) Add modify_return attach type which allows to attach to a function via BPF trampoline and is run after the fentry and before the fexit programs and can pass a return code to the original caller, from KP Singh. 2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher objects to be visible in /proc/kallsyms so they can be annotated in stack traces, from Jiri Olsa. 3) Extend BPF sockmap to allow for UDP next to existing TCP support in order in order to enable this for BPF based socket dispatch, from Lorenz Bauer. 4) Introduce a new bpftool 'prog profile' command which attaches to existing BPF programs via fentry and fexit hooks and reads out hardware counters during that period, from Song Liu. Example usage: bpftool prog profile id 337 duration 3 cycles instructions llc_misses 4228 run_cnt 3403698 cycles (84.08%) 3525294 instructions # 1.04 insn per cycle (84.05%) 13 llc_misses # 3.69 LLC misses per million isns (83.50%) 5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition of a new bpf_link abstraction to keep in particular BPF tracing programs attached even when the applicaion owning them exits, from Andrii Nakryiko. 6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering and which returns the PID as seen by the init namespace, from Carlos Neira. 7) Refactor of RISC-V JIT code to move out common pieces and addition of a new RV32G BPF JIT compiler, from Luke Nelson. 8) Add gso_size context member to __sk_buff in order to be able to know whether a given skb is GSO or not, from Willem de Bruijn. 9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output implementation but can be called from tracepoint programs, from Eelco Chaudron. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13follow_dotdot{,_rcu}(): switch to use of step_into()Al Viro
gets the regular mount crossing on result of .. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into()Al Viro
Right now the tail ends of follow_dotdot{,_rcu}() are pretty much the open-coded analogues of step_into(). The differences: * the lack of proper LOOKUP_NO_XDEV handling in non-RCU case (arguably a bug) * the lack of ->d_manage() handling (again, arguably a bug) Adjust the calling conventions so that on the next step with could just switch those functions to returning step_into(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into()Al Viro
pure move; we are going to have step_into() called by that bunch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loopAl Viro
Behaviour change: LOOKUP_BENEATH lookup of .. in absolute root yields an error even if it's not the process' root. That's possible only if you'd managed to escape chroot jail by way of procfs symlinks, but IMO the resulting behaviour is not worse - more consistent and easier to describe: ".." in root is "stay where you are", uness LOOKUP_BENEATH has been given, in which case it's "fail with EXDEV". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loopAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13expand path_parent_directory() in its callersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13path_parent_directory(): leave changing path->dentry to callersAl Viro
Instead of returning 0, return new dentry; instead of returning -ENOENT, return NULL. Adjust the callers accordingly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13path_connected(): pass mount and dentry separatelyAl Viro
eventually we'll want to do that check *before* mangling nd->path.dentry... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13split the lookup-related parts of do_last() into a separate helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} caseAl Viro
... getting may_create_in_sticky() checks in FMODE_OPENED case as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): simplify the liveness analysis past finish_open_createdAl Viro
Don't mess with got_write there - it is guaranteed to be false on entry and it will be set true if and only if we decide to go for truncation and manage to get write access for that. Don't carry acc_mode through the entire thing - it's only used in that part. And don't bother with gotos in there - compiler is quite capable of optimizing that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} caseAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-13do_last(): don't bother with keeping got_write in FMODE_OPENED caseAl Viro
it's easier to drop it right after lookup_open() and regain if needed (i.e. if we will need to truncate). On the non-FMODE_OPENED path we do that anyway. In case of FMODE_CREATED we won't be needing it. And it's easier to prove correctness that way, especially since the initial failure to get write access is not always fatal; proving that we'll never end up truncating in that case is rather convoluted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>