summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-05-14afs: Fix afs_find_server search loopMarc Dionne
The code that looks up servers by addresses makes the assumption that the list of addresses for a server is sorted. It exits the loop if it finds that the target address is larger than the current candidate. As the list is not currently sorted, this can lead to a failure to find a matching server, which can cause callbacks from that server to be ignored. Remove the early exit case so that the complete list is searched. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix the handling of an unfound server in CM operationsDavid Howells
If the client cache manager operations that need the server record (CB.Callback, CB.InitCallBackState, and CB.InitCallBackState3) can't find the server record, they abort the call from the file server with RX_CALL_DEAD when they should return okay. Fixes: c35eccb1f614 ("[AFS]: Implement the CB.InitCallBackState3 operation.") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Add a tracepoint to record callbacks from unlisted serversDavid Howells
Add a tracepoint to record callbacks from servers for which we don't have a record. Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix the handling of CB.InitCallBackState3 to find the server by UUIDDavid Howells
Fix the handling of the CB.InitCallBackState3 service call to find the record of a server that we're using by looking it up by the UUID passed as the parameter rather than by its address (of which it might have many, and which may change). Fixes: c35eccb1f614 ("[AFS]: Implement the CB.InitCallBackState3 operation.") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix VNOVOL handling in address rotationDavid Howells
If a volume location record lists multiple file servers for a volume, then it's possible that due to a misconfiguration or a changing configuration that one of the file servers doesn't know about it yet and will abort VNOVOL. Currently, the rotation algorithm will stop with EREMOTEIO. Fix this by moving on to try the next server if VNOVOL is returned. Once all the servers have been tried and the record rechecked, the algorithm will stop with EREMOTEIO or ENOMEDIUM. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibilityDavid Howells
The OpenAFS server's RXAFS_InlineBulkStatus implementation has a bug whereby if an error occurs on one of the vnodes being queried, then the errorCode field is set correctly in the corresponding status, but the interfaceVersion field is left unset. Fix kAFS to deal with this by evaluating the AFSFetchStatus blob against the following cases when called from FS.InlineBulkStatus delivery: (1) If InterfaceVersion == 0 then: (a) If errorCode != 0 then it indicates the abort code for the corresponding vnode. (b) If errorCode == 0 then the status record is invalid. (2) If InterfaceVersion == 1 then: (a) If errorCode != 0 then it indicates the abort code for the corresponding vnode. (b) If errorCode == 0 then the status record is valid and can be parsed. (3) If InterfaceVersion is anything else then the status record is invalid. Fixes: dd9fbcb8e103 ("afs: Rearrange status mapping") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14make xattr_getsecurity() staticAl Viro
many years overdue... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-14afs: Fix server rotation's handling of fileserver probe failureDavid Howells
The server rotation algorithm just gives up if it fails to probe a fileserver. Fix this by rotating to the next fileserver instead. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix refcounting in callback registrationDavid Howells
The refcounting on afs_cb_interest struct objects in afs_register_server_cb_interest() is wrong as it uses the server list entry's call back interest pointer without regard for the fact that it might be replaced at any time and the object thrown away. Fix this by: (1) Put a lock on the afs_server_list struct that can be used to mediate access to the callback interest pointers in the servers array. (2) Keep a ref on the callback interest that we get from the entry. (3) Dropping the old reference held by vnode->cb_interest if we replace the pointer. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix giving up callbacks on server destructionDavid Howells
When a server record is destroyed, we want to send a message to the server telling it that we're giving up all the callbacks it has promised us. Apply two fixes to this: (1) Only send the FS.GiveUpAllCallBacks message if we actually got a callback from that server. We assume this to be the case if we performed at least one successful FS operation on that server. (2) Send it to the address last used for that server rather than always picking the first address in the list (which might be unreachable). Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix address list parsingDavid Howells
The parsing of port specifiers in the address list obtained from the DNS resolution upcall doesn't work as in4_pton() and in6_pton() will fail on encountering an unexpected delimiter (in this case, the '+' marking the port number). However, in*_pton() can't be given multiple specifiers. Fix this by finding the delimiter in advance and not relying on in*_pton() to find the end of the address for us. Fixes: 8b2a464ced77 ("afs: Add an address list concept") Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix directory page lockingDavid Howells
The afs directory loading code (primarily afs_read_dir()) locks all the pages that hold a directory's content blob to defend against getdents/getdents races and getdents/lookup races where the competitors issue conflicting reads on the same data. As the reads will complete consecutively, they may retrieve different versions of the data and one may overwrite the data that the other is busy parsing. Fix this by not locking the pages at all, but rather by turning the validation lock into an rwsem and getting an exclusive lock on it whilst reading the data or validating the attributes and a shared lock whilst parsing the data. Sharing the attribute validation lock should be fine as the data fetch will retrieve the attributes also. The individual page locks aren't needed at all as the only place they're being used is to serialise data loading. Without this patch, the: if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) { ... } part of afs_read_dir() may be skipped, leaving the pages unlocked when we hit the success: clause - in which case we try to unlock the not-locked pages, leading to the following oops: page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0 flags: 0xfffe000001004(referenced|private) raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000 page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) page->mem_cgroup:ffff98156b27c000 ------------[ cut here ]------------ kernel BUG at mm/filemap.c:1205! ... RIP: 0010:unlock_page+0x43/0x50 ... Call Trace: afs_dir_iterate+0x789/0x8f0 [kafs] ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x166/0x1d0 ? afs_do_lookup+0x69/0x490 [kafs] ? afs_do_lookup+0x101/0x490 [kafs] ? key_default_cmp+0x20/0x20 ? request_key+0x3c/0x80 ? afs_lookup+0xf1/0x340 [kafs] ? __lookup_slow+0x97/0x150 ? lookup_slow+0x35/0x50 ? walk_component+0x1bf/0x490 ? path_lookupat.isra.52+0x75/0x200 ? filename_lookup.part.66+0xa0/0x170 ? afs_end_vnode_operation+0x41/0x60 [kafs] ? __check_object_size+0x9c/0x171 ? strncpy_from_user+0x4a/0x170 ? vfs_statx+0x73/0xe0 ? __do_sys_newlstat+0x39/0x70 ? __x64_sys_getdents+0xc9/0x140 ? __x64_sys_getdents+0x140/0x140 ? do_syscall_64+0x5b/0x160 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: f3ddee8dc4e2 ("afs: Fix directory handling") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-13ext4: handle errors on ext4_commit_superJaegeuk Kim
When remounting ext4 from ro to rw, currently it allows its transition, even if ext4_commit_super() returns EIO. Even worse thing is, after that, fs/buffer complains buffer dirty bits like: Call trace: [<ffffff9750c259dc>] mark_buffer_dirty+0x184/0x1a4 [<ffffff9750cb398c>] __ext4_handle_dirty_super+0x4c/0xfc [<ffffff9750c7a9fc>] ext4_file_open+0x154/0x1c0 [<ffffff9750bea51c>] do_dentry_open+0x114/0x2d0 [<ffffff9750bea75c>] vfs_open+0x5c/0x94 [<ffffff9750bf879c>] path_openat+0x668/0xfe8 [<ffffff9750bf8088>] do_filp_open+0x74/0x120 [<ffffff9750beac98>] do_sys_open+0x148/0x254 [<ffffff9750beade0>] SyS_openat+0x10/0x18 [<ffffff9750a83ab0>] el0_svc_naked+0x24/0x28 EXT4-fs (dm-1): previous I/O error to superblock detected Buffer I/O error on dev dm-1, logical block 0, lost sync page write EXT4-fs (dm-1): re-mounted. Opts: (null) Buffer I/O error on dev dm-1, logical block 80, lost async page write Signed-off-by: Jaegeuk Kim <jaegeuk@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-05-13ext4: do not update s_last_mounted of a frozen fsAmir Goldstein
If fs is frozen after mount and before the first file open, the update of s_last_mounted bypasses freeze protection and prints out a WARNING splat: $ mount /vdf $ fsfreeze -f /vdf $ cat /vdf/foo [ 31.578555] WARNING: CPU: 1 PID: 1415 at fs/ext4/ext4_jbd2.c:53 ext4_journal_check_start+0x48/0x82 [ 31.614016] Call Trace: [ 31.614997] __ext4_journal_start_sb+0xe4/0x1a4 [ 31.616771] ? ext4_file_open+0xb6/0x189 [ 31.618094] ext4_file_open+0xb6/0x189 If fs is frozen, skip s_last_mounted update. [backport hint: to apply to stable tree, need to apply also patches vfs: add the sb_start_intwrite_trylock() helper ext4: factor out helper ext4_sample_last_mounted()] Cc: stable@vger.kernel.org Fixes: bc0b0d6d69ee ("ext4: update the s_last_mounted field in the superblock") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2018-05-13ext4: factor out helper ext4_sample_last_mounted()Amir Goldstein
Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2018-05-13ext4: update mtime in ext4_punch_hole even if no blocks are releasedLukas Czerner
Currently in ext4_punch_hole we're going to skip the mtime update if there are no actual blocks to release. However we've actually modified the file by zeroing the partial block so the mtime should be updated. Moreover the sync and datasync handling is skipped as well, which is also wrong. Fix it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Joe Habermann <joe.habermann@quantum.com> Cc: <stable@vger.kernel.org>
2018-05-13ext4: add verifier check for symlink with append/immutable flagsLuis R. Rodriguez
The Linux VFS does not allow a way to set append/immuttable attributes to symlinks, this is just not possible. If this is detected inform the user as the filesystem must be corrupted. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2018-05-13fs: ext4: add new return type vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
2018-05-13vfat: simplify checks in vfat_lookup()Al Viro
vfat_d_anon_disconn() is called only if alias->d_parent is equal to dentry->d_parent *and* it returns false unless alias->d_parent == alias. But in that case alias is the directory we are doing lookup in, and d_splice_alias() would've done the right thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-13get rid of dead code in d_find_alias()Al Viro
All "try disconnected alias if nothing else fits" logics in d_find_alias() got accidentally disabled by Neil a while ago; for most of the callers it was the right thing to do, so fixes belong in few callers that *do* want disconnected aliases. This just takes the now-dead code in d_find_alias() out. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-12Merge tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Some small SMB3 fixes for 4.17-rc5, some for stable" * tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: directory sync should not return an error cifs: smb2ops: Fix listxattr() when there are no EAs cifs: smbd: Enable signing with smbdirect cifs: Allocate validate negotiation request through kmalloc
2018-05-12ext4: fix hole length detection in ext4_ind_map_blocks()Jan Kara
When ext4_ind_map_blocks() computes a length of a hole, it doesn't count with the fact that mapped offset may be somewhere in the middle of the completely empty subtree. In such case it will return too large length of the hole which then results in lseek(SEEK_DATA) to end up returning an incorrect offset beyond the end of the hole. Fix the problem by correctly taking offset within a subtree into account when computing a length of a hole. Fixes: facab4d9711e7aa3532cb82643803e8f1b9518e8 CC: stable@vger.kernel.org Reported-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-05-12ext4: mark block bitmap corrupted when foundWang Shilong
There are still some cases that we missed to set block bitmaps corrupted bit properly: 1) block bitmap number is wrong. 2) failed to read block bitmap due to disk errors. 3) double free block bitmaps.. 4) some mismatch check with bitmaps vs buddy information. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Wang Shilong <wshilong@ddn.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2018-05-12ext4: mark inode bitmap corrupted when foundWang Shilong
There are still some cases that we missed to set block bitmaps corrupted bit properly: 1)inode bitmap number is wrong. 2)failed to read block bitmap due to disk errors. 3)double allocations from bitmap Also remove a duplicated call ext4_error() afer ext4_read_inode_bitmap(), as ext4_error() have been called inside ext4_read_inode_bitmap() properly. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2018-05-12ext4: add new ext4_mark_group_bitmap_corrupted() helperWang Shilong
Since there are many places to set inode/block bitmap corrupt bit, add a new helper for it, which will make codes more clear. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2018-05-12ext4: fix wrong return value in ext4_read_inode_bitmap()Wang Shilong
The only reason that sb_getblk() could fail is out of memory, ext4 codes have returned -ENOMME for all other places except this one, let's fix it here too. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-05-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: rbtree: include rcu.h scripts/faddr2line: fix error when addr2line output contains discriminator ocfs2: take inode cluster lock before moving reflinked inode from orphan dir mm, oom: fix concurrent munlock and oom reaper unmap, v3 mm: migrate: fix double call of radix_tree_replace_slot() proc/kcore: don't bounds check against address 0 mm: don't show nr_indirectly_reclaimable in /proc/vmstat mm: sections are not offlined during memory hotremove z3fold: fix reclaim lock-ups init: fix false positives in W+X checking lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit() KASAN: prohibit KASAN+STRUCTLEAK combination MAINTAINERS: update Shuah's email address
2018-05-11ocfs2: take inode cluster lock before moving reflinked inode from orphan dirAshish Samant
While reflinking an inode, we create a new inode in orphan directory, then take EX lock on it, reflink the original inode to orphan inode and release EX lock. Once the lock is released another node could request it in EX mode from ocfs2_recover_orphans() which causes downconvert of the lock, on this node, to NL mode. Later we attempt to initialize security acl for the orphan inode and move it to the reflink destination. However, while doing this we dont take EX lock on the inode. This could potentially cause problems because we could be starting transaction, accessing journal and modifying metadata of the inode while holding NL lock and with another node holding EX lock on the inode. Fix this by taking orphan inode cluster lock in EX mode before initializing security and moving orphan inode to reflink destination. Use the __tracker variant while taking inode lock to avoid recursive locking in the ocfs2_init_security_and_acl() call chain. Link: http://lkml.kernel.org/r/1523475107-7639-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11proc/kcore: don't bounds check against address 0Laura Abbott
The existing kcore code checks for bad addresses against __va(0) with the assumption that this is the lowest address on the system. This may not hold true on some systems (e.g. arm64) and produce overflows and crashes. Switch to using other functions to validate the address range. It's currently only seen on arm64 and it's not clear if anyone wants to use that particular combination on a stable release. So this is not urgent for stable. Link: http://lkml.kernel.org/r/20180501201143.15121-1-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Tested-by: Dave Anderson <anderson@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexey Dobriyan <adobriyan@gmail.com>a Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11nfsd: Do not refuse to serve out of cacheTrond Myklebust
Currently the knfsd replay cache appears to try to refuse replying to retries that come within 200ms of the cache entry being created. That makes limited sense in today's world of high speed TCP. After a TCP disconnection, a client can very easily reconnect and retry an rpc in less than 200ms. If this logic drops that retry, however, the client may be quite slow to retry again. This logic is original to the first reply cache implementation in 2.1, and may have made more sense for UDP clients that retried much more frequently. After this patch we will still drop on finding the original request still in progress. We may want to fix that as well at some point, though it's less likely. Note that svc_check_conn_limits is often the cause of those disconnections. We may want to fix that some day. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-11fs: don't scan the inode cache before SB_BORN is setDave Chinner
We recently had an oops reported on a 4.14 kernel in xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage and so the m_perag_tree lookup walked into lala land. It produces an oops down this path during the failed mount: radix_tree_gang_lookup_tag+0xc4/0x130 xfs_perag_get_tag+0x37/0xf0 xfs_reclaim_inodes_count+0x32/0x40 xfs_fs_nr_cached_objects+0x11/0x20 super_cache_count+0x35/0xc0 shrink_slab.part.66+0xb1/0x370 shrink_node+0x7e/0x1a0 try_to_free_pages+0x199/0x470 __alloc_pages_slowpath+0x3a1/0xd20 __alloc_pages_nodemask+0x1c3/0x200 cache_grow_begin+0x20b/0x2e0 fallback_alloc+0x160/0x200 kmem_cache_alloc+0x111/0x4e0 The problem is that the superblock shrinker is running before the filesystem structures it depends on have been fully set up. i.e. the shrinker is registered in sget(), before ->fill_super() has been called, and the shrinker can call into the filesystem before fill_super() does it's setup work. Essentially we are exposed to both use-after-free and use-before-initialisation bugs here. To fix this, add a check for the SB_BORN flag in super_cache_count. In general, this flag is not set until ->fs_mount() completes successfully, so we know that it is set after the filesystem setup has completed. This matches the trylock_super() behaviour which will not let super_cache_scan() run if SB_BORN is not set, and hence will not allow the superblock shrinker from entering the filesystem while it is being set up or after it has failed setup and is being torn down. Cc: stable@kernel.org Signed-Off-By: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-11do d_instantiate/unlock_new_inode combinations safelyAl Viro
For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by: Mike Marshall <hubcap@omnibond.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-11nfsd: make nfsd4_scsi_identify_device retry with a larger bufferScott Mayhew
nfsd4_scsi_identify_device() performs a single IDENTIFY command for the device identification VPD page using a small buffer. If the reply is too large to fit in this buffer then the GETDEVICEINFO reply will not contain any info for the SCSI volume aside from the registration key. This can happen for example if the device has descriptors using long SCSI name strings. When the initial reply from the device indicates a larger buffer is needed, retry once using the page length from that reply. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-05-10smb3: directory sync should not return an errorSteve French
As with NFS, which ignores sync on directory handles, fsync on a directory handle is a noop for CIFS/SMB3. Do not return an error on it. It breaks some database apps otherwise. Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-05-10it's SB_BORN, not MS_BORN...Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-10xfs: rename on-disk dquot counter zap functionsDarrick J. Wong
The function 'xfs_qm_dqiterate' doesn't iterate dquots at all, it iterates all dquot blocks of a quota inode and clears the counters. Therefore, change the name to something more descriptive so that we can introduce a real dquot iterator later. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: replace XFS_QMOPT_DQALLOC with a simple booleanDarrick J. Wong
DQALLOC is only ever used with xfs_qm_dqget*, and the only flag that the _dqget family of functions cares about is DQALLOC. Therefore, change it to a boolean 'can alloc?' flag for the dqget interfaces where that makes sense. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-10xfs: remove direct calls to _qm_dqreadDarrick J. Wong
The quota initialization code needs an "uncached" variant of _dqget to read in default quota limits and timers before the dquot cache is fully set up. We've already split up _dqget into its component pieces so create a fourth variant to address this need, and make dqread internal to xfs_dquot.c again. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: refactor xfs_qm_dqtobp and xfs_qm_dqallocDarrick J. Wong
Separate the disk dquot read and allocation functionality into two helper functions, then refactor dqread to call them directly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-10xfs: refactor incore dquot initialization functionsDarrick J. Wong
Create two incore dquot initialization functions that will help us to disentangle dqget and dqread. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: fetch dquots directly during quotacheckDarrick J. Wong
Quotacheck only runs during mount, which means that there are no other processes in the system that could be doing chown or chproj. Therefore there's no potential for racing to attach dquots to the inode so we can drop all the ILOCK and race detection bits from quotacheck. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: split out dqget for inodes from regular dqgetDarrick J. Wong
There are two uses of dqget here -- one is to return the dquot for a given type and id, and the other is to return the dquot for a given type and inode. Those are two separate things, so split them into two smaller functions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: remove unnecessary xfs_qm_dqattach parameterDarrick J. Wong
The flags argument is always zero, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: delegate dqget input checks to helper functionDarrick J. Wong
Move the dqget input checks to a separate function in preparation for splitting up the dqget functionality. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: refactor dquot cache handlingDarrick J. Wong
Delegate the dquot cache handling (radix tree lookup and insertion) to separate helper functions so that we can continue to simplify the body of dqget. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: refactor XFS_QMOPT_DQNEXT out of existenceDarrick J. Wong
There's only one caller of DQNEXT and its semantics can be moved into a separate function, so create the function and get rid of the flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: don't spray logs when dquot flush/purge failDarrick J. Wong
When dquot flush or purge fail there's no need to spam the logs, we've already logged the IO error or fs shutdown that caused the flush failures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-10xfs: release new dquot buffer on defer_finish errorDarrick J. Wong
In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when allocating a new dquot record", we allocate a new dquot block, grab a buffer to initialize it, and return the locked initialized dquot buffer to the caller for further in-core dquot initialization. Unfortunately, if the _bmap_finish errored out, _qm_dqalloc would also error out without bothering to free the (locked) buffer. Leaking a locked buffer caused hangs in generic/388 when quotas are enabled. Furthermore, the _bmap_finish -> _defer_finish conversion in 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_*") failed to observe that the buffer was held going into _defer_finish and therefore failed to notice that the buffer lock is /not/ maintained afterwards. Now that we can bjoin a buffer to a defer_ops, use this mechanism to ensure that the buffer stays locked across the _defer_finish. Release the holds and locks on the buffer as appropriate if we have to error out. There is a subtlety here for the caller in that the buffer emerges locked and held to the transaction, so if the _trans_commit fails we have to release the buffer explicitly. This fixes the unmount hang in generic/388 when quotas are enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-10xfs: don't discard on free of unwritten extentsBrian Foster
Unwritten extents by definition have not been written to until they are converted to normal written extents. If unwritten extents are freed from a file, it is therefore guaranteed that the blocks have not been written to since allocation (note that zero range punches and reallocates blocks). To cut down on online discards generated from workloads that make use of preallocation, skip discards of extents if they are in the unwritten state when the extent is freed. Note that this optimization does not apply to log recovery, during which all freed extents are discarded if online discard is enabled. Also note that it may be possible for a filesystem crash to occur after write completion of an unwritten extent but before unwritten conversion such that the extent remains unwritten after log recovery. Since this pseudo-inconsistency may already be possible after a crash (consider writing to recently allocated blocks where the allocation transaction is lost after a crash), this change shouldn't introduce any fundamental limitations that don't already exist. In short, on storage stacks where discards are important, it's good practice to run an occasional fstrim even with online discard enabled in the filesystem, particularly after a crash. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: skip online discard during eofblocks trimsBrian Foster
We've had reports of online discard operations being sent from XFS on write-only workloads. These discards occur as a result of eofblocks trims that can occur after a large file copy completes. These discards are slightly confusing for users who might be paying close attention to online discards (i.e., vdo) due to performance sensitivity. They also happen to be spurious because freed post-eof blocks by definition have not been written to during the current allocation cycle. Update xfs_free_eofblocks() to skip discards that are purely attributed to eofblocks trims. This cuts down the number of spurious discards that may occur on write-only workloads due to normal preallocation activity. Note that discards of post-eof extents can still occur from other codepaths that do not isolate handling of post-eof blocks from those within eof. For example, file unlinks and truncates may still cause discards for any file blocks affected by the operation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>