summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-06-04xfs: btree lookup shouldn't ASSERT on empty btree nodesDarrick J. Wong
If a btree lookup encounters an empty btree node or an empty btree leaf on a multi-level btree, that's evidence of a corrupt on-disk btree. Therefore, we should return -EFSCORRUPTED to the upper levels, not an ASSERT failure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: xfs_alloc_get_rec should return EFSCORRUPTED for obvious bnobt corruptionDarrick J. Wong
Return -EFSCORRUPTED when the bnobt/cntbt return obviously corrupt values, rather than letting them bounce around in the internal code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: remove redundant ASSERT on insufficient bestfree length in _leaf_addnameDarrick J. Wong
In xfs_dir2_leaf_addname we ASSERT if the length of the unused space described by bestfree[0] is less the amount of space we wish to consume. Immediately after it is a call to xfs_dir2_data_use_free where the offset parameter is offset of the unused space and the length parameter is the amount of space we wish to consume. Both values (and the unused space pointer) are passed into xfs_dir2_data_check_free, which also validates that the region of unused space is big enough to cover the space we wish to consume. This is effectively the same check that the ASSERT covers, and since a check failure results in a corruption message being logged we can remove the ASSERT. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: don't assert when reporting on-disk corruption while loading btreeDarrick J. Wong
Don't bother ASSERTing when we're already going to log and return the corruption status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04xfs: don't forbid setting dax flag on directories if device doesn't daxDarrick J. Wong
On a directory, the DAX flag is merely a hint that files created in the directory should have the DAX flag set at creation time. We don't care if the underlying device supports DAX or not because directory metadata are always cached in DRAM. We don't care if new files get the flag even if the device doesn't support DAX because we always check for DAX support before setting the VFS flag (S_DAX). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-06-04Merge tag '4.18-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs updates from Steve French: - smb3 fixes for stable - addition of ftrace hooks for cifs.ko - improvements in compounding and smbdirect (rdma) * tag '4.18-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (38 commits) CIFS: Add support for direct pages in wdata CIFS: Use offset when reading pages CIFS: Add support for direct pages in rdata cifs: update multiplex loop to handle compounded responses cifs: remove header_preamble_size where it is always 0 cifs: remove struct smb2_hdr CIFS: 511c54a2f69195b28afb9dd119f03787b1625bb4 adds a check for session expiry, status STATUS_NETWORK_SESSION_EXPIRED, however the server can also respond with STATUS_USER_SESSION_DELETED in cases where the session has been idle for some time and the server reaps the session to recover resources. cifs: change smb2_get_data_area_len to take a smb2_sync_hdr as argument cifs: update smb2_calc_size to use smb2_sync_hdr instead of smb2_hdr cifs: remove struct smb2_oplock_break_rsp cifs: remove rfc1002 header from all SMB2 response structures smb3: on reconnect set PreviousSessionId field smb3: Add posix create context for smb3.11 posix mounts smb3: add tracepoints for smb2/smb3 open cifs: add debug output to show nocase mount option smb3: add define for id for posix create context and corresponding struct cifs: update smb2_check_message to handle PDUs without a 4 byte length header smb3: allow "posix" mount option to enable new SMB311 protocol extensions smb3: add support for posix negotiate context cifs: allow disabling less secure legacy dialects ...
2018-06-04Merge tag 'gfs2-4.18.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "We've got nine more patches for this merge window. - remove sd_jheightsize to greatly simplify some code (Andreas Gruenbacher) - fix some comments (Andreas) - fix a glock recursion bug when allocation errors occur (Andreas) - improve the hole_size function so it returns the entire hole rather than figuring it out piecemeal (Andreas) - clean up gfs2_stuffed_write_end to remove a lot of redundancy (Andreas) - clarify code with regard to the way ordered writes are processed (Andreas) - a bunch of improvements and cleanups of the iomap code to pave the way for iomap writes, which is a future patch set (Andreas) - fix a bug where block reservations can run off the end of a bitmap (Bob Peterson) - add Andreas to the MAINTAINERS file (Bob Peterson)" * tag 'gfs2-4.18.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: MAINTAINERS: Add Andreas Gruenbacher as a maintainer for gfs2 gfs2: Iomap cleanups and improvements gfs2: Remove ordered write mode handling from gfs2_trans_add_data gfs2: gfs2_stuffed_write_end cleanup gfs2: hole_size improvement GFS2: gfs2_free_extlen can return an extent that is too long GFS2: Fix allocation error bug with recursive rgrp glocking gfs2: Update find_metapath comment gfs2: Remove sdp->sd_jheightsize
2018-06-04Merge tag 'dlm-4.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "These three commits fix and clean up the flags dlm was using on its SCTP sockets. This improves performance and fixes some bad connection delays" * tag 'dlm-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: remove O_NONBLOCK flag in sctp_connect_to_sock dlm: make sctp_connect_to_sock() return in specified time dlm: fix a clerical error when set SCTP_NODELAY
2018-06-04f2fs: fix to clear FI_VOLATILE_FILE correctlyChao Yu
Thread A Thread B - f2fs_release_file - clear_inode_flag(FI_VOLATILE_FILE) - wb_writeback - writeback_sb_inodes - __writeback_single_inode - do_writepages - f2fs_write_data_pages - __write_data_page all volatile file's pages are writebacked to storage - set_inode_flag(FI_DROP_CACHE) - filemap_fdatawrite There is a hole that mm can flush all dirty pages of volatile file as inode is not tagged with both FI_VOLATILE_FILE and FI_DROP_CACHE flags, we should never writeback the page #0 and also it's unneeded to writeback other pages. This patch adjusts to relocate clear_inode_flag(FI_VOLATILE_FILE), so that FI_VOLATILE_FILE flag can be remained before all dirty pages were dropped to avoid issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-04f2fs: let sync node IO interrupt async oneChao Yu
Although mixed sync/async IOs can have continuous LBA, as they have different IO priority, block IO scheduler will add them into different queues and commit them separately, result in splited IOs which causes wrose performance. This patch gives high priority to synchronous IO of nodes, means that once synchronous flow starts, it can interrupt asynchronous writeback flow of system flusher, so more big IOs can be expected. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-04f2fs: don't change wbc->sync_modeChao Yu
We should never falsify wbc->sync_mode passed from mm, otherwise mm can trigger writeback with wrong IO priority. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-04f2fs: fix to update mtime correctlyChao Yu
If we change system time to the past, get_mtime() will return a overflowed time, and SIT_I(sbi)->max_mtime will be udpated incorrectly, this patch fixes the two issues. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-04Merge tag 'for-4.18-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "User visible features: - added support for the ioctl FS_IOC_FSGETXATTR, per-inode flags, successor of GET/SETFLAGS; now supports only existing flags: append, immutable, noatime, nodump, sync - 3 new unprivileged ioctls to allow users to enumerate subvolumes - dedupe syscall implementation does not restrict the range to 16MiB, though it still splits the whole range to 16MiB chunks - on user demand, rmdir() is able to delete an empty subvolume, export the capability in sysfs - fix inode number types in tracepoints, other cleanups - send: improved speed when dealing with a large removed directory, measurements show decrease from 2000 minutes to 2 minutes on a directory with 2 million entries - pre-commit check of superblock to detect a mysterious in-memory corruption - log message updates Other changes: - orphan inode cleanup improved, does no keep long-standing reservations that could lead up to early ENOSPC in some cases - slight improvement of handling snapshotted NOCOW files by avoiding some unnecessary tree searches - avoid OOM when dealing with many unmergeable small extents at flush time - speedup conversion of free space tree representations from/to bitmap/tree - code refactoring, deletion, cleanups: + delayed refs + delayed iput + redundant argument removals + memory barrier cleanups + remove a redundant mutex supposedly excluding several ioctls to run in parallel - new tracepoints for blockgroup manipulation - more sanity checks of compressed headers" * tag 'for-4.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (183 commits) btrfs: Add unprivileged version of ino_lookup ioctl btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF btrfs: Add unprivileged ioctl which returns subvolume information Btrfs: clean up error handling in btrfs_truncate() btrfs: Factor out write portion of btrfs_get_blocks_direct btrfs: Factor out read portion of btrfs_get_blocks_direct btrfs: return ENOMEM if path allocation fails in btrfs_cross_ref_exist btrfs: raid56: Remove VLA usage btrfs: return error value if create_io_em failed in cow_file_range btrfs: drop useless member qgroup_reserved of btrfs_pending_snapshot btrfs: drop unused parameter qgroup_reserved btrfs: balance dirty metadata pages in btrfs_finish_ordered_io btrfs: lift some btrfs_cross_ref_exist checks in nocow path btrfs: Remove fs_info argument from btrfs_uuid_tree_rem btrfs: Remove fs_info argument from btrfs_uuid_tree_add Btrfs: remove unused check of skip_locking Btrfs: remove always true check in unlock_up Btrfs: grab write lock directly if write_lock_level is the max level Btrfs: move get root out of btrfs_search_slot to a helper Btrfs: use more straightforward extent_buffer_uptodate check ...
2018-06-04Merge tag 'affs-for-4.18-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull affs fix from David Sterba: "A potential memory leak fix for AFFS" * tag 'affs-for-4.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: affs: fix potential memory leak when parsing option 'prefix'
2018-06-04Merge branch 'work.aio-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull aio updates from Al Viro: "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly. The only thing I'm holding back for a day or so is Adam's aio ioprio - his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case), but let it sit in -next for decency sake..." * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) aio: sanitize the limit checking in io_submit(2) aio: fold do_io_submit() into callers aio: shift copyin of iocb into io_submit_one() aio_read_events_ring(): make a bit more readable aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way aio: take list removal to (some) callers of aio_complete() aio: add missing break for the IOCB_CMD_FDSYNC case random: convert to ->poll_mask timerfd: convert to ->poll_mask eventfd: switch to ->poll_mask pipe: convert to ->poll_mask crypto: af_alg: convert to ->poll_mask net/rxrpc: convert to ->poll_mask net/iucv: convert to ->poll_mask net/phonet: convert to ->poll_mask net/nfc: convert to ->poll_mask net/caif: convert to ->poll_mask net/bluetooth: convert to ->poll_mask net/sctp: convert to ->poll_mask net/tipc: convert to ->poll_mask ...
2018-06-04Merge branch 'work.lookup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache lookup cleanups from Al Viro: "Cleaning ->lookup() instances up - mostly d_splice_alias() conversions" * 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (29 commits) switch the rest of procfs lookups to d_splice_alias() procfs: switch instantiate_t to d_splice_alias() don't bother with tid_fd_revalidate() in lookups proc_lookupfd_common(): don't bother with instantiate unless the file is open procfs: get rid of ancient BS in pid_revalidate() uses cifs_lookup(): switch to d_splice_alias() cifs_lookup(): cifs_get_inode_...() never returns 0 with *inode left NULL 9p: unify paths in v9fs_vfs_lookup() ncp_lookup(): use d_splice_alias() hfsplus: switch to d_splice_alias() hfs: don't allow mounting over .../rsrc hfs: use d_splice_alias() omfs_lookup(): report IO errors, use d_splice_alias() orangefs_lookup: simplify openpromfs: switch to d_splice_alias() xfs_vn_lookup: simplify a bit adfs_lookup: do not fail with ENOENT on negatives, use d_splice_alias() adfs_lookup_byname: .. *is* taken care of in fs/namei.c romfs_lookup: switch to d_splice_alias() qnx6_lookup: switch to d_splice_alias() ...
2018-06-04rxrpc: Fix handling of call quietly cancelled out on serverDavid Howells
Sometimes an in-progress call will stop responding on the fileserver when the fileserver quietly cancels the call with an internally marked abort (RX_CALL_DEAD), without sending an ABORT to the client. This causes the client's call to eventually expire from lack of incoming packets directed its way, which currently leads to it being cancelled locally with ETIME. Note that it's not currently clear as to why this happens as it's really hard to reproduce. The rotation policy implement by kAFS, however, doesn't differentiate between ETIME meaning we didn't get any response from the server and ETIME meaning the call got cancelled mid-flow. The latter leads to an oops when fetching data as the rotation partially resets the afs_read descriptor, which can result in a cleared page pointer being dereferenced because that page has already been filled. Handle this by the following means: (1) Set a flag on a call when we receive a packet for it. (2) Store the highest packet serial number so far received for a call (bearing in mind this may wrap). (3) If, when the "not received anything recently" timeout expires on a call, we've received at least one packet for a call and the connection as a whole has received packets more recently than that call, then cancel the call locally with ECONNRESET rather than ETIME. This indicates that the call was definitely in progress on the server. (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME, don't try the next server, but rather abort the call. This avoids the oops as we don't try to reuse the afs_read struct. Rather, as-yet ungotten pages will be reread at a later data. Also: (5) Add an rxrpc tracepoint to log detection of the call being reset. Without this, I occasionally see an oops like the following: general protection fault: 0000 [#1] SMP PTI ... RIP: 0010:_copy_to_iter+0x204/0x310 RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206 RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560 RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000 RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400 R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958 R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560 FS: 00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0 Call Trace: skb_copy_datagram_iter+0x14e/0x289 rxrpc_recvmsg_data.isra.0+0x6f3/0xf68 ? trace_buffer_unlock_commit_regs+0x4f/0x89 rxrpc_kernel_recv_data+0x149/0x421 afs_extract_data+0x1e0/0x798 ? afs_wait_for_call_to_complete+0xc9/0x52e afs_deliver_fs_fetch_data+0x33a/0x5ab afs_deliver_to_call+0x1ee/0x5e0 ? afs_wait_for_call_to_complete+0xc9/0x52e afs_wait_for_call_to_complete+0x12b/0x52e ? wake_up_q+0x54/0x54 afs_make_call+0x287/0x462 ? afs_fs_fetch_data+0x3e6/0x3ed ? rcu_read_lock_sched_held+0x5d/0x63 afs_fs_fetch_data+0x3e6/0x3ed afs_fetch_data+0xbb/0x14a afs_readpages+0x317/0x40d __do_page_cache_readahead+0x203/0x2ba ? ondemand_readahead+0x3a7/0x3c1 ondemand_readahead+0x3a7/0x3c1 generic_file_buffered_read+0x18b/0x62f __vfs_read+0xdb/0xfe vfs_read+0xb2/0x137 ksys_read+0x50/0x8c do_syscall_64+0x7d/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Note the weird value in RDI which is a result of trying to kmap() a NULL page pointer. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04Merge tag 'locks-v4.18-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull fasync fix from Jeff Layton: "Just a single fix for a deadlock in the fasync handling code that Kirill observed while testing. The fix is to change the fa_lock to be rwlock_t, and use a read lock in kill_fasync_rcu" * tag 'locks-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fasync: Fix deadlock between task-context and interrupt-context kill_fasync()
2018-06-04Merge tag 'docs-4.18' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "There's been a fair amount of work in the docs tree this time around, including: - Extensive RST conversions and organizational work in the memory-management docs thanks to Mike Rapoport. - An update of Documentation/features from Andrea Parri and a script to keep it updated. - Various LICENSES updates from Thomas, along with a script to check SPDX tags. - Work to fix dangling references to documentation files; this involved a fair number of one-liner comment changes outside of Documentation/ ... and the usual list of documentation improvements, typo fixes, etc" * tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits) Documentation: document hung_task_panic kernel parameter docs/admin-guide/mm: add high level concepts overview docs/vm: move ksm and transhuge from "user" to "internals" section. docs: Use the kerneldoc comments for memalloc_no*() doc: document scope NOFS, NOIO APIs docs: update kernel versions and dates in tables docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge docs/vm: transhuge: minor updates docs/vm: transhuge: change sections order Documentation: arm: clean up Marvell Berlin family info Documentation: gpio: driver: Fix a typo and some odd grammar docs: ranoops.rst: fix location of ramoops.txt scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode docs: uio-howto.rst: use a code block to solve a warning mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback w1: w1_io.c: fix a kernel-doc warning Documentation/process/posting: wrap text at 80 cols docs: admin-guide: add cgroup-v2 documentation Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'" Documentation: refcount-vs-atomic: Update reference to LKMM doc. ...
2018-06-04NFS: Filter cache invalidation when holding a delegationTrond Myklebust
If the client holds a delegation, then ensure we filter out attempts to invalidate the size, owner, group owner, or mode unless we made the change, in which case, check that NFS_INO_REVAL_FORCED is set by the caller. Always filter out attempts to invalidate the change attribute and size, since we are authoritative for those. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes()Trond Myklebust
If we hold a delegation, we should not need to call nfs_check_inode_attributes() since we already know which attributes are valid, and which ones may still need revalidation. The state of the NFS_INO_REVAL_FORCED flag is therefore irrelevant. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFS: Improve caching while holding a delegationTrond Myklebust
Make sure that the client completely ignores change attribute and size changes on the server when it holds a delegation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFS: Fix attribute revalidationTrond Myklebust
Don't mark attributes as invalid just because they have changed. Instead, for the purposes of adjusting the attribute cache timeout, keep a separate variable that tracks whether or not a change occurred. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFS: fix up nfs_setattr_update_inodeTrond Myklebust
Always try to set the attributes, even if we don't have a valid struct nfs_fattr. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFSv4: Ensure the inode is clean when we set a delegationTrond Myklebust
If there are attributes that are still invalid when we set a delegation, then we need to set the NFS_INO_REVAL_FORCED flag. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_accessTrond Myklebust
If we hold a delegation, we don't need to care about whether or not the inode attributes are up to date. We know we can cache the results of this call regardless. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04ceph: show ino32 if the value is different with defaultChengguang Xu
In current ceph_show_options(), there is no item for showing 'ino32', so add showing mount option 'ino32' if the value is different with default. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: strengthen rsize/wsize/readdir_max_bytes validationChengguang Xu
The check (intval < PAGE_SIZE) will involve type cast, so even when specifying negative value to rsize/wsize/readdir_max_bytes, it will pass the validation check successfully. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: fix alignment of rasizeChengguang Xu
On currently logic: when I specify rasize=0~1 then it will be 4096. when I specify rasize=2~4097 then it will be 8192. Make it the same as rsize & wsize. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: fix use-after-free in ceph_statfs()Luis Henriques
KASAN found an UAF in ceph_statfs. This was a one-off bug but looking at the code it looks like the monmap access needs to be protected as it can be modified while we're accessing it. Fix this by protecting the access with the monc->mutex. BUG: KASAN: use-after-free in ceph_statfs+0x21d/0x2c0 Read of size 8 at addr ffff88006844f2e0 by task trinity-c5/304 CPU: 0 PID: 304 Comm: trinity-c5 Not tainted 4.17.0-rc6+ #172 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa5/0x11b ? show_regs_print_info+0x5/0x5 ? kmsg_dump_rewind+0x118/0x118 ? ceph_statfs+0x21d/0x2c0 print_address_description+0x73/0x2b0 ? ceph_statfs+0x21d/0x2c0 kasan_report+0x243/0x360 ceph_statfs+0x21d/0x2c0 ? ceph_umount_begin+0x80/0x80 ? kmem_cache_alloc+0xdf/0x1a0 statfs_by_dentry+0x79/0xb0 vfs_statfs+0x28/0x110 user_statfs+0x8c/0xe0 ? vfs_statfs+0x110/0x110 ? __fdget_raw+0x10/0x10 __se_sys_statfs+0x5d/0xa0 ? user_statfs+0xe0/0xe0 ? mutex_unlock+0x1d/0x40 ? __x64_sys_statfs+0x20/0x30 do_syscall_64+0xee/0x290 ? syscall_return_slowpath+0x1c0/0x1c0 ? page_fault+0x1e/0x30 ? syscall_return_slowpath+0x13c/0x1c0 ? prepare_exit_to_usermode+0xdb/0x140 ? syscall_trace_enter+0x330/0x330 ? __put_user_4+0x1c/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Allocated by task 130: __kmalloc+0x124/0x210 ceph_monmap_decode+0x1c1/0x400 dispatch+0x113/0xd20 ceph_con_workfn+0xa7e/0x44e0 process_one_work+0x5f0/0xa30 worker_thread+0x184/0xa70 kthread+0x1a0/0x1c0 ret_from_fork+0x35/0x40 Freed by task 130: kfree+0xb8/0x210 dispatch+0x15a/0xd20 ceph_con_workfn+0xa7e/0x44e0 process_one_work+0x5f0/0xa30 worker_thread+0x184/0xa70 kthread+0x1a0/0x1c0 ret_from_fork+0x35/0x40 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: prevent i_version from going backYan, Zheng
inode info from non-auth can be stale. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: fix wrong check for the case of updating link countYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04libceph: make abort_on_full a per-osdc settingIlya Dryomov
The intent behind making it a per-request setting was that it would be set for writes, but not for reads. As it is, the flag is set for all fs/ceph requests except for pool perm check stat request (technically a read). ceph_osdc_abort_on_full() skips reads since the previous commit and I don't see a use case for marking individual requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04ceph: flush pending works before shutdown superYan, Zheng
Pending works hold inode references, which cause "Busy inodes after unmount" warning. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: abort osd requests on force umountYan, Zheng
This avoid force umount waiting on page writeback: io_schedule+0xd/0x30 wait_on_page_bit_common+0xc6/0x130 __filemap_fdatawait_range+0xbd/0x100 filemap_fdatawait_keep_errors+0x15/0x40 sync_inodes_sb+0x1cf/0x240 sync_filesystem+0x52/0x90 generic_shutdown_super+0x1d/0x110 ceph_kill_sb+0x28/0x80 [ceph] deactivate_locked_super+0x35/0x60 cleanup_mnt+0x36/0x70 task_work_run+0x79/0xa0 exit_to_usermode_loop+0x62/0x70 do_syscall_64+0xdb/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 0xffffffffffffffff Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: fix st_nlink stat for directoriesLuis Henriques
Currently, calling stat on a cephfs directory returns 1 for st_nlink. This behaviour has recently changed in the fuse client, as some applications seem to expect this value to be either 0 (if it's unlinked) or 2 + number of subdirectories. This behaviour was changed in the fuse client with commit 67c7e4619188 ("client: use common interp of st_nlink for dirs"). This patch modifies the kernel client to have a similar behaviour. Link: https://tracker.ceph.com/issues/23873 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: support file lock on directoryYan, Zheng
Link: http://tracker.ceph.com/issues/24028 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: show wsize only if non-defaultIlya Dryomov
This is how it was before commit 95cca2b44e54 ("ceph: limit osd write size") went in. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: handle the new nfiles/nsubdirs fields in cap messageYan, Zheng
Without these new fields, stale st_size is returned in following case. 1. MDS modifies a directory 2. MDS issues CEPH_CAP_ANY_SHARED to client 3. The client satifies stat(2) by its cached metadata. set st_size to "i_files + i_subdirs". Link: http://tracker.ceph.com/issues/23855 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: define argument structure for handle_cap_grantYan, Zheng
The data structure includes the versioned feilds of cap message. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: update i_files/i_subdirs only when Fs cap is issuedYan, Zheng
In MDS, file/subdir counts of a directory inode are protected by filelock. In request reply without Fs cap, nfiles/nsubdirs can be stale. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: always get rstat from auth mdsYan, Zheng
rstat is not tracked by capability. client can't know if rstat from non-auth mds is uptodate or not. Link: http://tracker.ceph.com/issues/23538 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04ceph: use bit flags to define vxattr attributesYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04fs: aio ioprio use ioprio_check_cap ret valAdam Manzanares
Previously the value was ignored. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-04Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Misc bits and pieces not fitting into anything more specific" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: delete unnecessary assignment in vfs_listxattr Documentation: filesystems: update filesystem locking documentation vfs: namei: use path_equal() in follow_dotdot() fs.h: fix outdated comment about file flags __inode_security_revalidate() never gets NULL opt_dentry make xattr_getsecurity() static vfat: simplify checks in vfat_lookup() get rid of dead code in d_find_alias() it's SB_BORN, not MS_BORN... msdos_rmdir(): kill BS comment remove rpc_rmdir() fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
2018-06-04Merge branch 'hch.procfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull procfs updates from Al Viro: "Christoph's proc_create_... cleanups series" * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits) xfs, proc: hide unused xfs procfs helpers isdn/gigaset: add back gigaset_procinfo assignment proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields tty: replace ->proc_fops with ->proc_show ide: replace ->proc_fops with ->proc_show ide: remove ide_driver_proc_write isdn: replace ->proc_fops with ->proc_show atm: switch to proc_create_seq_private atm: simplify procfs code bluetooth: switch to proc_create_seq_data netfilter/x_tables: switch to proc_create_seq_private netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data neigh: switch to proc_create_seq_data hostap: switch to proc_create_{seq,single}_data bonding: switch to proc_create_seq_data rtc/proc: switch to proc_create_single_data drbd: switch to proc_create_single resource: switch to proc_create_seq_data staging/rtl8192u: simplify procfs code jfs: simplify procfs code ...
2018-06-04Merge branch 'work.rmdir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rmdir update from Al Viro: "More shrink_dcache_parent()-related stuff - killing the main source of potentially contended calls of that on large subtrees" * 'work.rmdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rmdir(),rename(): do shrink_dcache_parent() only on success
2018-06-04NFSv4: Don't ask for delegated attributes when adding a hard linkTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFSv4: Don't ask for delegated attributes when revalidating the inodeTrond Myklebust
Again, when revalidating the inode, we don't need to ask for attributes for which we are authoritative. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04NFS: Pass the inode down to the getattr() callbackTrond Myklebust
Allow the getattr() callback to check things like whether or not we hold a delegation so that it can adjust the attributes that it is asking for. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>