summaryrefslogtreecommitdiff
path: root/fs/ceph
AgeCommit message (Collapse)Author
2014-08-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is a lot of refactoring and hardening of the libceph and rbd code here from Ilya that fix various smaller bugs, and a few more important fixes with clone overlap. The main fix is a critical change to the request_fn handling to not sleep that was exposed by the recent mutex changes (which will also go to the 3.16 stable series). Yan Zheng has several fixes in here for CephFS fixing ACL handling, time stamps, and request resends when the MDS restarts. Finally, there are a few cleanups from Himangi Saraogi based on Coccinelle" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits) libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly rbd: remove extra newlines from rbd_warn() messages rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC rbd: rework rbd_request_fn() ceph: fix kick_requests() ceph: fix append mode write ceph: fix sizeof(struct tYpO *) typo ceph: remove redundant memset(0) rbd: take snap_id into account when reading in parent info rbd: do not read in parent info before snap context rbd: update mapping size only on refresh rbd: harden rbd_dev_refresh() and callers a bit rbd: split rbd_dev_spec_update() into two functions rbd: remove unnecessary asserts in rbd_dev_image_probe() rbd: introduce rbd_dev_header_info() rbd: show the entire chain of parent images ceph: replace comma with a semicolon rbd: use rbd_segment_name_free() instead of kfree() ceph: check zero length in ceph_sync_read() ceph: reset r_resend_mds after receiving -ESTALE ...
2014-08-07dcache: d_obtain_alias callers don't all want DISCONNECTEDJ. Bruce Fields
There are a few d_obtain_alias callers that are using it to get the root of a filesystem which may already have an alias somewhere else. This is not the same as the filehandle-lookup case, and none of them actually need DCACHE_DISCONNECTED set. It isn't really a serious problem, but it would really be clearer if we reserved DCACHE_DISCONNECTED for those cases where it's actually needed. In the btrfs case this was causing a spurious printk from nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED dentry. Josef worked around this by unsetting DCACHE_DISCONNECTED manually in 3a0dfa6a12e "Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol", and this replaces that workaround. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07ceph: fix kick_requests()Yan, Zheng
__do_request() may unregister the request. So we should update iterator 'p' before calling __do_request() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-28ceph: fix append mode writeYan, Zheng
generic_write_checks() may update 'pos', so we need to pass 'pos' to ceph_sync_write() and ceph_sync_direct_write(); Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-28ceph: fix sizeof(struct tYpO *) typoIlya Dryomov
struct ceph_xattr -> struct ceph_inode_xattr Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-28ceph: remove redundant memset(0)Ilya Dryomov
xattrs array of pointers is allocated with kcalloc() - no need to memset() it to 0 right after that. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-24ceph: replace comma with a semicolonHimangi Saraogi
Replace a comma between expression statements by a semicolon. This changes the semantics of the code, but given the current indentation appears to be what is intended. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-07-21ceph: check zero length in ceph_sync_read()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-14ceph: reset r_resend_mds after receiving -ESTALEYan, Zheng
this makes __choose_mds() choose mds according caps Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: properly apply umask when ACL is enabledYan, Zheng
when ACL is enabled, posix_acl_create() may change inode's mode Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: pass proper page offset to copy_page_to_iter()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: include time stamp in replayed MDS requestsYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: check unsupported fallocate modeYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This has a mix of bug fixes and cleanups. Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT check when a second rbd image is mapped and a couple memory leaks. Zheng fixes several issues with fragmented directories and multiple MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's patches fix setting and unsetting RBD images read-only. Naturally there are several other cleanups mixed in for good measure" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) rbd: only set disk to read-only once rbd: move calls that may sleep out of spin lock range rbd: add ioctl for rbd ceph: use truncate_pagecache() instead of truncate_inode_pages() ceph: include time stamp in every MDS request rbd: fix ida/idr memory leak rbd: use reference counts for image requests rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync() rbd: make sure we have latest osdmap on 'rbd map' libceph: add ceph_monc_wait_osdmap() libceph: mon_get_version request infrastructure libceph: recognize poolop requests in debugfs ceph: refactor readpage_nounlock() to make the logic clearer mds: check cap ID when handling cap export message ceph: remember subtree root dirfrag's auth MDS ceph: introduce ceph_fill_fragtree() ceph: handle cap import atomically ceph: pre-allocate ceph_cap struct for ceph_add_cap() ceph: update inode fields according to issued caps rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ...
2014-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...
2014-06-12ceph: switch to iter_file_splice_write()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-08ceph: use truncate_pagecache() instead of truncate_inode_pages()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06fs/ceph/debugfs.c: replace seq_printf by seq_putsFabian Frederick
Replace seq_printf where possible. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Sage Weil <sage@inktank.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06fs/ceph: replace pr_warning by pr_warnFabian Frederick
Update the last pr_warning callsites in fs branch Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Sage Weil <sage@inktank.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06ceph: include time stamp in every MDS requestSage Weil
We recently modified the client/MDS protocol to include a timestamp in the client request. This allows ctime updates to follow the client's clock in most cases, which avoids subtle problems when clocks are out of sync and timestamps are updated sometimes by the MDS clock (for most requests) and sometimes by the client clock (for cap writeback). Signed-off-by: Sage Weil <sage@inktank.com>
2014-06-06ceph: refactor readpage_nounlock() to make the logic clearerZhang Zhen
If the return value of ceph_osdc_readpages() is not negative, it is certainly greater than or equal to zero. Remove the useless condition judgment and redundant braces. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06mds: check cap ID when handling cap export messageYan, Zheng
handle following sequence of events: - mds0 exports an inode to mds1. client receives the cap import message from mds1. caps from mds0 are removed while handling the cap import message. - mds1 exports an inode to mds0. client receives the cap export message from mds1. handle_cap_export() adds placeholder caps for mds0 - client receives the first cap export message (for exporting inode from mds0 to mds1) Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06ceph: remember subtree root dirfrag's auth MDSYan, Zheng
remember dirfrag's auth MDS when it's different from its parent inode's auth MDS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06ceph: introduce ceph_fill_fragtree()Yan, Zheng
Move the code that update the i_fragtree into a separate function. Also add simple probabilistic test to decide whether the i_fragtree should be updated Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06ceph: handle cap import atomicallyYan, Zheng
cap import messages are processed by both handle_cap_import() and handle_cap_grant(). These two functions are not executed in the same atomic context, so they can races with cap release. The fix is make handle_cap_import() not release the i_ceph_lock when it returns. Let handle_cap_grant() release the lock after it finishes its job. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06ceph: pre-allocate ceph_cap struct for ceph_add_cap()Yan, Zheng
So that ceph_add_cap() can be used while i_ceph_lock is locked. This simplifies the code that handle cap import/export. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06ceph: update inode fields according to issued capsYan, Zheng
Cap message and request reply from non-auth MDS may carry stale information (corresponding locks are in LOCK states) even they have the newest inode version. So client should update inode fields according to issued caps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06ceph: queue vmtruncate if necessary when handing cap grant/revokeYan, Zheng
cap grant/revoke message from non-auth MDS can update inode's size and truncate_seq/truncate_size. (the message arrives before auth MDS's cap trunc message) Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06ceph: remove useless ACL checkZhang Zhen
posix_acl_xattr_set() already does the check, and it's the only way to feed in an ACL from userspace. So the check here is useless, remove it. Signed-off-by: zhang zhen <zhenzhang.zhang@huawei.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-06ceph: ceph_get_parent() can be staticFengguang Wu
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-02locks: ensure that fl_owner is always initialized properly in flock and ↵Jeff Layton
lease codepaths Currently, the fl_owner isn't set for flock locks. Some filesystems use byte-range locks to simulate flock locks and there is a common idiom in those that does: fl->fl_owner = (fl_owner_t)filp; fl->fl_start = 0; fl->fl_end = OFFSET_MAX; Since flock locks are generally "owned" by the open file description, move this into the common flock lock setup code. The fl_start and fl_end fields are already set appropriately, so remove the unneeded setting of that in flock ops in those filesystems as well. Finally, the lease code also sets the fl_owner as if they were owned by the process and not the open file description. This is incorrect as leases have the same ownership semantics as flock locks. Set them the same way. The lease code doesn't actually use the fl_owner value for anything, so this is more for consistency's sake than a bugfix. Reported-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (Staging portion) Acked-by: J. Bruce Fields <bfields@fieldses.org>
2014-05-06ceph: switch to ->write_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph_sync_direct_write: stop poking into iov_iter gutsAl Viro
all needed primitives are there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph_sync_read: stop poking into iov_iter gutsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph: switch to ->read_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06start adding the tag to iov_iterAl Viro
For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06new helper: generic_file_read_iter()Al Viro
iov_iter-using variant of generic_file_aio_read(). Some callers converted. Note that it's still not quite there for use as ->read_iter() - we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately, that's true for all converted callers (and for generic_file_aio_read() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph_aio_read(): keep iov_iter across retriesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06pass iov_iter to ->direct_IO()Al Viro
unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06kill generic_segment_checks()Al Viro
all callers of ->aio_read() and ->aio_write() have iov/nr_segs already checked - generic_segment_checks() done after that is just an odd way to spell iov_length(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06kill iov_iter_copy_from_user()Al Viro
all callers can use copy_page_from_iter() and it actually simplifies them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "First, there is a critical fix for the new primary-affinity function that went into -rc1. The second batch of patches from Zheng fix a range of problems with directory fragmentation, readdir, and a few odds and ends for cephfs" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: reserve caps for file layout/lock MDS requests ceph: avoid releasing caps that are being used ceph: clear directory's completeness when creating file libceph: fix non-default values check in apply_primary_affinity() ceph: use fpos_cmp() to compare dentry positions ceph: check directory's completeness before emitting directory entry
2014-04-28ceph: reserve caps for file layout/lock MDS requestsYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: avoid releasing caps that are being usedYan, Zheng
To avoid releasing caps that are being used, encode_inode_release() should send implemented caps to MDS. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: clear directory's completeness when creating fileYan, Zheng
When creating a file, ceph_set_dentry_offset() puts the new dentry at the end of directory's d_subdirs, then set the dentry's offset based on directory's max offset. The offset does not reflect the real postion of the dentry in directory. Later readdir reply from MDS may change the dentry's position/offset. This inconsistency can cause missing/duplicate entries in readdir result if readdir is partly satisfied by dcache_readdir(). The fix is clear directory's completeness after creating/renaming file. It prevents later readdir from using dcache_readdir(). Fixes: http://tracker.ceph.com/issues/8025 Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: use fpos_cmp() to compare dentry positionsYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-28ceph: check directory's completeness before emitting directory entryYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-20Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "These are regression and bug fixes for ext4. We had a number of new features in ext4 during this merge window (ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so there were many more regression and bug fixes this time around. It didn't help that xfstests hadn't been fully updated to fully stress test COLLAPSE_RANGE until after -rc1" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits) ext4: disable COLLAPSE_RANGE for bigalloc ext4: fix COLLAPSE_RANGE failure with 1KB block size ext4: use EINVAL if not a regular file in ext4_collapse_range() ext4: enforce we are operating on a regular file in ext4_zero_range() ext4: fix extent merging in ext4_ext_shift_path_extents() ext4: discard preallocations after removing space ext4: no need to truncate pagecache twice in collapse range ext4: fix removing status extents in ext4_collapse_range() ext4: use filemap_write_and_wait_range() correctly in collapse range ext4: use truncate_pagecache() in collapse range ext4: remove temporary shim used to merge COLLAPSE_RANGE and ZERO_RANGE ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabled ext4: always check ext4_ext_find_extent result ext4: fix error handling in ext4_ext_shift_extents ext4: silence sparse check warning for function ext4_trim_extent ext4: COLLAPSE_RANGE only works on extent-based files ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches ext4: use i_size_read in ext4_unaligned_aio() fs: disallow all fallocate operation on active swapfile fs: move falloc collapse range check into the filesystem methods ...
2014-04-12ceph: fix pr_fmt() redefinitionLinus Torvalds
The vfs merge caused a latent bug to show up: In file included from fs/ceph/super.h:4:0, from fs/ceph/ioctl.c:3: include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default] #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt ^ In file included from include/linux/kernel.h:13:0, from include/linux/uio.h:12, from include/linux/socket.h:7, from include/uapi/linux/in.h:22, from include/linux/in.h:23, from fs/ceph/ioctl.c:1: include/linux/printk.h:214:0: note: this is the location of the previous definition #define pr_fmt(fmt) fmt ^ where the reason is that <linux/ceph_debug.h> is included much too late for the "pr_fmt()" define. The include of <linux/ceph_debug.h> needs to be the first include in the file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't noticeable until some unrelated header file changes brought in an indirect earlier include of <linux/kernel.h>. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it *does* shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...