summaryrefslogtreecommitdiff
path: root/fs/ceph/mds_client.c
AgeCommit message (Collapse)Author
2016-06-10vfs: make the string hashes salt the hashLinus Torvalds
We always mixed in the parent pointer into the dentry name hash, but we did it late at lookup time. It turns out that we can simplify that lookup-time action by salting the hash with the parent pointer early instead of late. A few other users of our string hashes also wanted to mix in their own pointers into the hash, and those are updated to use the same mechanism. Hash users that don't have any particular initial salt can just use the NULL pointer as a no-salt. Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-26ceph: fix wake_up_session_cb()Yan, Zheng
We should reset i_requested_max_size before waking the waiters. (zero i_requested_max_size make waiter re-request the max size) Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: using hash value to compose dentry offsetYan, Zheng
If MDS sorts dentries in dirfrag in hash order, we use hash value to compose dentry offset. dentry offset is: (0xff << 52) | ((24 bits hash) << 28) | (the nth entry hash hash collision) This offset is stable across directory fragmentation. This alos means there is no need to reset readdir offset if directory get fragmented in the middle of readdir. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: record 'offset' for each entry of readdir resultYan, Zheng
This is preparation for using hash value as dentry 'offset' Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: define 'end/complete' in readdir reply as bit flagsYan, Zheng
Set a flag in readdir request, which indicates that client interprets 'end/complete' as bit flags. So that mds can reply additional flags in readdir reply. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: define struct for dir entry in readdir replyYan, Zheng
This avoids defining multiple arrays for entries in readdir reply Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: report mount root in session metadataYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: don't call truncate_pagecache in ceph_writepages_startYan, Zheng
truncate_pagecache() may decrease inode's reference. This can cause deadlock if inode's last reference is dropped and iput_final() wants to evict the inode. (evict() calls inode_wait_for_writeback(), which waits for ceph_writepages_start() to return). The fix is use work thead to truncate dirty pages. Also add 'forced umount' check to ceph_update_writeable_page(), which prevents new pages getting dirty. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: renew caps for read/write if mds session got killed.Yan, Zheng
When mds session gets killed, read/write operation may hang. Client waits for Frw caps, but mds does not know what caps client wants. To recover this, client sends an open request to mds. The request will tell mds what caps client wants. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26libceph: DEFINE_RB_FUNCS macroIlya Dryomov
Given struct foo { u64 id; struct rb_node bar_node; }; generate insert_bar(), erase_bar() and lookup_bar() functions with DEFINE_RB_FUNCS(bar, struct foo, id, bar_node) The key is assumed to be an integer (u64, int, etc), compared with < and >. nodefld has to be initialized with RB_CLEAR_NODE(). Start using it for MDS, MON and OSD requests and OSD sessions. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-04-25libceph: make authorizer destruction independent of ceph_auth_clientIlya Dryomov
Starting the kernel client with cephx disabled and then enabling cephx and restarting userspace daemons can result in a crash: [262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000 [262671.531460] IP: [<ffffffff811cd04a>] kfree+0x5a/0x130 [262671.584334] PGD 0 [262671.635847] Oops: 0000 [#1] SMP [262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu [262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014 [262672.268937] Workqueue: ceph-msgr con_work [libceph] [262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000 [262672.428330] RIP: 0010:[<ffffffff811cd04a>] [<ffffffff811cd04a>] kfree+0x5a/0x130 [262672.535880] RSP: 0018:ffff880149aeba58 EFLAGS: 00010286 [262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018 [262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012 [262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000 [262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40 [262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840 [262673.131722] FS: 0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000 [262673.245377] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0 [262673.417556] Stack: [262673.472943] ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04 [262673.583767] ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00 [262673.694546] ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800 [262673.805230] Call Trace: [262673.859116] [<ffffffffc035f70f>] ceph_x_destroy_authorizer+0x1f/0x50 [libceph] [262673.968705] [<ffffffffc035c89e>] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph] [262674.078852] [<ffffffffc0352805>] put_osd+0x45/0x80 [libceph] [262674.134249] [<ffffffffc035290e>] remove_osd+0xae/0x140 [libceph] [262674.189124] [<ffffffffc0352aa3>] __reset_osd+0x103/0x150 [libceph] [262674.243749] [<ffffffffc0354703>] kick_requests+0x223/0x460 [libceph] [262674.297485] [<ffffffffc03559e2>] ceph_osdc_handle_map+0x282/0x5e0 [libceph] [262674.350813] [<ffffffffc035022e>] dispatch+0x4e/0x720 [libceph] [262674.403312] [<ffffffffc034bd91>] try_read+0x3d1/0x1090 [libceph] [262674.454712] [<ffffffff810ab7c2>] ? dequeue_entity+0x152/0x690 [262674.505096] [<ffffffffc034cb1b>] con_work+0xcb/0x1300 [libceph] [262674.555104] [<ffffffff8108fb3e>] process_one_work+0x14e/0x3d0 [262674.604072] [<ffffffff810901ea>] worker_thread+0x11a/0x470 [262674.652187] [<ffffffff810900d0>] ? rescuer_thread+0x310/0x310 [262674.699022] [<ffffffff810957a2>] kthread+0xd2/0xf0 [262674.744494] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0 [262674.789543] [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70 [262674.834094] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0 What happens is the following: (1) new MON session is established (2) old "none" ac is destroyed (3) new "cephx" ac is constructed ... (4) old OSD session (w/ "none" authorizer) is put ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer) osd->o_auth.authorizer in the "none" case is just a bare pointer into ac, which contains a single static copy for all services. By the time we get to (4), "none" ac, freed in (2), is long gone. On top of that, a new vtable installed in (3) points us at ceph_x_destroy_authorizer(), so we end up trying to destroy a "none" authorizer with a "cephx" destructor operating on invalid memory! To fix this, decouple authorizer destruction from ac and do away with a single static "none" authorizer by making a copy for each OSD or MDS session. Authorizers themselves are independent of ac and so there is no reason for destroy_authorizer() to be an ac op. Make it an op on the authorizer itself by turning ceph_authorizer into a real struct. Fixes: http://tracker.ceph.com/issues/15447 Reported-by: Alan Zhang <alan.zhang@linux.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-04-04mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ceph: fix security xattr deadlockYan, Zheng
When security is enabled, security module can call filesystem's getxattr/setxattr callbacks during d_instantiate(). For cephfs, d_instantiate() is usually called by MDS' dispatch thread, while handling MDS reply. If the MDS reply does not include xattrs and corresponding caps, getxattr/setxattr need to send a new request to MDS and waits for the reply. This makes MDS' dispatch sleep, nobody handles later MDS replies. The fix is make sure lookup/atomic_open reply include xattrs and corresponding caps. So getxattr can be handled by cached xattrs. This requires some modification to both MDS and request message. (Client tells MDS what caps it wants; MDS encodes proper caps in the reply) Smack security module may call setxattr during d_instantiate(). Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL to us. So just make setxattr return error when called by MDS' dispatch thread. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25ceph: replace CURRENT_TIME by current_fs_time()Deepa Dinamani
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25libceph: revamp subs code, switch to SUBSCRIBE2 protocolIlya Dryomov
It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-04ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 supportYan, Zheng
Add support for the format change of MClientReply/MclientCaps. Also add code that denies access to inodes with pool_ns layouts. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2015-11-02libceph: msg signing callouts don't need con argumentIlya Dryomov
We can use msg->con instead - at the point we sign an outgoing message or check the signature on the incoming one, msg->con is always set. We wouldn't know how to sign a message without an associated session (i.e. msg->con == NULL) and being able to sign a message using an explicitly provided authorizer is of no use. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02ceph: make fsync() wait unsafe requests that created/modified inodeYan, Zheng
If we get a unsafe reply for request that created/modified inode, add the unsafe request to a list in the newly created/modified inode. So we can make fsync() wait these unsafe requests. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02ceph: add request to i_unsafe_dirops when getting unsafe replyYan, Zheng
Previously we add request to i_unsafe_dirops when registering request. So ceph_fsync() also waits for imcomplete requests. This is unnecessary, ceph_fsync() only needs to wait unsafe requests. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02ceph: don't invalidate page cache when inode is no longer usedYan, Zheng
ceph_check_caps() invalidate page cache when inode is not used by any open file. This behaviour is not friendly for workload that repeatly read files. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02ceph: fix message length computationArnd Bergmann
create_request_message() computes the maximum length of a message, but uses the wrong type for the time stamp: sizeof(struct timespec) may be 8 or 16 depending on the architecture, while sizeof(struct ceph_timespec) is always 8, and that is what gets put into the message. Found while auditing the uses of timespec for y2038 problems. Fixes: b8e69066d8af ("ceph: include time stamp in every MDS request") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08ceph: cleanup use of ceph_msg_getJianpeng Ma
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08ceph: remove redundant test of head->safe and silence static analysis warningsBrad Hubbard
Signed-off-by: Brad Hubbard <bhubbard@redhat.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08ceph: EIO all operations after forced umountYan, Zheng
This patch makes try_get_cap_refs() and __do_request() check if the file system was forced umount, and return -EIO if it was. This patch also adds a helper function to drops dirty caps and wakes up blocking operation. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: switch some GFP_NOFS memory allocation to GFP_KERNELYan, Zheng
GFP_NOFS memory allocation is required for page writeback path. But there is no need to use GFP_NOFS in syscall path and readpage path Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: pre-allocate data structure that tracks caps flushingYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: re-send flushing caps (which are revoked) in reconnect stageYan, Zheng
if flushing caps were revoked, we should re-send the cap flush in client reconnect stage. This guarantees that MDS processes the cap flush message before issuing the flushing caps to other client. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: track pending caps flushing globallyYan, Zheng
So we know TID of the oldest pending caps flushing. Later patch will send this information to MDS, so that MDS can trim its completed caps flush list. Tracking pending caps flushing globally also simplifies syncfs code. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: track pending caps flushing accuratelyYan, Zheng
Previously we do not trace accurate TID for flushing caps. when MDS failovers, we have no choice but to re-send all flushing caps with a new TID. This can cause problem because MDS can has already flushed some caps and has issued the same caps to other client. The re-sent cap flush has a new TID, which makes MDS unable to detect if it has already processed the cap flush. This patch adds code to track pending caps flushing accurately. When re-sending cap flush is needed, we use its original flush TID. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: ratelimit warn messages for MDS closes sessionYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: simplify two mount_timeout sitesIlya Dryomov
No need to bifurcate wait now that we've got ceph_timeout_jiffies(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2015-06-25libceph: store timeouts in jiffies, verify user inputIlya Dryomov
There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25ceph: exclude setfilelock requests when calculating oldest tidYan, Zheng
setfilelock requests can block for a long time, which can prevent client from advancing its oldest tid. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: don't pre-allocate space for cap release messagesYan, Zheng
Previously we pre-allocate cap release messages for each caps. This wastes lots of memory when there are large amount of caps. This patch make the code not pre-allocate the cap release messages. Instead, we add the corresponding ceph_cap struct to a list when releasing a cap. Later when flush cap releases is needed, we allocate the cap release messages dynamically. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: make sure syncfs flushes all cap snapsYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: don't trim auth cap when there are cap snapsYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25ceph: check OSD caps before read/writeYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-22ceph: fix null pointer dereference in send_mds_reconnect()Yan, Zheng
sb->s_root can be null when umounting Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20ceph: cleanup unsafe requests when reconnecting is deniedYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20ceph: don't zero i_wrbuffer_ref when reconnecting is deniedYan, Zheng
remove_session_caps_cb() does not truncate dirty data in page cache, but zeros i_wrbuffer_ref/i_wrbuffer_ref_head. This will result negtive i_wrbuffer_ref/i_wrbuffer_ref_head Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20ceph: don't mark dirty caps when there is no auth capYan, Zheng
No i_auth_cap means reconnecting to MDS was denied. So don't add new dirty caps. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20ceph: use msecs_to_jiffies for time conversionNicholas Mc Guire
This is only an API consolidation and should make things more readable it replaces var * HZ / 1000 by msecs_to_jiffies(var). Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20ceph: drop cap releases in requests composed before cap reconnectYan, Zheng
These cap releases are stale because MDS will re-establish client caps according to the cap reconnect messages. Note: MDS can detect stale cap messages, so these stale cap releases are harmless even we don't drop them. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-15VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "On the RBD side, there is a conversion to blk-mq from Christoph, several long-standing bug fixes from Ilya, and some cleanup from Rickard Strandqvist. On the CephFS side there is a long list of fixes from Zheng, including improved session handling, a few IO path fixes, some dcache management correctness fixes, and several blocking while !TASK_RUNNING fixes. The core code gets a few cleanups and Chaitanya has added support for TCP_NODELAY (which has been used on the server side for ages but we somehow missed on the kernel client). There is also an update to MAINTAINERS to fix up some email addresses and reflect that Ilya and Zheng are doing most of the maintenance for RBD and CephFS these days. Do not be surprised to see a pull request come from one of them in the future if I am unavailable for some reason" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) MAINTAINERS: update Ceph and RBD maintainers libceph: kfree() in put_osd() shouldn't depend on authorizer libceph: fix double __remove_osd() problem rbd: convert to blk-mq ceph: return error for traceless reply race ceph: fix dentry leaks ceph: re-send requests when MDS enters reconnecting stage ceph: show nocephx_require_signatures and notcp_nodelay options libceph: tcp_nodelay support rbd: do not treat standalone as flatten ceph: fix atomic_open snapdir ceph: properly mark empty directory as complete client: include kernel version in client metadata ceph: provide seperate {inode,file}_operations for snapdir ceph: fix request time stamp encoding ceph: fix reading inline data when i_size > PAGE_SIZE ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) rbd: fix error paths in rbd_dev_refresh() ...
2015-02-19ceph: re-send requests when MDS enters reconnecting stageYan, Zheng
So that MDS can check if any request is already completed and process completed requests in clientreplay stage. When completed requests are processed in clientreplay stage, MDS can avoid sending traceless replies. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19client: include kernel version in client metadataYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: fix request time stamp encodingYan, Zheng
struct timespec uses 'long' to present second and nanosecond. 'long' is 64 bits on 64bits machine. ceph MDS expects time stamp to be encoded as struct ceph_timespec, which uses 'u32' to present second and nanosecond. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)Yan, Zheng
use an atomic variable to track number of sessions, this can avoid block operation inside wait loops. Signed-off-by: Yan, Zheng <zyan@redhat.com>