summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-06ceph: stop on-going cached readdir if mds revokes FILE_SHARED capYan, Zheng
If directory's FILE_SHARED cap get revoked, dentry in the directory can get spliced into other directory (Eg, other client move the dentry into directory B, then we do readdir on directory B). So we should stop on-going cached readdir. this can be achieved by marking dir not complete, because __dcache_readdir() checks dir completeness before emitting each dentry. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: wait on writeback after writing snapshot dataYan, Zheng
In sync mode, writepages() needs to write all dirty pages. But it can only write dirty pages associated with the oldest snapc. To write dirty pages associated with next snapc, it needs to wait until current writes complete. Without this wait, writepages() keeps looking up dirty pages, but the found dirty pages are not writeable. It wastes CPU time. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: fix capsnap dirty pages accountingYan, Zheng
writepages_finish() calls ceph_put_wrbuffer_cap_refs() once for all pages, parameter snapc is set to req->r_snapc. So writepages() shouldn't write dirty pages associated with different snapc in one OSD request. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: ignore wbc->range_{start,end} when write back snapshot dataYan, Zheng
writepages() needs to write dirty pages to OSD in strict order of snapshot context. It must first write dirty pages associated with the oldest snapshot context. In the write range case, dirty pages in the specified range can be associated with newer snapc. They are not writeable until we write all dirty pages associated with the oldest snapc. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: fix "range cyclic" mode writepagesYan, Zheng
In range cyclic mode, writepages() should first write dirty pages in range [writeback_index, (pgoff_t)-1], then write pages in range [0, writeback_index -1]. Besides, if writepages() encounters a page that beyond EOF, it should restart from the beginning. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: cleanup local variables in ceph_writepages_start()Yan, Zheng
Remove two variables and define variables of same type together. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: optimize pagevec iterating in ceph_writepages_start()Yan, Zheng
ceph_writepages_start() supports writing non-continuous pages. If it encounters a non-dirty or non-writeable page in pagevec, it can continue to check the rest pages in pagevec. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: make writepage_nounlock() invalidate page that beyonds EOFYan, Zheng
Otherwise, the page left in state that page is associated with a snapc, but (PageDirty(page) || PageWriteback(page)) is false. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: properly get capsnap's size in get_oldest_context()Yan, Zheng
capsnap's size is set by __ceph_finish_cap_snap(). If capsnap is under writing, its size is zero. In this case, get_oldest_context() should read i_size. Besides, ceph_writepages_start() should re-check capsnap's size after dirty pages get locked. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: remove stale check in ceph_invalidatepage()Yan, Zheng
Both set_page_dirty and truncate_complete_page should be called for locked page, they can't race with each other. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: queue cap snap only when snap realm's context changesYan, Zheng
If we create capsnap when snap realm's context does not change, the new capsnap's snapc is equal to ci->i_head_snapc. Page writeback code can't differentiates dirty pages associated with the new capsnap from dirty pages associated with i_head_snapc. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: handle race between vmtruncate and queuing cap snapYan, Zheng
It's possible that we create a cap snap while there is pending vmtruncate (truncate hasn't been processed by worker thread). We should truncate dirty pages beyond capsnap->size in that case. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: fix message order check in handle_cap_export()Yan, Zheng
If caps for importer mds exists, but cap id mismatch, client should have received corresponding import message. Because cap ID does not change as long as client holds the caps. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: fix NULL pointer dereference in ceph_flush_snaps()Yan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: adjust 36 checks for NULL pointersMarkus Elfring
The script “checkpatch.pl” pointed information out like the following. Comparison to NULL could be written ... Thus fix the affected source code places. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: delete an unnecessary return statement in update_dentry_lease()Markus Elfring
The script "checkpatch.pl" pointed information out like the following. WARNING: void function return statements are not generally useful Thus remove such a statement in the affected function. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: ENOMEM pr_err in __get_or_create_frag() is redundantMarkus Elfring
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: check negative offsets in ceph_llseek()Luis Henriques
When a user requests SEEK_HOLE or SEEK_DATA with a negative offset ceph_llseek should return -ENXIO. Currently -EINVAL is being returned for SEEK_DATA and 0 for SEEK_HOLE. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: more accurate statfsDouglas Fuller
Improve accuracy of statfs reporting for Ceph filesystems comprising exactly one data pool. In this case, the Ceph monitor can now report the space usage for the single data pool instead of the global data for the entire Ceph cluster. Include support for this message in mon_client and leverage it in ceph/super. Signed-off-by: Douglas Fuller <dfuller@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: properly set snap follows for cap reconnectYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: don't use CEPH_OSD_FLAG_ORDERSNAPYan, Zheng
Inode can be moved between snap realms. It's possible inode is moved into a snap realm whose seq number is smaller than old snap realm's. So there is no guarantee that seq number inode's snap context always increases. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: include snapc in debug message of writeYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: make sure flushsnap messages are sent in proper orderYan, Zheng
Before sending new flushsnap message, check if there are old flushsnap messages that need to be re-sent. If there are, re-send old messages first. This guarantees ordering of flushsnap messages. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: fix -EOLDSNAPC handlingYan, Zheng
Need to drop cap reference before retry. Besides, it's better to redo file write checks for each retry because we re-lock inode. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: send LSSNAP request to auth mds of directory inodeYan, Zheng
Snapdir inode has no capability. __choose_mds() should choose mds base on capabilities of snapdir's parent inode. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: don't fill readdir cache for LSSNAP replyYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: cleanup ceph_readdir_prepopulate()Yan, Zheng
In LSSNAP case, req->r_dentry is already set to snapdir dentry. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: use errseq_t for writeback error reportingJeff Layton
Ensure that when writeback errors are marked that we report those to all file descriptions that were open at the time of the error. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: new cap message flags indicate if there is pending capsnapYan, Zheng
These flags tell mds if there is pending capsnap explicitly. Without this explicit notification, mds can only conclude if client has pending capsnap. The method mds use is inefficient and error-prone. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: nuke startsync opYanhu Cao
startsync is a no-op, has been for years. Remove it. Link: http://tracker.ceph.com/issues/20604 Signed-off-by: Yanhu Cao <gmayyyha@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06rbd: silence bogus uninitialized use warning in rbd_acquire_lock()Kefeng Wang
drivers/block/rbd.c: In function 'rbd_acquire_lock': drivers/block/rbd.c:3602:44: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized] Silence the warning, found it when built old kernel(3.10) with OBS(opensuse build service). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: validate correctness of some mount optionsYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: limit osd write sizeYan, Zheng
OSD has a configurable limitation of max write size. OSD return error if write request size is larger than the limitation. For now, set max write size to CEPH_MSG_MAX_DATA_LEN. It should be small enough. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: limit osd read size to CEPH_MSG_MAX_DATA_LENYan, Zheng
libceph returns -EIO when read size > CEPH_MSG_MAX_DATA_LEN. Link: http://tracker.ceph.com/issues/20528 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: remove unused cap_release_safety mount optionYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06drm/i915: Re-enable GTT following a device resetChris Wilson
Ville Syrjälä spotted that PGETBL_CTL was losing its enable bit upon a reset. That was causing the display to show garbage on his 945gm. On my i915gm the effect was far more severe; re-enabling the display following the reset without PGETBL_CTL being enabled lead to an immediate hard hang. We do have a routine to re-enable PGETBL_CTL which is applicable to gen2-4, although on gen4 it is documented that a graphics reset doesn't alter the register (no such wording is given for gen3) and should be safe to call to punch back in the enable bit. However, that leaves the question of whether we need to completely re-initialise the register and the rest of the GSM. For g33/pnv/gen4+, where we do have a configurable page table, its contents do seem to be kept, and so we should be able to recover without having to reinitialise the GTT from scratch (as prior to g33, that register is configured by the BIOS and we leave alone except for the enable bit). This appears to have been broken by commit 5fbd0418eef2 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms"), which moved the intel_enable_gtt() from i915_gem_init_hw() (also used by reset) to add it earlier during hw init and resume, missing the reset path. v2: Find the culprit, rearrange ggtt_enable to be before gem_init_hw to match init/resume Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: 5fbd0418eef2 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101852 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170906111405.27110-1-chris@chris-wilson.co.uk Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (cherry picked from commit 0db8c961209153498fe7e279b8f0d3deb81808f0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-06drm/i915: Annotate user relocs with __userVille Syrjälä
Add the missing __user to the urelocs cast to fix the following sparse warning: i915_gem_execbuffer.c:1541:47: warning: cast removes address space of expression i915_gem_execbuffer.c:1541:62: warning: incorrect type in argument 2 (different address spaces) i915_gem_execbuffer.c:1541:62: expected void const [noderef] <asn:1>*from i915_gem_execbuffer.c:1541:62: got char * Cc: Chris Wilson <chris@chris-wilson.co.uk> Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901165434.24636-1-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> #irc (cherry picked from commit 908a610557f4d8b46a0f82c01e31b30f5c998580) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-09-06loop: set physical block size to logical block sizeOmar Sandoval
Commit 6c6b6f28b333 ("loop: set physical block size to PAGE_SIZE") caused mkfs.xfs to barf on ppc64 [1]. Always using PAGE_SIZE as the physical block size still makes the most sense semantically, but let's just lie and always set it to the same value as the logical block size (same goes for io_min). In the future we might want to at least bump up io_min to PAGE_SIZE but I'm sick of these stupid changes so let's play it safe. 1: https://marc.info/?l=linux-xfs&m=150459024723753&w=2 Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-06lockd: Delete an error message for a failed memory allocation in reclaimer()Markus Elfring
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06NFS: remove jiffies field from access cacheNeilBrown
This field hasn't been used since commit 57b691819ee2 ("NFS: Cache access checks more aggressively"). Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06NFS: flush data when locking a file to ensure cache coherence for mmap.NeilBrown
When a byte range lock (or flock) is taken out on an NFS file, the validity of the cached data is checked and the inode is marked NFS_INODE_INVALID_DATA. However the cached data isn't flushed from the page cache. This is sufficient for future read() requests or mmap() requests as they call nfs_revalidate_mapping() which performs the flush if necessary. However an existing mapping is not affected. Accessing data through that mapping will continue to return old data even though the inode is marked NFS_INODE_INVALID_DATA. This can easily be confirmed using the 'nfs' tool in git://github.com/okirch/twopence-nfs.git and running nfs coherence FILENAME on one client, and nfs coherence -r FILENAME on another client. It appears that prior to Linux 2.6.0 this worked correctly. However commit: http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ca9268fe3ddd075714005adecd4afbd7f9ab87d0 removed the call to inode_invalidate_pages() from nfs_zap_caches(). I haven't tested this code, but inspection suggests that prior to this commit, file locking would invalidate all inode pages. This patch adds a call to nfs_revalidate_mapping() after a successful SETLK so that invalid data is flushed. With this patch the above test passes. To minimize impact (and possibly avoid a GETATTR call) this only happens if the mapping might be mapped into userspace. Cc: Olaf Kirch <okir@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06SUNRPC: remove some dead code.NeilBrown
RPC_TASK_NO_RETRANS_TIMEOUT is set when cl_noretranstimeo is set, which happens when RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT is set, which happens when NFS_CS_NO_RETRANS_TIMEOUT is set. This flag means "don't resend on a timeout, only resend if the connection gets broken for some reason". cl_discrtry is set when RPC_CLNT_CREATE_DISCRTRY is set, which happens when NFS_CS_DISCRTRY is set. This flag means "always disconnect before resending". NFS_CS_NO_RETRANS_TIMEOUT and NFS_CS_DISCRTRY are both only set in nfs4_init_client(), and it always sets both. So we will never have a situation where only one of the flags is set. So this code, which tests if timeout retransmits are allowed, and disconnection is required, will never run. So it makes sense to remove this code as it cannot be tested and could confuse people reading the code (like me). (alternately we could leave it there with a comment saying it is never actually used). Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06NFS: don't expect errors from mempool_alloc().NeilBrown
Commit fbe77c30e9ab ("NFS: move rw_mode to nfs_pageio_header") reintroduced some pointless code that commit 518662e0fcb9 ("NFS: fix usage of mempools.") had recently removed. Remove it again. Cc: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-06Merge branch 'topic/dmatest' into for-linusVinod Koul
2017-09-06Merge branch 'topic/qcom' into for-linusVinod Koul
2017-09-06Merge branch 'topic/ppc4xx' into for-linusVinod Koul
2017-09-06Merge branch 'topic/of' into for-linusVinod Koul
2017-09-06Merge branch 'topic/k3dma' into for-linusVinod Koul
2017-09-06Merge branch 'topic/ioat' into for-linusVinod Koul
2017-09-06Merge branch 'topic/bcm' into for-linusVinod Koul