Age | Commit message (Collapse) | Author |
|
dax_load_hole() will soon need to call dax_insert_mapping_entry(), so it
needs to be moved lower in dax.c so the definition exists.
dax_wake_mapping_entry_waiter() will soon be removed from dax.h and be
made static to dax.c, so we need to move its definition above all its
callers.
Link: http://lkml.kernel.org/r/20170724170616.25810-3-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When servicing mmap() reads from file holes the current DAX code
allocates a page cache page of all zeroes and places the struct page
pointer in the mapping->page_tree radix tree. This has three major
drawbacks:
1) It consumes memory unnecessarily. For every 4k page that is read via
a DAX mmap() over a hole, we allocate a new page cache page. This
means that if you read 1GiB worth of pages, you end up using 1GiB of
zeroed memory.
2) It is slower than using a common zero page because each page fault
has more work to do. Instead of just inserting a common zero page we
have to allocate a page cache page, zero it, and then insert it.
3) The fact that we had to check for both DAX exceptional entries and
for page cache pages in the radix tree made the DAX code more
complex.
This series solves these issues by following the lead of the DAX PMD
code and using a common 4k zero page instead. This reduces memory usage
and decreases latencies for some workloads, and it simplifies the DAX
code, removing over 100 lines in total.
This patch (of 5):
To be able to use the common 4k zero page in DAX we need to have our PTE
fault path look more like our PMD fault path where a PTE entry can be
marked as dirty and writeable as it is first inserted rather than
waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault()
call.
Right now we can rely on having a dax_pfn_mkwrite() call because we can
distinguish between these two cases in do_wp_page():
case 1: 4k zero page => writable DAX storage
case 2: read-only DAX storage => writeable DAX storage
This distinction is made by via vm_normal_page(). vm_normal_page()
returns false for the common 4k zero page, though, just as it does for
DAX ptes. Instead of special casing the DAX + 4k zero page case we will
simplify our DAX PTE page fault sequence so that it matches our DAX PMD
sequence, and get rid of the dax_pfn_mkwrite() helper. We will instead
use dax_iomap_fault() to handle write-protection faults.
This means that insert_pfn() needs to follow the lead of
insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If 'mkwrite'
is set insert_pfn() will do the work that was previously done by
wp_page_reuse() as part of the dax_pfn_mkwrite() call path.
Link: http://lkml.kernel.org/r/20170724170616.25810-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removes the last user of parent_node().
The parent_node() macro in METAG architecture is unnecessary.
Remove it for cleanup.
Link: http://lkml.kernel.org/r/1501076076-1974-4-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.14:
API:
- Defer scompress scratch buffer allocation to first use.
- Add __crypto_xor that takes separte src and dst operands.
- Add ahash multiple registration interface.
- Revamped aead/skcipher algif code to fix async IO properly.
Drivers:
- Add non-SIMD fallback code path on ARM for SVE.
- Add AMD Security Processor framework for ccp.
- Add support for RSA in ccp.
- Add XTS-AES-256 support for CCP version 5.
- Add support for PRNG in sun4i-ss.
- Add support for DPAA2 in caam.
- Add ARTPEC crypto support.
- Add Freescale RNGC hwrng support.
- Add Microchip / Atmel ECC driver.
- Add support for STM32 HASH module"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
crypto: af_alg - get_page upon reassignment to TX SGL
crypto: cavium/nitrox - Fix an error handling path in 'nitrox_probe()'
crypto: inside-secure - fix an error handling path in safexcel_probe()
crypto: rockchip - Don't dequeue the request when device is busy
crypto: cavium - add release_firmware to all return case
crypto: sahara - constify platform_device_id
MAINTAINERS: Add ARTPEC crypto maintainer
crypto: axis - add ARTPEC-6/7 crypto accelerator driver
crypto: hash - add crypto_(un)register_ahashes()
dt-bindings: crypto: add ARTPEC crypto
crypto: algif_aead - fix comment regarding memory layout
crypto: ccp - use dma_mapping_error to check map error
lib/mpi: fix build with clang
crypto: sahara - Remove leftover from previous used spinlock
crypto: sahara - Fix dma unmap direction
crypto: af_alg - consolidation of duplicate code
crypto: caam - Remove unused dentry members
crypto: ccp - select CONFIG_CRYPTO_RSA
crypto: ccp - avoid uninitialized variable warning
crypto: serpent - improve __serpent_setkey with UBSAN
...
|
|
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.
2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.
3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.
5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.
10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.
11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.
12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.
13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.
15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull writeback error handling updates from Jeff Layton:
"This pile continues the work from last cycle on better tracking
writeback errors. In v4.13 we added some basic errseq_t infrastructure
and converted a few filesystems to use it.
This set continues refining that infrastructure, adds documentation,
and converts most of the other filesystems to use it. The main
exception at this point is the NFS client"
* tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
ecryptfs: convert to file_write_and_wait in ->fsync
mm: remove optimizations based on i_size in mapping writeback waits
fs: convert a pile of fsync routines to errseq_t based reporting
gfs2: convert to errseq_t based writeback error reporting for fsync
fs: convert sync_file_range to use errseq_t based error-tracking
mm: add file_fdatawait_range and file_write_and_wait
fuse: convert to errseq_t based error tracking for fsync
mm: consolidate dax / non-dax checks for writeback
Documentation: add some docs for errseq_t
errseq: rename __errseq_set to errseq_set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
"This pile just has a few file locking fixes from Ben Coddington. There
are a couple of cleanup patches + an attempt to bring sanity to the
l_pid value that is reported back to userland on an F_GETLK request.
After a few gyrations, he came up with a way for filesystems to
communicate to the VFS layer code whether the pid should be translated
according to the namespace or presented as-is to userland"
* tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: restore a warn for leaked locks on close
fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks
fs/locks: Use allocation rather than the stack in fcntl_getlk()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set includes a bunch of minor code cleanups that have
accumulated, probably from code analyzers people like to run. There is
one nice fix that avoids some socket leaks by switching to use
sock_create_lite()"
* tag 'dlm-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: use sock_create_lite inside tcp_accept_from_sock
uapi linux/dlm_netlink.h: include linux/dlmconstants.h
dlm: avoid double-free on error path in dlm_device_{register,unregister}
dlm: constify kset_uevent_ops structure
dlm: print log message when cluster name is not set
dlm: Delete an unnecessary variable initialisation in dlm_ls_start()
dlm: Improve a size determination in two functions
dlm: Use kcalloc() in two functions
dlm: Use kmalloc_array() in make_member_array()
dlm: Delete an error message for a failed memory allocation in dlm_recover_waiters_pre()
dlm: Improve a size determination in dlm_recover_waiters_pre()
dlm: Use kcalloc() in dlm_scan_waiters()
dlm: Improve a size determination in table_seq_start()
dlm: Add spaces for better code readability
dlm: Replace six seq_puts() calls by seq_putc()
dlm: Make dismatch error message more clear
dlm: Fix kernel memory disclosure
|
|
drm-intel-next-fixes
gvt-fixes-2017-09-06
- regression fix for gvt init failure from Jianjun
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170906035924.2225krr6snv2duvq@zhen-hp.sh.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Scalability improvements when allocating inodes, and some
miscellaneous bug fixes and cleanups"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: avoid Y2038 overflow in recently_deleted()
ext4: fix fault handling when mounted with -o dax,ro
ext4: fix quota inconsistency during orphan cleanup for read-only mounts
ext4: fix incorrect quotaoff if the quota feature is enabled
ext4: remove useless test and assignment in strtohash functions
ext4: backward compatibility support for Lustre ea_inode implementation
ext4: remove timebomb in ext4_decode_extra_time()
ext4: use sizeof(*ptr)
ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
ext4: reduce lock contention in __ext4_new_inode
ext4: cleanup goto next group
ext4: do not unnecessarily allocate buffer in recently_deleted()
|
|
Pull XFS updates from Darrick Wong:
"Here are the changes for xfs for 4.14. Most of these are cleanups and
fixes for bad behavior, as we're mostly focusing on improving
reliablity this cycle (read: there's potentially a lot of stuff on the
horizon for 4.15 so better to spend a few weeks killing other bugs
now).
Summary:
- Write unmount record for a ro mount to avoid unnecessary log replay
- Clean up orphaned inodes when mounting fs readonly
- Resubmit inode log items when buffer writeback fails to avoid
umount hang
- Fix log recovery corruption problems when log headers wrap around
the end
- Avoid infinite loop searching for free inodes when inode counters
are wrong
- Evict inodes involved with log redo so that we don't leak them
later
- Fix a potential race between reclaim and inode cluster freeing
- Refactor the inode joining code w.r.t. transaction rolling &
deferred ops
- Fix a bug where the log doesn't properly deal with dirty buffers
that are about to become ordered buffers
- Fix the extent swap code to deal with making dirty buffers ordered
properly
- Consolidate page fault handlers
- Refactor the incore extent manipulation functions to use the iext
abstractions instead of directly modifying with extent data
- Disable crashy chattr +/-x until we fix it
- Don't allow us to set S_DAX for v2 inodes
- Various cleanups
- Clarify some documentation
- Fix a problem where fsync and a log commit race to send the disk a
flush command, resulting in a small window where power fail data
loss could occur
- Simplify some rmap operations in the fcollapse code
- Fix some use-after-free problems in async writeback"
* tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits)
xfs: use kmem_free to free return value of kmem_zalloc
xfs: open code end_buffer_async_write in xfs_finish_page_writeback
xfs: don't set v3 xflags for v2 inodes
xfs: fix compiler warnings
fsmap: fix documentation of FMR_OF_LAST
xfs: simplify the rmap code in xfs_bmse_merge
xfs: remove unused flags arg from xfs_file_iomap_begin_delay
xfs: fix incorrect log_flushed on fsync
xfs: disable per-inode DAX flag
xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves
xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extent
xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at
xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents
xfs: move some code around inside xfs_bmap_shift_extents
xfs: use xfs_iext_get_extent in xfs_bmap_first_unused
xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert
xfs: add a xfs_iext_update_extent helper
xfs: consolidate the various page fault handlers
iomap: return VM_FAULT_* codes from iomap_page_mkwrite
xfs: relog dirty buffers during swapext bmbt owner change
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Bob Peterson:
"We've got a whopping 29 GFS2 patches for this merge window, mainly
because we held some back from the previous merge window until we
could get them perfected and well tested. We have a couple patch sets,
including my patch set for protecting glock gl_object and Andreas
Gruenbacher's patch set to fix the long-standing shrink- slab hang,
plus a bunch of assorted bugs and cleanups.
Summary:
- I fixed a bug whereby an IO error would lead to a double-brelse.
- Andreas Gruenbacher made a minor cleanup to call his relatively new
function, gfs2_holder_initialized, rather than doing it manually.
This was just missed by a previous patch set.
- Jan Kara fixed a bug whereby the SGID was being cleared when
inheriting ACLs.
- Andreas found a bug and fixed it in his previous patch, "Get rid of
flush_delayed_work in gfs2_evict_inode". A call to
flush_delayed_work was deleted from *gfs2_inode_lookup and added to
gfs2_create_inode.
- Wang Xibo found and fixed a list_add call in inode_go_lock that
specified the parameters in the wrong order.
- Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's
metadata reads that were accidentally missing them.
- I submitted a 4-patch set to protect the glock gl_object field.
GFS2 was setting and checking gl_object with no locking mechanism,
so the value was occasionally stomped on, which caused file system
corruption.
- I submitted a small cleanup to function gfs2_clear_rgrpd. It was
needlessly adding rgrp glocks to the lru list, then pulling them
back off immediately. The rgrp glocks don't use the lru list
anyway, so doing so was just a waste of time.
- I submitted a patch that checks the GLOF_LRU flag on a glock before
trying to remove it from the lru_list. This avoids a lot of
unnecessary spin_lock contention.
- I submitted a patch to delete GFS2's debugfs files only after we
evict all the glocks. Before this patch, GFS2 would delete the
debugfs files, and if unmount hung waiting for a glock, there was
no way to debug the problem. Now, if a hang occurs during umount,
we can examine the debugfs files to figure out why it's hung.
- Andreas Gruenbacher submitted a patch to fix some trivial typos.
- Andreas also submitted a five-part patch set to fix the
longstanding hang involving the slab shrinker: dlm requires memory,
calls the inode shrinker, which calls gfs2's evict, which calls
back into DLM before it can evict an inode.
- Abhi Das submitted a patch to forcibly flush the active items list
to relieve memory pressure. This fixes a long-standing bug whereby
GFS2 was getting hung permanently in balance_dirty_pages.
- Thomas Tai submitted a patch to fix a slab corruption problem due
to a residual pointer left in the lock_dlm lockstruct.
- I submitted a patch to withdraw the file system if IO errors are
encountered while writing to the journals or statfs system file
which were previously not being sent back up. Before, some IO
errors were sometimes not be detected for several hours, and at
recovery time, the journal errors made journal replay impossible.
- Andreas has a patch to fix an annoying format-truncation compiler
warning so GFS2 compiles cleanly.
- I have a patch that fixes a handful of sparse compiler warnings.
- Andreas fixed up an useless gl_object warning caused by an earlier
patch.
- Arvind Yadav added a patch to properly constify our rhashtable
params declare.
- I added a patch to fix a regression caused by the non-recursive
delete and truncate patch that caused file system blocks to not be
properly freed.
- Ernesto A. Fernández added a patch to fix a place where GFS2 would
send back the wrong return code setting extended attributes.
- Ernesto also added a patch to fix a case in which GFS2 was
improperly setting an inode's i_mode, potentially granting access
to the wrong users"
* tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits)
gfs2: preserve i_mode if __gfs2_set_acl() fails
gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing
GFS2: Fix non-recursive truncate bug
gfs2: constify rhashtable_params
GFS2: Fix gl_object warnings
GFS2: Fix up some sparse warnings
gfs2: Silence gcc format-truncation warning
GFS2: Withdraw for IO errors writing to the journal or statfs
gfs2: fix slab corruption during mounting and umounting gfs file system
gfs2: forcibly flush ail to relieve memory pressure
gfs2: Clean up waiting on glocks
gfs2: Defer deleting inodes under memory pressure
gfs2: gfs2_evict_inode: Put glocks asynchronously
gfs2: Get rid of gfs2_set_nlink
gfs2: gfs2_glock_get: Wait on freeing glocks
gfs2: Fix trivial typos
GFS2: Delete debugfs files only after we evict the glocks
GFS2: Don't waste time locking lru_lock for non-lru glocks
GFS2: Don't bother trying to add rgrps to the lru list
GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode
...
|
|
If directory's FILE_SHARED cap get revoked, dentry in the directory
can get spliced into other directory (Eg, other client move the
dentry into directory B, then we do readdir on directory B). So we
should stop on-going cached readdir. this can be achieved by marking
dir not complete, because __dcache_readdir() checks dir completeness
before emitting each dentry.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In sync mode, writepages() needs to write all dirty pages. But
it can only write dirty pages associated with the oldest snapc.
To write dirty pages associated with next snapc, it needs to wait
until current writes complete.
Without this wait, writepages() keeps looking up dirty pages, but
the found dirty pages are not writeable. It wastes CPU time.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
writepages_finish() calls ceph_put_wrbuffer_cap_refs() once for
all pages, parameter snapc is set to req->r_snapc. So writepages()
shouldn't write dirty pages associated with different snapc in
one OSD request.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
writepages() needs to write dirty pages to OSD in strict order of
snapshot context. It must first write dirty pages associated with
the oldest snapshot context. In the write range case, dirty pages
in the specified range can be associated with newer snapc. They
are not writeable until we write all dirty pages associated with
the oldest snapc.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In range cyclic mode, writepages() should first write dirty pages
in range [writeback_index, (pgoff_t)-1], then write pages in range
[0, writeback_index -1]. Besides, if writepages() encounters a page
that beyond EOF, it should restart from the beginning.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Remove two variables and define variables of same type together.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
ceph_writepages_start() supports writing non-continuous pages.
If it encounters a non-dirty or non-writeable page in pagevec,
it can continue to check the rest pages in pagevec.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Otherwise, the page left in state that page is associated with a
snapc, but (PageDirty(page) || PageWriteback(page)) is false.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
capsnap's size is set by __ceph_finish_cap_snap(). If capsnap is under
writing, its size is zero. In this case, get_oldest_context() should
read i_size. Besides, ceph_writepages_start() should re-check capsnap's
size after dirty pages get locked.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Both set_page_dirty and truncate_complete_page should be called
for locked page, they can't race with each other.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
If we create capsnap when snap realm's context does not change, the
new capsnap's snapc is equal to ci->i_head_snapc. Page writeback code
can't differentiates dirty pages associated with the new capsnap from
dirty pages associated with i_head_snapc.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
It's possible that we create a cap snap while there is pending
vmtruncate (truncate hasn't been processed by worker thread).
We should truncate dirty pages beyond capsnap->size in that case.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
If caps for importer mds exists, but cap id mismatch, client should
have received corresponding import message. Because cap ID does not
change as long as client holds the caps.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The script “checkpatch.pl” pointed information out like the following.
Comparison to NULL could be written ...
Thus fix the affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The script "checkpatch.pl" pointed information out like the following.
WARNING: void function return statements are not generally useful
Thus remove such a statement in the affected function.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When a user requests SEEK_HOLE or SEEK_DATA with a negative offset
ceph_llseek should return -ENXIO. Currently -EINVAL is being returned for
SEEK_DATA and 0 for SEEK_HOLE.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Improve accuracy of statfs reporting for Ceph filesystems comprising
exactly one data pool. In this case, the Ceph monitor can now report
the space usage for the single data pool instead of the global data
for the entire Ceph cluster. Include support for this message in
mon_client and leverage it in ceph/super.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Inode can be moved between snap realms. It's possible inode is moved
into a snap realm whose seq number is smaller than old snap realm's.
So there is no guarantee that seq number inode's snap context always
increases.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Before sending new flushsnap message, check if there are old
flushsnap messages that need to be re-sent. If there are, re-send
old messages first. This guarantees ordering of flushsnap messages.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Need to drop cap reference before retry. Besides, it's better to
redo file write checks for each retry because we re-lock inode.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Snapdir inode has no capability. __choose_mds() should choose mds
base on capabilities of snapdir's parent inode.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In LSSNAP case, req->r_dentry is already set to snapdir dentry.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Ensure that when writeback errors are marked that we report those to all
file descriptions that were open at the time of the error.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
These flags tell mds if there is pending capsnap explicitly.
Without this explicit notification, mds can only conclude if
client has pending capsnap. The method mds use is inefficient
and error-prone.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
startsync is a no-op, has been for years. Remove it.
Link: http://tracker.ceph.com/issues/20604
Signed-off-by: Yanhu Cao <gmayyyha@gmail.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
drivers/block/rbd.c: In function 'rbd_acquire_lock':
drivers/block/rbd.c:3602:44: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Silence the warning, found it when built old kernel(3.10) with
OBS(opensuse build service).
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
OSD has a configurable limitation of max write size. OSD return
error if write request size is larger than the limitation. For now,
set max write size to CEPH_MSG_MAX_DATA_LEN. It should be small
enough.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
libceph returns -EIO when read size > CEPH_MSG_MAX_DATA_LEN.
Link: http://tracker.ceph.com/issues/20528
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Ville Syrjälä spotted that PGETBL_CTL was losing its enable bit upon a
reset. That was causing the display to show garbage on his 945gm. On my
i915gm the effect was far more severe; re-enabling the display following
the reset without PGETBL_CTL being enabled lead to an immediate hard
hang.
We do have a routine to re-enable PGETBL_CTL which is applicable to
gen2-4, although on gen4 it is documented that a graphics reset doesn't
alter the register (no such wording is given for gen3) and should be safe
to call to punch back in the enable bit. However, that leaves the question
of whether we need to completely re-initialise the register and the
rest of the GSM. For g33/pnv/gen4+, where we do have a configurable
page table, its contents do seem to be kept, and so we should be able to
recover without having to reinitialise the GTT from scratch (as prior to
g33, that register is configured by the BIOS and we leave alone except
for the enable bit).
This appears to have been broken by commit 5fbd0418eef2 ("drm/i915:
Re-enable GGTT earlier during resume on pre-gen6 platforms"), which
moved the intel_enable_gtt() from i915_gem_init_hw() (also used by
reset) to add it earlier during hw init and resume, missing the reset
path.
v2: Find the culprit, rearrange ggtt_enable to be before gem_init_hw to
match init/resume
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: 5fbd0418eef2 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101852
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170906111405.27110-1-chris@chris-wilson.co.uk
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(cherry picked from commit 0db8c961209153498fe7e279b8f0d3deb81808f0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add the missing __user to the urelocs cast to fix the following sparse
warning:
i915_gem_execbuffer.c:1541:47: warning: cast removes address space of expression
i915_gem_execbuffer.c:1541:62: warning: incorrect type in argument 2 (different address spaces)
i915_gem_execbuffer.c:1541:62: expected void const [noderef] <asn:1>*from
i915_gem_execbuffer.c:1541:62: got char *
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901165434.24636-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> #irc
(cherry picked from commit 908a610557f4d8b46a0f82c01e31b30f5c998580)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Commit 6c6b6f28b333 ("loop: set physical block size to PAGE_SIZE")
caused mkfs.xfs to barf on ppc64 [1]. Always using PAGE_SIZE as the
physical block size still makes the most sense semantically, but let's
just lie and always set it to the same value as the logical block size
(same goes for io_min). In the future we might want to at least bump up
io_min to PAGE_SIZE but I'm sick of these stupid changes so let's play
it safe.
1: https://marc.info/?l=linux-xfs&m=150459024723753&w=2
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|