summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-08-16btrfs: Cleanup num_tolerated_disk_barrier_failuresQu Wenruo
As we use per-chunk degradable check, the global num_tolerated_disk_barrier_failures is of no use. We can now remove it. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Allow barrier_all_devices to do chunk level device checkQu Wenruo
The last user of num_tolerated_disk_barrier_failures is barrier_all_devices(). But it can be easily changed to the new per-chunk degradable check framework. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Do chunk level check for degraded remountQu Wenruo
Just the same for mount time check, use btrfs_check_rw_degradable() to check if we are OK to be remounted rw. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Do chunk level check for degraded rw mountQu Wenruo
Now use the btrfs_check_rw_degradable() to check if we can mount in the degraded mode. With this patch, we can mount in the following case: # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc # wipefs -a /dev/sdc # mount /dev/sdb /mnt/btrfs -o degraded As the single data chunk is only on sdb, so it's OK to mount as degraded, as missing one device is OK for RAID1. But still fail in the following case as expected: # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc # wipefs -a /dev/sdb # mount /dev/sdc /mnt/btrfs -o degraded As the data chunk is only in sdb, so it's not OK to mount it as degraded. Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reported-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Introduce a function to check if all chunks a OK for degraded rw mountQu Wenruo
Introduce a new function, btrfs_check_rw_degradable(), to check if all chunks in btrfs is OK for degraded rw mount. It provides the new basis for accurate btrfs mount/remount and even runtime degraded mount check other than old one-size-fit-all method. Btrfs currently uses num_tolerated_disk_barrier_failures to do global check for tolerated missing device. Although the one-size-fit-all solution is quite safe, it's too strict if data and metadata has different duplication level. For example, if one use Single data and RAID1 metadata for 2 disks, it means any missing device will make the fs unable to be degraded mounted. But in fact, some times all single chunks may be in the existing device and in that case, we should allow it to be rw degraded mounted. Such case can be easily reproduced using the following script: # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc # wipefs -f /dev/sdc # mount /dev/sdb -o degraded,rw If using btrfs-debug-tree to check /dev/sdb, one should find that the data chunk is only in sdb, so in fact it should allow degraded mount. This patchset will introduce a new per-chunk degradable check for btrfs, allow above case to succeed, and it's quite small anyway. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ copied text from cover letter with more details about the problem being solved ] Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16Btrfs: report errors when checksum is not foundLiu Bo
When btrfs fails the checksum check, it'll fill the whole page with "1". However, if %csum_expected is 0 (which means there is no checksum), then for some unknown reason, we just pretend that the read is correct, so userspace would be confused about the dilemma that read is successful but getting a page with all content being "1". This can happen due to a bug in btrfs-convert. This fixes it by always returning errors if checksum doesn't match. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Prevent possible ERR_PTR() dereferenceNikolay Borisov
In btrfs_full_stripe_len/btrfs_is_parity_mirror we have similar code which gets the chunk map for a particular range via get_chunk_map. However, get_chunk_map can return an ERR_PTR value and while the 2 callers do catch this with a WARN_ON they then proceed to indiscriminately dereference the extent map. This of course leads to a crash. Fix the offenders by making the dereference conditional on IS_ERR. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Remove redundant checks from btrfs_alloc_data_chunk_ondemandNikolay Borisov
Many commits ago the data space_info in alloc_data_chunk_ondemand used to be acquired from the inode. At that point commit 33b4d47f5e24 ("Btrfs: deal with NULL space info") got introduced to deal with spurios cases where the space info could be null, following a rebalance. Nowadays, however, the space info is referenced directly from the btrfs_fs_info struct which is initialised at filesystem mount time. This makes the null checks redundant, so remove them. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Remove redundant argument of flush_spaceNikolay Borisov
All callers of flush_space pass the same number for orig/num_bytes arguments. Let's remove one of the numbers and also modify the trace point to show only a single number - bytes requested. Seems that last point where the two parameters were treated differently is before the ticketed enospc rework. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: resume qgroup rescan on rw remountAleksa Sarai
Several distributions mount the "proper root" as ro during initrd and then remount it as rw before pivot_root(2). Thus, if a rescan had been aborted by a previous shutdown, the rescan would never be resumed. This issue would manifest itself as several btrfs ioctl(2)s causing the entire machine to hang when btrfs_qgroup_wait_for_completion was hit (due to the fs_info->qgroup_rescan_running flag being set but the rescan itself not being resumed). Notably, Docker's btrfs storage driver makes regular use of BTRFS_QUOTA_CTL_DISABLE and BTRFS_IOC_QUOTA_RESCAN_WAIT (causing this problem to be manifested on boot for some machines). Cc: <stable@vger.kernel.org> # v3.11+ Cc: Jeff Mahoney <jeffm@suse.com> Fixes: b382a324b60f ("Btrfs: fix qgroup rescan resume on mount") Signed-off-by: Aleksa Sarai <asarai@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: clean up extraneous computations in add_delayed_refsEdmund Nadolski
Repeating the same computation in multiple places is not necessary. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: allow backref search checks for shared extentsEdmund Nadolski
When called with a struct share_check, find_parent_nodes() will detect a shared extent and immediately return with BACKREF_SHARED_FOUND. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: add cond_resched() calls when resolving backrefsEdmund Nadolski
Since backref resolution is CPU-intensive, the cond_resched calls should help alleviate soft lockup occurences. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: backref, add tracepoints for prelim_ref insertion and mergingJeff Mahoney
This patch adds a tracepoint event for prelim_ref insertion and merging. For each, the ref being inserted or merged and the count of tree nodes is issued. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: add a node counter to each of the rbtreesJeff Mahoney
This patch adds counters to each of the rbtrees so that we can tell how large they are growing for a given workload. These counters will be exported by tracepoints in the next patch. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: convert prelimary reference tracking to use rbtreesEdmund Nadolski
It's been known for a while that the use of multiple lists that are periodically merged was an algorithmic problem within btrfs. There are several workloads that don't complete in any reasonable amount of time (e.g. btrfs/130) and others that cause soft lockups. The solution is to use a set of rbtrees that do insertion merging for both indirect and direct refs, with the former converting refs into the latter. The result is a btrfs/130 workload that used to take several hours now takes about half of that. This runtime still isn't acceptable and a future patch will address that by moving the rbtrees higher in the stack so the lookups can be shared across multiple calls to find_parent_nodes. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16reiserfs: fix spelling mistake: "tranasction" -> "transaction"Colin Ian King
Trivial fix to spelling mistake in reiserfs_warning message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-16btrfs: remove ref_tree implementation from backref.cEdmund Nadolski
Commit afce772e87c3 ("btrfs: fix check_shared for fiemap ioctl") added the ref_tree code in backref.c to reduce backref searching for shared extents under the FIEMAP ioctl. This code will not be compatible with the upcoming rbtree changes for improved backref searching, so this patch removes the ref_tree code. The rbtree changes will provide the equivalent functionality for FIEMAP. The above commit also introduced transaction semantics around calls to btrfs_check_shared() in order to accurately account for delayed refs. This functionality needs to be retained, so a complete revert of the above commit is not desirable. This patch therefore removes the ref_tree portion of the commit as above, however it does not remove the transaction portion. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: btrfs_check_shared should manage its own transactionEdmund Nadolski
Commit afce772e87c3 ("btrfs: fix check_shared for fiemap ioctl") added transaction semantics around calls to btrfs_check_shared() in order to provide accurate accounting of delayed refs. The transaction management should be done inside btrfs_check_shared(), so that callers do not need to manage transactions individually. Signed-off-by: Edmund Nadolski <enadolski@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: backref, cleanup __ namespace abuseJeff Mahoney
We typically use __ to indicate a helper routine that shouldn't be called directly without understanding the proper context required to do so. We use static functions to indicate that a function is private to a particular C file. The backref code uses static function and __ prefixes on nearly everything, which makes the code difficult to read and establishes a pattern for future code that shouldn't be followed. This patch drops all the unnecessary prefixes. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: backref, add unode_aux_to_inode_list helperJeff Mahoney
Replacing the double cast and ternary conditional with a helper makes the code easier on the eyes. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: backref, constify some argumentsJeff Mahoney
This constifies a few buffers used in the backref code. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: constify tracepoint argumentsJeff Mahoney
Tracepoint arguments are all read-only. If we mark the arguments as const, we're able to keep or convert those arguments to const where appropriate. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: struct-funcs, constify readersJeff Mahoney
We have reader helpers for most of the on-disk structures that use an extent_buffer and pointer as offset into the buffer that are read-only. We should mark them as const and, in turn, allow consumers of these interfaces to mark the buffers const as well. No impact on code, but serves as documentation that a buffer is intended not to be modified. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: remove unused sectorsize memberNikolay Borisov
The sectorsize member of btrfs_block_group_cache is unused. So remove it, this reduces the number of holes in the struct. With patch: /* size: 856, cachelines: 14, members: 40 */ /* sum members: 837, holes: 4, sum holes: 19 */ /* bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 24 bytes */ Without patch: /* size: 864, cachelines: 14, members: 41 */ /* sum members: 841, holes: 5, sum holes: 23 */ /* bit holes: 1, sum bit holes: 29 bits */ /* last cacheline: 32 bytes */ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Be explicit about usage of min()Nikolay Borisov
__btrfs_alloc_chunk contains code which boils down to: ndevs = min(ndevs, devs_max) It's conditional upon devs_max not being 0. However, it cannot really be 0 since it's always set to either BTRFS_MAX_DEVS_SYS_CHUNK or BTRFS_MAX_DEVS(fs_info->chunk_root). So eliminate the condition check and use min explicitly. This has no functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: Use explicit round_down call rather than open-coding itNikolay Borisov
No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-16btrfs: convert while loop to list_for_each_entryNikolay Borisov
No functional changes, just make the loop a bit more readable Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-08-15f2fs: fix potential overflow when adjusting GC cycleChao Yu
While comparing signed and unsigned variables, compiler will converts the signed value to unsigned one, due to this reason, {in,de}crease_sleep_time may return overflowed result. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-15f2fs: avoid unneeded sync on quota fileChao Yu
We only need to sync quota file with appointed quota type instead of all types in f2fs_quota_{on,off}. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-15f2fs: introduce gc_urgent mode for background GCJaegeuk Kim
This patch adds a sysfs entry to control urgent mode for background GC. If this is set, background GC thread conducts GC with gc_urgent_sleep_time all the time. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-15f2fs: use IPU for cold filesJaegeuk Kim
We expect cold files write data sequentially, but sometimes some of small data can be updated, which incurs fragmentation. Let's avoid that. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-15f2fs: fix the size value in __check_sit_bitmapYunlong Song
The current size value is not correct and will miss bitmap check. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-15gfs2: fix slab corruption during mounting and umounting gfs file systemThomas Tai
When using cman-3.0.12.1 and gfs2-utils-3.0.12.1, mounting and unmounting GFS2 file system would cause kernel to hang. The slab allocator suggests that it is likely a double free memory corruption. The issue is traced back to v3.9-rc6 where a patch is submitted to use kzalloc() for storing a bitmap instead of using a local variable. The intention is to allocate memory during mount and to free memory during unmount. The original patch misses a code path which has already freed the memory and caused memory corruption. This patch sets the memory pointer to NULL after the memory is freed, so that double free memory corruption will not happen. gdlm_mount() '-- set_recover_size() which use kzalloc() '-- if dlm does not support ops callbacks then '--- free_recover_size() which use kfree() gldm_unmount() '-- free_recover_size() which use kfree() Previous patch which introduced the double free issue is commit 57c7310b8eb9 ("GFS2: use kmalloc for lvb bitmap") Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
2017-08-15btrfs: Add zstd supportNick Terrell
Add zstd compression and decompression support to BtrFS. zstd at its fastest level compresses almost as well as zlib, while offering much faster compression and decompression, approaching lzo speeds. I benchmarked btrfs with zstd compression against no compression, lzo compression, and zlib compression. I benchmarked two scenarios. Copying a set of files to btrfs, and then reading the files. Copying a tarball to btrfs, extracting it to btrfs, and then reading the extracted files. After every operation, I call `sync` and include the sync time. Between every pair of operations I unmount and remount the filesystem to avoid caching. The benchmark files can be found in the upstream zstd source repository under `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}` [1] [2]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. The first compression benchmark is copying 10 copies of the unzipped Silesia corpus [3] into a BtrFS filesystem mounted with `-o compress-force=Method`. The decompression benchmark times how long it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is measured by comparing the output of `df` and `du`. See the benchmark file [1] for details. I benchmarked multiple zstd compression levels, although the patch uses zstd level 1. | Method | Ratio | Compression MB/s | Decompression speed | |---------|-------|------------------|---------------------| | None | 0.99 | 504 | 686 | | lzo | 1.66 | 398 | 442 | | zlib | 2.58 | 65 | 241 | | zstd 1 | 2.57 | 260 | 383 | | zstd 3 | 2.71 | 174 | 408 | | zstd 6 | 2.87 | 70 | 398 | | zstd 9 | 2.92 | 43 | 406 | | zstd 12 | 2.93 | 21 | 408 | | zstd 15 | 3.01 | 11 | 354 | The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it measures the compression ratio, extracts the tar, and deletes the tar. Then it measures the compression ratio again, and `tar`s the extracted files into `/dev/null`. See the benchmark file [2] for details. | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) | |--------|-----------|---------------|----------|------------|----------| | None | 0.97 | 0.78 | 0.981 | 5.501 | 8.807 | | lzo | 2.06 | 1.38 | 1.631 | 8.458 | 8.585 | | zlib | 3.40 | 1.86 | 7.750 | 21.544 | 11.744 | | zstd 1 | 3.57 | 1.85 | 2.579 | 11.479 | 9.389 | [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz zstd source repository: https://github.com/facebook/zstd Signed-off-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2017-08-15NFS: Wait for requests that are locked on the commit listTrond Myklebust
If a request is on the commit list, but is locked, we will currently skip it, which can lead to livelocking when the commit count doesn't reduce to zero. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()Trond Myklebust
Now that we no longer hold the inode->i_lock when manipulating the commit lists, it is safe to call pnfs_put_lseg() again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Switch to using mapping->private_lock for page writeback lookups.Trond Myklebust
Switch from using the inode->i_lock for this to avoid contention with other metadata manipulation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Use an atomic_long_t to count the number of commitsTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Use an atomic_long_t to count the number of requestsTrond Myklebust
Rather than forcing us to take the inode->i_lock just in order to bump the number. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFSv4: Use a mutex to protect the per-inode commit listsTrond Myklebust
The commit lists can get very large, so using the inode->i_lock can end up affecting general metadata performance. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Refactor nfs_page_find_head_request()Trond Myklebust
Split out the 2 cases so that we can treat the locking differently. The issue is that the locking in the pageswapcache cache is highly linked to the commit list locking. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()Trond Myklebust
Hide the locking from nfs_lock_and_join_requests() so that we can separate out the requirements for swapcache pages. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Fix up nfs_page_group_covers_page()Trond Myklebust
Fix up the test in nfs_page_group_covers_page(). The simplest implementation is to check that we have a set of intersecting or contiguous subrequests that connect page offset 0 to nfs_page_length(req->wb_page). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Remove unused parameter from nfs_page_group_lock()Trond Myklebust
nfs_page_group_lock() is now always called with the 'nonblock' parameter set to 'false'. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Remove unuse function nfs_page_group_lock_wait()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Remove nfs_page_group_clear_bits()Trond Myklebust
At this point, we only expect ever to potentially see PG_REMOVE and PG_TEARDOWN being set on the subrequests. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race casesTrond Myklebust
Since nfs_page_group_destroy() does not take any locks on the requests to be freed, we need to ensure that we don't inadvertently free the request in nfs_destroy_unlinked_subrequests() while the last reference is being released elsewhere. Do this by: 1) Taking a reference to the request unless it is already being freed 2) Checking (under the page group lock) if PG_TEARDOWN is already set before freeing an unreferenced request in nfs_destroy_unlinked_subrequests() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15NFS: Further optimise nfs_lock_and_join_requests()Trond Myklebust
When locking the entire group in order to remove subrequests, the locks are always taken in order, and with the page group lock being taken after the page head is locked. The intention is that: 1) The lock on the group head guarantees that requests may not be removed from the group (although new entries could be appended if we're not holding the group lock). 2) It is safe to drop and retake the page group lock while iterating through the list, in particular when waiting for a subrequest lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>