summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-09-03gfs2: Make statistics unsigned, suitable for use with do_div()Ben Hutchings
None of these statistics can meaningfully be negative, and the numerator for do_div() must have the type u64. The generic implementation of do_div() used on some 32-bit architectures asserts that, resulting in a compiler error in gfs2_rgrp_congested(). Fixes: 0166b197c2ed ("GFS2: Average in only non-zero round-trip times ...") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Andreas Gruenbacher <agruenba@redhat.com>
2015-09-03GFS2: Use resizable hash table for glocksBob Peterson
This patch changes the glock hash table from a normal hash table to a resizable hash table, which scales better. This also simplifies a lot of code. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-09-03GFS2: Move glock superblock pointer to field gl_nameBob Peterson
What uniquely identifies a glock in the glock hash table is not gl_name, but gl_name and its superblock pointer. This patch makes the gl_name field correspond to a unique glock identifier. That will allow us to simplify hashing with a future patch, since the hash algorithm can then take the gl_name and hash its components in one operation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-09-03gfs2: Simplify the seq file code for "sbstats"Andreas Gruenbacher
Don't use struct gfs2_glock_iter as the helper data structure for iterating through "sbstats"; we are not iterating through glocks here. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-09-02NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cbTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-02NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()Trond Myklebust
When I/O cannot complete due to a fatal error on the DS, ensure that we invalidate the corresponding layout segment and return it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-02Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "This first core part of the block IO changes contains: - Cleanup of the bio IO error signaling from Christoph. We used to rely on the uptodate bit and passing around of an error, now we store the error in the bio itself. - Improvement of the above from myself, by shrinking the bio size down again to fit in two cachelines on x86-64. - Revert of the max_hw_sectors cap removal from a revision again, from Jeff Moyer. This caused performance regressions in various tests. Reinstate the limit, bump it to a more reasonable size instead. - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me. Most devices have huge trim limits, which can cause nasty latencies when deleting files. Enable the admin to configure the size down. We will look into having a more sane default instead of UINT_MAX sectors. - Improvement of the SGP gaps logic from Keith Busch. - Enable the block core to handle arbitrarily sized bios, which enables a nice simplification of bio_add_page() (which is an IO hot path). From Kent. - Improvements to the partition io stats accounting, making it faster. From Ming Lei. - Also from Ming Lei, a basic fixup for overflow of the sysfs pending file in blk-mq, as well as a fix for a blk-mq timeout race condition. - Ming Lin has been carrying Kents above mentioned patches forward for a while, and testing them. Ming also did a few fixes around that. - Sasha Levin found and fixed a use-after-free problem introduced by the bio->bi_error changes from Christoph. - Small blk cgroup cleanup from Viresh Kumar" * 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits) blk: Fix bio_io_vec index when checking bvec gaps block: Replace SG_GAPS with new queue limits mask block: bump BLK_DEF_MAX_SECTORS to 2560 Revert "block: remove artifical max_hw_sectors cap" blk-mq: fix race between timeout and freeing request blk-mq: fix buffer overflow when reading sysfs file of 'pending' Documentation: update notes in biovecs about arbitrarily sized bios block: remove bio_get_nr_vecs() fs: use helper bio_add_page() instead of open coding on bi_io_vec block: kill merge_bvec_fn() completely md/raid5: get rid of bio_fits_rdev() md/raid5: split bio for chunk_aligned_read block: remove split code in blkdev_issue_{discard,write_same} btrfs: remove bio splitting and merge_bvec_fn() calls bcache: remove driver private bio splitting code block: simplify bio_add_page() block: make generic_make_request handle arbitrarily sized bios blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL) block: don't access bio->bi_error after bio_put() block: shrink struct bio down to 2 cache lines again ...
2015-09-02nfsd: deal with DELEGRETURN racing with CB_RECALLAndrew Elble
We have observed the server sending recalls for delegation stateids that have already been successfully returned. Change nfsd4_cb_recall_done() to return success if the client has returned the delegation. While this does not completely eliminate the sending of recalls for delegations that have already been returned, this does prevent unnecessarily declaring the callback path to be down. Reported-by: Eric Meddaugh <etmsys@rit.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-09-01Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual stuff from trivial tree for 4.3 (kerneldoc updates, printk() fixes, Documentation and MAINTAINERS updates)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits) MAINTAINERS: update my e-mail address mod_devicetable: add space before */ scsi: a100u2w: trivial typo in printk i2c: Fix typo in i2c-bfin-twi.c treewide: fix typos in comment blocks Doc: fix trivial typo in SubmittingPatches proportions: Spelling s/consitent/consistent/ dm: Spelling s/consitent/consistent/ aic7xxx: Fix typo in error message pcmcia: Fix typo in locking documentation scsi/arcmsr: Fix typos in error log drm/nouveau/gr: Fix typo in nv10.c [SCSI] Fix printk typos in drivers/scsi staging: comedi: Grammar s/Enable support a/Enable support for a/ Btrfs: Spelling s/consitent/consistent/ README: GTK+ is a acronym ASoC: omap: Fix typo in config option description mm: tlb.c: Fix error message ntfs: super.c: Fix error log fix typo in Documentation/SubmittingPatches ...
2015-09-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace updates from Eric Biederman: "This finishes up the changes to ensure proc and sysfs do not start implementing executable files, as the there are application today that are only secure because such files do not exist. It akso fixes a long standing misfeature of /proc/<pid>/mountinfo that did not show the proper source for files bind mounted from /proc/<pid>/ns/*. It also straightens out the handling of clone flags related to user namespaces, fixing an unnecessary failure of unshare(CLONE_NEWUSER) when files such as /proc/<pid>/environ are read while <pid> is calling unshare. This winds up fixing a minor bug in unshare flag handling that dates back to the first version of unshare in the kernel. Finally, this fixes a minor regression caused by the introduction of sysfs_create_mount_point, which broke someone's in house application, by restoring the size of /sys/fs/cgroup to 0 bytes. Apparently that application uses the directory size to determine if a tmpfs is mounted on /sys/fs/cgroup. The bind mount escape fixes are present in Al Viros for-next branch. and I expect them to come from there. The bind mount escape is the last of the user namespace related security bugs that I am aware of" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: fs: Set the size of empty dirs to 0. userns,pidns: Force thread group sharing, not signal handler sharing. unshare: Unsharing a thread does not require unsharing a vm nsfs: Add a show_path method to fix mountinfo mnt: fs_fully_visible enforce noexec and nosuid if !SB_I_NOEXEC vfs: Commit to never having exectuables on proc and sysfs.
2015-09-01nfs: Remove unneeded checking of the return value from scnprintfKinglong Mee
The return value from scnprintf always less than the buffer length. So, result >= len always false. This patch removes those checking. int vscnprintf(char *buf, size_t size, const char *fmt, va_list args) { int i; i = vsnprintf(buf, size, fmt, args); if (likely(i < size)) return i; if (size != 0) return size - 1; return 0; } Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01nfs: Fix truncated client owner id without proto typeKinglong Mee
The length of "Linux NFSv4.0 " is 14, not 10. Without this patch, I get a truncated client owner id as, "Linux NFSv4.0 ::1/::1" With this patch, "Linux NFSv4.0 ::1/::1 tcp" Fixes: a319268891 ("nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalidTrond Myklebust
If a read-write layout has an invalid mirror, then we should mark it as invalid, and return it. If a read-only layout has an invalid mirror, then mark it as invalid and check if there is still at least one valid mirror before we return it. Note: Also fix incorrect use of pnfs_generic_mark_devid_invalid(). We really want nfs4_mark_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are validTrond Myklebust
Unlike read layouts, the writeable layout cannot fall back to using only one of the mirrors. It need to write to all of them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()Trond Myklebust
Unlike the files layout, flexfiles does not test for the NFS_DEVICEID_INVALID flag. Instead it relies on NFS_DEVICEID_UNAVAILABLE. Fix is to replace with nfs4_mark_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01f2fs: upset segment_info repairYunlei He
upset segment_info like this: 276000|161 0|0 4|70 3|0 3|0 0|0 0|91 4|0 4|232 4|39 276104|0 4|0 4|1 4|0 4|0 4|280 4|0 4|42 4|262 4|38 276204|179 4|89 4|39 4|24 4|0 4|96 4|3 4|428 4|0 4|118 276304|112 4|97 4|0 4|0 4|0 4|68 4|0 4|0 4|86 4|138 276404|0 4|0 0|166 5|39 4|101 0|111 Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-09-01NFSv4.1/flexfiles: Fix freeing of mirrorsTrond Myklebust
Mirrors are now shared objects, so we should not be freeing them directly inside ff_layout_free_lseg(). We should already be doing the right thing in _ff_layout_free_lseg(), so just let it handle things. Also ensure that ff_layout_free_mirror() frees the RPC credential if it is set. Fixes: 28a0d72c6867 ("Add refcounting to struct nfs4_ff_layout_mirror") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-01nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM caseJ. Bruce Fields
Somebody with a Solaris client was hitting this case. We haven't figured out why yet, and don't have a reproducer. Meanwhile Frank noticed that RFC 7530 actually recommends CLID_INUSE for this case. Unlikely to help the original reporter, but may as well fix it. Reported-by: Frank Filz <ffilzlnx@mindspring.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-09-01Merge branch 'xfs-misc-fixes-for-4.3-4' into for-nextDave Chinner
2015-08-31nfsd: ensure that delegation stateid hash references are only put onceJeff Layton
It's possible that a DELEGRETURN could race with (e.g.) client expiry, in which case we could end up putting the delegation hash reference more than once. Have unhash_delegation_locked return a bool that indicates whether it was already unhashed. In the case of destroy_delegation we only conditionally put the hash reference if that returns true. The other callers of unhash_delegation_locked call it while walking list_heads that shouldn't yet be detached. If we find that it doesn't return true in those cases, then throw a WARN_ON as that indicates that we have a partially hashed delegation, and that something is likely very wrong. Tested-by: Andrew W Elble <aweits@rit.edu> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31nfsd: ensure that the ol stateid hash reference is only put onceJeff Layton
When an open or lock stateid is hashed, we take an extra reference to it. When we unhash it, we drop that reference. The code however does not properly account for the case where we have two callers concurrently trying to unhash the stateid. This can lead to list corruption and the hash reference being put more than once. Fix this by having unhash_ol_stateid use list_del_init on the st_perfile list_head, and then testing to see if that list_head is empty before releasing the hash reference. This means that some of the unhashing wrappers now become bool return functions so we can test to see whether the stateid was unhashed before we put the reference. Reported-by: Andrew W Elble <aweits@rit.edu> Tested-by: Andrew W Elble <aweits@rit.edu> Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com> Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31nfsd: allow more than one laundry job to run at a timeJeff Layton
We can potentially have several nfs4_laundromat jobs running if there are multiple namespaces running nfsd on the box. Those are effectively separated from one another though, so I don't see any reason to serialize them. Also, create_singlethread_workqueue automatically adds the WQ_MEM_RECLAIM flag. Since we run this job on a timer, it's not really involved in any reclaim paths. I see no need for a rescuer thread. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31nfsd: don't WARN/backtrace for invalid container deployment.Paul Gortmaker
These messages, combined with the backtrace they trigger, makes it seem like a serious problem, though a quick search shows distros marking it as a "won't fix" non-issue when the problem is reported by users. The backtrace is overkill, and only really manages to show that if you follow the code path, you can't really avoid it with bootargs or configuration settings in the container. Given that, lets tone it down a bit and get rid of the WARN severity, and the associated backtrace, so people aren't needlessly alarmed. Also, lets drop the split printk line, since they are grep unfriendly. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31fs: fix fs/locks.c kernel-doc warningRandy Dunlap
Fix kernel-doc warnings in fs/locks.c: Warning(..//fs/locks.c:1577): No description found for parameter 'flags' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-08-31NFSD: Return word2 bitmask if setting security label in OPEN/CREATEKinglong Mee
Security label can be set in OPEN/CREATE request, nfsd should set the bitmask in word2 if setting success. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1Kinglong Mee
According to rfc5661 18.16.4, "If EXCLUSIVE4_1 was used, the client determines the attributes used for the verifier by comparing attrset with cva_attrs.attrmask;" So, EXCLUSIVE4_1 also needs those bitmask used to store the verifier. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.Kinglong Mee
The encode order should be as the bitmask defined order. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bugKinglong Mee
Currently we'll respond correctly to a request for either FS_LAYOUT_TYPES or LAYOUT_TYPES, but not to a request for both attributes simultaneously. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31NFSD: Store parent's stat in a separate valueKinglong Mee
After commit ae7095a7c4 (nfsd4: helper function for getting mounted_on ino) we ignore the return value from get_parent_attributes(). Also, the following FATTR4_WORD2_LAYOUT_BLKSIZE uses stat.blksize, so to avoid overwriting that, use an independent value for the parent's attributes. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-31Btrfs: cleanup: remove unnecessary check before btrfs_free_path is calledTsutomu Itoh
We need not check path before btrfs_free_path() is called because path is checked in btrfs_free_path(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31btrfs: async_thread: Fix workqueue 'max_active' value when initializingQu Wenruo
At initializing time, for threshold-able workqueue, it's max_active of kernel workqueue should be 1 and grow if it hits threshold. But due to the bad naming, there is both 'max_active' for kernel workqueue and btrfs workqueue. So wrong value is given at workqueue initialization. This patch fixes it, and to avoid further misunderstanding, change the member name of btrfs_workqueue to 'current_active' and 'limit_active'. Also corresponding comment is added for readability. Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31btrfs: Add raid56 support for updatingZhao Lei
num_tolerated_disk_barrier_failures in btrfs_balance Code for updating fs_info->num_tolerated_disk_barrier_failures in btrfs_balance() lacks raid56 support. Reason: Above code was wroten in 2012-08-01, together with btrfs_calc_num_tolerated_disk_barrier_failures()'s first version. Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated later to support raid56, but code in btrfs_balance() was not updated together. Fix: Merge above similar code to a common function: btrfs_get_num_tolerated_disk_barrier_failures() and make it support both case. It can fix this bug with a bonus of cleanup, and make these code never in above no-sync state from now on. Suggested-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31btrfs: Cleanup for btrfs_calc_num_tolerated_disk_barrier_failuresZhao Lei
1: Use ARRAY_SIZE(types) to replace a static-value variant: int num_types = 4; 2: Use 'continue' on condition to reduce one level tab if (!XXX) { code; ... } -> if (XXX) continue; code; ... 3: Put setting 'num_tolerated_disk_barrier_failures = 2' to (num_tolerated_disk_barrier_failures > 2) condition to make make logic neat. if (num_tolerated_disk_barrier_failures > 0 && XXX) num_tolerated_disk_barrier_failures = 0; else if (num_tolerated_disk_barrier_failures > 1) { if (XXX) num_tolerated_disk_barrier_failures = 1; else if (XXX) num_tolerated_disk_barrier_failures = 2; -> if (num_tolerated_disk_barrier_failures > 0 && XXX) num_tolerated_disk_barrier_failures = 0; if (num_tolerated_disk_barrier_failures > 1 && XXX) num_tolerated_disk_barrier_failures = ; if (num_tolerated_disk_barrier_failures > 2 && XXX) num_tolerated_disk_barrier_failures = 2; 4: Remove comment of: num_mirrors - 1: if RAID1 or RAID10 is configured and more than 2 mirrors are used. which is not fit with code. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31btrfs: Remove noused chunk_tree and chunk_objectid from ↵Zhao Lei
scrub_enumerate_chunks and scrub_chunk These variables are not used from introduced version, remove them. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31btrfs: Update out-of-date "skip parity stripe" commentZhao Lei
Because btrfs support scrub raid56 parity stripe now. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-31Merge tag 'char-misc-4.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver patches from Greg KH: "Here's the "big" char/misc driver update for 4.3-rc1. Not much really interesting here, just a number of little changes all over the place, and some nice consolidation of the nvmem drivers to a common framework. As usual, the mei drivers stand out as the largest "churn" to handle new devices and features in their hardware. All have been in linux-next for a while with no issues" * tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) auxdisplay: ks0108: initialize local parport variable extcon: palmas: Fix build break due to devm_gpiod_get_optional API change extcon: palmas: Support GPIO based USB ID detection extcon: Fix signedness bugs about break error handling extcon: Drop owner assignment from i2c_driver extcon: arizona: Simplify pdata symantics for micd_dbtime extcon: arizona: Declare 3-pole jack if we detect open circuit on mic extcon: Add exception handling to prevent the NULL pointer access extcon: arizona: Ensure variables are set for headphone detection extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio extcon: arizona: Add basic microphone detection DT/ACPI bindings extcon: arizona: Update to use the new device properties API extcon: palmas: Remove the mutually_exclusive array extcon: Remove optional print_state() function pointer of struct extcon_dev extcon: Remove duplicate header file in extcon.h extcon: max77843: Clear IRQ bits state before request IRQ toshiba laptop: replace ioremap_cache with ioremap misc: eeprom: max6875: clean up max6875_read() misc: eeprom: clean up eeprom_read() misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write ...
2015-08-31NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of fileTrond Myklebust
If we have a read layout, then sanity check the minimal layout length so that it does not extend beyond the end of file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31NFSv4.1/pnfs: Handle LAYOUTGET return values correctlyTrond Myklebust
According to RFC5661 section 18.43.3, if the server cannot satisfy the loga_minlength argument to LAYOUTGET, there are 2 cases: 1) If loga_minlength == 0, it returns NFS4ERR_LAYOUTTRYLATER 2) If loga_minlength != 0, it returns NFS4ERR_BADLAYOUT Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31NFSv4.1/pnfs: Don't ask for a read layout for an empty file.Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-30NFSv4.1: Fix a protocol issue with CLOSE stateidsTrond Myklebust
According to RFC5661 Section 18.2.4, CLOSE is supposed to return the zero stateid. This means that nfs_clear_open_stateid_locked() cannot assume that the result stateid will always match the 'other' field of the existing open stateid when trying to determine a race with a parallel OPEN. Instead, we look at the argument, and check for matches. Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-30NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errorsTrond Myklebust
If the file was fenced and/or has been deleted on the DS, then we want to retry pNFS after a layoutreturn with error report. If the server cannot fix the problem, then we rely on it to tell us so in the response to the LAYOUTGET. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-28f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extentChao Yu
If extent cache is disable, we will encounter oops when triggering direct IO as below: BUG: unable to handle kernel NULL pointer dereference at 0000000c IP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] *pdpt = 000000002bb9a001 *pde = 0000000000000000 Oops: 0000 [#1] SMP Modules linked in: f2fs(O) fuse bnep rfcomm bluetooth nfsd dm_crypt nfs_acl auth_rpcgss oid_registry nfs binfmt_misc fscache lockd sunrpc grace snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore joydev psmouse hid_generic i2c_piix4 serio_raw ppdev mac_hid parport_pc lp parport ext4 jbd2 mbcache usbhid hid e1000 CPU: 3 PID: 3608 Comm: dd Tainted: G O 4.2.0-rc4 #12 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: ef161600 ti: ebd5e000 task.ti: ebd5e000 EIP: 0060:[<f0b9c61e>] EFLAGS: 00010202 CPU: 3 EIP is at f2fs_drop_largest_extent+0xe/0x30 [f2fs] EAX: 00000000 EBX: ddebc000 ECX: 00000000 EDX: 00000000 ESI: ebd5fdf8 EDI: 00000000 EBP: ebd5fd58 ESP: ebd5fd58 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 80050033 CR2: 0000000c CR3: 2c24ee40 CR4: 000006f0 Stack: ebd5fda4 f0b8c005 00000000 00000001 00000000 f0b8c430 c816cd68 ddebc000 ddebc088 00001000 00000555 00000555 ffffffff c160bb00 00055501 00000000 00000000 00000100 00000000 ebd5fe20 f0b8c430 00000046 ef161600 00001000 Call Trace: [<f0b8c005>] __allocate_data_block+0x1a5/0x260 [f2fs] [<f0b8c430>] ? f2fs_direct_IO+0x370/0x440 [f2fs] [<c160bb00>] ? down_read+0x30/0x50 [<f0b8c430>] f2fs_direct_IO+0x370/0x440 [f2fs] [<c113e115>] generic_file_direct_write+0xa5/0x260 [<c10b53f8>] ? current_fs_time+0x18/0x50 [<c113e38b>] __generic_file_write_iter+0xbb/0x210 [<c113e50f>] ? generic_file_write_iter+0x2f/0x320 [<c113e63c>] generic_file_write_iter+0x15c/0x320 [<f0b77f29>] f2fs_file_write_iter+0x39/0x80 [f2fs] [<c11984d9>] __vfs_write+0xa9/0xe0 [<c1199227>] vfs_write+0x97/0x180 [<c119955b>] SyS_write+0x5b/0xd0 [<c160dcd0>] sysenter_do_call+0x12/0x12 Code: 10 8b 50 1c 89 53 14 eb ca 8d 74 26 00 85 f6 74 86 eb a6 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 d4 02 00 00 <8b> 48 0c 39 d1 77 0e 03 48 14 39 ca 73 07 c7 40 14 00 00 00 00 EIP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] SS:ESP 0068:ebd5fd58 CR2: 000000000000000c ---[ end trace a38c07026a1afffd ]--- This is because when extent cache is disable, extent_tree pointer in struct f2fs_inode_info should be NULL, but in f2fs_drop_largest_extent we access this NULL pointer directly without checking state of extent cache, then, the oops occurs. Let's fix it by checking state of extent cache before accessing. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-28xfs: fix error gotos in xfs_setattr_nonsizeEric Sandeen
As the code stands today, if xfs_trans_reserve() fails, we goto out_dqrele, which does not free the allocated transaction. Fix up the goto targets to undo everything properly. Addresses-Coverity-Id: 145571 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-28xfs: add mssing inode cache attempts counter incrementLucas Stach
Increasing the inode cache attempt counter was apparently dropped while refactoring the cache code and so stayed at the initial 0 value. Add the increment back to make the runtime stats more useful. Signed-off-by: Lucas Stach <dev@lynxeye.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-28xfs: return errors from partial I/O failures to filesDavid Jeffery
There is an issue with xfs's error reporting in some cases of I/O partially failing and partially succeeding. Calls like fsync() can report success even though not all I/O was successful in partial-failure cases such as one disk of a RAID0 array being offline. The issue can occur when there are more than one bio per xfs_ioend struct. Each call to xfs_end_bio() for a bio completing will write a value to ioend->io_error. If a successful bio completes after any failed bio, no error is reported do to it writing 0 over the error code set by any failed bio. The I/O error information is now lost and when the ioend is completed only success is reported back up the filesystem stack. xfs_end_bio() should only set ioend->io_error in the case of BIO_UPTODATE being clear. ioend->io_error is initialized to 0 at allocation so only needs to be updated by a failed bio. Also check that ioend->io_error is 0 so that the first error reported will be the error code returned. Cc: stable@vger.kernel.org Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-28libxfs: bad magic number should set da block buffer errorDarrick J. Wong
If xfs_da3_node_read_verify() doesn't recognize the magic number of a buffer it's just read, set the buffer error to -EFSCORRUPTED so that the error can be sent up to userspace. Without this patch we'll notice the bad magic eventually while trying to traverse or change the block, but we really ought to fail early in the verifier. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-27NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payloadTrond Myklebust
The "FIXME" is outdated. Flexfiles does add a payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27NFSv4.1/flexfiles: Fix a protocol error in layoutreturnTrond Myklebust
According to the flexfiles protocol, the layoutreturn should specify an array of errors in the following format: struct ff_ioerr4 { offset4 ffie_offset; length4 ffie_length; stateid4 ffie_stateid; device_error4 ffie_errors<>; }; This patch fixes up the code to ensure that our ffie_errors is indeed encoded as an array (albeit with only a single entry). Reported-by: Tom Haynes <thomas.haynes@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1Kinglong Mee
Client sends a SETATTR request after OPEN for updating attributes. For create file with S_ISGID is set, the S_ISGID in SETATTR will be ignored at nfs server as chmod of no PERMISSION. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-27NFS: Get suppattr_exclcreat when getting server capabilitiesKinglong Mee
Create file with attributs as NFS4_CREATE_EXCLUSIVE4_1 mode depends on suppattr_exclcreat attribut. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>