summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-18cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never failsRafael J. Wysocki
None of the cpufreq governors currently in the tree will ever fail an invocation of the ->governor() callback with the event argument equal to CPUFREQ_GOV_POLICY_EXIT (unless invoked with incorrect arguments which doesn't matter anyway) and it wouldn't really make sense to fail it, because the caller won't be able to handle that failure in a meaningful way. Accordingly, rearrange the code in the core to make it clear that this call never fails. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-05-18intel_pstate: Simplify conditional in intel_pstate_set_policy()Rafael J. Wysocki
One of the if () statements in intel_pstate_set_policy() causes another if () to be evaluated if the condition is true and it doesn't do anything else, so merge the two if () statements into one. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: - remove of our own implementation of architecture-specific relocation code and leveraging existing code in the module loader to perform arch-dependent work, from Jessica Yu. The relevant patches have been acked by Rusty (for module.c) and Heiko (for s390). - live patching support for ppc64le, which is a joint work of Michael Ellerman and Torsten Duwe. This is coming from topic branch that is share between livepatching.git and ppc tree. - addition of livepatching documentation from Petr Mladek * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: make object/func-walking helpers more robust livepatch: Add some basic livepatch documentation powerpc/livepatch: Add live patching support on ppc64le powerpc/livepatch: Add livepatch stack to struct thread_info powerpc/livepatch: Add livepatch header livepatch: Allow architectures to specify an alternate ftrace location ftrace: Make ftrace_location_range() global livepatch: robustify klp_register_patch() API error checking Documentation: livepatch: outline Elf format and requirements for patch modules livepatch: reuse module loader code to write relocations module: s390: keep mod_arch_specific for livepatch modules module: preserve Elf information for livepatch modules Elf: add livepatch-specific Elf constants
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits) gitignore: fix wording mfd: ab8500-debugfs: fix "between" in printk memstick: trivial fix of spelling mistake on management cpupowerutils: bench: fix "average" treewide: Fix typos in printk IB/mlx4: printk fix pinctrl: sirf/atlas7: fix printk spelling serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/ w1: comment spelling s/minmum/minimum/ Blackfin: comment spelling s/divsor/divisor/ metag: Fix misspellings in comments. ia64: Fix misspellings in comments. hexagon: Fix misspellings in comments. tools/perf: Fix misspellings in comments. cris: Fix misspellings in comments. c6x: Fix misspellings in comments. blackfin: Fix misspelling of 'register' in comment. avr32: Fix misspelling of 'definitions' in comment. treewide: Fix typos in printk Doc: treewide : Fix typos in DocBook/filesystem.xml ...
2016-05-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support SPI based w5100 devices, from Akinobu Mita. 2) Partial Segmentation Offload, from Alexander Duyck. 3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE. 4) Allow cls_flower stats offload, from Amir Vadai. 5) Implement bpf blinding, from Daniel Borkmann. 6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is actually using FASYNC these atomics are superfluous. From Eric Dumazet. 7) Run TCP more preemptibly, also from Eric Dumazet. 8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e driver, from Gal Pressman. 9) Allow creating ppp devices via rtnetlink, from Guillaume Nault. 10) Improve BPF usage documentation, from Jesper Dangaard Brouer. 11) Support tunneling offloads in qed, from Manish Chopra. 12) aRFS offloading in mlx5e, from Maor Gottlieb. 13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo Leitner. 14) Add MSG_EOR support to TCP, this allows controlling packet coalescing on application record boundaries for more accurate socket timestamp sampling. From Martin KaFai Lau. 15) Fix alignment of 64-bit netlink attributes across the board, from Nicolas Dichtel. 16) Per-vlan stats in bridging, from Nikolay Aleksandrov. 17) Several conversions of drivers to ethtool ksettings, from Philippe Reynes. 18) Checksum neutral ILA in ipv6, from Tom Herbert. 19) Factorize all of the various marvell dsa drivers into one, from Vivien Didelot 20) Add VF support to qed driver, from Yuval Mintz" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits) Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m" Revert "phy dp83867: Make rgmii parameters optional" r8169: default to 64-bit DMA on recent PCIe chips phy dp83867: Make rgmii parameters optional phy dp83867: Fix compilation with CONFIG_OF_MDIO=m bpf: arm64: remove callee-save registers use for tmp registers asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions switchdev: pass pointer to fib_info instead of copy net_sched: close another race condition in tcf_mirred_release() tipc: fix nametable publication field in nl compat drivers: net: Don't print unpopulated net_device name qed: add support for dcbx. ravb: Add missing free_irq() calls to ravb_close() qed: Remove a stray tab net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fec-mpc52xx: use phydev from struct net_device bpf, doc: fix typo on bpf_asm descriptions stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fs-enet: use phydev from struct net_device ...
2016-05-17btrfs: Switch to generic xattr handlersAndreas Gruenbacher
The btrfs_{set,remove}xattr inode operations check for a read-only root (btrfs_root_readonly) before calling into generic_{set,remove}xattr. If this check is moved into __btrfs_setxattr, we can get rid of btrfs_{set,remove}xattr. This patch applies to mainline, I would like to keep it together with the other xattr cleanups if possible, though. Could you please review? Thanks, Andreas Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-17ubifs: Switch to generic xattr handlersAndreas Gruenbacher
Ubifs internally uses special inodes for storing xattrs. Those inodes had NULL {get,set,remove}xattr inode operations before this change, so xattr operations on them would fail. The super block's s_xattr field would also apply to those special inodes. However, the inodes are not visible outside of ubifs, and so no xattr operations will ever be carried out on them anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-17nvme/host: Add missing blk_integrity tag_size + flags assignmentsNicholas Bellinger
While doing recent bring-up of nvme/host with target-core T10-PI, I noticed /sys/block/nvme*/integrity/device_is_integrity_capable was false, and /sys/block/nvme*/integrity/tag_size contained a bogus value. AFAICT outside of blk_integrity_compare() for DM + MD these are informational values, but go ahead and add the missing assignments for nvme/host to match what SCSI does within sd_dif_config_host() for consistency's sake. Cc: Keith Busch <keith.busch@intel.com> Cc: Jay Freyensee <james.p.freyensee@intel.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Sagi Grimberg <sagig@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Sagi Grimberg <sagi at grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Add device ID's with stripe quirkKeith Busch
Adds two Intel controllers that have the "stripe" quirk. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Short-cut removal on surprise hot-unplugKeith Busch
This patch adds a new state that when set has the core automatically kill request queues prior to removing namespaces. If PCI device is not present at the time the nvme driver's remove is called, we can kill all IO queues immediately instead of waiting for the watchdog thread to do that at its polling interval. This improves scenarios where multiple hot plug events occur at the same time since it doesn't block the pci enumeration for as long. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Allow user initiated rescanKeith Busch
This exposes ioctl and sysfs methods a user can invoke to request the driver rescan a controller and its namespaces. This is less harsh than doing a controller reset, which temporarilly halts all IO, just to surface a newly attached namespace. This is mainly useful for controllers that implement the namespace management command, but do not support the namespace notify change asynchronous event notification. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Reduce driver log spammingKeith Busch
Reduce error logging when no corrective action is required. Suggessted-by: Chris Petersen <cpetersen@fb.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Unbind driver on failureKeith Busch
Instead of removing the PCI device from the kernel's topology on controller failure, this patch simply requests unbinding the device from the driver. This avoids concurrently running pci removal with the hot plug event, which has been reported to be problematic when multiple surprise events occur near simultaneously. The other benefit is that we will have PCI config and memory space available to poke around for debugging a failed controller, assuming the device was not physically removed. The down side occurs if the platform and/or kernel do not support any type of surprise hot removal. The device will remain visible through sysfs (and therefore lspci), and some manual work is necessary to get the logical topology corrected. But if your platform and/or kernel don't support surprise removal, you probably shouldn't be doing that anyway. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Delete only created queuesKeith Busch
Use the online queue count instead of the number of allocated queues. The controller should just return an invalid queue identifier error to the commands if a queue wasn't created. While it's not harmful, it's still not correct. Reported-by: Saar Gross <saar@annapurnalabs.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Allocate queues only for online cpusKeith Busch
The driver previously requested allocating queues for the total possible number of CPUs so that blk-mq could rebalance these if CPUs were added after initialization. The number of hardware contexts can now be changed at runtime, so we only need to allocate the number of online queues since we can add more later. Suggested-by: Jeff Lien <jeff.lien@hgst.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17Merge tag 'dm-4.7-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - based on Jens' 'for-4.7/core' to have DM thinp's discard support use bio_inc_remaining() and the block core's new async __blkdev_issue_discard() interface - make DM multipath's fast code-paths lockless, using lockless_deference, to significantly improve large NUMA performance when using blk-mq. The m->lock spinlock contention was a serious bottleneck. - a few other small code cleanups and Documentation fixes * tag 'dm-4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: unroll issue_discard() to create longer discard bio chains dm thin: use __blkdev_issue_discard for async discard support dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining() dm raid: make sure no feature flags are set in metadata dm ioctl: drop use of __GFP_REPEAT in copy_params()'s __vmalloc() call dm stats: fix spelling mistake in Documentation dm cache: update cache-policies.txt now that mq is an alias for smq dm mpath: eliminate use of spinlock in IO fast-paths dm mpath: move trigger_event member to the end of 'struct multipath' dm mpath: use atomic_t for counting members of 'struct multipath' dm mpath: switch to using bitops for state flags dm thin: Remove return statement from void function dm: remove unused mapped_device argument from free_tio()
2016-05-17Merge branch 'for-4.7/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "On top of the core pull request, this is the drivers pull request for this merge window. This contains: - Switch drivers to the new write back cache API, and kill off the flush flags. From me. - Kill the discard support for the STEC pci-e flash driver. It's trivially broken, and apparently unmaintained, so it's safer to just remove it. From Jeff Moyer. - A set of lightnvm updates from the usual suspects (Matias/Javier, and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei Tao. - A set of updates for NVMe: - Turn the controller state management into a proper state machine. From Christoph. - Shuffling of code in preparation for NVMe-over-fabrics, also from Christoph. - Cleanup of the command prep part from Ming Lin. - Rewrite of the discard support from Ming Lin. - Deadlock fix for namespace removal from Ming Lin. - Use the now exported blk-mq tag helper for IO termination. From Sagi. - Various little fixes from Christoph, Guilherme, Keith, Ming Lin, Wang Sheng-Hui. - Convert mtip32xx to use the now exported blk-mq tag iter function, from Keith" * 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits) lightnvm: reserved space calculation incorrect lightnvm: rename nr_pages to nr_ppas on nvm_rq lightnvm: add is_cached entry to struct ppa_addr lightnvm: expose gennvm_mark_blk to targets lightnvm: remove mgt targets on mgt removal lightnvm: pass dma address to hardware rather than pointer lightnvm: do not assume sequential lun alloc. nvme/lightnvm: Log using the ctrl named device lightnvm: rename dma helper functions lightnvm: enable metadata to be sent to device lightnvm: do not free unused metadata on rrpc lightnvm: fix out of bound ppa lun id on bb tbl lightnvm: refactor set_bb_tbl for accepting ppa list lightnvm: move responsibility for bad blk mgmt to target lightnvm: make nvm_set_rqd_ppalist() aware of vblks lightnvm: remove struct factory_blks lightnvm: refactor device ops->get_bb_tbl() lightnvm: introduce nvm_for_each_lun_ppa() macro lightnvm: refactor dev->online_target to global nvm_targets lightnvm: rename nvm_targets to nvm_tgt_type ...
2016-05-17Merge branch 'for-4.7/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block layer updates from Jens Axboe: "This is the core block IO changes for this merge window. Nothing earth shattering in here, it's mostly just fixes. In detail: - Fix for a long standing issue where wrong ordering in blk-mq caused order_to_size() to spew a warning. From Bart. - Async discard support from Christoph. Basically just splitting our sync interface into a submit + wait part. - Add a cleaner interface for flagging whether a device has a write back cache or not. We've previously overloaded blk_queue_flush() with this, but let's make it more explicit. Drivers cleaned up and updated in the drivers pull request. From me. - Fix for a double check for whether IO accounting is enabled or not. From Michael Callahan. - Fix for the async discard from Mike Snitzer, reinstating the early EOPNOTSUPP return if the device doesn't support discards. - Also from Mike, export bio_inc_remaining() so dm can drop it's private copy of it. - From Ming Lin, add support for passing in an offset for request payloads. - Tag function export from Sagi, which will be used in NVMe in the drivers pull. - Two blktrace related fixes from Shaohua. - Propagate NOMERGE flag when making a request from a bio, also from Shaohua. - An optimization to not parse cgroup paths in blk-throttle, if we don't need to. From Shaohua" * 'for-4.7/core' of git://git.kernel.dk/linux-block: blk-mq: fix undefined behaviour in order_to_size() blk-throttle: don't parse cgroup path if trace isn't enabled blktrace: add missed mask name blktrace: delete garbage for message trace block: make bio_inc_remaining() interface accessible again block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard block: Minor blk_account_io_start usage cleanup block: add __blkdev_issue_discard block: remove struct bio_batch block: copy NOMERGE flag from bio to request block: add ability to flag write back caching on a device blk-mq: Export tagset iter function block: add offset in blk_add_request_payload() writeback: Fix performance regression in wb_over_bg_thresh()
2016-05-17doc: self-protection: provide initial detailsKees Cook
This document attempts to codify the intent around kernel self-protection along with discussion of both existing and desired technologies, with attention given to the rationale behind them, and the expectations of their usage. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> [jc: applied fixes suggested by Randy] Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-05-17Merge branch 'work.preadv2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs cleanups from Al Viro: "More cleanups from Christoph" * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: use RWF_SYNC fs: add RWF_DSYNC aand RWF_SYNC ceph: use generic_write_sync fs: simplify the generic_write_sync prototype fs: add IOCB_SYNC and IOCB_DSYNC direct-io: remove the offset argument to dio_complete direct-io: eliminate the offset argument to ->direct_IO xfs: eliminate the pos variable in xfs_file_dio_aio_write filemap: remove the pos argument to generic_file_direct_write filemap: remove pos variables in generic_file_read_iter
2016-05-17Merge branch 'for-chris-4.7' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.7 Signed-off-by: Chris Mason <clm@fb.com>
2016-05-17Merge branch 'work.const-path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct path' constification update from Al Viro: "'struct path' is passed by reference to a bunch of Linux security methods; in theory, there's nothing to stop them from modifying the damn thing and LSM community being what it is, sooner or later some enterprising soul is going to decide that it's a good idea. Let's remove the temptation and constify all of those..." * 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: constify ima_d_path() constify security_sb_pivotroot() constify security_path_chroot() constify security_path_{link,rename} apparmor: remove useless checks for NULL ->mnt constify security_path_{mkdir,mknod,symlink} constify security_path_{unlink,rmdir} apparmor: constify common_perm_...() apparmor: constify aa_path_link() apparmor: new helper - common_path_perm() constify chmod_common/security_path_chmod constify security_sb_mount() constify chown_common/security_path_chown tomoyo: constify assorted struct path * apparmor_path_truncate(): path->mnt is never NULL constify vfs_truncate() constify security_path_truncate() [apparmor] constify struct path * in a bunch of helpers
2016-05-17Merge branch 'for-cifs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull cifs xattr updates from Al Viro: "This is the remaining parts of the xattr work - the cifs bits" * 'for-cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: cifs: Switch to generic xattr handlers cifs: Fix removexattr for os2.* xattrs cifs: Check for equality with ACL_TYPE_ACCESS and ACL_TYPE_DEFAULT cifs: Fix xattr name checks
2016-05-17Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF fixes from Jan Kara: "A fix for UDF crash on corrupted media and one UDF header fixup" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Export superblock magic to userspace udf: Prevent stack overflow on corrupted filesystem mount
2016-05-17Merge branch 'for-chris-4.7' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7
2016-05-17Merge tag 'jfs-4.7' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs updates from Dave Kleikamp: "Some jfs logging cleanups from Joe Perches" * tag 'jfs-4.7' of git://github.com/kleikamp/linux-shaggy: jfs: Coalesce some formats jfs: Remove unnecessary line continuations and terminating newlines jfs: Remove terminating newlines from jfs_info, jfs_warn, jfs_err uses
2016-05-17exec: clarify reasoning for euid/egid resetKees Cook
This section of code initially looks redundant, but is required. This improves the comment to explain more clearly why the reset is needed. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-17pnfs: make pnfs_layout_process more robustJeff Layton
It can return NULL if layoutgets are blocked currently. Fix it to return -EAGAIN in that case, so we can properly handle it in pnfs_update_layout. Also, clean up and simplify the error handling -- eliminate "status" and just use "lseg". Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: rework LAYOUTGET retry handlingJeff Layton
There are several problems in the way a stateid is selected for a LAYOUTGET operation: We pick a stateid to use in the RPC prepare op, but that makes it difficult to serialize LAYOUTGETs that use the open stateid. That serialization is done in pnfs_update_layout, which occurs well before the rpc_prepare operation. Between those two events, the i_lock is dropped and reacquired. pnfs_update_layout can find that the list has lsegs in it and not do any serialization, but then later pnfs_choose_layoutget_stateid ends up choosing the open stateid. This patch changes the client to select the stateid to use in the LAYOUTGET earlier, when we're searching for a usable layout segment. This way we can do it all while holding the i_lock the first time, and ensure that we serialize any LAYOUTGET call that uses a non-layout stateid. This also means a rework of how LAYOUTGET replies are handled, as we must now get the latest stateid if we want to retransmit in response to a retryable error. Most of those errors boil down to the fact that the layout state has changed in some fashion. Thus, what we really want to do is to re-search for a layout when it fails with a retryable error, so that we can avoid reissuing the RPC at all if possible. While the LAYOUTGET RPC is async, the initiating thread always waits for it to complete, so it's effectively synchronous anyway. Currently, when we need to retry a LAYOUTGET because of an error, we drive that retry via the rpc state machine. This means that once the call has been submitted, it runs until it completes. So, we must move the error handling for this RPC out of the rpc_call_done operation and into the caller. In order to handle errors like NFS4ERR_DELAY properly, we must also pass a pointer to the sliding timeout, which is now moved to the stack in pnfs_update_layout. The complicating errors are -NFS4ERR_RECALLCONFLICT and -NFS4ERR_LAYOUTTRYLATER, as those involve a timeout after which we give up and return NULL back to the caller. So, there is some special handling for those errors to ensure that the layers driving the retries can handle that appropriately. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: lift retry logic from send_layoutget to pnfs_update_layoutJeff Layton
If we get back something like NFS4ERR_OLD_STATEID, that will be translated into -EAGAIN, and the do/while loop in send_layoutget will drive the call again. This is not quite what we want, I think. An error like that is a sign that something has changed. That something could have been a concurrent LAYOUTGET that would give us a usable lseg. Lift the retry logic into pnfs_update_layout instead. That allows us to redo the layout search, and may spare us from having to issue an RPC. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: fix bad error handling in send_layoutgetJeff Layton
Currently, the code will clear the fail bit if we get back a fatal error. I don't think that's correct -- we want to clear that bit if we do not get a fatal error. Fixes: 0bcbf039f6 (nfs: handle request add failure properly) Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_dsJeff Layton
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTEDJeff Layton
Setting just the NFS_LAYOUT_RETURN_REQUESTED flag doesn't do anything, unless there are lsegs that are also being marked for return. At the point where that happens this flag is also set, so these set_bit calls don't do anything useful. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN argsJeff Layton
LAYOUTRETURN is "special" in that servers and clients are expected to work with old stateids. When the client sends a LAYOUTRETURN with an old stateid in it then the server is expected to only tear down layout segments that were present when that seqid was current. Ensure that the client handles its accounting accordingly. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: keep track of the return sequence number in pnfs_layout_hdrJeff Layton
When we want to selectively do a LAYOUTRETURN, we need to specify a stateid that represents most recent layout acquisition that is to be returned. When we mark a layout stateid to be returned, we update the return sequence number in the layout header with that value, if it's newer than the existing one. Then, when we go to do a LAYOUTRETURN on layout header put, we overwrite the seqid in the stateid with the saved one, and then zero it out. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: record sequence in pnfs_layout_segment when it's createdJeff Layton
In later patches, we're going to teach the client to be more selective about how it returns layouts. This means keeping a record of what the stateid's seqid was at the time that the server handed out a layout segment. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit setJeff Layton
Otherwise, we'll end up returning layouts that we've just received if the client issues a new LAYOUTGET prior to the LAYOUTRETURN. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pNFS/flexfiles: When initing reads or writes, we might have to retry ↵Tom Haynes
connecting to DSes If we are initializing reads or writes and can not connect to a DS, then check whether or not IO is allowed through the MDS. If it is allowed, reset to the MDS. Else, fail the layout segment and force a retry of a new layout segment. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pNFS/flexfiles: When checking for available DSes, conditionally check for MDS ioTom Haynes
Whenever we check to see if we have the needed number of DSes for the action, we may also have to check to see whether IO is allowed to go to the MDS or not. [jlayton: fix merge conflict due to lack of localio patches here] Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pNFS/flexfile: Fix erroneous fall back to read/write through the MDSTrond Myklebust
This patch fixes a problem whereby the pNFS client falls back to doing reads and writes through the metadata server even when the layout flag FF_FLAGS_NO_IO_THRU_MDS is set. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17NFS: Reclaim writes via writepage are opportunisticTrond Myklebust
No need to make them a priority any more, or to make them succeed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17NFSv4: Use the right stateid for delegations in setattr, read and writeTrond Myklebust
When we're using a delegation to represent our open state, we should ensure that we use the stateid that was used to create that delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17NFSv4: Label stateids with the typeTrond Myklebust
In order to more easily distinguish what kind of stateid we are dealing with, introduce a type that can be used to label the stateid structure. The label will be useful both for debugging, but also when dealing with operations like SETATTR, READ and WRITE that can take several different types of stateid as arguments. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17SUNRPC: Ensure get_rpccred() and put_rpccred() can take NULL argumentsTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17pNFS: Fix a leaked layoutstats flagTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Remove qplockChuck Lever
Clean up. After "xprtrdma: Remove ro_unmap() from all registration modes", there are no longer any sites that take rpcrdma_ia::qplock for read. The one site that takes it for write is always single-threaded. It is safe to remove it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Faster server reboot recoveryChuck Lever
In a cluster failover scenario, it is desirable for the client to attempt to reconnect quickly, as an alternate NFS server is already waiting to take over for the down server. The client can't see that a server IP address has moved to a new server until the existing connection is gone. For fabrics and devices where it is meaningful, set a definite upper bound on the amount of time before it is determined that a connection is no longer valid. This allows the RPC client to detect connection loss in a timely matter, then perform a fresh resolution of the server GUID in case it has changed (cluster failover). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Remove ro_unmap() from all registration modesChuck Lever
Clean up: The ro_unmap method is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Add ro_unmap_safe memreg methodChuck Lever
There needs to be a safe method of releasing registered memory resources when an RPC terminates. Safe can mean a number of things: + Doesn't have to sleep + Doesn't rely on having a QP in RTS ro_unmap_safe will be that safe method. It can be used in cases where synchronous memory invalidation can deadlock, or needs to have an active QP. The important case is fencing an RPC's memory regions after it is signaled (^C) and before it exits. If this is not done, there is a window where the server can write an RPC reply into memory that the client has released and re-used for some other purpose. Note that this is a full solution for FRWR, but FMR and physical still have some gaps where a particularly bad server can wreak some havoc on the client. These gaps are not made worse by this patch and are expected to be exceptionally rare and timing-based. They are noted in documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Refactor __fmr_dma_unmap()Chuck Lever
Separate the DMA unmap operation from freeing the MW. In a subsequent patch they will not always be done at the same time, and they are not related operations (except by order; freeing the MW must be the last step during invalidation). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>