summaryrefslogtreecommitdiff
path: root/drivers/nvme/host/fabrics.c
AgeCommit message (Collapse)Author
2019-02-20nvme-fabrics: convert to SPDX identifiersChristoph Hellwig
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2019-02-20nvme-fabrics: document the poll function argumentBart Van Assche
This patch avoids that the kernel-doc tool reports a warning when building with W=1. Fixes: 26c682274e0a ("nvme-fabrics: allow nvmf_connect_io_queue to poll") # v5.0-rc1 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-09nvme-fabrics: unset write/poll queues for discovery controllersSagi Grimberg
Even if user-space sent it to us, it got it wrong so lets help by disallowing it. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvme-fabrics: allow user to pass in nr_poll_queuesSagi Grimberg
This argument will specify how many polling I/O queues to connect when creating the controller. These I/O queues will host I/O that is set with REQ_HIPRI. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvme-fabrics: allow nvmf_connect_io_queue to pollSagi Grimberg
Preparation for polling support for fabrics. Polling support means that our completion queues are not generating any interrupts which means we need to poll for the nvmf io queue connect as well. Reviewed by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-18nvme-core: optionally poll sync commandsSagi Grimberg
Pass poll bool to indicate that we need it to poll. This prepares us for polling support in nvmf since connect is an I/O that will be queued and has to be polled in order to complete. If poll is passed, we call nvme_execute_rq_polled which sends the requests and polls for its completion. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13nvme-fabrics: allow user to set nr_write_queues for separate queue mapsSagi Grimberg
This argument will specify how many I/O queues will be connected in create_ctrl in addition to nr_io_queues. With this configuration, I/O that carries payload from the host to the target, will use the default hctx queue map, and I/O that involves target to host transfers will use the read hctx queue map. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13nvme-fabrics: allow user passing data digestSagi Grimberg
Data digest is a nvme-tcp specific feature, but nothing prevents other transports reusing the concept so do not associate with tcp transport solely. Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13nvme-fabrics: allow user passing header digestSagi Grimberg
Header digest is a nvme-tcp specific feature, but nothing prevents other transports reusing the concept so do not associate with tcp transport solely. Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-07nvme: disable fabrics SQ flow control when asked by the userSagi Grimberg
As for now, we don't care about sq_head pointer updates anyway, so at least allow the controller to micro-optimize by omiting this update. Note that we will probably need to support it when a controller that requires this comes along. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-19nvme-fabrics: move controller options matching to fabricsSagi Grimberg
IP transports will most likely use the same controller options matching when detecting a duplicate connect. Move it to fabrics. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/OJames Smart
When an io is rejected by nvmf_check_ready() due to validation of the controller state, the nvmf_fail_nonready_command() will normally return BLK_STS_RESOURCE to requeue and retry. However, if the controller is dying or the I/O is marked for NVMe multipath, the I/O is failed so that the controller can terminate or so that the io can be issued on a different path. Unfortunately, as this reject point is before the transport has accepted the command, blk-mq ends up completing the I/O and never calls nvme_complete_rq(), which is where multipath may preserve or re-route the I/O. The end result is, the device user ends up seeing an EIO error. Example: single path connectivity, controller is under load, and a reset is induced. An I/O is received: a) while the reset state has been set but the queues have yet to be stopped; or b) after queues are started (at end of reset) but before the reconnect has completed. The I/O finishes with an EIO status. This patch makes the following changes: - Adds the HOST_PATH_ERROR pathing status from TP4028 - Modifies the reject point such that it appears to queue successfully, but actually completes the io with the new pathing status and calls nvme_complete_rq(). - nvme_complete_rq() recognizes the new status, avoids resetting the controller (likely was already done in order to get this new status), and calls the multipather to clear the current path that errored. This allows the next command (retry or new command) to select a new path if there is one. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-14Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ...
2018-08-08nvme-fabrics: fix ctrl_loss_tmo < 0 to reconnect foreverTal Shorer
When the user supplies a ctrl_loss_tmo < 0, we warn them that this will cause the fabrics layer to attempt reconnection forever. However, in reality the fabrics layer never attempts to reconnect because the condition to test whether we should reconnect is backwards in this case. Signed-off-by: Tal Shorer <tal.shorer@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvme: if_ready checks to fail io to deleting controllerJames Smart
The revised if_ready checks skipped over the case of returning error when the controller is being deleted. Instead it was returning BUSY, which caused the ios to retry, which caused the ns delete to hang waiting for the ios to drain. Stack trace of hang looks like: kworker/u64:2 D 0 74 2 0x80000000 Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core] Call Trace: ? __schedule+0x26d/0x820 schedule+0x32/0x80 blk_mq_freeze_queue_wait+0x36/0x80 ? remove_wait_queue+0x60/0x60 blk_cleanup_queue+0x72/0x160 nvme_ns_remove+0x106/0x140 [nvme_core] nvme_remove_namespaces+0x7e/0xa0 [nvme_core] nvme_delete_ctrl_work+0x4d/0x80 [nvme_core] process_one_work+0x160/0x350 worker_thread+0x1c3/0x3d0 kthread+0xf5/0x130 ? process_one_work+0x350/0x350 ? kthread_bind+0x10/0x10 ret_from_fork+0x1f/0x30 Extend nvmf_fail_nonready_command() to supply the controller pointer so that the controller state can be looked at. Fail any io to a controller that is deleting. Fixes: 3bc32bb1186c ("nvme-fabrics: refactor queue ready check") Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready") Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com>
2018-06-15nvme-fabrics: fix and refine state checks in __nvmf_check_readyChristoph Hellwig
- make sure we only allow internally generates commands in any non-live state - only allow connect commands on non-live queues when actually in the new or connecting states - treat all other non-live, non-dead states the same as a default cach-all This fixes a regression where we could not shutdown a controller orderly as we didn't allow the internal generated Property Set command, and also ensures we don't accidentally let a Connect command through in the wrong state. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com>
2018-06-15nvme-fabrics: refactor queue ready checkChristoph Hellwig
Move the is_connected check to the fibre channel transport, as it has no meaning for other transports. To facilitate this split out a new nvmf_fail_nonready_command helper that is called by the transport when it is asked to handle a command on a queue that is not ready. Also avoid a function call for the queue live fast path by inlining the check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com>
2018-06-08nvme: don't hold nvmf_transports_rwsem for more than transport lookupsJohannes Thumshirn
Only take nvmf_transports_rwsem when doing a lookup of registered transports, so that a blocking ->create_ctrl doesn't prevent other actions on /dev/nvme-fabrics. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> [hch: increased lock hold time a bit to be safe, added a comment and updated the changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-31nvme-fabrics: allow internal passthrough command on deleting controllersChristoph Hellwig
Without this we can't cleanly shut down. Based on analysis an an earlier patch from Hannes Reinecke. Fixes: bb06ec31452f ("nvme: expand nvmf_check_if_ready checks") Reported-by: Hannes Reinecke <hare@suse.de> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com>
2018-05-25nvme: fix KASAN warning when parsing host nqnHannes Reinecke
The host nqn actually is smaller than the space reserved for it, so we should be using strlcpy to keep KASAN happy. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-fabrics: allow duplicate connections to the discovery controllerHannes Reinecke
The whole point of the discovery controller is that it can accept multiple connections. Additionally the cmic field is not even defined for the discovery controller identify page. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-fabrics: centralize discovery controller defaultsHannes Reinecke
When connecting to the discovery controller we have certain defaults to observe, so centralize them to avoid inconsistencies due to argument ordering. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-fabrics: remove unnecessary controller subnqn validationJames Smart
After creating the nvme controller, nvmf_create_ctrl() validates the newly created subsysnqn vs the one specified by the options. In general, this is an unnecessary check as the Connect message should implicitly ensure this value matches. With the change to the FC transport to do an asynchronous connect for the first association create, the transport will return to nvmf_create_ctrl() before that first association has been established, thus the subnqn will not yet be set. Remove the unnecessary validation. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-03nvme: fix potential memory leak in option parsingChengguang Xu
When specifying same string type option several times, current option parsing may cause memory leak. Hence, call kfree for previous one in this case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-12nvme: expand nvmf_check_if_ready checksJames Smart
The nvmf_check_if_ready() checks that were added are very simplistic. As such, the routine allows a lot of cases to fail ios during windows of reset or re-connection. In cases where there are not multi-path options present, the error goes back to the callee - the filesystem or application. Not good. The common routine was rewritten and calling syntax slightly expanded so that per-transport is_ready routines don't need to be present. The transports now call the routine directly. The routine is now a fabrics routine rather than an inline function. The routine now looks at controller state to decide the action to take. Some states mandate io failure. Others define the condition where a command can be accepted. When the decision is unclear, a generic queue-or-reject check is made to look for failfast or multipath ios and only fails the io if it is so marked. Otherwise, the io will be queued and wait for the controller state to resolve. Admin commands issued via ioctl share a live admin queue with commands from the transport for controller init. The ioctls could be intermixed with the initialization commands. It's possible for the ioctl cmd to be issued prior to the controller being enabled. To block this, the ioctl admin commands need to be distinguished from admin commands used for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to reflect this division and set it on ioctls requests. As the nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(), ensure that commands allocated by the ioctl path (actually anything in core.c) preps the nvme_req(req) before starting the io. This will preserve the USERCMD flag during execution and/or retry. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.e> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-12nvme: don't send keep-alives to the discovery controllerJohannes Thumshirn
NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and Command Support" Figure 31 "Discovery Controller – Admin Commands" explicitly listst all commands but "Get Log Page" and "Identify" as reserved, but NetApp report the Linux host is sending Keep Alive commands to the discovery controller, which is a violation of the Spec. We're already checking for discovery controllers when configuring the keep alive timeout but when creating a discovery controller we're not hard wiring the keep alive timeout to 0 and thus remain on NVME_DEFAULT_KATO for the discovery controller. This can be easily remproduced when issuing a direct connect to the discovery susbsystem using: 'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery' Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: 07bfcd09a288 ("nvme-fabrics: add a generic NVMe over Fabrics library") Reported-by: Martin George <marting@netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-08nvme-fabrics: Ignore nr_io_queues option for discovery controllersRoland Dreier
This removes a dependency on the order options are passed when creating a fabrics controller. With the old code, if "nr_io_queues" appears before an "nqn" option specifying the discovery controller, then nr_io_queues is overridden with zero. If "nr_io_queues" appears after specifying the discovery controller, then the nr_io_queues option is used to set the number of queues, and the driver attempts to establish IO connections to the discovery controller (which doesn't work). It seems better to ignore (and warn about) the "nr_io_queues" option if userspace has already asked to connect to the discovery controller. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-02-22nvme-fabrics: don't check for non-NULL module in nvmf_register_transportChristoph Hellwig
THIS_MODULE evaluates to NULL when used from code built into the kernel, thus breaking built-in transport modules. Remove the bogus check. Fixes: 0de5cd36 ("nvme-fabrics: protect against module unload during create_ctrl") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-01-29Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "This is the main pull request for block IO related changes for the 4.16 kernel. Nothing major in this pull request, but a good amount of improvements and fixes all over the map. This contains: - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and Paolo. - Support for SMR zones for deadline and mq-deadline from Damien and Christoph. - Set of fixes for bcache by way of Michael Lyle, including fixes from himself, Kent, Rui, Tang, and Coly. - Series from Matias for lightnvm with fixes from Hans Holmberg, Javier, and Matias. Mostly centered around pblk, and the removing rrpc 1.2 in preparation for supporting 2.0. - A couple of NVMe pull requests from Christoph. Nothing major in here, just fixes and cleanups, and support for command tracing from Johannes. - Support for blk-throttle for tracking reads and writes separately. From Joseph Qi. A few cleanups/fixes also for blk-throttle from Weiping. - Series from Mike Snitzer that enables dm to register its queue more logically, something that's alwways been problematic on dm since it's a stacked device. - Series from Ming cleaning up some of the bio accessor use, in preparation for supporting multipage bvecs. - Various fixes from Ming closing up holes around queue mapping and quiescing. - BSD partition fix from Richard Narron, fixing a problem where we can't mount newer (10/11) FreeBSD partitions. - Series from Tejun reworking blk-mq timeout handling. The previous scheme relied on atomic bits, but it had races where we would think a request had timed out if it to reused at the wrong time. - null_blk now supports faking timeouts, to enable us to better exercise and test that functionality separately. From me. - Kill the separate atomic poll bit in the request struct. After this, we don't use the atomic bits on blk-mq anymore at all. From me. - sgl_alloc/free helpers from Bart. - Heavily contended tag case scalability improvement from me. - Various little fixes and cleanups from Arnd, Bart, Corentin, Douglas, Eryu, Goldwyn, and myself" * 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits) block: remove smart1,2.h nvme: add tracepoint for nvme_complete_rq nvme: add tracepoint for nvme_setup_cmd nvme-pci: introduce RECONNECTING state to mark initializing procedure nvme-rdma: remove redundant boolean for inline_data nvme: don't free uuid pointer before printing it nvme-pci: Suspend queues after deleting them bsg: use pr_debug instead of hand crafted macros blk-mq-debugfs: don't allow write on attributes with seq_operations set nvme-pci: Fix queue double allocations block: Set BIO_TRACE_COMPLETION on new bio during split blk-throttle: use queue_is_rq_based block: Remove kblockd_schedule_delayed_work{,_on}() blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() lib/scatterlist: Fix chaining support in sgl_alloc_order() blk-throttle: track read and write request individually block: add bdev_read_only() checks to common helpers block: fail op_is_write() requests to read-only partitions blk-throttle: export io_serviced_recursive, io_service_bytes_recursive ...
2018-01-25nvme: don't free uuid pointer before printing itJohannes Thumshirn
Commit df351ef73789 ("nvme-fabrics: fix memory leak when parsing host ID option") fixed the leak of 'p' but in case uuid_parse() fails the memory is freed before the error print that is using it. Free it after printing eventual errors. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: df351ef73789 ("nvme-fabrics: fix memory leak when parsing host ID option") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-15nvme-fabrics: fix memory leak when parsing host ID optionRoland Dreier
We use match_strdup() to get a copy of the option string for host ID string, but we just pass it to uuid_parse() and don't store the string pointer, so we need to kfree() the string after parsing it. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-08nvme-fabrics: protect against module unload during create_ctrlRoy Shterman
NVMe transport driver module unload may (and usually does) trigger iteration over the active controllers and delete them all (sometimes under a mutex). However, a controller can be created concurrently with module unload which can lead to leakage of resources (most important char device node leakage) in case the controller creation occured after the unload delete and drain sequence. To protect against this, we take a module reference to guarantee that the nvme transport driver is not unloaded while creating a controller. Signed-off-by: Roy Shterman <roys@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-01-08nvme-fabrics: initialize default host->id in nvmf_host_default()Ewan D. Milne
The field was uninitialized before use. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-10nvme: track subsystemsChristoph Hellwig
This adds a new nvme_subsystem structure so that we can track multiple controllers that belong to a single subsystem. For now we only use it to store the NQN, and to check that we don't have duplicate NQNs unless the involved subsystems support multiple controllers. Includes code originally from Hannes Reinecke to expose the subsystems in sysfs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-01nvme: move controller deletion to common codeChristoph Hellwig
Move the ->delete_work and the associated helpers to common code instead of duplicating them in every driver. This also adds the missing reference get/put for the loop driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27nvme: add duplicate_connect optionJames Smart
Add the "duplicate_connect" boolean option (presence means true). Default is false. When false, the transport should validate whether a new controller request is targeted for the same host transport addressing and target transport addressing as an existing controller. If so, the new controller request should be rejected. When true, the callee is explicitly requesting a duplicate controller connection to be made and the new request should be attempted. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04nvme-fabrics: request transport moduleSagi Grimberg
Help userspace to make sure transport module is loaded. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-09-25nvme-fabrics: Allow 0 as KATO valueGuilherme G. Piccoli
Currently, driver code allows user to set 0 as KATO (Keep Alive TimeOut), but this is not being respected. This patch enforces the expected behavior. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-01nvme-fabrics: generate spec-compliant UUID NQNsDaniel Verkamp
The default host NQN, which is generated based on the host's UUID, does not follow the UUID-based NQN format laid out in the NVMe 1.3 specification. Remove the "NVMf:" portion of the NQN to match the spec. Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-08-30nvme-fabrics: Convert nvmf_transports_mutex to an rwsemRoland Dreier
The mutex protects against the list of transports changing while a controller is being created, but using a plain old mutex means that it also serializes controller creation. This unnecessarily slows down creating multiple controllers - for example for the RDMA transport, creating a controller involves establishing one connection for every IO queue, which involves even more network/software round trips, so the delay can become significant. The simplest way to fix this is to change the mutex to an rwsem and only hold it for writing when the list is being mutated. Since we can take the rwsem for reading while creating a controller, we can create multiple controllers in parallel. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-08-28nvme-fabrics: log a warning if hostid is invalidGuan Junxiong
This helps users to quickly spot the reason of why connection fails if the hostid is not compliant with the uuid format. Signed-off-by: Guan Junxiong <guanjunxiong@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-08-17nvme-fabrics: fix reporting of unrecognized optionsChristoph Hellwig
Only print the specified options that are not recognized, instead of the whole list of options. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
2017-06-28nvme-fabrics: verify that a controller returns the correct NQNChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: read the subsystem NQN from Identify ControllerChristoph Hellwig
NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the Identify controller data structures. Use this NQN for the subsysnqn sysfs attribute by storing it in the nvme_ctrl structure after verifying it. For older controllers we generate a "fake" NQN per non-normative text in the NVMe 1.3 spec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: use a single NVME_AQ_DEPTH and relax it to 32Sagi Grimberg
No need to differentiate fabrics from pci/loop, also lower it to 32 as we don't really need 256 inflight admin commands. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: add hostid token to fabric optionsJohannes Thumshirn
Currently we have no way to define a stable host-id but always use the one which is randomly generated when we add the host or use the default host. Provide a "hostid=%s" for user-space to pass in a persistent host-id which overrides the randomly generated one. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-15nvmf: keep track of nvmet connect error statusGuan Junxiong
To let the host know what happends to the connection establishment, adjust the behavior of nvmf_log_connect_error to make more connect specifig error codes human-readble. Signed-off-by: Guan Junxiong <guanjunxiong@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-15nvme: move nr_reconnects to nvme_ctrlSagi Grimberg
It is not a user option but rather a variable controller attribute. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-05nvme: switch to uuid_tChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-04-04nvme-fabrics: Allow ctrl loss timeout configurationSagi Grimberg
When a host sense that its controller session is damaged, it tries to re-establish it periodically (reconnect every reconnect_delay). It may very well be that the controller is gone and never coming back, in this case the host will try to reconnect forever. Add a ctrl_loss_tmo to bound the number of reconnect attempts to a specific controller (default to a reasonable 10 minutes). The timeout configuration is actually translated into number of reconnect attempts and not a schedule on its own but rather divided with reconnect_delay. This is useful to prevent racing flows of remove and reconnect, and it doesn't really matter if we remove slightly sooner than what the user requested. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>