summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2018-05-31nvme-fabrics: allow internal passthrough command on deleting controllersChristoph Hellwig
Without this we can't cleanly shut down. Based on analysis an an earlier patch from Hannes Reinecke. Fixes: bb06ec31452f ("nvme: expand nvmf_check_if_ready checks") Reported-by: Hannes Reinecke <hare@suse.de> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com>
2018-05-30nvme-loop: add support for multiple portsChristoph Hellwig
This is useful at least for multipath testing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2018-05-30nvme-pci: simplify __nvme_submit_cmdChristoph Hellwig
With recent CQ handling improvements we can now move the locking into __nvme_submit_cmd. Also remove the local tail variable to make the code more obvious, remove the __ prefix in the name, and fix the comments describing the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
2018-05-30nvme-pci: Rate limit the nvme timeout warningsKeith Busch
The block layer's timeout handling currently prevents drivers from completing commands outside the timeout callback once blk-mq decides they've expired. If a device breaks, this could potentially create many thousands of timed out commands. There's nothing of value to be gleaned from observing each of those messages, so this patch adds a rate limit on them. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-30nvme: allow duplicate controller if prior controller being deletedJames Smart
The current checks for whether a new controller request "matches" an existing controller ignores controller state and checks identity strings. There are cases where an existing controller may be in its last steps of deletion when they are "matched" by a new connection. Change the behavior so that the new connection ignores controllers that are deleted. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-29block: remove parent device reference from struct bsg_class_deviceChristoph Hellwig
Bsg holding a reference to the parent device may result in a crash if a bsg file handle is closed after the parent device driver has unloaded. Holding a reference is not really needed: the parent device must exist between bsg_register_queue and bsg_unregister_queue. Before the device goes away the caller does blk_cleanup_queue so that all in-flight requests to the device are gone and all new requests cannot pass beyond the queue. The queue itself is a refcounted object and it will stay alive with a bsg file. Based on analysis, previous patch and changelog from Anatoliy Glagolev. Reported-by: Anatoliy Glagolev <glagolig@gmail.com> Reviewed-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29Merge branch 'nvme-4.18-2' of git://git.infradead.org/nvme into for-4.18/blockJens Axboe
Pull NVMe changes from Christoph: "Here is the current batch of nvme updates for 4.18, we have a few more patches in the queue, but I'd like to get this pile into your tree and linux-next ASAP. The biggest item is support for file-backed namespaces in the NVMe target from Chaitanya, in addition to that we mostly small fixes from all the usual suspects." * 'nvme-4.18-2' of git://git.infradead.org/nvme: nvme: fixup memory leak in nvme_init_identify() nvme: fix KASAN warning when parsing host nqn nvmet-loop: use nr_phys_segments when map rq to sgl nvmet-fc: increase LS buffer count per fc port nvmet: add simple file backed ns support nvmet: remove duplicate NULL initialization for req->ns nvmet: make a few error messages more generic nvme-fabrics: allow duplicate connections to the discovery controller nvme-fabrics: centralize discovery controller defaults nvme-fabrics: remove unnecessary controller subnqn validation nvme-fc: remove setting DNR on exception conditions nvme-rdma: stop admin queue before freeing it nvme-pci: Fix AER reset handling nvme-pci: set nvmeq->cq_vector after alloc cq/sq nvme: host: core: fix precedence of ternary operator nvme: fix lockdep warning in nvme_mpath_clear_current_path
2018-05-29libiscsi: don't try to bypass SCSI EHChristoph Hellwig
libiscsi is the only SCSI code that return BLK_EH_HANDLED, thus trying to bypass the normal SCSI EH code. We are going to remove this return value at the block layer, and at least from a quick look it doesn't look too harmful to try to send an abort for these cases, especially as the first one should not actually be possible. If this doesn't work out iscsi will probably need its own eh_strategy_handler instead to just do the right thing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29mmc: complete requests from ->timeoutChristoph Hellwig
By completing the request entirely in the driver we can remove the BLK_EH_HANDLED return value and thus the split responsibility between the driver and the block layer that has been causing trouble. [While this keeps existing behavior it seems to mismatch the comment, maintainers please chime in!] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29scsi_transport_fc: complete requests from ->timeoutChristoph Hellwig
By completing the request entirely in the driver we can remove the BLK_EH_HANDLED return value and thus the split responsibility between the driver and the block layer that has been causing trouble. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29null_blk: complete requests from ->timeoutChristoph Hellwig
By completing the request entirely in the driver we can remove the BLK_EH_HANDLED return value and thus the split responsibility between the driver and the block layer that has been causing trouble. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29mtip32xx: complete requests from ->timeoutChristoph Hellwig
By completing the request entirely in the driver we can remove the BLK_EH_HANDLED return value and thus the split responsibility between the driver and the block layer that has been causing trouble. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29nbd: complete requests from ->timeoutChristoph Hellwig
By completing the request entirely in the driver we can remove the BLK_EH_HANDLED return value and thus the split responsibility between the driver and the block layer that has been causing trouble. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29nvme: return BLK_EH_DONE from ->timeoutChristoph Hellwig
NVMe always completes the request before returning from ->timeout, either by polling for it, or by disabling the controller. Return BLK_EH_DONE so that the block layer doesn't even try to complete it again. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONEChristoph Hellwig
The BLK_EH_NOT_HANDLED implies nothing happen, but very often that is not what is happening - instead the driver already completed the command. Fix the symbolic name to reflect that a little better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29libata: remove ata_scsi_timed_outChristoph Hellwig
As far as I can tell this function can't even be called any more, given that ATA implements its own eh_strategy_handler with ata_scsi_error, which never calls ->eh_timed_out. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-28bcache: Replace bch_read_string_list() by __sysfs_match_string()Andy Shevchenko
Kernel library has a common function to match user input from sysfs against an array of strings. Thus, replace bch_read_string_list() by __sysfs_match_string(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-28bcache: Move couple of functions to sysfs.cAndy Shevchenko
There is couple of functions that are used exclusively in sysfs.c. Move it to there and make them static. Besides above, it will allow further clean up. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-28bcache: Move couple of string arrays to sysfs.cAndy Shevchenko
There is couple of string arrays that are used exclusively in sysfs.c. Move it to there and make them static. Besides above, it will allow further clean up. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-28bcache: stop bcache device when backing device is offlineColy Li
Currently bcache does not handle backing device failure, if backing device is offline and disconnected from system, its bcache device can still be accessible. If the bcache device is in writeback mode, I/O requests even can success if the requests hit on cache device. That is to say, when and how bcache handles offline backing device is undefined. This patch tries to handle backing device offline in a rather simple way, - Add cached_dev->status_update_thread kernel thread to update backing device status in every 1 second. - Add cached_dev->offline_seconds to record how many seconds the backing device is observed to be offline. If the backing device is offline for BACKING_DEV_OFFLINE_TIMEOUT (30) seconds, set dc->io_disable to 1 and call bcache_device_stop() to stop the bache device which linked to the offline backing device. Now if a backing device is offline for BACKING_DEV_OFFLINE_TIMEOUT seconds, its bcache device will be removed, then user space application writing on it will get error immediately, and handler the device failure in time. This patch is quite simple, does not handle more complicated situations. Once the bcache device is stopped, users need to recovery the backing device, register and attach it manually. Changelog: v3: call wait_for_kthread_stop() before exits kernel thread. v2: remove "bcache: " prefix when calling pr_warn(). v1: initial version. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Michael Lyle <mlyle@lyle.org> Cc: Junhui Tang <tang.junhui@zte.com.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-25nvme: fixup memory leak in nvme_init_identify()Hannes Reinecke
If nvme_get_effects_log() failed the 'id' buffer from the previous nvme_identify_ctrl() call will never be freed. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme: fix KASAN warning when parsing host nqnHannes Reinecke
The host nqn actually is smaller than the space reserved for it, so we should be using strlcpy to keep KASAN happy. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvmet-loop: use nr_phys_segments when map rq to sglChaitanya Kulkarni
Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check if a command contains data to me mapped. This fixes the case where a struct requests contains LBAs, but no data will actually be send, e.g. the pending Write Zeroes support. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvmet-fc: increase LS buffer count per fc portJames Smart
Todays limit on concurrent LS's is very small - 4 buffers. With large subsystem counts or large numbers of initiators connecting, the limit may be exceeded. Raise the LS buffer count to 256. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvmet: add simple file backed ns supportChaitanya Kulkarni
This patch adds simple file backed namespace support for NVMeOF target. The new file io-cmd-file.c is responsible for handling the code for I/O commands when ns is file backed. Also, we introduce mempools based slow path using sync I/Os for file backed ns to ensure forward progress under reclaim. The old block device based implementation is moved to io-cmd-bdev.c and use a "nvmet_bdev_" symbol prefix. The enable/disable calls are also move into the respective files. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: updated changelog, fixed double req->ns lookup in bdev case] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvmet: remove duplicate NULL initialization for req->nsChaitanya Kulkarni
Remove the duplicate NULL initialization for req->ns. req->ns is always initialized to NULL in nvmet_req_init(), so there is no need to reset it later on failures unless we have previously assigned a value to it. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvmet: make a few error messages more genericChaitanya Kulkarni
"nvmet_check_ctrl_status()" is called from admin-cmd.c along with io-cmd.c, make the error message more generic. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-fabrics: allow duplicate connections to the discovery controllerHannes Reinecke
The whole point of the discovery controller is that it can accept multiple connections. Additionally the cmic field is not even defined for the discovery controller identify page. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-fabrics: centralize discovery controller defaultsHannes Reinecke
When connecting to the discovery controller we have certain defaults to observe, so centralize them to avoid inconsistencies due to argument ordering. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-fabrics: remove unnecessary controller subnqn validationJames Smart
After creating the nvme controller, nvmf_create_ctrl() validates the newly created subsysnqn vs the one specified by the options. In general, this is an unnecessary check as the Connect message should implicitly ensure this value matches. With the change to the FC transport to do an asynchronous connect for the first association create, the transport will return to nvmf_create_ctrl() before that first association has been established, thus the subnqn will not yet be set. Remove the unnecessary validation. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-fc: remove setting DNR on exception conditionsJames Smart
Current code will set DNR if the controller is deleting or there is an error during controller init. None of this is necessary. Remove the code that sets DNR Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-rdma: stop admin queue before freeing itJianchao Wang
For any failure after nvme_rdma_start_queue in nvme_rdma_configure_admin_queue, the admin queue will be freed with the NVME_RDMA_Q_LIVE flag still set. Once nvme_rdma_stop_queue is invoked, that will cause a use-after-free. BUG: KASAN: use-after-free in rdma_disconnect+0x1f/0xe0 [rdma_cm] To fix it, call nvme_rdma_stop_queue for all the failed cases after nvme_rdma_start_queue. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-pci: Fix AER reset handlingKeith Busch
The nvme timeout handling doesn't do anything if the pci channel is offline, which is the case when recovering from PCI error event, so it was a bad idea to sync the controller reset in this state. This patch flushes the reset work in the error_resume callback instead when the channel is back to online. This keeps AER handling serialized and can recover from timeouts. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199757 Fixes: cc1d5e749a2e ("nvme/pci: Sync controller reset for AER slot_reset") Reported-by: Alex Gagniuc <mr.nuke.me@gmail.com> Tested-by: Alex Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25nvme-pci: set nvmeq->cq_vector after alloc cq/sqJianchao Wang
Set cq_vector after alloc cq/sq, otherwise nvme_suspend_queue will invoke free_irq for it and cause a 'Trying to free already-free IRQ xxx' warning if the create CQ/SQ command times out. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Keith Busch <keith.busch@intel.com> [hch: fixed to pass a s16 and clean up the comment] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-24block drivers/block: Use octal not symbolic permissionsJoe Perches
Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-23nbd: set discard granularity properlyJosef Bacik
For some reason we had discard granularity set to 512 always even when discards were disabled. Fix this by having the default be 0, and then if we turn it on set the discard granularity to the blocksize. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-23nvme: host: core: fix precedence of ternary operatorIvan Bornyakov
Ternary operator have lower precedence then bitwise or, so 'cdw10' was calculated wrong. Signed-off-by: Ivan Bornyakov <brnkv.i1@gmail.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-05-23nvme: fix lockdep warning in nvme_mpath_clear_current_pathJohannes Thumshirn
When running blktest's nvme/005 with a lockdep enabled kernel the test case fails due to the following lockdep splat in dmesg: ============================= WARNING: suspicious RCU usage 4.17.0-rc5 #881 Not tainted ----------------------------- drivers/nvme/host/nvme.h:457 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by kworker/u32:5/1102: #0: (ptrval) ((wq_completion)"nvme-wq"){+.+.}, at: process_one_work+0x152/0x5c0 #1: (ptrval) ((work_completion)(&ctrl->scan_work)){+.+.}, at: process_one_work+0x152/0x5c0 #2: (ptrval) (&subsys->lock#2){+.+.}, at: nvme_ns_remove+0x43/0x1c0 [nvme_core] The only caller of nvme_mpath_clear_current_path() is nvme_ns_remove() which holds the subsys lock so it's likely a false positive, but when using rcu_access_pointer(), we're telling rcu and lockdep that we're only after the pointer falue. Fixes: 32acab3181c7 ("nvme: implement multipath access to nvme subsystems") Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-05-22block/ndb: add WQ_UNBOUND to the knbd-recv workqueueDan Melnic
Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound to a single CPU that is selected at device creation time. Signed-off-by: Dan Melnic <dmm@fb.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-21nvme-pci: fix race between poll and IRQ completionsJens Axboe
If polling completions are racing with the IRQ triggered by a completion, the IRQ handler will find no work and return IRQ_NONE. This can trigger complaints about spurious interrupts: [ 560.169153] irq 630: nobody cared (try booting with the "irqpoll" option) [ 560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65 [ 560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018 [ 560.175991] Call Trace: [ 560.175994] <IRQ> [ 560.176005] dump_stack+0x5c/0x7b [ 560.176010] __report_bad_irq+0x30/0xc0 [ 560.176013] note_interrupt+0x235/0x280 [ 560.176020] handle_irq_event_percpu+0x51/0x70 [ 560.176023] handle_irq_event+0x27/0x50 [ 560.176026] handle_edge_irq+0x6d/0x180 [ 560.176031] handle_irq+0xa5/0x110 [ 560.176036] do_IRQ+0x41/0xc0 [ 560.176042] common_interrupt+0xf/0xf [ 560.176043] </IRQ> [ 560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0 [ 560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f [ 560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000 [ 560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008 [ 560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358 [ 560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593 [ 560.176065] ? cpuidle_enter_state+0x8b/0x2b0 [ 560.176071] do_idle+0x1f4/0x260 [ 560.176075] cpu_startup_entry+0x6f/0x80 [ 560.176080] start_secondary+0x184/0x1d0 [ 560.176085] secondary_startup_64+0xa5/0xb0 [ 560.176088] handlers: [ 560.178387] [<00000000efb612be>] nvme_irq [nvme] [ 560.183019] Disabling IRQ #630 A previous commit removed ->cqe_seen that was handling this case, but we need to handle this a bit differently due to completions now running outside the queue lock. Return IRQ_HANDLED from the IRQ handler, if the completion ring head was moved since we last saw it. Fixes: 5cb525c8315f ("nvme-pci: handle completions outside of the queue lock") Reported-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Tested-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-21Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-4.18/blockJens Axboe
Pull NVMe changes from Keith: "This is just the first nvme pull request for 4.18. There are several fabrics and target patches that I missed, so there will be more to come." * 'nvme-4.18' of git://git.infradead.org/nvme: nvme-pci: drop IRQ disabling on submission queue lock nvme-pci: split the nvme queue lock into submission and completion locks nvme-pci: handle completions outside of the queue lock nvme-pci: move ->cq_vector == -1 check outside of ->q_lock nvme-pci: remove cq check after submission nvme-pci: simplify nvme_cqe_valid nvme: mark the result argument to nvme_complete_async_event volatile nvme/pci: Sync controller reset for AER slot_reset nvme/pci: Hold controller reference during async probe nvme: only reconfigure discard if necessary nvme/pci: Use async_schedule for initial reset work nvme: lightnvm: add granby support NVMe: Add Quirk Delay before CHK RDY for Seagate Nytro Flash Storage nvme: change order of qid and cmdid in completion trace nvme: fc: provide a descriptive error
2018-05-18nvme-pci: drop IRQ disabling on submission queue lockJens Axboe
Since we aren't sharing the lock for completions now, we don't have to make it IRQ safe. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18nvme-pci: split the nvme queue lock into submission and completion locksJens Axboe
This is now feasible. We protect the submission queue ring with ->sq_lock, and the completion side with ->cq_lock. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18nvme-pci: handle completions outside of the queue lockJens Axboe
Split the completion of events into a two part process: 1) Reap the events inside the queue lock 2) Complete the events outside the queue lock Since we never wrap the queue, we can access it locklessly after we've updated the completion queue head. This patch started off with batching events on the stack, but with this trick we don't have to. Keith Busch <keith.busch@intel.com> came up with that idea. Note that this kills the ->cqe_seen as well. I haven't been able to trigger any ill effects of this. If we do race with polling every so often, it should be rare enough NOT to trigger any issues. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: refactored, restored poll early exit optimization] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18nvme-pci: move ->cq_vector == -1 check outside of ->q_lockJens Axboe
We only clear it dynamically in nvme_suspend_queue(). When we do, ensure to do a full flush so that any nvme_queue_rq() invocation will see it. Ideally we'd kill this check completely, but we're using it to flush requests on a dying queue. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18nvme-pci: remove cq check after submissionJens Axboe
We always check the completion queue after submitting, but in my testing this isn't a win even on DRAM/xpoint devices. In some cases it's actually worse. Kill it. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-18nvme-pci: simplify nvme_cqe_validChristoph Hellwig
We always look at the current CQ head and phase, so don't pass these as separate arguments, and rename the function to nvme_cqe_pending. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-18nvme: mark the result argument to nvme_complete_async_event volatileChristoph Hellwig
We'll need that in the PCIe driver soon as we'll read it straight off the CQ. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16nbd: call nbd_bdev_reset instead of bd_set_size on disconnectJosef Bacik
We need to make sure we don't just set the size of the bdev to 0 while it's being used by a file system. We have the appropriate check in nbd_bdev_reset, simply use that helper instead. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16nbd: fix how we set bd_invalidatedJosef Bacik
bd_invalidated is kind of a pain wrt partitions as it really only triggers the partition rescan if it is set after bd_ops->open() runs, so setting it when we reset the device isn't useful. We also sporadically would still have partitions left over in some disconnect cases, so fix this by always setting bd_invalidated on open if there's no configuration or if we've had a disconnect action happen, that way the partition table gets invalidated and rescanned properly. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>