summaryrefslogtreecommitdiff
path: root/drivers/nvme
AgeCommit message (Collapse)Author
2018-10-22Merge tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "This is the main pull request for block changes for 4.20. This contains: - Series enabling runtime PM for blk-mq (Bart). - Two pull requests from Christoph for NVMe, with items such as; - Better AEN tracking - Multipath improvements - RDMA fixes - Rework of FC for target removal - Fixes for issues identified by static checkers - Fabric cleanups, as prep for TCP transport - Various cleanups and bug fixes - Block merging cleanups (Christoph) - Conversion of drivers to generic DMA mapping API (Christoph) - Series fixing ref count issues with blkcg (Dennis) - Series improving BFQ heuristics (Paolo, et al) - Series improving heuristics for the Kyber IO scheduler (Omar) - Removal of dangerous bio_rewind_iter() API (Ming) - Apply single queue IPI redirection logic to blk-mq (Ming) - Set of fixes and improvements for bcache (Coly et al) - Series closing a hotplug race with sysfs group attributes (Hannes) - Set of patches for lightnvm: - pblk trace support (Hans) - SPDX license header update (Javier) - Tons of refactoring patches to cleanly abstract the 1.2 and 2.0 specs behind a common core interface. (Javier, Matias) - Enable pblk to use a common interface to retrieve chunk metadata (Matias) - Bug fixes (Various) - Set of fixes and updates to the blk IO latency target (Josef) - blk-mq queue number updates fixes (Jianchao) - Convert a bunch of drivers from the old legacy IO interface to blk-mq. This will conclude with the removal of the legacy IO interface itself in 4.21, with the rest of the drivers (me, Omar) - Removal of the DAC960 driver. The SCSI tree will introduce two replacement drivers for this (Hannes)" * tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block: (204 commits) block: setup bounce bio_sets properly blkcg: reassociate bios when make_request() is called recursively blkcg: fix edge case for blk_get_rl() under memory pressure nvme-fabrics: move controller options matching to fabrics nvme-rdma: always have a valid trsvcid mtip32xx: fully switch to the generic DMA API rsxx: switch to the generic DMA API umem: switch to the generic DMA API sx8: switch to the generic DMA API sx8: remove dead IF_64BIT_DMA_IS_POSSIBLE code skd: switch to the generic DMA API ubd: remove use of blk_rq_map_sg nvme-pci: remove duplicate check drivers/block: Remove DAC960 driver nvme-pci: fix hot removal during error handling nvmet-fcloop: suppress a compiler warning nvme-core: make implicit seed truncation explicit nvmet-fc: fix kernel-doc headers nvme-fc: rework the request initialization code nvme-fc: introduce struct nvme_fcp_op_w_sgl ...
2018-10-19nvme-fabrics: move controller options matching to fabricsSagi Grimberg
IP transports will most likely use the same controller options matching when detecting a duplicate connect. Move it to fabrics. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-19nvme-rdma: always have a valid trsvcidSagi Grimberg
If not passed, we set the default trsvcid. We can rely on having trsvcid and can simplify the controller matching logic. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-18nvme-pci: remove duplicate checkChaitanya Kulkarni
This is a cleanup patch doesn't change any functionality. It removes the duplicate call to the blk_integrity_rq() in the nvme_map_data(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme-pci: fix hot removal during error handlingKeith Busch
A removal waits for the reset_work to complete. If a surprise removal occurs around the same time as an error triggered controller reset, and reset work happened to dispatch a command to the removed controller, the command won't be recovered since the timeout work doesn't do anything during error recovery. We wouldn't want to wait for timeout handling anyway, so this patch fixes this by disabling the controller and killing admin queues prior to syncing with the reset_work. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvmet-fcloop: suppress a compiler warningBart Van Assche
Building with W=1 enables the compiler warning -Wimplicit-fallthrough=3. That option does not recognize the fall-through comment in the fcloop driver. Add a fall-through comment that is recognized for -Wimplicit-fallthrough=3. This patch avoids that the compiler reports the following warning when building with W=1: drivers/nvme/target/fcloop.c:647:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (op == NVMET_FCOP_READDATA) ^ Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme-core: make implicit seed truncation explicitBart Van Assche
The nvme_user_io.slba field is 64 bits wide. That value is copied into the 32-bit bio_integrity_payload.bip_iter.bi_sector field. Make that truncation explicit to avoid that Coverity complains about implicit truncation. See also Coverity ID 1056486 on http://scan.coverity.com/projects/linux. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvmet-fc: fix kernel-doc headersBart Van Assche
This patch avoids that the kernel-doc tool complains about two function headers when building with W=1. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme-fc: rework the request initialization codeBart Van Assche
Instead of setting and then clearing the first_sgl pointer for AEN requests, leave that pointer zero. This patch does not change how requests are initialized but avoids that Coverity reports the following complaint for nvme_fc_init_aen_ops(): CID 1418400 (#1 of 1): Out-of-bounds access (OVERRUN) 4. overrun-buffer-val: Overrunning buffer pointed to by aen_op of 312 bytes by passing it to a function which accesses it at byte offset 312. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme-fc: introduce struct nvme_fcp_op_w_sglBart Van Assche
This patch does not change any functionality but makes the intent of the code more clear. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme-fc: fix kernel-doc headersBart Van Assche
This patch avoids that the kernel-doc tool complains about several multiple function headers when building with W=1. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvmet: avoid integer overflow in the discard codeBart Van Assche
Although I'm not sure whether it is a good idea to support large discard commands, I think integer overflow for discard ranges larger than 4 GB should be avoided. This patch avoids that smatch reports the following: drivers/nvme/target/io-cmd-file.c:249:1 nvmet_file_execute_discard() warn: should '((range.nlb)) << req->ns->blksize_shift' be a 64 bit type? Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvmet-rdma: declare local symbols staticBart Van Assche
This patch avoids that sparse complains about missing declarations. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvmet: use strlcpy() instead of strcpy()Bart Van Assche
Although the code modified by this patch looks fine to me, this patch avoids that Coverity reports the following complaint (ID 1364971 and ID 1364973): "You might overrun the 256-character fixed-size string id->subnqn". Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme-pci: fix nvme_suspend_queue() kernel-doc headerBart Van Assche
This patch avoids that the kernel-doc tool complains about the nvme_suspend_queue() function header when building with W=1. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme-core: rework a NQN copying operationBart Van Assche
Although it is easy to see that the code in nvme_init_subnqn() guarantees that the subsys->nqn string is '\0'-terminated, apparently Coverity is not smart enough to see this. Make it easier for Coverity to analyze this code by changing the strncpy() call into a strlcpy() call. This patch does not change the behavior of the code but fixes Coveritiy ID 1423720. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme-core: declare local symbols staticBart Van Assche
This patch avoids that sparse complains about missing declarations. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvmet-rdma: check for timeout in nvme_rdma_wait_for_cm()Bart Van Assche
Check whether queue->cm_error holds a value before reading it. This patch addresses Coverity ID 1373774: unchecked return value. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvmet: use strcmp() instead of strncmp() for subsystem lookupBart Van Assche
strncmp() stops comparing when either the end of one of the first two arguments is reached or when 'n' characters have been compared, whichever comes first. That means that strncmp(s1, s2, n) is equivalent to strcmp(s1, s2) if n exceeds the length of s1 or the length of s2. Since that is the case in nvmet_find_get_subsys(), change strncmp() into strcmp(). This patch avoids that the following warning is reported by smatch: drivers/nvme/target/core.c:940:1 nvmet_find_get_subsys() error: strncmp() '"nqn.2014-08.org.nvmexpress.discovery"' too small (37 vs 223) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvmet: remove unreachable codeChaitanya Kulkarni
Get rid of the unreachable code in the nvmet_parse_discovery_cmd(). Keep the error message identical to the admin-cmd.c and io-cmd*.c Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-17nvme: update node paths after adding new pathKeith Busch
The nvme namespace paths were being updated only when the current path was not set or nonoptimized. If a new path comes online that is a better path for its NUMA node, the multipath selector may continue using the previously set path on a potentially further node. This patch re-runs the path assignment after successfully adding a new optimized path. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-09lightnvm: do no update csecs and sos on 1.2Javier González
1.2 devices exposes their data and metadata size through the separate identify command. Make sure that the NVMe LBA format does not override these values. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: use internal allocation for chunk log pageJavier González
The lightnvm subsystem provides helpers to retrieve chunk metadata, where the target needs to provide a buffer to store the metadata. An implicit assumption is that this buffer is contiguous and can be used to retrieve the data from the device. If the device exposes too many chunks, then kmalloc might fail, thus failing instance creation. This patch removes this assumption by implementing an internal buffer in the lightnvm subsystem to retrieve chunk metadata. Targets can then use virtual memory allocations. Since this is a target API change, adapt pblk accordingly. Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: move bad block and chunk state logic to coreMatias Bjørling
pblk implements two data paths for recovery line state. One for 1.2 and another for 2.0, instead of having pblk implement these, combine them in the core to reduce complexity and make available to other targets. The new interface will adhere to the 2.0 chunk definition, including managing open chunks with an active write pointer. To provide this interface, a 1.2 device recovers the state of the chunks by manually detecting if a chunk is either free/open/close/offline, and if open, scanning the flash pages sequentially to find the next writeable page. This process takes on average ~10 seconds on a device with 64 dies, 1024 blocks and 60us read access time. The process can be parallelized but is left out for maintenance simplicity, as the 1.2 specification is deprecated. For 2.0 devices, the logic is maintained internally in the drive and retrieved through the 2.0 interface. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-08nvme: remove ns sibling before clearing pathKeith Busch
The code had been clearing a namespace being deleted as the current path while that namespace was still in the path siblings list. It is possible a new IO could set that namespace back to the current path since it appeared to be an eligable path to select, which may result in a use-after-free error. This patch ensures a namespace being removed is not eligable to be reset as a current path prior to clearing it as the current path. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-05nvmet-rdma: use a private workqueue for deleteSagi Grimberg
Queue deletion is done asynchronous when the last reference on the queue is dropped. Thus, in order to make sure we don't over allocate under a connect/disconnect storm, we let queue deletion complete before making forward progress. However, given that we flush the system_wq from rdma_cm context which runs from a workqueue context, we can have a circular locking complaint [1]. Fix that by using a private workqueue for queue deletion. [1]: ====================================================== WARNING: possible circular locking dependency detected 4.19.0-rc4-dbg+ #3 Not tainted ------------------------------------------------------ kworker/5:0/39 is trying to acquire lock: 00000000a10b6db9 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm] but task is already holding lock: 00000000331b4e2c ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(&queue->release_work)){+.+.}: process_one_work+0x474/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #2 ((wq_completion)"events"){+.+.}: flush_workqueue+0xf3/0x970 nvmet_rdma_cm_handler+0x133d/0x1734 [nvmet_rdma] cma_ib_req_handler+0x72f/0xf90 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_req_handler+0x135b/0x1c30 [ib_cm] cm_work_handler+0x2b7/0x38cd [ib_cm] process_one_work+0x4ae/0xa20 nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: disconnected (10): status 0 id 0000000040357082 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 nvme nvme0: Reconnecting in 10 seconds... -> #1 (&id_priv->handler_mutex/1){+.+.}: __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 cma_ib_req_handler+0x6aa/0xf90 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_req_handler+0x135b/0x1c30 [ib_cm] cm_work_handler+0x2b7/0x38cd [ib_cm] process_one_work+0x4ae/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 (&id_priv->handler_mutex){+.+.}: lock_acquire+0xc5/0x200 __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 rdma_destroy_id+0x6f/0x440 [rdma_cm] nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma] process_one_work+0x4ae/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Fixes: 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown") Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvme: take node locality into account when selecting a pathChristoph Hellwig
Make current_path an array with an entry for every possible node, and cache the best path on a per-node basis. Take the node distance into account when selecting it. This is primarily useful for dual-ported PCIe devices which are connected to PCIe root ports on different sockets. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
2018-10-01nvmet: don't split large I/Os unconditionallySagi Grimberg
If we know that the I/O size exceeds our inline bio vec, no point using it and split the rest to begin with. We could in theory reuse the inline bio and only allocate the bio_vec, but its really not worth optimizing for. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/OJames Smart
When an io is rejected by nvmf_check_ready() due to validation of the controller state, the nvmf_fail_nonready_command() will normally return BLK_STS_RESOURCE to requeue and retry. However, if the controller is dying or the I/O is marked for NVMe multipath, the I/O is failed so that the controller can terminate or so that the io can be issued on a different path. Unfortunately, as this reject point is before the transport has accepted the command, blk-mq ends up completing the I/O and never calls nvme_complete_rq(), which is where multipath may preserve or re-route the I/O. The end result is, the device user ends up seeing an EIO error. Example: single path connectivity, controller is under load, and a reset is induced. An I/O is received: a) while the reset state has been set but the queues have yet to be stopped; or b) after queues are started (at end of reset) but before the reconnect has completed. The I/O finishes with an EIO status. This patch makes the following changes: - Adds the HOST_PATH_ERROR pathing status from TP4028 - Modifies the reject point such that it appears to queue successfully, but actually completes the io with the new pathing status and calls nvme_complete_rq(). - nvme_complete_rq() recognizes the new status, avoids resetting the controller (likely was already done in order to get this new status), and calls the multipather to clear the current path that errored. This allows the next command (retry or new command) to select a new path if there is one. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvme-core: add async event trace helperChaitanya Kulkarni
This patch adds a new event for nvme async event notification. We print the async event in the decoded format when we recognize the event otherwise we just dump the result. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport deviceJames Smart
The fc transport device should allow for a rediscovery, as userspace might have lost the events. Example is udev events not handled during system startup. This patch add a sysfs entry 'nvme_discovery' on the fc class to have it replay all udev discovery events for all local port/remote port address pairs. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvmet_fc: support target port removal with nvmet layerJames Smart
Currently, if a targetport has been connected to via the nvmet config (in other words, the add_port() transport routine called, and the nvmet port pointer stored for using in upcalls on new io), and if the targetport is then removed (say the lldd driver decides to unload or fully reset its hardware) and then re-added (the lldd driver reloads or reinits its hardware), the port pointer has been lost so there's no way to continue to post commands up to nvmet via the transport port. Correct by allocating a small "port context" structure that will be linked to by the targetport. The context will save the targetport WWN's and the nvmet port pointer to use for it. Initial allocation will occur when the targetport is bound to via add_port. The context will be deallocated when remove_port() is called. If a targetport is removed while nvmet has the active port context, the targetport will be unlinked from the port context before removal. If a new targetport is registered, the port contexts without a binding are looked through and if the WWN's match (so it's the same as nvmet's port context) the port context is linked to the new target port. Thus new io can be received on the new targetport and operation resumes with nvmet. Additionally, this also resolves nvmet configuration changing out from underneath of the nvme-fc target port (for example: a nvmetcli clear). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvme-fc: fix for a minor typosMilan P. Gandhi
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvmet: remove redundant module prefixChaitanya Kulkarni
This patch removes the redundant module prefix used in the pr_err() when nvmet_get_smart_log_nsid() failed to find the namespace provided as a part of smart-log command. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01nvme: fix typo in nvme_identify_ns_descsMilan P. Gandhi
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01Merge tag 'v4.19-rc6' into for-4.20/blockJens Axboe
Merge -rc6 in, for two reasons: 1) Resolve a trivial conflict in the blk-mq-tag.c documentation 2) A few important regression fixes went into upstream directly, so they aren't in the 4.20 branch. Signed-off-by: Jens Axboe <axboe@kernel.dk> * tag 'v4.19-rc6': (780 commits) Linux 4.19-rc6 MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c cpufreq: qcom-kryo: Fix section annotations perf/core: Add sanity check to deal with pinned event failure xen/blkfront: correct purging of persistent grants Revert "xen/blkfront: When purging persistent grants, keep them in the buffer" selftests/powerpc: Fix Makefiles for headers_install change blk-mq: I/O and timer unplugs are inverted in blktrace dax: Fix deadlock in dax_lock_mapping_entry() x86/boot: Fix kexec booting failure in the SEV bit detection code bcache: add separate workqueue for journal_write to avoid deadlock drm/amd/display: Fix Edid emulation for linux drm/amd/display: Fix Vega10 lightup on S3 resume drm/amdgpu: Fix vce work queue was not cancelled when suspend Revert "drm/panel: Add device_link from panel device to DRM device" xen/blkfront: When purging persistent grants, keep them in the buffer clocksource/drivers/timer-atmel-pit: Properly handle error cases block: fix deadline elevator drain for zoned block devices ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-28nvme: register ns_id attributes as default sysfs groupsHannes Reinecke
We should be registering the ns_id attribute as default sysfs attribute groups, otherwise we have a race condition between the uevent and the attributes appearing in sysfs. Suggested-by: Bart van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-28block: genhd: add 'groups' argument to device_add_diskHannes Reinecke
Update device_add_disk() to take an 'groups' argument so that individual drivers can register a device with additional sysfs attributes. This avoids race condition the driver would otherwise have if these groups were to be created with sysfs_add_groups(). Signed-off-by: Martin Wilck <martin.wilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-25nvme: properly propagate errors in nvme_mpath_initSusobhan Dey
Signed-off-by: Susobhan Dey <susobhan.dey@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-09-17nvme: count all ANA groups for ANA Log pageHannes Reinecke
When issuing a short read on the ANA log page the number of groups should not change, even though the final returned data might contain less groups than that number. Signed-off-by: Hannes Reinecke <hare@suse.com> [switched to a for loop] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-09-05nvmet-rdma: fix possible bogus dereference under heavy loadSagi Grimberg
Currently we always repost the recv buffer before we send a response capsule back to the host. Since ordering is not guaranteed for send and recv completions, it is posible that we will receive a new request from the host before we got a send completion for the response capsule. Today, we pre-allocate 2x rsps the length of the queue, but in reality, under heavy load there is nothing that is really preventing the gap to expand until we exhaust all our rsps. To fix this, if we don't have any pre-allocated rsps left, we dynamically allocate a rsp and make sure to free it when we are done. If under memory pressure we fail to allocate a rsp, we silently drop the command and wait for the host to retry. Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> [hch: dropped a superflous assignment] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-28nvmet: free workqueue object if module init failsChaitanya Kulkarni
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-28nvme-fcloop: Fix dropped LS's to removed target portJames Smart
When a targetport is removed from the config, fcloop will avoid calling the LS done() routine thinking the targetport is gone. This leaves the initiator reset/reconnect hanging as it waits for a status on the Create_Association LS for the reconnect. Change the filter in the LS callback path. If tport null (set when failed validation before "sending to remote port"), be sure to call done. This was the main bug. But, continue the logic that only calls done if tport was set but there is no remoteport (e.g. case where remoteport has been removed, thus host doesn't expect a completion). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-28nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_eventMichal Wnukowski
In many architectures loads may be reordered with older stores to different locations. In the nvme driver the following two operations could be reordered: - Write shadow doorbell (dbbuf_db) into memory. - Read EventIdx (dbbuf_ei) from memory. This can result in a potential race condition between driver and VM host processing requests (if given virtual NVMe controller has a support for shadow doorbell). If that occurs, then the NVMe controller may decide to wait for MMIO doorbell from guest operating system, and guest driver may decide not to issue MMIO doorbell on any of subsequent commands. This issue is purely timing-dependent one, so there is no easy way to reproduce it. Currently the easiest known approach is to run "Oracle IO Numbers" (orion) that is shipped with Oracle DB: orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \ concat -write 40 -duration 120 -matrix row -testname nvme_test Where nvme_test is a .lun file that contains a list of NVMe block devices to run test against. Limiting number of vCPUs assigned to given VM instance seems to increase chances for this bug to occur. On test environment with VM that got 4 NVMe drives and 1 vCPU assigned the virtual NVMe controller hang could be observed within 10-20 minutes. That correspond to about 400-500k IO operations processed (or about 100GB of IO read/writes). Orion tool was used as a validation and set to run in a loop for 36 hours (equivalent of pushing 550M IO operations). No issues were observed. That suggest that the patch fixes the issue. Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices") Signed-off-by: Michal Wnukowski <wnukowski@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> [hch: updated changelog and comment a bit] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-16Merge branch 'linus/master' into rdma.git for-nextJason Gunthorpe
rdma.git merge resolution for the 4.19 merge window Conflicts: drivers/infiniband/core/rdma_core.c - Use the rdma code and revise with the new spelling for atomic_fetch_add_unless drivers/nvme/host/rdma.c - Replace max_sge with max_send_sge in new blk code drivers/nvme/target/rdma.c - Use the blk code and revise to use NULL for ib_post_recv when appropriate - Replace max_sge with max_recv_sge in new blk code net/rds/ib_send.c - Use the net code and revise to use NULL for ib_post_recv when appropriate Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16Merge tag 'v4.18' into rdma.git for-nextJason Gunthorpe
Resolve merge conflicts from the -rc cycle against the rdma.git tree: Conflicts: drivers/infiniband/core/uverbs_cmd.c - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next - Merge removal of file->ucontext in for-next with new code in -rc drivers/infiniband/core/uverbs_main.c - for-next removed code from ib_uverbs_write() that was modified in for-rc Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ...
2018-08-08nvme-fabrics: fix ctrl_loss_tmo < 0 to reconnect foreverTal Shorer
When the user supplies a ctrl_loss_tmo < 0, we warn them that this will cause the fabrics layer to attempt reconnection forever. However, in reality the fabrics layer never attempts to reconnect because the condition to test whether we should reconnect is backwards in this case. Signed-off-by: Tal Shorer <tal.shorer@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-08nvmet: add ns write protect supportChaitanya Kulkarni
This patch implements the Namespace Write Protect feature described in "NVMe TP 4005a Namespace Write Protect". In this version, we implement No Write Protect and Write Protect states for target ns which can be toggled by set-features commands from the host side. For write-protect state transition, we need to flush the ns specified as a part of command so we also add helpers for carrying out synchronous flush operations. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: fixed an incorrect endianess conversion, minor cleanups] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-08nvme: set gendisk read only based on nsattrChaitanya Kulkarni
NVMe 1.3 TP 4005 introduces new filed (NSATTR). This field indicates whether given namespace is write protected or not. This patch sets the gendisk associated with the namespace to read only based on the identify namespace nsattr field. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>