summaryrefslogtreecommitdiff
path: root/drivers/nvme/host/core.c
AgeCommit message (Collapse)Author
2016-10-09Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull blk-mq irq/cpu mapping updates from Jens Axboe: "This is the block-irq topic branch for 4.9-rc. It's mostly from Christoph, and it allows drivers to specify their own mappings, and more importantly, to share the blk-mq mappings with the IRQ affinity mappings. It's a good step towards making this work better out of the box" * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block: blk_mq: linux/blk-mq.h does not include all the headers it depends on blk-mq: kill unused blk_mq_create_mq_map() blk-mq: get rid of the cpumask in struct blk_mq_tags nvme: remove the post_scan callout nvme: switch to use pci_alloc_irq_vectors blk-mq: provide a default queue mapping for PCI device blk-mq: allow the driver to pass in a queue mapping blk-mq: remove ->map_queue blk-mq: only allocate a single mq_map per tag_set blk-mq: don't redistribute hardware queues on a CPU hotplug event
2016-09-24nvme: Pass pointers, not dma addresses, to nvme_get/set_features()Andy Lutomirski
Any user I can imagine that needs a buffer at all will want to pass a pointer directly. There are no currently callers that use buffers, so this change is painless, and it will make it much easier to start using features that use buffers (e.g. APST). Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Tested-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21lightnvm: expose device geometry through sysfsSimon A. F. Lund
For a host to access an Open-Channel SSD, it has to know its geometry, so that it writes and reads at the appropriate device bounds. Currently, the geometry information is kept within the kernel, and not exported to user-space for consumption. This patch exposes the configuration through sysfs and enables user-space libraries, such as liblightnvm, to use the sysfs implementation to get the geometry of an Open-Channel SSD. The sysfs entries are stored within the device hierarchy, and can be found using the "lightnvm" device type. An example configuration looks like this: /sys/class/nvme/ └── nvme0n1 ├── capabilities: 3 ├── device_mode: 1 ├── erase_max: 1000000 ├── erase_typ: 1000000 ├── flash_media_type: 0 ├── media_capabilities: 0x00000001 ├── media_type: 0 ├── multiplane: 0x00010101 ├── num_blocks: 1022 ├── num_channels: 1 ├── num_luns: 4 ├── num_pages: 64 ├── num_planes: 1 ├── page_size: 4096 ├── prog_max: 100000 ├── prog_typ: 100000 ├── read_max: 10000 ├── read_typ: 10000 ├── sector_oob_size: 0 ├── sector_size: 4096 ├── media_manager: gennvm ├── ppa_format: 0x380830082808001010102008 ├── vendor_opcode: 0 ├── max_phys_secs: 64 └── version: 1 Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21lightnvm: control life of nvm_dev in driverMatias Bjørling
LightNVM compatible device drivers does not have a method to expose LightNVM specific sysfs entries. To enable LightNVM sysfs entries to be exposed, lightnvm device drivers require a struct device to attach it to. To allow both the actual device driver and lightnvm sysfs entries to coexist, the device driver tracks the lifetime of the nvm_dev structure. This patch refactors NVMe and null_blk to handle the lifetime of struct nvm_dev, which eliminates the need for struct gendisk when a lightnvm compatible device is provided. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21nvme: refactor namespaces to support non-gendisk devicesMatias Bjørling
With LightNVM enabled namespaces, the gendisk structure is not exposed to the user. This prevents LightNVM users from accessing the NVMe device driver specific sysfs entries, and LightNVM namespace geometry. Refactor the revalidation process, so that a namespace, instead of a gendisk, is revalidated. This later allows patches to wire up the sysfs entries up to a non-gendisk namespace. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15nvme: remove the post_scan calloutChristoph Hellwig
No need now that we don't have to reverse engineer the irq affinity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-24nvme: Fix nvme_get/set_features() with a NULL result pointerAndy Lutomirski
nvme_set_features() callers seem to expect that passing NULL as the result pointer is acceptable. Teach nvme_set_features() not to try to write to the NULL address. For symmetry, make the same change to nvme_get_features(), despite the fact that all current callers pass a valid result pointer. I assume that this bug hasn't been reported in practice because the callers that pass NULL are all in the SCSI translation layer and no one uses the relevant operations. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-15nvme: Prevent controller state invalid transitionGabriel Krisman Bertazi
Acquiring the nvme_ctrl lock before reading ctrl->state in nvme_change_ctrl_state() should prevent a theoretical invalid state transition, in the event of two threads racing inside that function. I haven't been able to observe this happening with the current code, and the current state machine seems to be simple enough to not be affected by these invalid transitions, but future modifications could make it more likely to happen. Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sag@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-26Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...
2016-07-26Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...
2016-07-20nvme: initialize variable before logical OR'ing itJay Freyensee
It is typically not good coding or secure coding practice to logical OR a variable without an initialization value first. Here on this line: integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE; BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable never set to an initial value. This patch fixes that. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20block: ensure bios return from blk_get_request are properly initializedChristoph Hellwig
blk_get_request is used for BLOCK_PC and similar passthrough requests. Currently we always need to call blk_rq_set_block_pc or an open coded version of it to allow appending bios using the request mapping helpers later on, which is a somewhat awkward API. Instead move the initialization part of blk_rq_set_block_pc into blk_get_request, so that we always have a safe to use request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-14NVMe: don't allocate unused nvme_majorNeilBrown
When alloc_disk(0) is used, the ->major number is ignored. All device numbers are allocated with a major of BLOCK_EXT_MAJOR. So remove all references to nvme_major. [akpm@linux-foundation.org: one unregister_blkdev() was missed] Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@noble Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-14nvme: Remove RCU namespace protectionKeith Busch
We can't sleep with RCU read lock held, but we need to do potentially blocking stuff to namespace queues when iterating the list. This patch removes the RCU locking and holds a mutex instead. To prevent deadlocks, this patch removes holding the mutex during namespace scanning and removal. The unlocked namespace scanning is made safe by holding a reference to the namespace being scanned. List iteration that does IO has to be unlocked to allow error recovery. The caller must ensure the list can not be manipulated during such an event, so this patch adds a comment explaining this requirement to the only function that iterates an unlocked list. All callers currently meet this requirement, so no further changes required. List iterations that do not do IO can safely use the lock since it couldn't block recovery from missing forced IO completions. Reported-by: Ming Lin <mlin at kernel.org> [fixes 0bf77e9 nvme: switch to RCU freeing the namespace] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvme: Limit command retriesKeith Busch
Many controller implementations will return errors to commands that will not succeed, but without the DNR bit set. The driver previously retried these commands an unlimited number of times until the command timeout has exceeded, which takes an unnecessarilly long period of time. This patch limits the number of retries a command can have, defaulting to 5, but is user tunable at load or runtime. The struct request's 'retries' field is used to track the number of retries attempted. This is in contrast with scsi's use of this field, which indicates how many retries are allowed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-12nvme/quirk: Add a delay before checking for adapter readinessGuilherme G. Piccoli
When disabling the controller, the specification says the register NVME_REG_CC should be written and then driver needs to wait the adapter to be ready, which is checked by reading another register bit (NVME_CSTS_RDY). There's a timeout validation in this checking, so in case this timeout is reached the driver gives up and removes the adapter from the system. After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003) (HGST adapter) end up being removed if we issue a reset_controller, because driver keeps verifying the NVME_REG_CSTS until the timeout is reached. This patch adds a necessary quirk for this adapter, by introducing a delay before nvme_wait_ready(), so the reset procedure is able to be completed. This quirk is needed because just increasing the timeout is not enough in case of this adapter - the driver must wait before start reading NVME_REG_CSTS register on this specific device. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-08Merge branch 'for-4.8/block' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm into for-4.8/drivers Dan writes: "The removal of ->driverfs_dev in favor of just passing the parent device in as a parameter to add_disk(). See below, it has received a "Reviewed-by" from Christoph, Bart, and Johannes. It is also a pre-requisite for Fam Zheng's work to cleanup gendisk uevents vs attribute visibility [1]. We would extend device_add_disk() to take an attribute_group list. This is based off a branch of block.git/for-4.8/drivers and has received a positive build success notification from the kbuild robot across several configs. [1]: "gendisk: Generate uevent after attribute available" http://marc.info/?l=linux-virtualization&m=146725201522201&w=2"
2016-07-08nvme: add new reconnecting controller stateChristoph Hellwig
The nvme fabric (RDMA, FC, etc...) can introduce port, link or node failures that may require a reconnect to re-establish the connection. Add a new reconnecting state that will initially be used by the RDMA driver. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme: add keep-alive supportSagi Grimberg
Periodic keep-alive is a mandatory feature in NVMe over Fabrics, and optional in NVMe 1.2.1 for PCIe. This patch adds periodic keep-alive sent from the host to verify that the controller is still responsive and vice-versa. The keep-alive timeout is user-defined (with keep_alive_tmo connection parameter) and defaults to 5 seconds. In order to avoid a race condition where the host sends a keep-alive competing with the target side keep-alive timeout expiration, the host adds a grace period of 10 seconds when publishing the keep-alive timeout to the target. In case a keep-alive failed (or timed out), a transport specific error recovery kicks in. For now only NVMe over Fabrics is wired up to support keep alive, but we can add PCIe support easily once controllers actually supporting it become available. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@chelsio.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme-fabrics: add a generic NVMe over Fabrics libraryChristoph Hellwig
The NVMe over Fabrics library provides an interface for both transports and the nvme core to handle fabrics specific commands and attributes independent of the underlying transport. In addition, the fabrics library adds a misc device interface that allow actually creating a fabrics controller, as we can't just autodiscover it like in the PCI case. The nvme-cli utility has been enhanced to use this interface to support fabric connect and discovery. Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>, Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>, Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme.h: add NVMe over Fabrics definitionsChristoph Hellwig
The NVMe over Fabrics specification defines a protocol interface and related extensions to NVMe that enable operation over network protocols. The NVMe over Fabrics specification has an NVMe Transport binding for each NVMe Transport. This patch adds the fabrics related definitions: - fabric specific command set and error codes - transport addressing and binding definitions - fabrics sgl extensions - controller identification fabrics enhancements - discovery log page definition Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme: add fabrics sysfs attributesMing Lin
- delete_controller: This attribute allows to delete a controller. A driver is not obligated to support it (pci doesn't) so it is created only if the driver supports it. The new fabrics drivers will support it (essentialy a disconnect operation). Usage: echo > /sys/class/nvme/nvme0/delete_controller - subsysnqn: This attribute shows the subsystem nqn of the configured device. If a driver does not implement the get_subsysnqn method, the file will not appear in sysfs. - transport: This attribute shows the transport name. Added a "name" field to struct nvme_ctrl_ops. For loop, cat /sys/class/nvme/nvme0/transport loop For RDMA, cat /sys/class/nvme/nvme0/transport rdma For PCIe, cat /sys/class/nvme/nvme0/transport pcie - address: This attributes shows the controller address. The fabrics drivers that will implement get_address can show the address of the connected controller. example: cat /sys/class/nvme/nvme0/address traddr=192.168.2.2,trsvcid=1023 Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme: Modify and export sync command submission for fabricsChristoph Hellwig
NVMe over fabrics will use __nvme_submit_sync_cmd in the the transport and require a few tweaks to it. For that we export it and add a few more paramters: 1. allow passing a queue ID to the block layer For the NVMe over Fabrics connect command we need to able to specify a queue ID that we want to send the command on. Add a qid parameter to the relevant functions to enable this behavior. 2. allow submitting at_head commands In cases where we want to (re)connect to a controller where we have inflight queued commands we want to first connect and only then allow the other queued commands to be kicked. This will prevents failures in controller resets and reconnects. 3. allow passing flags to blk_mq_allocate_request Both for Fabrics connect the the keep-alive feature in NVMe 1.2.1 we want to be able to use reserved requests. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme: allow transitioning from NEW to LIVE stateChristoph Hellwig
For Fabrics we're not going through an intermediate reset state (at least for now). Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-27block: convert to device_add_disk()Dan Williams
For block drivers that specify a parent device, convert them to use device_add_disk(). This conversion was done with the following semantic patch: @@ struct gendisk *disk; expression E; @@ - disk->driverfs_dev = E; ... - add_disk(disk); + device_add_disk(E, disk); @@ struct gendisk *disk; expression E1, E2; @@ - disk->driverfs_dev = E1; ... E2 = disk; ... - add_disk(E2); + device_add_disk(E1, E2); ...plus some manual fixups for a few missed conversions. Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-12nvme: move the workaround for I/O queue-less controllers from PCIe to coreChristoph Hellwig
We want to apply this to Fabrics drivers as well, so move it to common code. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme: factor out a add nvme_is_write helperChristoph Hellwig
Centralize the check if a given NVMe command reads or writes data. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme: allow for size limitations from transport driversChristoph Hellwig
Some transport drivers may have a lower transfer size than the controller. So allow the transport to set it in the controller max_hw_sectors. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-08NVMe: Fix removal in case of active namespace list scanning methodSunad Bhandary
In case of the active namespace list scanning method, a namespace that is detached is not removed from the host if it was the last entry in the list. Fix this by adding a scan to validate namespaces greater than the value of prev. This also handles the case of removing namespaces whose value exceed the device's reported number of namespaces. Signed-off-by: Sunad Bhandary S <sunad.s@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07nvme: use UINT_MAX for max discard sectorsMinfei Huang
It's more elegant to use UINT_MAX to represent the max value of type unsigned int. So replace the actual value by using this define. Signed-off-by: Minfei Huang <mnghuan@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07nvme: move nvme_cancel_request() to common codeMing Lin
So it can be used by fabrics driver also. Signed-off-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.bsuch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block, drivers: add REQ_OP_FLUSH operationMike Christie
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07drivers: use req op accessorMike Christie
The req operation REQ_OP is separated from the rq_flag_bits definition. This converts the block layer drivers to use req_op to get the op from the request struct. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17nvme/host: Add missing blk_integrity tag_size + flags assignmentsNicholas Bellinger
While doing recent bring-up of nvme/host with target-core T10-PI, I noticed /sys/block/nvme*/integrity/device_is_integrity_capable was false, and /sys/block/nvme*/integrity/tag_size contained a bogus value. AFAICT outside of blk_integrity_compare() for DM + MD these are informational values, but go ahead and add the missing assignments for nvme/host to match what SCSI does within sd_dif_config_host() for consistency's sake. Cc: Keith Busch <keith.busch@intel.com> Cc: Jay Freyensee <james.p.freyensee@intel.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Sagi Grimberg <sagig@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Sagi Grimberg <sagi at grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Short-cut removal on surprise hot-unplugKeith Busch
This patch adds a new state that when set has the core automatically kill request queues prior to removing namespaces. If PCI device is not present at the time the nvme driver's remove is called, we can kill all IO queues immediately instead of waiting for the watchdog thread to do that at its polling interval. This improves scenarios where multiple hot plug events occur at the same time since it doesn't block the pci enumeration for as long. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-17NVMe: Allow user initiated rescanKeith Busch
This exposes ioctl and sysfs methods a user can invoke to request the driver rescan a controller and its namespaces. This is less harsh than doing a controller reset, which temporarilly halts all IO, just to surface a newly attached namespace. This is mainly useful for controllers that implement the namespace management command, but do not support the namespace notify change asynchronous event notification. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02nvme: fix nvme_ns_remove() deadlockMing Lin
On receipt of a namespace attribute changed AER, we acquire the namespace mutex lock before proceeding to scan and validate the namespace list. In case of namespace detach/delete command, nvme_ns_remove function deadlocks trying to acquire the already held lock. All callers, except nvme_remove_namespaces(), of nvme_ns_remove() already held namespaces_mutex. So we can simply fix the deadlock by not acquiring the mutex in nvme_ns_remove() and acquiring it in nvme_remove_namespaces(). Reported-by: Sunad Bhandary S <sunad.s@samsung.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimerg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02nvme: switch to RCU freeing the namespaceMing Lin
Switch to RCU freeing the namespace structure so that nvme_start_queues, nvme_stop_queues and nvme_kill_queues would be able to get away with only a RCU read side critical section. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimerg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02nvme: move AER handling to common codeChristoph Hellwig
The transport driver still needs to do the actual submission, but all the higher level code can be shared. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02nvme: move namespace scanning to coreChristoph Hellwig
Move the scan work item and surrounding code to the common code. For now we need a new finish_scan method to allow the PCI driver to set the irq affinity hints, but I have plans in the works to obsolete this as well. Note that this moves the namespace scanning from nvme_wq to the system workqueue, but as we don't rely on namespace scanning to finish from reset or I/O this should be fine. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02nvme: introduce a controller state machineChristoph Hellwig
Replace the adhoc flags in the PCI driver with a state machine in the core code. Based on code from Sagi Grimberg for the Fabrics driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02NVMe: nvme_core_exit() should do cleanup in the reverse order as ↵Wang Sheng-Hui
nvme_core_init does nvme_core_init does: 1) register_blkdev 2) __register_chrdev 3) class_create nvme_core_exit should do cleanup in the reverse order. Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-12NVMe: switch to using blk_queue_write_cache()Jens Axboe
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-12nvme: add helper nvme_setup_cmd()Ming Lin
This moves nvme_setup_{flush,discard,rw} calls into a common nvme_setup_cmd() helper. So we can eventually hide all the command setup in the core module and don't even need to update the fabrics drivers for any specific command type. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-18Merge branch 'for-4.6/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "This is the block driver pull request for this merge window. It sits on top of for-4.6/core, that was just sent out. This contains: - A set of fixes for lightnvm. One from Alan, fixing an overflow, and the rest from the usual suspects, Javier and Matias. - A set of fixes for nbd from Markus and Dan, and a fixup from Arnd for correct usage of the signed 64-bit divider. - A set of bug fixes for the Micron mtip32xx, from Asai. - A fix for the brd discard handling from Bart. - Update the maintainers entry for cciss, since that hardware has transferred ownership. - Three bug fixes for bcache from Eric Wheeler. - Set of fixes for xen-blk{back,front} from Jan and Konrad. - Removal of the cpqarray driver. It has been disabled in Kconfig since 2013, and we were initially scheduled to remove it in 3.15. - Various updates and fixes for NVMe, with the most important being: - Removal of the per-device NVMe thread, replacing that with a watchdog timer instead. From Christoph. - Exposing the namespace WWID through sysfs, from Keith. - Set of cleanups from Ming Lin. - Logging the controller device name instead of the underlying PCI device name, from Sagi. - And a bunch of fixes and optimizations from the usual suspects in this area" * 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits) NVMe: Expose ns wwid through single sysfs entry drivers:block: cpqarray clean up brd: Fix discard request processing cpqarray: remove it from the kernel cciss: update MAINTAINERS NVMe: Remove unused sq_head read in completion path bcache: fix cache_set_flush() NULL pointer dereference on OOM bcache: cleaned up error handling around register_cache() bcache: fix race of writeback thread starting before complete initialization NVMe: Create discard zero quirk white list nbd: use correct div_s64 helper mtip32xx: remove unneeded variable in mtip_cmd_timeout() lightnvm: generalize rrpc ppa calculations lightnvm: remove struct nvm_dev->total_blocks lightnvm: rename ->nr_pages to ->nr_sects lightnvm: update closed list outside of intr context xen/blback: Fit the important information of the thread in 17 characters lightnvm: fold get bb tbl when using dual/quad plane mode lightnvm: fix up nonsensical configure overrun checking xen-blkback: advertise indirect segment support earlier ...
2016-03-16NVMe: Expose ns wwid through single sysfs entryKeith Busch
The method to uniquely identify a namespace depends on the controller's specification revision level and implemented capabilities. This patch has the driver figure this out and exports the unique string through a single 'wwid' attribute so the user doesn't have this burden. The longest namespace unique identifier is used if available. If not available, the driver will concat the controller's vendor, serial, and model with the namespace ID. The specification provides this as a unique indentifier. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-08NVMe: Create discard zero quirk white listKeith Busch
The NVMe specification does not require discarded blocks return zeroes on read, but provides that behavior as a possibility. Some applications more efficiently use an SSD if reads on discarded blocks were deterministically zero, based on the "discard_zeroes_data" queue attribute. There is no specification defined way to determine device behavior on discarded blocks, so the driver always left the queue setting disabled. We can only know behavior based on individual device models, so this patch adds a flag to the NVMe "quirk" list that vendors may set if they know their controller works that way. The patch also sets the new flag for one such known device. Signed-off-by: Keith Busch <keith.busch@intel.com> Suggested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03nvme: fix max_segments integer truncationChristoph Hellwig
The block layer uses an unsigned short for max_segments. The way we calculate the value for NVMe tends to generate very large 32-bit values, which after integer truncation may lead to a zero value instead of the desired outcome. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jeff Lien <Jeff.Lien@hgst.com> Tested-by: Jeff Lien <Jeff.Lien@hgst.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03nvme: set queue limits for the admin queueChristoph Hellwig
Factor out a helper to set all the device specific queue limits and apply them to the admin queue in addition to the I/O queues. Without this the command size on the admin queue is arbitrarily low, and the missing other limitations are just minefields waiting for victims. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jeff Lien <Jeff.Lien@hgst.com> Tested-by: Jeff Lien <Jeff.Lien@hgst.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03NVMe: Fix 0-length integrity payloadKeith Busch
A user could send a passthrough IO command with a metadata pointer to a namespace without metadata. With metadata length of 0, kmalloc returns ZERO_SIZE_PTR. Since that is not NULL, the driver would have set this as the bio's integrity payload, which causes an access fault on completion. This patch ignores the users metadata buffer if the namespace format does not support separate metadata. Reported-by: Stephen Bates <stephen.bates@microsemi.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>