summaryrefslogtreecommitdiff
path: root/include/linux/nvme.h
AgeCommit message (Collapse)Author
2024-03-11Merge tag 'for-6.9/block-20240310' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - MD pull requests via Song: - Cleanup redundant checks (Yu Kuai) - Remove deprecated headers (Marc Zyngier, Song Liu) - Concurrency fixes (Li Lingfeng) - Memory leak fix (Li Nan) - Refactor raid1 read_balance (Yu Kuai, Paul Luse) - Clean up and fix for md_ioctl (Li Nan) - Other small fixes (Gui-Dong Han, Heming Zhao) - MD atomic limits (Christoph) - NVMe pull request via Keith: - RDMA target enhancements (Max) - Fabrics fixes (Max, Guixin, Hannes) - Atomic queue_limits usage (Christoph) - Const use for class_register (Ricardo) - Identification error handling fixes (Shin'ichiro, Keith) - Improvement and cleanup for cached request handling (Christoph) - Moving towards atomic queue limits. Core changes and driver bits so far (Christoph) - Fix UAF issues in aoeblk (Chun-Yi) - Zoned fix and cleanups (Damien) - s390 dasd cleanups and fixes (Jan, Miroslav) - Block issue timestamp caching (me) - noio scope guarding for zoned IO (Johannes) - block/nvme PI improvements (Kanchan) - Ability to terminate long running discard loop (Keith) - bdev revalidation fix (Li) - Get rid of old nr_queues hack for kdump kernels (Ming) - Support for async deletion of ublk (Ming) - Improve IRQ bio recycling (Pavel) - Factor in CPU capacity for remote vs local completion (Qais) - Add shared_tags configfs entry for null_blk (Shin'ichiro - Fix for a regression in page refcounts introduced by the folio unification (Tony) - Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid, Ricardo, Roman, Tang, Uwe) * tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits) block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC block/swim: Convert to platform remove callback returning void cdrom: gdrom: Convert to platform remove callback returning void block: remove disk_stack_limits md: remove mddev->queue md: don't initialize queue limits md/raid10: use the atomic queue limit update APIs md/raid5: use the atomic queue limit update APIs md/raid1: use the atomic queue limit update APIs md/raid0: use the atomic queue limit update APIs md: add queue limit helpers md: add a mddev_is_dm helper md: add a mddev_add_trace_msg helper md: add a mddev_trace_remap helper bcache: move calculation of stripe_size and io_opt into bcache_device_init virtio_blk: Do not use disk_set_max_open/active_zones() aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts block: move capacity validation to blkpg_do_ioctl() block: prevent division by zero in blk_rq_stat_sum() drbd: atomically update queue limits in drbd_reconsider_queue_parameters ...
2024-03-02nvme-rdma: move NVME_RDMA_IP_PORT from common fileMax Gurtovoy
The correct place for this definition is the nvme rdma header file and not the common nvme header file. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-02-13nvme: implement support for relaxed effectsKeith Busch
NVM Express TP4167 provides a way for controllers to report a relaxed execution constraint. Specifically, it notifies of exclusivity for IO vs. admin commands instead of grouping these together. If set, then we don't need to freeze IO in order to execute that admin command. The freezing distrupts IO processes, so it's nice to avoid that if the controller tells us it's not necessary. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-01-31nvme: take const cmd pointer in read-only helpersCaleb Sander
nvme_is_fabrics() and nvme_is_write() only read struct nvme_command, so take it by const pointer. This allows callers to pass a const pointer and communicates that these functions don't modify the command. Signed-off-by: Caleb Sander <csander@purestorage.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-01-23nvmet: unify aer type enumGuixin Liu
The host and target use two definition of aer type, unify them into a single one. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-01-03nvme: remove unused definitionMax Gurtovoy
There is no users for NVMF_AUTH_HASH_LEN macro. Reviewed-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-11-06nvme-auth: auth success1 msg always includes respMark O'Donovan
In cases where RVALID is false, the response is still transmitted, but is cleared to zero. Relevant extract from the spec: Response R2, if valid (i.e., if the RVALID field is set to 01h), cleared to 0h otherwise Signed-off-by: Mark O'Donovan <shiftee@posteo.net> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-11nvme: add TCP TSAS definitionsHannes Reinecke
Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-10nvme: fix the NVME_ID_NS_NVM_STS_MASK definitionAnkit Kumar
As per NVMe command set specification 1.0c Storage tag size is 7 bits. Fixes: 4020aad85c67 ("nvme: add support for enhanced metadata") Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-05-22Merge patch series "Use block pr_ops in LIO"Martin K. Petersen
Mike Christie <michael.christie@oracle.com> says: The patches in this thread allow us to use the block pr_ops with LIO's target_core_iblock module to support cluster applications in VMs. They were built over Linus's tree. They also apply over linux-next and Martin's tree and Jens's trees. Currently, to use windows clustering or linux clustering (pacemaker + cluster labs scsi fence agents) in VMs with LIO and vhost-scsi, you have to use tcmu or pscsi or use a cluster aware FS/framework for the LIO pr file. Setting up a cluster FS/framework is pain and waste when your real backend device is already a distributed device, and pscsi and tcmu are nice for specific use cases, but iblock gives you the best performance and allows you to use stacked devices like dm-multipath. So these patches allow iblock to work like pscsi/tcmu where they can pass a PR command to the backend module. And then iblock will use the pr_ops to pass the PR command to the real devices similar to what we do for unmap today. The patches are separated in the following groups: Patch 1 - 2: - Add block layer callouts for reading reservations and rename reservation error code. Patch 3 - 5: - SCSI support for new callouts. Patch 6: - DM support for new callouts. Patch 7 - 13: - NVMe support for new callouts. Patch 14 - 18: - LIO support for new callouts. This patchset has been tested with the libiscsi PGR ops and with window's failover cluster verification test. Note that for scsi backend devices we need this patchset: https://lore.kernel.org/linux-scsi/20230123221046.125483-1-michael.christie@oracle.com/T/#m4834a643ffb5bac2529d65d40906d3cfbdd9b1b7 to handle UAs. To reduce the size of this patchset that's being done separately to make reviewing easier. And to make merging easier this patchset and the one above do not have any conflicts so can be merged in different trees. Link: https://lore.kernel.org/r/20230407200551.12660-1-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11nvme: Add a nvme_pr_type enumMike Christie
The next patch adds support to report the reservation type, so we need to be able to convert from the NVMe PR value we get from the device to the linux block layer PR value that will be returned to callers. To prepare for that, this patch adds a nvme_pr_type enum and renames the nvme_pr_type function. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-13-michael.christie@oracle.com Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11nvme: Add pr_ops read_keys supportMike Christie
This patch adds support for the pr_ops read_keys callout by calling the NVMe Reservation Report helper, then parsing that info to get the controller's registered keys. Because the callout is only used in the kernel where the callers, like LIO, do not know about controller/host IDs, the callout just returns the registered keys which is required by the SCSI PR in READ KEYS command. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-12-michael.christie@oracle.com Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11nvme: Fix reservation status related structsMike Christie
This fixes the following issues with the reservation status structs: 1. resv10 is bytes 23:10 so it should be 14 bytes. 2. regctl_ds only supports 64 bit host IDs. These are not currently used, but will be in this patchset which adds support for the reservation report command. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-8-michael.christie@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-15nvme-trace: show more opcode namesMinwoo Im
We have more commands to show in the trace. Sync up. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-28nvme: consult the CSE log page for unprivileged passthroughChristoph Hellwig
Commands like Write Zeros can change the contents of a namespaces without actually transferring data. To protect against this, check the Commands Supported and Effects log is supported by the controller for any unprivileg command passthrough and refuse unprivileged passthrough if the command has any effects that can change data or metadata. Note: While the Commands Support and Effects log page has only been mandatory since NVMe 2.0, it is widely supported because Windows requires it for any command passthrough from userspace. Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-12-28nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definitionChristoph Hellwig
3 << 16 does not generate the correct mask for bits 16, 17 and 18. Use the GENMASK macro to generate the correct mask instead. Fixes: 84fef62d135b ("nvme: check admin passthru command effects") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-11-15nvme: implement the DEAC bit for the Write Zeroes commandChristoph Hellwig
While the specification allows devices to either deallocate data or to actually write zeroes on any Write Zeroes command, many SSDs only do the sensible thing and deallocate data when the DEAC bit is specific. Set it when it is supported and the caller doesn't explicitly opt out of deallocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-15nvme: fine-granular CAP_SYS_ADMIN for nvme io commandsKanchan Joshi
Currently both io and admin commands are kept under a coarse-granular CAP_SYS_ADMIN check, disregarding file mode completely. $ ls -l /dev/ng* crw-rw-rw- 1 root root 242, 0 Sep 9 19:20 /dev/ng0n1 crw------- 1 root root 242, 1 Sep 9 19:20 /dev/ng0n2 In the example above, ng0n1 appears as if it may allow unprivileged read/write operation but it does not and behaves same as ng0n2. This patch implements a shift from CAP_SYS_ADMIN to more fine-granular control for io-commands. If CAP_SYS_ADMIN is present, nothing else is checked as before. Otherwise, following rules are in place - any admin-cmd is not allowed - vendor-specific and fabric commmand are not allowed - io-commands that can write are allowed if matching FMODE_WRITE permission is present - io-commands that read are allowed Add a helper nvme_cmd_allowed that implements above policy. Change all the callers of CAP_SYS_ADMIN to go through nvme_cmd_allowed for any decision making. Since file open mode is counted for any approval/denial, change at various places to keep file-mode information handy. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: improve the NVME_CONNECT_AUTHREQ* definitionsChristoph Hellwig
Mark them as unsigned so that we don't need extra casts, and define them relative to cdword0 instead of requiring extra shifts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-08-02nvme: add definitions for NVMe In-Band authenticationHannes Reinecke
Add new definitions for NVMe In-band authentication as defined in the NVMe Base Specification v2.0. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02nvme: handle the persistent internal error AERMichael Kelley
In the NVM Express Revision 1.4 spec, Figure 145 describes possible values for an AER with event type "Error" (value 000b). For a Persistent Internal Error (value 03h), the host should perform a controller reset. Add support for this error using code that already exists for doing a controller reset. As part of this support, introduce two utility functions for parsing the AER type and subtype. This new support was tested in a lab environment where we can generate the persistent internal error on demand, and observe both the Linux side and NVMe controller side to see that the controller reset has been done. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06nvme: use struct group for generic command dwordsKeith Busch
This will allow the trace event to know the full size of the data intended to be copied and silence read overflow checks. Reported-by: John Garry <john.garry@huawei.com> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-23nvme: fix the CRIMS and CRWMS definitions to match the specJoel Granados
Adjust the values of NVME_CAP_CRMS_CRIMS and NVME_CAP_CRMS_CRWMS masks as they are different from the ones in TP4084 - Time-to-ready. Fixes: 354201c53e61 ("nvme: add support for TP4084 - Time-to-Ready Enhancements"). Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-05-18nvme: add support for TP4084 - Time-to-Ready EnhancementsChristoph Hellwig
Add support for using longer timeouts during controller initialization and letting the controller come up with namespaces that are not ready for I/O yet. We skip these not ready namespaces during scanning and only bring them online once anoter scan is kicked off by the AEN that is set when the NRDY bit gets set in the I/O Command Set Independent Identify Namespace Data Structure. This asynchronous probing avoids blocking the kernel boot when controllers take a very long time to recover after unclean shutdowns (up to minutes). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-05-17nvme: split the enum used for various register constantsChristoph Hellwig
Instead of having one big enum add one for each register or field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-05-16nvme: add missing status values to verbose loggingMax Gurtovoy
Log a few more path related status codes. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-04-01Merge tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver fixes from Jens Axboe: "Followup block driver updates and fixes for the 5.18-rc1 merge window. In detail: - NVMe pull request - Fix multipath hang when disk goes live over reconnect (Anton Eidelman) - fix RCU hole that allowed for endless looping in multipath round robin (Chris Leech) - remove redundant assignment after left shift (Colin Ian King) - add quirks for Samsung X5 SSDs (Monish Kumar R) - fix the read-only state for zoned namespaces with unsupposed features (Pankaj Raghav) - use a private workqueue instead of the system workqueue in nvmet (Sagi Grimberg) - allow duplicate NSIDs for private namespaces (Sungup Moon) - expose use_threaded_interrupts read-only in sysfs (Xin Hao)" - nbd minor allocation fix (Zhang) - drbd fixes and maintainer addition (Lars, Jakob, Christoph) - n64cart build fix (Jackie) - loop compat ioctl fix (Carlos) - misc fixes (Colin, Dongli)" * tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block: drbd: remove check of list iterator against head past the loop body drbd: remove usage of list iterator variable after loop nbd: fix possible overflow on 'first_minor' in nbd_dev_add() MAINTAINERS: add drbd co-maintainer drbd: fix potential silent data corruption loop: fix ioctl calls using compat_loop_info nvme-multipath: fix hang when disk goes live over reconnect nvme: fix RCU hole that allowed for endless looping in multipath round robin nvme: allow duplicate NSIDs for private namespaces nvmet: remove redundant assignment after left shift nvmet: use a private workqueue instead of the system workqueue nvme-pci: add quirks for Samsung X5 SSDs nvme-pci: expose use_threaded_interrupts read-only in sysfs nvme: fix the read-only state for zoned namespaces with unsupposed features n64cart: convert bi_disk to bi_bdev->bd_disk fix build xen/blkfront: fix comment for need_copy xen-blkback: remove redundant assignment to variable i
2022-03-29nvme: allow duplicate NSIDs for private namespacesSungup Moon
A NVMe subsystem with multiple controller can have private namespaces that use the same NSID under some conditions: "If Namespace Management, ANA Reporting, or NVM Sets are supported, the NSIDs shall be unique within the NVM subsystem. If the Namespace Management, ANA Reporting, and NVM Sets are not supported, then NSIDs: a) for shared namespace shall be unique; and b) for private namespace are not required to be unique." Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec. Make sure this specific setup is supported in Linux. Fixes: 9ad1927a3bc2 ("nvme: always search for namespace head") Signed-off-by: Sungup Moon <sungup.moon@samsung.com> [hch: refactored and fixed the controller vs subsystem based naming conflict] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-03-07nvme: add support for enhanced metadataKeith Busch
NVM Express ratified TP 4068 defines new protection information formats. Implement support for the CRC64 guard tags. Since the block layer doesn't support variable length reference tags, driver support for the Storage Tag space is not supported at this time. Cc: Hannes Reinecke <hare@suse.de> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Klaus Jensen <its@irrelevant.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220303201312.3255347-9-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28nvme: expose cntrltype and dctype through sysfsMartin Belanger
TP8010 introduces the Discovery Controller Type attribute (dctype). The dctype is returned in the response to the Identify command. This patch exposes the dctype through the sysfs. Since the dctype depends on the Controller Type (cntrltype), another attribute of the Identify response, the patch also exposes the cntrltype as well. The dctype will only be displayed for discovery controllers. A note about the naming of this attribute: Although TP8010 calls this attribute the Discovery Controller Type, note that the dctype is now part of the response to the Identify command for all controller types. I/O, Discovery, and Admin controllers all share the same Identify response PDU structure. Non-discovery controllers as well as pre-TP8010 discovery controllers will continue to set this field to 0 (which has always been the default for reserved bytes). Per TP8010, the value 0 now means "Discovery controller type is not reported" instead of "Reserved". One could argue that this definition is correct even for non-discovery controllers, and by extension, exposing it in the sysfs for non-discovery controllers is appropriate. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28nvme: add verbose error loggingAlan Adamson
Improves logging of NVMe errors. If NVME_VERBOSE_ERRORS is configured, a verbose description of the error is logged, otherwise only status codes/bits is logged. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> [kch]: fix several nits, cosmetics, and trim down code. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Alan Adamson <alan.adamson@oracle.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvme: add new discovery log page entry definitionsHannes Reinecke
TP8014 adds a new SUBTYPE value and a new field EFLAGS for the discovery log page entry. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-20nvmet: use macro definitions for setting cmic valueMax Gurtovoy
This makes the code more readable. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-20nvme: add CNTRLTYPE definitions for 'identify controller'Hannes Reinecke
Update the 'identify controller' structure to define the newly added CNTRLTYPE field. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme.h: add missing nvme_lba_range_type endianness annotationsWesley Sheng
Signed-off-by: Wesley Sheng <wesley.sheng@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: add ZBD over ZNS backend supportChaitanya Kulkarni
NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to communicate with a non-volatile memory subsystem using zones for NVMe protocol-based controllers. NVMeOF already support the ZNS NVMe Protocol compliant devices on the target in the passthru mode. There are generic zoned block devices like Shingled Magnetic Recording (SMR) HDDs that are not based on the NVMe protocol. This patch adds ZNS backend support for non-ZNS zoned block devices as NVMeOF targets. This support includes implementing the new command set NVME_CSI_ZNS, adding different command handlers for ZNS command set such as NVMe Identify Controller, NVMe Identify Namespace, NVMe Zone Append, NVMe Zone Management Send and NVMe Zone Management Receive. With the new command set identifier, we also update the target command effects logs to reflect the ZNS compliant commands. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvmet: add Command Set Identifier supportChaitanya Kulkarni
NVMe TP 4056 allows controllers to support different command sets. NVMeoF target currently only supports namespaces that contain traditional logical blocks that may be randomly read and written. In some applications there is a value in exposing namespaces that contain logical blocks that have special access rules (e.g. sequentially write required namespace such as Zoned Namespace (ZNS)). In order to support the Zoned Block Devices (ZBD) backend, controllers need to have support for ZNS Command Set Identifier (CSI). In this preparation patch, we adjust the code such that it can now support the default command set identifier. We update the namespace data structure to store the CSI value which defaults to NVME_CSI_NVM that represents traditional logical blocks namespace type. The CSI support is required to implement the ZBD backend for NVMeOF with host side NVMe ZNS interface, since ZNS commands belong to the different command set than the default one. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06nvme: implement non-mdts command limitsKeith Busch
Commands that access LBA contents without a data transfer between the host historically have not had a spec defined upper limit. The driver set the queue constraints for such commands to the max data transfer size just to be safe, but this artificial constraint frequently limits devices below their capabilities. The NVMe Workgroup ratified TP4040 defines how a controller may advertise their non-MDTS limits. Use these if provided and default to the current constraints if not. Since the Dataset Management command limits are defined in logical blocks, but without a namespace to tell us the logical block size, the code defaults to the safe 512b size. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme: add tracing of zns commandsJohannes Thumshirn
When support for the NVMe ZNS commands was merged, tracing of these has been omitted. Add nvme_cmd_zone_mgmt_send, nvme_cmd_zone_mgmt_recv as well as nvme_cmd_zone_append to the nvme driver's tracing facility. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02nvme: update enumerations for status codesMax Gurtovoy
All the updates are mentioned in the ratified NVMe 1.4 spec. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-18nvme-pci: allow use of cmb on v1.4 controllersKlaus Jensen
Since NVMe v1.4 the Controller Memory Buffer must be explicitly enabled by the host. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> [hch: avoid a local variable and add a comment] Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-29nvmet: add passthru code to process commandsLogan Gunthorpe
Add passthru command handling capability for the NVMeOF target and export passthru APIs which are used to integrate passthru code with nvmet-core. The new file passthru.c handles passthru cmd parsing and execution. In the passthru mode, we create a block layer request from the nvmet request and map the data on to the block layer request. Admin commands and features are on an allow list as there are a number of each that don't make too much sense with passthrough. We use an allow list such that new commands can be considered before being blindly passed through. In both cases, vendor specific commands are always allowed. We also reject reservation IO commands as the underlying device cannot differentiate between multiple hosts behind a fabric. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08nvme: support for zoned namespacesKeith Busch
Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined in NVM Express TP4053. Zoned namespaces are discovered based on their Command Set Identifier reported in the namespaces Namespace Identification Descriptor list. A successfully discovered Zoned Namespace will be registered with the block layer as a host managed zoned block device with Zone Append command support. A namespace that does not support append is not supported by the driver. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Keith Busch <keith.busch@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08nvme: support for multiple Command Sets Supported and Effects log pagesKeith Busch
The Commands Supported and Effects log page was extended with a CSI field that enables the host to query the log page for each command set supported. Retrieve this log page for each command set that an attached namespace supports, and save a pointer to that log in the namespace head. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Keith Busch <keith.busch@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-07-08nvme: implement multiple I/O Command Set supportNiklas Cassel
Implements support for multiple I/O Command Sets. NVMe TP 4056 introduces a method to enumerate multiple command sets per namespace. If the command set is exposed, this method for enumeration will be used instead of the traditional method that uses the CC.CSS register command set register for command set identification. For namespaces where the Command Set Identifier is not supported or recognized, the specific namespace will not be created. Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27nvme: add Metadata Capabilities enumerationsIsrael Rukshin
The enumerations will be used to expose the namespace metadata format by the target. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-27nvme: replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-09nvme: define constants for identification valuesKeith Busch
Improve code readability by defining the specification's constants that the driver is using when decoding identification payloads. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Bart van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09nvmet: align addrfam list to specChaitanya Kulkarni
With reference to the NVMeOF Specification (page 44, Figure 38) discovery log page entry provides address family field. We do set the transport type field but the adrfam field is not set when using loop transport and also it doesn't have support in the nvme-cli. So when reading discovery log page with a loop transport it leads to confusing output. As per the spec for adrfam value 254 is reserved for Intra Host Transport i.e. loopback), we add a required macro in the protocol header file, set default port disc addr entry's adrfam to NVMF_ADDR_FAMILY_MAX, and update nvmet_addr_family configfs array for show/store attribute. Without this patch, setting adrfam to (ipv4/ipv6/ib/fc/loop/" ") we get following output for nvme discover command from nvme-cli which is confusing. trtype: loop adrfam: ipv4 trtype: loop adrfam: ipv6 trtype: loop adrfam: infiniband trtype: loop adrfam: fibre-channel trtype: loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop adrfam: pci # <----- pci for loop trtype: loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = " " adrfam: pci # <----- pci for unrecognized This patch fixes above output :- trtype: loop adrfam: ipv4 trtype: loop adrfam: ipv6 trtype: loop adrfam: infiniband trtype: loop adrfam: fibre-channel trtype: loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop adrfam: loop # <----- loop for loop trtype: loop # ${CFGFS_HOME}/config/nvmet/ports/adrfam = " " adrfam: unrecognized # <----- unrecognized when invalid value Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-22nvme: hwmon: provide temperature min and max values for each sensorAkinobu Mita
According to the NVMe specification, the over temperature threshold and under temperature threshold features shall be implemented for Composite Temperature if a non-zero WCTEMP field value is reported in the Identify Controller data structure. The features are also implemented for all implemented temperature sensors (i.e., all Temperature Sensor fields that report a non-zero value). This provides the over temperature threshold and under temperature threshold for each sensor as temperature min and max values of hwmon sysfs attributes. The WCTEMP is already provided as a temperature max value for Composite Temperature, but this change isn't incompatible. Because the default value of the over temperature threshold for Composite Temperature is the WCTEMP. Now the alarm attribute for Composite Temperature indicates one of the temperature is outside of a temperature threshold. Because there is only a single bit in Critical Warning field that indicates a temperature is outside of a threshold. Example output from the "sensors" command: nvme-pci-0100 Adapter: PCI adapter Composite: +33.9°C (low = -273.1°C, high = +69.8°C) (crit = +79.8°C) Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C) Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C) This also adds helper macros for kelvin from/to milli Celsius conversion, and replaces the repeated code in hwmon.c. Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jean Delvare <jdelvare@suse.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>