summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2018-08-05Merge tag 'v4.18-rc6' into for-4.19/block2Jens Axboe
Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk>
2018-08-02scsi: Check sense buffer size at build timeKees Cook
To avoid introducing problems like those fixed in commit f7068114d45e ("sr: pass down correctly sized SCSI sense buffer"), this creates a macro wrapper for scsi_execute() that verifies the size of the sense buffer similar to what was done for command string sizes in commit 3756f6401c30 ("exec: avoid gcc-8 warning for get_task_comm"). Another solution could be to add a length argument to scsi_execute(), but this function already takes a lot of arguments and Jens was not fond of that approach. Additionally, this moves the SCSI_SENSE_BUFFERSIZE definition into scsi_device.h, and removes a redundant include for scsi_device.h from scsi_cmnd.h. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02libata-scsi: Move sense buffers onto stackKees Cook
To support future compile-time sizeof() checks that will be able to validate the length of sense buffers, this removes the only dynamically allocated sense buffers in the tree by putting the 96 byte sense buffers on the stack. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02cdrom: Use struct scsi_sense_hdr internallyKees Cook
This removes more casts of struct request_sense and uses the standard struct scsi_sense_hdr instead. This also fixes any possible stale values since the prior code did not check the sense length. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02ide-cd: Remove redundant sense bufferKees Cook
This is already able to process the sense buffer, so remove the redundant parsing during the failure path. This also fixes any possible stale values since the prior code did not check the sense length. Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02block: Switch struct packet_command to use struct scsi_sense_hdrKees Cook
There is a lot of needless struct request_sense usage in the CDROM code. These can all be struct scsi_sense_hdr instead, to avoid any confusion over their respective structure sizes. This patch is a lot of noise changing "sense" to "sshdr", but the final code is more readable to distinguish between "sense" meaning "struct request_sense" and "sshdr" meaning "struct scsi_sense_hdr". Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02target: don't depend on SCSIChristoph Hellwig
The core target code only needs code from scsi_common.c, which is now separately selectable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02scsi: build scsi_common.o for all scsi passthrough request usersChristoph Hellwig
Split scsi_common.o out of SCSI so that non-SCSI users can pull it in easily for future sense buffer helper usage. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02scsi: cxlflash: Drop unused sense buffersKees Cook
This removes the unused sense buffer in read_cap16() and write_same16(). Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02ide-cd: Drop unused sense buffersKees Cook
This drops unused sense buffers from: cdrom_eject() cdrom_read_capacity() cdrom_read_tocentry() ide_cd_lockdoor() ide_cd_read_toc() Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02aoe: mark expected switch fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 114722 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-30nvme: use blk API to remap ref tags for IOs with metadataMax Gurtovoy
Also moved the logic of the remapping to the nvme core driver instead of implementing it in the nvme pci driver. This way all the other nvme transport drivers will benefit from it (in case they'll implement metadata support). Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-30block: move dif_prepare/dif_complete functions to block layerMax Gurtovoy
Currently these functions are implemented in the scsi layer, but their actual place should be the block layer since T10-PI is a general data integrity feature that is used in the nvme protocol as well. Also, use the tuple size from the integrity profile since it may vary between integrity types. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-30block: move ref_tag calculation func to the block layerMax Gurtovoy
Currently this function is implemented in the scsi layer, but it's actual place should be the block layer since T10-PI is a general data integrity feature that is used in the nvme protocol as well. Suggested-by: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-28pktcdvd: Fix possible Spectre-v1 for pkt_devsJinbum Park
User controls @dev_minor which to be used as index of pkt_devs. So, It can be exploited via Spectre-like attack. (speculative execution) This kind of attack leaks address of pkt_devs, [1] It leads an attacker to bypass security mechanism such as KASLR. So sanitize @dev_minor before using it to prevent attack. [1] https://github.com/jinb-park/linux-exploit/ tree/master/exploit-remaining-spectre-gadget/leak_pkt_devs.c Signed-off-by: Jinbum Park <jinb.park7@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: stop using the deprecated get_seconds()Arnd Bergmann
The get_seconds function is deprecated now since it returns a 32-bit value that will eventually overflow, and we are replacing it throughout the kernel with ktime_get_seconds() or ktime_get_real_seconds() that return a time64_t. bcache uses get_seconds() to read the current system time and store it in the superblock as well as in uuid_entry structures that are user visible. Unfortunately, the two structures in are still limited to 32 bits, so this won't fix any real problems but will still overflow in year 2106. Let's at least document that properly, in case we get an updated format in the future it can be fixed. We still have a long time before the overflow and checking the tools at https://github.com/koverstreet/bcache-tools reveals no access to any of them. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: do not assign in if condition in bcache_device_init()Florian Schmaus
Fixes an error condition reported by checkpatch.pl which is caused by assigning a variable in an if condition. Signed-off-by: Florian Schmaus <flo@geekplace.eu> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: do not assign in if condition in bcache_init()Florian Schmaus
Fixes an error condition reported by checkpatch.pl which is caused by assigning a variable in an if condition. Signed-off-by: Florian Schmaus <flo@geekplace.eu> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: free heap cache_set->flush_btree in bch_journal_freeShenghui Wang
Free the cache_set->flush_bree heap memory on journal free. Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: do not assign in if condition register_bcache()Florian Schmaus
Fixes an error condition reported by checkpatch.pl which is caused by assigning a variable in an if condition. Signed-off-by: Florian Schmaus <flo@geekplace.eu> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: fix I/O significant decline while backend devices registeringTang Junhui
I attached several backend devices in the same cache set, and produced lots of dirty data by running small rand I/O writes in a long time, then I continue run I/O in the others cached devices, and stopped a cached device, after a mean while, I register the stopped device again, I see the running I/O in the others cached devices dropped significantly, sometimes even jumps to zero. In currently code, bcache would traverse each keys and btree node to count the dirty data under read locker, and the writes threads can not get the btree write locker, and when there is a lot of keys and btree node in the registering device, it would last several seconds, so the write I/Os in others cached device are blocked and declined significantly. In this patch, when a device registering to a ache set, which exist others cached devices with running I/Os, we get the amount of dirty data of the device in an incremental way, and do not block other cached devices all the time. Patch v2: Rename some variables and macros name as Coly suggested. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: calculate the number of incremental GC nodes according to the total ↵Tang Junhui
of btree nodes This patch base on "[PATCH] bcache: finish incremental GC". Since incremental GC would stop 100ms when front side I/O comes, so when there are many btree nodes, if GC only processes constant (100) nodes each time, GC would last a long time, and the front I/Os would run out of the buckets (since no new bucket can be allocated during GC), and I/Os be blocked again. So GC should not process constant nodes, but varied nodes according to the number of btree nodes. In this patch, GC is divided into constant (100) times, so when there are many btree nodes, GC can process more nodes each time, otherwise GC will process less nodes each time (but no less than MIN_GC_NODES). Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: finish incremental GCTang Junhui
In GC thread, we record the latest GC key in gc_done, which is expected to be used for incremental GC, but in currently code, we didn't realize it. When GC runs, front side IO would be blocked until the GC over, it would be a long time if there is a lot of btree nodes. This patch realizes incremental GC, the main ideal is that, when there are front side I/Os, after GC some nodes (100), we stop GC, release locker of the btree node, and go to process the front side I/Os for some times (100 ms), then go back to GC again. By this patch, when we doing GC, I/Os are not blocked all the time, and there is no obvious I/Os zero jump problem any more. Patch v2: Rename some variables and macros name as Coly suggested. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-27bcache: simplify the calculation of the total amount of flash dirty dataTang Junhui
Currently we calculate the total amount of flash only devices dirty data by adding the dirty data of each flash only device under registering locker. It is very inefficient. In this patch, we add a member flash_dev_dirty_sectors in struct cache_set to record the total amount of flash only devices dirty data in real time, so we didn't need to calculate the total amount of dirty data any more. Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-26scsi: virtio_scsi: fix pi_bytes{out,in} on 4 KiB block size devicesGreg Edwards
When the underlying device is a 4 KiB logical block size device with a protection interval exponent of 0, i.e. 4096 bytes data + 8 bytes PI, the driver miscalculates the pi_bytes{out,in} by a factor of 8x (64 bytes). This leads to errors on all reads and writes on 4 KiB logical block size devices when CONFIG_BLK_DEV_INTEGRITY is enabled and the VIRTIO_SCSI_F_T10_PI feature bit has been negotiated. Fixes: e6dc783a38ec0 ("virtio-scsi: Enable DIF/DIX modes in SCSI host LLD") Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-25xen/blkfront: remove unused macrosJuergen Gross
Remove some macros not used anywhere. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-25Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/blockJens Axboe
Pull NVMe updates from Christoph: "Highlights: - massively improved tracepoints (Keith Busch) - support for larger inline data in the RDMA host and target (Steve Wise) - RDMA setup/teardown path fixes and refactor (Sagi Grimberg) - Command Supported and Effects log support for the NVMe target (Chaitanya Kulkarni) - buffered I/O support for the NVMe target (Chaitanya Kulkarni) plus the usual set of cleanups and small enhancements." * 'nvme-4.19' of git://git.infradead.org/nvme: nvmet: don't use uuid_le type nvmet: check fileio lba range access boundaries nvmet: fix file discard return status nvme-rdma: centralize admin/io queue teardown sequence nvme-rdma: centralize controller setup sequence nvme-rdma: unquiesce queues when deleting the controller nvme-rdma: mark expected switch fall-through nvme: add disk name to trace events nvme: add controller name to trace events nvme: use hw qid in trace events nvme: cache struct nvme_ctrl reference to struct nvme_request nvmet-rdma: add an error flow for post_recv failures nvmet-rdma: add unlikely check in the fast path nvmet-rdma: support max(16KB, PAGE_SIZE) inline data nvme-rdma: support up to 4 segments of inline data nvmet: add buffered I/O support for file backed ns nvmet: add commands supported and effects log page nvme: move init of keep_alive work item to controller initialization nvme.h: resync with nvme-cli
2018-07-24md: remove a bogus commentChristoph Hellwig
The function name mentioned doesn't exist, and the code next to it doesn't match the description either. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-24bcache: don't clone bio in bch_data_verifyChristoph Hellwig
We immediately overwrite the biovec array, so instead just allocate a new bio and copy over the disk, setor and size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-24block: Rename the null_blk_mod kernel module back into null_blkBart Van Assche
Commit ca4b2a011948 ("null_blk: add zone support") breaks several blktests scripts because it renamed the null_blk kernel module into null_blk_mod. Hence rename null_blk_mod back into null_blk. Fixes: ca4b2a011948 ("null_blk: add zone support") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Matias Bjorling <matias.bjorling@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-24nvmet: don't use uuid_le typeAndy Shevchenko
Don't use sizeof(uuid_le) where none of the parameters is type of uuid_le. Since both arguments are u8 [16], use size of destination there. Moreover, uuid_le is a deprecated type, and nvmet is using uuid_t already. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvmet: check fileio lba range access boundariesSagi Grimberg
Fail out-of-bounds with a proper status code. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvmet: fix file discard return statusSagi Grimberg
If nvmet_copy_from_sgl failed, we falsly return successful completion status. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvme-rdma: centralize admin/io queue teardown sequenceSagi Grimberg
We follow the same queue teardown sequence in delete, reset and error recovery. Centralize the logic. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvme-rdma: centralize controller setup sequenceSagi Grimberg
Centralize controller sequence to a single routine that correctly cleans up after failures instead of having multiple apperances in several flows (create, reset, reconnect). One thing that we also gain here are the sanity/boundary checks also when connecting back to a dynamic controller. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvme-rdma: unquiesce queues when deleting the controllerSagi Grimberg
If the controller is going away, we need to unquiesce the IO queues so that all pending request can fail gracefully before moving forward with controller deletion. Do that before we destroy the IO queues so blk_cleanup_queue won't block in freeze. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvme-rdma: mark expected switch fall-throughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvme: add disk name to trace eventsKeith Busch
This will print the disk name to the nvme event trace for io requests so a user can better distinguish traffic to different disks. This can be used to create disk based filters. For example, to see only nvme0n2 traffic: echo "disk == \"nvme0n2\"" > /sys/kernel/debug/tracing/events/nvme/filter Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> [hch: turned __assign_disk_name into an inline function] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-24nvme: add controller name to trace eventsKeith Busch
This appends the controller instance to the nvme trace buffer to distinguish which controller is dispatching and completing a command. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvme: use hw qid in trace eventsKeith Busch
We can not match a command to its completion based on the command id alone. We need the submitting queue identifier to pair with the completion, so this patch adds that to the trace buffer. This patch is also collapsing the admin and IO submission traces into a single one so we don't need to duplicate this and creating unnecessary code branches: we know if the command is an admin vs IO based on the qid. And since we're here, the patch fixes code formatting in the area. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> [hch: move the qid helper to nvme.h and made it an inline function] Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvme: cache struct nvme_ctrl reference to struct nvme_requestSagi Grimberg
We will need to reference the controller in the setup and completion time for tracing and future traffic based keep alive support. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvmet-rdma: add an error flow for post_recv failuresMax Gurtovoy
Posting receive buffer operation can fail, thus we should make sure to have an error flow during initialization phase. While we're here, add a debug print in case of a failure. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvmet-rdma: add unlikely check in the fast pathMax Gurtovoy
ib_post_send operation should succeed unless something unusual happened to the ib device. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvmet-rdma: support max(16KB, PAGE_SIZE) inline dataSteve Wise
The patch enables inline data sizes using up to 4 recv sges, and capping the size at 16KB or at least 1 page size. So on a 4K page system, up to 16KB is supported, and for a 64K page system 1 page of 64KB is supported. We avoid > 0 order page allocations for the inline buffers by using multiple recv sges, one for each page. If the device cannot support the configured inline data size due to lack of enough recv sges, then log a warning and reduce the inline size. Add a new configfs port attribute, called param_inline_data_size, to allow configuring the size of inline data for a given nvmf port. The maximum size allowed is still enforced by nvmet-rdma with NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE). And the default size, if not specified via configfs, is still PAGE_SIZE. This preserves the existing behavior, but allows larger inline sizes for small page systems. If the configured inline data size exceeds NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is reduced. If param_inline_data_size is set to 0, then inline data is disabled for that nvmf port. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvme-rdma: support up to 4 segments of inline dataSteve Wise
Allow up to 4 segments of inline data for NVMF WRITE operations. This reduces latency for small WRITEs by removing the need for the target to issue a READ WR for IB, or a REG_MR + READ WR chain for iWarp. Also cap the inline segments used based on the limitations of the device. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvmet: add buffered I/O support for file backed nsChaitanya Kulkarni
Add a new "buffered_io" attribute, which disabled direct I/O and thus enables page cache based caching when enabled. The attribute can only be changed when the namespace is disabled as the file has to be reopend for the change to take effect. The possibly blocking read/write are deferred to a newly introduced global workqueue. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvmet: add commands supported and effects log pageChaitanya Kulkarni
This patch adds support for Commands Supported and Effects log page (Log Identifier 05h) for NVMeOF. This also makes it easier to find which commands are supported, e.g. :- subnqn : testnqn1 Admin Command Set ACS2 [Get Log Page ] 00000001 ACS6 [Identify ] 00000001 ACS8 [Abort ] 00000001 ACS9 [Set Features ] 00000001 ACS10 [Get Features ] 00000001 ACS12 [Asynchronous Event Request ] 00000001 ACS24 [Keep Alive ] 00000001 NVM Command Set IOCS0 [Flush ] 00000001 IOCS1 [Write ] 00000001 IOCS2 [Read ] 00000001 IOCS8 [Write Zeroes ] 00000001 IOCS9 [Dataset Management ] 00000001 This partticular functionality can be used from the host side to examine the NVMeOF ctrl commands supported. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-23nvme: move init of keep_alive work item to controller initializationJames Smart
Currently, the code initializes the keep alive work item whenever nvme_start_keep_alive() is called. However, this routine is called several times while reconnecting, etc. Although it's hoped that keep alive is always disabled and not scheduled when start is called, re-initing if it were scheduled or completing can have very bad side effects. There's no need for re-initialization. Move the keep_alive work item and cmd struct initialization to controller init. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-07-22Merge tag 'nvme-for-4.18' of git://git.infradead.org/nvmeLinus Torvalds
Pull NVMe fixes from Christoph Hellwig: - fix a regression in 4.18 that causes a memory leak on probe failure (Keith Bush) - fix a deadlock in the passthrough ioctl code (Scott Bauer) - don't enable AENs if not supported (Weiping Zhang) - fix an old regression in metadata handling in the passthrough ioctl code (Roland Dreier) * tag 'nvme-for-4.18' of git://git.infradead.org/nvme: nvme: fix handling of metadata_len for NVME_IOCTL_IO_CMD nvme: don't enable AEN if not supported nvme: ensure forward progress during Admin passthru nvme-pci: fix memory leak on probe failure
2018-07-22Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "Fix several places that screw up cleanups after failures halfway through opening a file (one open-coding filp_clone_open() and getting it wrong, two misusing alloc_file()). That part is -stable fodder from the 'work.open' branch. And Christoph's regression fix for uapi breakage in aio series; include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel definition of sigset_t, the reason for doing so in the first place had been bogus - there's no need to expose struct __aio_sigset in aio_abi.h at all" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: don't expose __aio_sigset in uapi ocxlflash_getfile(): fix double-iput() on alloc_file() failures cxl_getfile(): fix double-iput() on alloc_file() failures drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()