summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2021-10-18block: add a struct io_comp_batch argument to fops->iopoll()Jens Axboe
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: provide helpers for rq_list manipulationJens Axboe
Instead of open-coding the list additions, traversal, and removal, provide a basic set of helpers. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove some blk_mq_hw_ctx debugfs entriesJens Axboe
Just like the blk_mq_ctx counterparts, we've got a bunch of counters in here that are only for debugfs and are of questionnable value. They are: - dispatched, index of how many requests were dispatched in one go - poll_{considered,invoked,success}, which track poll sucess rates. We're confident in the iopoll implementation at this point, don't bother tracking these. As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes, dropping a whole cacheline. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18treewide: Replace 0-element memcpy() destinations with flexible arraysKees Cook
The 0-element arrays that are used as memcpy() destinations are actually flexible arrays. Adjust their structures accordingly so that memcpy() can better reason able their destination size (i.e. they need to be seen as "unknown" length rather than "zero"). In some cases, use of the DECLARE_FLEX_ARRAY() helper is needed when a flexible array is alone in a struct. Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Nilesh Javali <njavali@marvell.com> Cc: Manish Rangankar <mrangankar@marvell.com> Cc: GR-QLogic-Storage-Upstream@marvell.com Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Florian Schilhabel <florian.c.schilhabel@googlemail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Fabio Aiuto <fabioaiuto83@gmail.com> Cc: Ross Schmidt <ross.schm.dev@gmail.com> Cc: Marco Cesati <marcocesati@gmail.com> Cc: ath10k@lists.infradead.org Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-staging@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org>
2021-10-18treewide: Replace open-coded flex arrays in unionsKees Cook
In support of enabling -Warray-bounds and -Wzero-length-bounds and correctly handling run-time memcpy() bounds checking, replace all open-coded flexible arrays (i.e. 0-element arrays) in unions with the DECLARE_FLEX_ARRAY() helper macro. This fixes warnings such as: fs/hpfs/anode.c: In function 'hpfs_add_sector_to_btree': fs/hpfs/anode.c:209:27: warning: array subscript 0 is outside the bounds of an interior zero-length array 'struct bplus_internal_node[0]' [-Wzero-length-bounds] 209 | anode->btree.u.internal[0].down = cpu_to_le32(a); | ~~~~~~~~~~~~~~~~~~~~~~~^~~ In file included from fs/hpfs/hpfs_fn.h:26, from fs/hpfs/anode.c:10: fs/hpfs/hpfs.h:412:32: note: while referencing 'internal' 412 | struct bplus_internal_node internal[0]; /* (internal) 2-word entries giving | ^~~~~~~~ drivers/net/can/usb/etas_es58x/es58x_fd.c: In function 'es58x_fd_tx_can_msg': drivers/net/can/usb/etas_es58x/es58x_fd.c:360:35: warning: array subscript 65535 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[]'} [-Wzero-length-bounds] 360 | tx_can_msg = (typeof(tx_can_msg))&es58x_fd_urb_cmd->raw_msg[msg_len]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/net/can/usb/etas_es58x/es58x_core.h:22, from drivers/net/can/usb/etas_es58x/es58x_fd.c:17: drivers/net/can/usb/etas_es58x/es58x_fd.h:231:6: note: while referencing 'raw_msg' 231 | u8 raw_msg[0]; | ^~~~~~~ Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ayush Sawal <ayush.sawal@chelsio.com> Cc: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Cc: Rohit Maheshwari <rohitm@chelsio.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Stanislaw Gruszka <stf_xl@wp.pl> Cc: Luca Coelho <luciano.coelho@intel.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Mordechay Goodstein <mordechay.goodstein@intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Arunachalam Santhanam <arunachalam.santhanam@in.bosch.com> Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: linux-crypto@vger.kernel.org Cc: ath10k@lists.infradead.org Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-can@vger.kernel.org Cc: bpf@vger.kernel.org Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # drivers/net/can/usb/etas_es58x/* Signed-off-by: Kees Cook <keescook@chromium.org>
2021-10-18stddef: Introduce DECLARE_FLEX_ARRAY() helperKees Cook
There are many places where kernel code wants to have several different typed trailing flexible arrays. This would normally be done with multiple flexible arrays in a union, but since GCC and Clang don't (on the surface) allow this, there have been many open-coded workarounds, usually involving neighboring 0-element arrays at the end of a structure. For example, instead of something like this: struct thing { ... union { struct type1 foo[]; struct type2 bar[]; }; }; code works around the compiler with: struct thing { ... struct type1 foo[0]; struct type2 bar[]; }; Another case is when a flexible array is wanted as the single member within a struct (which itself is usually in a union). For example, this would be worked around as: union many { ... struct { struct type3 baz[0]; }; }; These kinds of work-arounds cause problems with size checks against such zero-element arrays (for example when building with -Warray-bounds and -Wzero-length-bounds, and with the coming FORTIFY_SOURCE improvements), so they must all be converted to "real" flexible arrays, avoiding warnings like this: fs/hpfs/anode.c: In function 'hpfs_add_sector_to_btree': fs/hpfs/anode.c:209:27: warning: array subscript 0 is outside the bounds of an interior zero-length array 'struct bplus_internal_node[0]' [-Wzero-length-bounds] 209 | anode->btree.u.internal[0].down = cpu_to_le32(a); | ~~~~~~~~~~~~~~~~~~~~~~~^~~ In file included from fs/hpfs/hpfs_fn.h:26, from fs/hpfs/anode.c:10: fs/hpfs/hpfs.h:412:32: note: while referencing 'internal' 412 | struct bplus_internal_node internal[0]; /* (internal) 2-word entries giving | ^~~~~~~~ drivers/net/can/usb/etas_es58x/es58x_fd.c: In function 'es58x_fd_tx_can_msg': drivers/net/can/usb/etas_es58x/es58x_fd.c:360:35: warning: array subscript 65535 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[]'} [-Wzero-length-bounds] 360 | tx_can_msg = (typeof(tx_can_msg))&es58x_fd_urb_cmd->raw_msg[msg_len]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/net/can/usb/etas_es58x/es58x_core.h:22, from drivers/net/can/usb/etas_es58x/es58x_fd.c:17: drivers/net/can/usb/etas_es58x/es58x_fd.h:231:6: note: while referencing 'raw_msg' 231 | u8 raw_msg[0]; | ^~~~~~~ However, it _is_ entirely possible to have one or more flexible arrays in a struct or union: it just has to be in another struct. And since it cannot be alone in a struct, such a struct must have at least 1 other named member -- but that member can be zero sized. Wrap all this nonsense into the new DECLARE_FLEX_ARRAY() in support of having flexible arrays in unions (or alone in a struct). As with struct_group(), since this is needed in UAPI headers as well, implement the core there, with a non-UAPI wrapper. Additionally update kernel-doc to understand its existence. https://github.com/KSPP/linux/issues/137 Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2021-10-18string.h: Introduce memset_startat() for wiping trailing members and paddingKees Cook
A common idiom in kernel code is to wipe the contents of a structure starting from a given member. These open-coded cases are usually difficult to read and very sensitive to struct layout changes. Like memset_after(), introduce a new helper, memset_startat() that takes the target struct instance, the byte to write, and the member name where zeroing should start. Note that this doesn't zero padding preceding the target member. For those cases, memset_after() should be used on the preceding member. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Francis Laniel <laniel_francis@privacyrequired.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Daniel Axtens <dja@axtens.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2021-10-18string.h: Introduce memset_after() for wiping trailing members/paddingKees Cook
A common idiom in kernel code is to wipe the contents of a structure after a given member. This is especially useful in places where there is trailing padding. These open-coded cases are usually difficult to read and very sensitive to struct layout changes. Introduce a new helper, memset_after() that takes the target struct instance, the byte to write, and the member name after which the zeroing should start. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Francis Laniel <laniel_francis@privacyrequired.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Daniel Axtens <dja@axtens.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2021-10-18block: store elevator state in requestJens Axboe
Add an rq private RQF_ELV flag, which tells the block layer that this request was initialized on a queue that has an IO scheduler attached. This allows for faster checking in the fast path, rather than having to deference rq->q later on. Elevator switching does full quiesce of the queue before detaching an IO scheduler, so it's safe to cache this in the request itself. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: improve layout of struct requestJens Axboe
It's been a while since this was analyzed, move some members around to better flow with the use case. Initial state up top, and queued state after that. This improves my peak case by about 1.5%, from 7750K to 7900K IOPS. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: don't bother iter advancing a fully done bioJens Axboe
If we're completing nbytes and nbytes is the size of the bio, don't bother with calling into the iterator increment helpers. Just clear the bio size and we're done. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS for net-next: 1) Add new run_estimation toggle to IPVS to stop the estimation_timer logic, from Dust Li. 2) Relax superfluous dynset check on NFT_SET_TIMEOUT. 3) Add egress hook, from Lukas Wunner. 4) Nowadays, almost all hook functions in x_table land just call the hook evaluation loop. Remove remaining hook wrappers from iptables and IPVS. From Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18Merge tag 'v5.15-rc6' into regulator-5.16Mark Brown
Linux 5.15-rc6
2021-10-18block: cache request queue in bdevPavel Begunkov
There are tons of places where we need to get a request_queue only having bdev, which turns into bdev->bd_disk->queue. There are probably a hundred of such places considering inline helpers, and enough of them are in hot paths. Cache queue pointer in struct block_device and make use of it in bdev_get_queue(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a3bfaecdd28956f03629d0ca5c63ebc096e1c809.1634219547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: switch polling to be bio basedChristoph Hellwig
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: define 'struct bvec_iter' as packedMing Lei
'struct bvec_iter' is embedded into 'struct bio', define it as packed so that we can get one extra 4bytes for other uses without expanding bio. 'struct bvec_iter' is often allocated on stack, so making it packed doesn't affect performance. Also I have run io_uring on both nvme/null_blk, and not observe performance effect in this way. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: rename REQ_HIPRI to REQ_POLLEDChristoph Hellwig
Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague, the bio flag is specific to the polling implementation, so rename and document it properly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18io_uring: don't sleep when polling for I/OChristoph Hellwig
There is no point in sleeping for the expected I/O completion timeout in the io_uring async polling model as we never poll for a specific I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: replace the spin argument to blk_iopoll with a flags argumentChristoph Hellwig
Switch the boolean spin argument to blk_poll to passing a set of flags instead. This will allow to control polling behavior in a more fine grained way. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de [axboe: adapt to changed io_uring iopoll] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: remove blk_qc_t_validChristoph Hellwig
Move the trivial check into the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: remove blk_qc_t_to_tag and blk_qc_t_is_internalChristoph Hellwig
Merge both functions into their only caller to keep the blk-mq tag to blk_qc_t mapping as private as possible in blk-mq.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: factor out a blk_qc_to_hctx helperChristoph Hellwig
Add a helper to get the hctx from a request_queue and cookie, and fold the blk_qc_t_to_queue_num helper into it as no other callers are left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18sbitmap: add __sbitmap_queue_get_batch()Jens Axboe
The block layer tag allocation batching still calls into sbitmap to get each tag, but we can improve on that. Add __sbitmap_queue_get_batch(), which returns a mask of tags all at once, along with an offset for those tags. An example return would be 0xff, where bits 0..7 are set, with tag_offset == 128. The valid tags in this case would be 128..135. A batch is specific to an individual sbitmap_map, hence it cannot be larger than that. The requested number of tags is automatically reduced to the max that can be satisfied with a single map. On failure, 0 is returned. Caller should fall back to single tag allocation at that point/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: mark bio_truncate staticChristoph Hellwig
bio_truncate is only used in bio.c, so mark it static. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move bio_get_{first,last}_bvec out of bio.hChristoph Hellwig
bio_get_first_bvec and bio_get_last_bvec are only used in blk-merge.c, so move them there. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: mark __bio_try_merge_page staticChristoph Hellwig
Mark __bio_try_merge_page static and move it up a bit to avoid the need for a forward declaration. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move bio_full out of bio.hChristoph Hellwig
bio_full is only used in bio.c, so move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: fold bio_cur_bytes into blk_rq_cur_bytesChristoph Hellwig
Fold bio_cur_bytes into the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move bio_mergeable out of bio.hChristoph Hellwig
bio_mergeable is only needed by I/O schedulers, so move it to blk-mq-sched.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: don't include <linux/ioprio.h> in <linux/bio.h>Christoph Hellwig
bio.h doesn't need any of the definitions from ioprio.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove BIO_BUG_ONChristoph Hellwig
BIO_DEBUG is always defined, so just switch the two instances to use BUG_ON directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move the *blkdev_ioctl declarations out of blkdev.hChristoph Hellwig
These are only used inside of block/. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012104450.659013-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: pre-allocate requests if plug is started and is a batchJens Axboe
The caller typically has a good (or even exact) idea of how many requests it needs to submit. We can make the request/tag allocation a lot more efficient if we just allocate N requests/tags upfront when we queue the first bio from the batch. Provide a new plug start helper that allows the caller to specify how many IOs are expected. This sets plug->nr_ios, and we can use that for smarter request allocation. The plug provides a holding spot for requests, and request allocation will check it before calling into the normal request allocation path. The blk_finish_plug() is called, check if there are unused requests and free them. This should not happen in normal operations. The exception is if we get merging, then we may be left with requests that need freeing when done. This raises the per-core performance on my setup from ~5.8M to ~6.1M IOPS. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: bump max plugged deferred size from 16 to 32Jens Axboe
Particularly for NVMe with efficient deferred submission for many requests, there are nice benefits to be seen by bumping the default max plug count from 16 to 32. This is especially true for virtualized setups, where the submit part is more expensive. But can be noticed even on native hardware. Reduce the multiple queue factor from 4 to 2, since we're changing the default size. While changing it, move the defines into the block layer private header. These aren't values that anyone outside of the block layer uses, or should use. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: Change shared sbitmap naming to shared tagsJohn Garry
Now that shared sbitmap support really means shared tags, rename symbols to match that. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: Use shared tags for shared sbitmap supportJohn Garry
Currently we use separate sbitmap pairs and active_queues atomic_t for shared sbitmap support. However a full sets of static requests are used per HW queue, which is quite wasteful, considering that the total number of requests usable at any given time across all HW queues is limited by the shared sbitmap depth. As such, it is considerably more memory efficient in the case of shared sbitmap to allocate a set of static rqs per tag set or request queue, and not per HW queue. So replace the sbitmap pairs and active_queues atomic_t with a shared tags per tagset and request queue, which will hold a set of shared static rqs. Since there is now no valid HW queue index to be passed to the blk_mq_ops .init and .exit_request callbacks, pass an invalid index token. This changes the semantics of the APIs, such that the callback would need to validate the HW queue index before using it. Currently no user of shared sbitmap actually uses the HW queue index (as would be expected). Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQJohn Garry
It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as the name BLKDEV_MAX_RQ would imply the max requests always, which it is not. Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being the default number of requests assigned when allocating a request queue. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move struct request to blk-mq.hChristoph Hellwig
struct request is only used by blk-mq drivers, so move it and all related declarations to blk-mq.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move integrity handling out of <linux/blkdev.h>Christoph Hellwig
Split the integrity/metadata handling definitions out into a new header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move a few merge helpers out of <linux/blkdev.h>Christoph Hellwig
These are block-layer internal helpers, so move them to block/blk.h and block/blk-merge.c. Also update a comment a bit to use better grammar. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: drop unused includes in <linux/genhd.h>Christoph Hellwig
Drop various include not actually used in genhd.h itself, and move the remaning includes closer together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: drop unused includes in <linux/blkdev.h>Christoph Hellwig
Drop various include not actually used in blkdev.h itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move elevator.h to block/Christoph Hellwig
Except for the features passed to blk_queue_required_elevator_features, elevator.h is only needed internally to the block layer. Move the ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to block/, dropping all the spurious includes outside of that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove the struct blk_queue_ctx forward declarationChristoph Hellwig
This type doesn't exist at all, so no need to forward declare it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove the cmd_size field from struct request_queueChristoph Hellwig
Entirely unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove the unused blk_queue_state enumChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove the unused rq_end_sector macroChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18mm: don't include <linux/blkdev.h> in <linux/backing-dev.h>Christoph Hellwig
Move inode_to_bdi out of line to avoid having to include blkdev.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>Christoph Hellwig
There is no need to pull blk-cgroup.h and thus blkdev.h in here, so break the include chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18mm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>Christoph Hellwig
blk-cgroup.h pulls in blkdev.h and thus pretty much all the block headers. Break this dependency chain by turning wbc_blkcg_css into a macro and dropping the blk-cgroup.h include. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>