summaryrefslogtreecommitdiff
path: root/drivers/md/dm-log-writes.c
AgeCommit message (Collapse)Author
2024-05-20dm: always manage discard support in terms of max_hw_discard_sectorsMike Snitzer
Commit 4f563a64732d ("block: add a max_user_discard_sectors queue limit") changed block core to set max_discard_sectors to: min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors) Since commit 1c0e720228ad ("dm: use queue_limits_set") it was reported dm-thinp was failing in a few fstests (generic/347 and generic/405) with the first WARN_ON_ONCE in dm_cell_key_has_valid_range() being reported, e.g.: WARNING: CPU: 1 PID: 30 at drivers/md/dm-bio-prison-v1.c:128 dm_cell_key_has_valid_range+0x3d/0x50 blk_set_stacking_limits() sets max_user_discard_sectors to UINT_MAX, so given how block core now sets max_discard_sectors (detailed above) it follows that blk_stack_limits() stacks up the underlying device's max_hw_discard_sectors and max_discard_sectors is set to match it. If max_hw_discard_sectors exceeds dm's BIO_PRISON_MAX_RANGE, then dm_cell_key_has_valid_range() will trigger the warning with: WARN_ON_ONCE(key->block_end - key->block_begin > BIO_PRISON_MAX_RANGE) Aside from this warning, the discard will fail. Fix this and other DM issues by governing discard support in terms of max_hw_discard_sectors instead of max_discard_sectors. Reported-by: Theodore Ts'o <tytso@mit.edu> Fixes: 1c0e720228ad ("dm: use queue_limits_set") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-04-11dm: add helper macro for simple DM target module init and exitYangtao Li
Eliminate duplicate boilerplate code for simple modules that contain a single DM target driver without any additional setup code. Add a new module_dm() macro, which replaces the module_init() and module_exit() with template functions that call dm_register_target() and dm_unregister_target() respectively. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-04-11dm: remove unnecessary (void*) conversionsYu Zhe
Pointer variables of void * type do not require type cast. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-04-11dm: push error reporting down to dm_register_target()Yangtao Li
Simplifies each DM target's init method by making dm_register_target() responsible for its error reporting (on behalf of targets). Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: change "unsigned" to "unsigned int"Heinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: prefer kmap_local_page() instead of deprecated kmap_atomic()Heinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: add missing SPDX-License-IndentifiersHeinz Mauelshagen
'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has deprecated its use. Suggested-by: John Wiele <jwiele@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-11-16dm-log-writes: set dma_alignment limit in io_hintsKeith Busch
This device mapper needs bio vectors to be sized and memory aligned to the logical block size. Set the minimum required queue limit accordingly. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20221110184501.2451620-6-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27Merge tag 'libnvdimm-for-5.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and DAX updates from Dan Williams: "New support for clearing memory errors when a file is in DAX mode, alongside with some other fixes and cleanups. Previously it was only possible to clear these errors using a truncate or hole-punch operation to trigger the filesystem to reallocate the block, now, any page aligned write can opportunistically clear errors as well. This change spans x86/mm, nvdimm, and fs/dax, and has received the appropriate sign-offs. Thanks to Jane for her work on this. Summary: - Add support for clearing memory error via pwrite(2) on DAX - Fix 'security overwrite' support in the presence of media errors - Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)" * tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: pmem: implement pmem_recovery_write() pmem: refactor pmem_clear_poison() dax: add .recovery_write dax_operation dax: introduce DAX_RECOVERY_WRITE dax access mode mce: fix set_mce_nospec to always unmap the whole page x86/mce: relocate set{clear}_mce_nospec() functions acpi/nfit: rely on mce->misc to determine poison granularity testing: nvdimm: asm/mce.h is not needed in nfit.c testing: nvdimm: iomap: make __nfit_test_ioremap a macro nvdimm: Allow overwrite in the presence of disabled dimms tools/testing/nvdimm: remove unneeded flush_workqueue
2022-05-16dax: add .recovery_write dax_operationJane Chu
Introduce dax_recovery_write() operation. The function is used to recover a dax range that contains poison. Typical use case is when a user process receives a SIGBUS with si_code BUS_MCEERR_AR indicating poison(s) in a dax range, in response, the user process issues a pwrite() to the page-aligned dax range, thus clears the poison and puts valid data in the range. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16dax: introduce DAX_RECOVERY_WRITE dax access modeJane Chu
Up till now, dax_direct_access() is used implicitly for normal access, but for the purpose of recovery write, dax range with poison is requested. To make the interface clear, introduce enum dax_access_mode { DAX_ACCESS, DAX_RECOVERY_WRITE, } where DAX_ACCESS is used for normal dax access, and DAX_RECOVERY_WRITE is used for dax recovery write. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-04-17block: remove QUEUE_FLAG_DISCARDChristoph Hellwig
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: pass a block_device and opf to bio_allocChristoph Hellwig
Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02dm: bio_alloc can't fail if it is allowed to sleepChristoph Hellwig
Remove handling of NULL returns from sleeping bio_alloc calls given that those can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-18dax: remove the copy_from_iter and copy_to_iter methodsChristoph Hellwig
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm-log-writes: add a log_writes_dax_pgoff helperChristoph Hellwig
Add a helper to perform the entire remapping for DAX accesses. This helper open codes bdev_dax_pgoff given that the alignment checks have already been done by the submitting file system and don't need to be repeated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-11-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm: make the DAX support depend on CONFIG_FS_DAXChristoph Hellwig
The device mapper DAX support is all hanging off a block device and thus can't be used with device dax. Make it depend on CONFIG_FS_DAX instead of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to be built under CONFIG_FS_DAX now. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-09Merge tag 'for-5.16/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add DM core support for emitting audit events through the audit subsystem. Also enhance both the integrity and crypt targets to emit events to via dm-audit. - Various other simple code improvements and cleanups. * tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm table: log table creation error code dm: make workqueue names device-specific dm writecache: Make use of the helper macro kthread_run() dm crypt: Make use of the helper macro kthread_run() dm verity: use bvec_kmap_local in verity_for_bv_block dm log writes: use memcpy_from_bvec in log_writes_map dm integrity: use bvec_kmap_local in __journal_read_write dm integrity: use bvec_kmap_local in integrity_metadata dm: add add_disk() error handling dm: Remove redundant flush_workqueue() calls dm crypt: log aead integrity violations to audit subsystem dm integrity: log audit events for dm-integrity target dm: introduce audit event module for device mapper
2021-11-01dm log writes: use memcpy_from_bvec in log_writes_mapChristoph Hellwig
Use memcpy_from_bvec instead of open coding the logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-10-18dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding themChristoph Hellwig
Use the proper helpers to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211018101130.1838532-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-10dm: update target status functions to support IMA measurementTushar Sugandhi
For device mapper targets to take advantage of IMA's measurement capabilities, the status functions for the individual targets need to be updated to handle the status_type_t case for value STATUSTYPE_IMA. Update status functions for the following target types, to log their respective attributes to be measured using IMA. 01. cache 02. crypt 03. integrity 04. linear 05. mirror 06. multipath 07. raid 08. snapshot 09. striped 10. verity For rest of the targets, handle the STATUSTYPE_IMA case by setting the measurement buffer to NULL. For IMA to measure the data on a given system, the IMA policy on the system needs to be updated to have the following line, and the system needs to be restarted for the measurements to take effect. /etc/ima/ima-policy measure func=CRITICAL_DATA label=device-mapper template=ima-buf The measurements will be reflected in the IMA logs, which are located at: /sys/kernel/security/integrity/ima/ascii_runtime_measurements /sys/kernel/security/integrity/ima/binary_runtime_measurements These IMA logs can later be consumed by various attestation clients running on the system, and send them to external services for attesting the system. The DM target data measured by IMA subsystem can alternatively be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with DM_TABLE_STATUS_CMD. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-26block: Add bio_max_segsMatthew Wilcox (Oracle)
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to be unsigned to make it easier for the users. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-20dm: replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-04-02dm,dax: Add dax zero_page_range operationVivek Goyal
This patch adds support for dax zero_page_range operation to dm targets. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20200228163456.1587-5-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-07-09dm log writes: fix incorrect comment about the logged sequence exampleQu Wenruo
dm-log-writes records the sequence of completion, not submission, thus for the following sequence (W=write, C=complete): Wa,Wb,Wc,Cc,Ca,FLUSH,FUAd,Cb,CFLUSH,CFUAd The logged results in log device should be: c,a,b,flush,fua Fix the comment to give a better example. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-09dm log writes: use struct_size() to calculate size of pending_blockZhengyuan Liu
Use struct_size() to avoid open-coded equivalent that is prone to a type mistake. Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-06-25dm log writes: make sure super sector log updates are written in orderzhangyi (F)
Currently, although we submit super bios in order (and super.nr_entries is incremented by each logged entry), submit_bio() is async so each super sector may not be written to log device in order and then the final nr_entries may be smaller than it should be. This problem can be reproduced by the xfstests generic/455 with ext4: QA output created by 455 -Silence is golden +mark 'end' does not exist Fix this by serializing submission of super sectors to make sure each is written to the log disk in order. Fixes: 0e9cebe724597 ("dm: add log writes target") Cc: stable@vger.kernel.org Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Suggested-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-05-22dax: Introduce a ->copy_to_iter dax operationDan Williams
Similar to the ->copy_from_iter() operation, a platform may want to deploy an architecture or device specific routine for handling reads from a dax_device like /dev/pmemX. On x86 this routine will point to a machine check safe version of copy_to_iter(). For now, add the plumbing to device-mapper and the dax core. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
2018-04-04dm: remove fmode_t argument from .prepare_ioctl hookMike Snitzer
Use the fmode_t that is passed to dm_blk_ioctl() rather than inconsistently (varies across targets) drop it on the floor by overriding it with the fmode_t stored in 'struct dm_dev'. All the persistent reservation functions weren't using the fmode_t they got back from .prepare_ioctl so remove them. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm log writes: record metadata flag for better flags recordQu Wenruo
So developer could distinguish data and metadata bios easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm: allow targets to return output from messages they are sentMike Snitzer
Could be useful for a target to return stats or other information. If a target does DMEMIT() anything to @result from its .message method then it must return 1 to the caller. Signed-off-By: Mike Snitzer <snitzer@redhat.com>
2018-04-03dax, dm: allow device-mapper to operate without dax supportDan Williams
Change device-mapper's DAX dependency to require the presence of at least one DAX_DRIVER. This allows device-mapper to be built without bringing the DAX core along which is especially wasteful when there are no DAX drivers, like BLK_DEV_PMEM, configured. Cc: Alasdair Kergon <agk@redhat.com> Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-17dm log writes: fix max length used for kstrndupMa Shimiao
If source string is longer than max, kstrndup will allocate max+1 space. So make sure the result will not exceed max. Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10dm log writes: add support for DAXRoss Zwisler
Now that we have the ability log filesystem writes using a flat buffer, add support for DAX. The motivation for this support is the need for an xfstest that can test the new MAP_SYNC DAX flag. By logging the filesystem activity with dm-log-writes we can show that the MAP_SYNC page faults are writing out their metadata as they happen, instead of requiring an explicit msync/fsync. Unfortunately we can't easily track data that has been written via mmap() now that the dax_flush() abstraction was removed by commit c3ca015fab6d ("dax: remove the pmem_dax_ops->flush abstraction"). Otherwise we could just treat each flush as a big write, and store the data that is being synced to media. It may be worthwhile to add the dax_flush() entry point back, just as a notifier so we can do this logging. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10dm log writes: add support for inline data buffersRoss Zwisler
Currently dm-log-writes supports writing filesystem data via BIOs, and writing internal metadata from a flat buffer via write_metadata(). For DAX writes, though, we won't have a BIO, but will instead have an iterator that we'll want to use to fill a flat data buffer. So, create write_inline_data() which allows us to write filesystem data using a flat buffer as a source, and wire it up in log_one_block(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-09-14Merge tag 'for-4.14/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Some request-based DM core and DM multipath fixes and cleanups - Constify a few variables in DM core and DM integrity - Add bufio optimization and checksum failure accounting to DM integrity - Fix DM integrity to avoid checking integrity of failed reads - Fix DM integrity to use init_completion - A couple DM log-writes target fixes - Simplify DAX flushing by eliminating the unnecessary flush abstraction that was stood up for DM's use. * tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dax: remove the pmem_dax_ops->flush abstraction dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK dm integrity: make blk_integrity_profile structure const dm integrity: do not check integrity for failed read operations dm log writes: fix >512b sectorsize support dm log writes: don't use all the cpu while waiting to log blocks dm ioctl: constify ioctl lookup table dm: constify argument arrays dm integrity: count and display checksum failures dm integrity: optimize writing dm-bufio buffers that are partially changed dm rq: do not update rq partially in each ending bio dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior dm mpath: complain about unsupported __multipath_map_bio() return values dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
2017-09-11dm log writes: fix >512b sectorsize supportJosef Bacik
512b sectors vs device's physical sectorsize was not maintained consistently and as such the support for >512b sector devices has bugs. The log metadata expects native sectorsize but 512b sectors were being stored. Also, device's sectorsize was assumed when assigning the bi_sector for blocks that were being logged. Fix this up by adding two helpers to convert between bio and dev sectors, and use these in the appropriate places to fix the problem and make it clear which units go where. Doing so allows dm-log-writes use with 4k devices. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-09-11dm log writes: don't use all the cpu while waiting to log blocksJosef Bacik
The check to see if the logging kthread needs to go to sleep is wrong, it checks lc->pending_blocks, which will be non-0 if there are any blocks that are pending, whether they are ready to be logged or not. What we really want is to go to sleep until it's time to log blocks, so change this check so we do actually go to sleep in between flushes. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-23block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-09block: switch bios to blk_status_tChristoph Hellwig
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09dm: change ->end_io calling conventionChristoph Hellwig
Turn the error paramter into a pointer so that target drivers can change the value, and make sure only DM_ENDIO_* values are returned from the methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09dm: don't return errnos from ->mapChristoph Hellwig
Instead use the special DM_MAPIO_KILL return value to return -EIO just like we do for the request based path. Note that dm-log-writes returned -ENOMEM in a few places, which now becomes -EIO instead. No consumer treats -ENOMEM special so this shouldn't be an issue (and it should use a mempool to start with to make guaranteed progress). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-07Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "This is the main pull request for block layer changes in 4.9. As mentioned at the last merge window, I've changed things up and now do just one branch for core block layer changes, and driver changes. This avoids dependencies between the two branches. Outside of this main pull request, there are two topical branches coming as well. This pull request contains: - A set of fixes, and a conversion to blk-mq, of nbd. From Josef. - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd. Followup dependency fix from Geert. - General fixes from Bart, Baoyou, Guoqing, and Linus W. - CFQ async write starvation fix from Glauber. - Add supprot for delayed kick of the requeue list, from Mike. - Pull out the scalable bitmap code from blk-mq-tag.c and make it generally available under the name of sbitmap. Only blk-mq-tag uses it for now, but the blk-mq scheduling bits will use it as well. From Omar. - bdev thaw error progagation from Pierre. - Improve the blk polling statistics, and allow the user to clear them. From Stephen. - Set of minor cleanups from Christoph in block/blk-mq. - Set of cleanups and optimizations from me for block/blk-mq. - Various nvme/nvmet/nvmeof fixes from the various folks" * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits) fs/block_dev.c: return the right error in thaw_bdev() nvme: Pass pointers, not dma addresses, to nvme_get/set_features() nvme/scsi: Remove power management support nvmet: Make dsm number of ranges zero based nvmet: Use direct IO for writes admin-cmd: Added smart-log command support. nvme-fabrics: Add host_traddr options field to host infrastructure nvme-fabrics: revise host transport option descriptions nvme-fabrics: rework nvmf_get_address() for variable options nbd: use BLK_MQ_F_BLOCKING blkcg: Annotate blkg_hint correctly cfq: fix starvation of asynchronous writes blk-mq: add flag for drivers wanting blocking ->queue_rq() blk-mq: remove non-blocking pass in blk_mq_map_request blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() block: export bio_free_pages to other modules lightnvm: propagate device_add() error code lightnvm: expose device geometry through sysfs lightnvm: control life of nvm_dev in driver blk-mq: register device instead of disk ...
2016-09-22block: export bio_free_pages to other modulesGuoqing Jiang
bio_free_pages is introduced in commit 1dfa0f68c040 ("block: add a helper to free bio bounce buffer pages"), we can reuse the func in other modules after it was imported. Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@fb.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-30dm log writes: fix check of kthread_run() return valueVladimir Zapolskiy
The kthread_run() function returns either a valid task_struct or ERR_PTR() value, check for NULL is invalid. This change fixes potential for oops, e.g. in OOM situation. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2016-08-30dm log writes: fix bug with too large biosMikulas Patocka
bio_alloc() can allocate a bio with at most BIO_MAX_PAGES (256) vector entries. However, the incoming bio may have more vector entries if it was allocated by other means. For example, bcache submits bios with more than BIO_MAX_PAGES entries. This results in bio_alloc() failure. To avoid the failure, change the code so that it allocates bio with at most BIO_MAX_PAGES entries. If the incoming bio has more entries, bio_add_page() will fail and a new bio will be allocated - the code that handles bio_add_page() failure already exists in the dm-log-writes target. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Josef Bacik <jbacik@fb,com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v4.1+
2016-08-30dm log writes: move IO accounting earlier to fix error pathMikulas Patocka
Move log_one_block()'s atomic_inc(&lc->io_blocks) before bio_alloc() to fix a bug that the target hangs if bio_alloc() fails. The error path does put_io_block(lc), so atomic_inc(&lc->io_blocks) must occur before invoking the error path to avoid underflow of lc->io_blocks. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Josef Bacik <jbacik@fb,com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2016-08-07block: rename bio bi_rw to bi_opfJens Axboe
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSHMike Christie
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>