summaryrefslogtreecommitdiff
path: root/drivers/dax/super.c
AgeCommit message (Collapse)Author
2018-01-19dax: require 'struct page' by default for filesystem daxDan Williams
If a dax buffer from a device that does not map pages is passed to read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If gdb attempts to examine the contents of a dax buffer from a device that does not map pages it triggers SIGBUS. If fork(2) is called on a process with a dax mapping from a device that does not map pages it triggers SIGBUS. 'struct page' is required otherwise several kernel code paths break in surprising ways. Disable filesystem-dax on devices that do not map pages. In addition to needing pfn_to_page() to be valid we also require devmap pages. We need this to detect dax pages in the get_user_pages_fast() path and so that we can stop managing the VM_MIXEDMAP flag. For DAX drivers that have not supported get_user_pages() to date we allow them to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration option which requires ->direct_access() to return pfn_t_special() pfns. This leaves DAX support in brd disabled and scheduled for removal. Note that when the initial dax support was being merged a few years back there was concern that struct page was unsuitable for use with next generation persistent memory devices. The theoretical concern was that struct page access, being such a hotly used data structure in the kernel, would lead to media wear out. While that was a reasonable conservative starting position it has not held true in practice. We have long since committed to using devm_memremap_pages() to support higher order kernel functionality that needs get_user_pages() and pfn_to_page(). Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-14dax: fix general protection fault in dax_alloc_inodeMikulas Patocka
Don't crash in case of allocation failure in dax_alloc_inode. syzkaller hit the following crash on e4880bc5dfb1 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access [..] RIP: 0010:dax_alloc_inode+0x3b/0x70 drivers/dax/super.c:348 Call Trace: alloc_inode+0x65/0x180 fs/inode.c:208 new_inode_pseudo+0x69/0x190 fs/inode.c:890 new_inode+0x1c/0x40 fs/inode.c:919 mount_pseudo_xattr+0x288/0x560 fs/libfs.c:261 mount_pseudo include/linux/fs.h:2137 [inline] dax_mount+0x2e/0x40 drivers/dax/super.c:388 mount_fs+0x66/0x2d0 fs/super.c:1223 Cc: <stable@vger.kernel.org> Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider...") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-14dax: stop requiring a live device for dax_flush()Dan Williams
Now that dax_flush() is no longer a driver callback (commit c3ca015fab6d "dax: remove the pmem_dax_ops->flush abstraction"), stop requiring the dax_read_lock() to be held and the device to be alive. This is in preparation for switching filesystem-dax to store pfns instead of sectors in the radix. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-14dax: quiet bdev_dax_supported()Dan Williams
Before we add another failure reason, quiet the existing log messages. Leave it to the caller to decide if bdev_dax_supported() failures are errors worth emitting to the log. Reported-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-09-14Merge tag 'for-4.14/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Some request-based DM core and DM multipath fixes and cleanups - Constify a few variables in DM core and DM integrity - Add bufio optimization and checksum failure accounting to DM integrity - Fix DM integrity to avoid checking integrity of failed reads - Fix DM integrity to use init_completion - A couple DM log-writes target fixes - Simplify DAX flushing by eliminating the unnecessary flush abstraction that was stood up for DM's use. * tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dax: remove the pmem_dax_ops->flush abstraction dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK dm integrity: make blk_integrity_profile structure const dm integrity: do not check integrity for failed read operations dm log writes: fix >512b sectorsize support dm log writes: don't use all the cpu while waiting to log blocks dm ioctl: constify ioctl lookup table dm: constify argument arrays dm integrity: count and display checksum failures dm integrity: optimize writing dm-bufio buffers that are partially changed dm rq: do not update rq partially in each ending bio dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior dm mpath: complain about unsupported __multipath_map_bio() return values dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
2017-09-11dax: remove the pmem_dax_ops->flush abstractionMikulas Patocka
Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is buggy. A DM device may be composed of multiple underlying devices and all of them need to be flushed. That commit just routes the flush request to the first device and ignores the other devices. It could be fixed by adding more complex logic to the device mapper. But there is only one implementation of the method pmem_dax_ops->flush - that is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we don't need the pmem_dax_ops->flush abstraction at all, we can call arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush can't ever reach anything different from arch_wb_cache_pmem(). It should be also pointed out that for some uses of persistent memory it is needed to flush only a very small amount of data (such as 1 cacheline), and it would be overkill if we go through that device mapper machinery for a single flushed cache line. Fix this by removing the pmem_dax_ops->flush abstraction and call arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device mapper code that forwards the flushes. Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-09-03dax: fix FS_DAX=n BLOCK=y compilationDan Williams
The 0day kbuild robot reports: >> drivers//dax/super.c:64:20: error: redefinition of 'fs_dax_get_by_bdev' struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev) ^~~~~~~~~~~~~~~~~~ In file included from drivers//dax/super.c:22:0: include/linux/dax.h:76:34: note: previous definition of 'fs_dax_get_by_bdev' was here static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev) ^~~~~~~~~~~~~~~~~~ Protect the definition of fs_dax_get_by_bdev() in drivers/dax/super.c with an ifdef. Fixes: 78f354735081 ("dax: introduce a fs_dax_get_by_bdev() helper") Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-08-30dax: introduce a fs_dax_get_by_bdev() helperDan Williams
Add a helper that can replace the following common pattern: if (blk_queue_dax(bdev->bd_queue)) fs_dax_get_by_host(bdev->bd_disk->disk_name); This will be used to move dax_device lookup from iomap-operation time to fs-mount time. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-07-26dm, dax: Make sure dm_dax_flush() is called if device supports itVivek Goyal
Currently dm_dax_flush() is not being called, even if underlying dax device supports write cache, because DAXDEV_WRITE_CACHE is not being propagated up to the DM dax device. If the underlying dax device supports write cache, set DAXDEV_WRITE_CACHE on the DM dax device. This will cause dm_dax_flush() to be called. Fixes: abebfbe2f7 ("dm: add ->flush() dax operation support") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-07-07Merge tag 'libnvdimm-for-4.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "libnvdimm updates for the latest ACPI and UEFI specifications. This pull request also includes new 'struct dax_operations' enabling to undo the abuse of copy_user_nocache() for copy operations to pmem. The dax work originally missed 4.12 to address concerns raised by Al. Summary: - Introduce the _flushcache() family of memory copy helpers and use them for persistent memory write operations on x86. The _flushcache() semantic indicates that the cache is either bypassed for the copy operation (movnt) or any lines dirtied by the copy operation are written back (clwb, clflushopt, or clflush). - Extend dax_operations with ->copy_from_iter() and ->flush() operations. These operations and other infrastructure updates allow all persistent memory specific dax functionality to be pushed into libnvdimm and the pmem driver directly. It also allows dax-specific sysfs attributes to be linked to a host device, for example: /sys/block/pmem0/dax/write_cache - Add support for the new NVDIMM platform/firmware mechanisms introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace label format, extensions to the address-range-scrub command set, new error injection commands, and a new BTT (block-translation-table) layout. These updates support inter-OS and pre-OS compatibility. - Fix a longstanding memory corruption bug in nfit_test. - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2) capable. - Miscellaneous fixes and small updates across libnvdimm and the nfit driver. Acknowledgements that came after the branch was pushed: commit 6aa734a2f38e ("libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani <toshi.kani@hpe.com>" * tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits) libnvdimm, namespace: record 'lbasize' for pmem namespaces acpi/nfit: Issue Start ARS to retrieve existing records libnvdimm: New ACPI 6.2 DSM functions acpi, nfit: Show bus_dsm_mask in sysfs libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru. acpi, nfit: Enable DSM pass thru for root functions. libnvdimm: passthru functions clear to send libnvdimm, btt: convert some info messages to warn/err libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime libnvdimm: fix the clear-error check in nsio_rw_bytes libnvdimm, btt: fix btt_rw_page not returning errors acpi, nfit: quiet invalid block-aperture-region warnings libnvdimm, btt: BTT updates for UEFI 2.7 format acpi, nfit: constify *_attribute_group libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region libnvdimm, pmem, dax: export a cache control attribute dax: convert to bitmask for flags dax: remove default copy_from_iter fallback libnvdimm, nfit: enable support for volatile ranges libnvdimm, pmem: fix persistence warning ...
2017-06-29libnvdimm, pmem, dax: export a cache control attributeDan Williams
The dax_flush() operation can be turned into a nop on platforms where firmware arranges for cpu caches to be flushed on a power-fail event. The ACPI 6.2 specification defines a mechanism for the platform to indicate this capability so the kernel can select the proper default. However, for other platforms, the administrator must toggle this setting manually. Given this flush setting is a dax-specific mechanism we advertise it through a 'dax' attribute group hanging off a host device. For example, a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a 'write_cache' attribute to control response to dax cache flush requests. This is similar to the 'queue/write_cache' attribute that appears under block devices. Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-29dax: convert to bitmask for flagsDan Williams
In preparation for adding more flags, convert the existing flag to a bit-flag. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-27dax: remove default copy_from_iter fallbackDan Williams
Require all dax-drivers to register a ->copy_from_iter() operation so that it is clear which dax_operations are optional and which must be implemented for filesystem-dax to operate. Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15dm: add ->flush() dax operation supportDan Williams
Allow device-mapper to route flush operations to the per-target implementation. In order for the device stacking to work we need a dax_dev and a pgoff relative to that device. This gives each layer of the stack the information it needs to look up the operation pointer for the next level. This conceptually allows for an array of mixed device drivers with varying flush implementations. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-09dm: add ->copy_from_iter() dax operation supportDan Williams
Allow device-mapper to route copy_from_iter operations to the per-target implementation. In order for the device stacking to work we need a dax_dev and a pgoff relative to that device. This gives each layer of the stack the information it needs to look up the operation pointer for the next level. This conceptually allows for an array of mixed device drivers with varying copy_from_iter implementations. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-09device-dax: fix 'dax' device filesystem inode destruction crashDan Williams
The inode destruction path for the 'dax' device filesystem incorrectly assumes that the inode was initialized through 'alloc_dax()'. However, if someone attempts to directly mount the dax filesystem with 'mount -t dax dax mnt' that will bypass 'alloc_dax()' and the following failure signatures may occur as a result: kill_dax() must be called before final iput() WARNING: CPU: 2 PID: 1188 at drivers/dax/super.c:243 dax_destroy_inode+0x48/0x50 RIP: 0010:dax_destroy_inode+0x48/0x50 Call Trace: destroy_inode+0x3b/0x60 evict+0x139/0x1c0 iput+0x1f9/0x2d0 dentry_unlink_inode+0xc3/0x160 __dentry_kill+0xcf/0x180 ? dput+0x37/0x3b0 dput+0x3a3/0x3b0 do_one_tree+0x36/0x40 shrink_dcache_for_umount+0x2d/0x90 generic_shutdown_super+0x1f/0x120 kill_anon_super+0x12/0x20 deactivate_locked_super+0x43/0x70 deactivate_super+0x4e/0x60 general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC RIP: 0010:kfree+0x6d/0x290 Call Trace: <IRQ> dax_i_callback+0x22/0x60 ? dax_destroy_inode+0x50/0x50 rcu_process_callbacks+0x298/0x740 ida_remove called for id=0 which is not allocated. WARNING: CPU: 0 PID: 0 at lib/idr.c:383 ida_remove+0x110/0x120 [..] Call Trace: <IRQ> ida_simple_remove+0x2b/0x50 ? dax_destroy_inode+0x50/0x50 dax_i_callback+0x3c/0x60 rcu_process_callbacks+0x298/0x740 Add missing initialization of the 'struct dax_device' and inode so that the destruction path does not kfree() or ida_simple_remove() uninitialized data. Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider of 'struct dax_device' instances") Reported-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-13dax: fix false CONFIG_BLOCK dependencyDan Williams
In the BLOCK=n case the dax core does not need to / must not emit the block-device-dax helpers. Otherwise it leads to compile errors. Cc: Arnd Bergmann <arnd@arndb.de> Reported-by: Fabian Frederick <fabf@skynet.be> Fixes: ef51042472f5 ("block, dax: move 'select DAX' from BLOCK to FS_DAX") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-09device-dax: kill NR_DEV_DAXDan Williams
There is no point to ask how many device-dax instances the kernel should support. Since we are already using a dynamic major number, just allow the max number of minors by default and be done. This also fixes the fact that the proposed max for the NR_DEV_DAX range was larger than what could be supported by alloc_chrdev_region(). Fixes: ba09c01d2fa8 ("dax: convert to the cdev api") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-08block, dax: move "select DAX" from BLOCK to FS_DAXDan Williams
For configurations that do not enable DAX filesystems or drivers, do not require the DAX core to be built. Given that the 'direct_access' method has been removed from 'block_device_operations', we can also go ahead and remove the block-related dax helper functions from fs/block_dev.c to drivers/dax/super.c. This keeps dax details out of the block layer and lets the DAX core be built as a module in the FS_DAX=n case. Filesystems need to include dax.h to call bdev_dax_supported(). Cc: linux-xfs@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-20dax: introduce dax_direct_access()Dan Williams
Replace bdev_direct_access() with dax_direct_access() that uses dax_device and dax_operations instead of a block_device and block_device_operations for dax. Once all consumers of the old api have been converted bdev_direct_access() will be deleted. Given that block device partitioning decisions can cause dax page alignment constraints to be violated this also introduces the bdev_dax_pgoff() helper. It handles calculating a logical pgoff relative to the dax_device and also checks for page alignment. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-19dax: introduce dax_operationsDan Williams
Track a set of dax_operations per dax_device that can be set at alloc_dax() time. These operations will be used to stop the abuse of block_device_operations for communicating dax capabilities to filesystems. It will also be used to replace the "pmem api" and move pmem-specific cache maintenance, and other dax-driver-specific filesystem-dax operations, to dax device methods. In particular this allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(), with a driver specific replacement. This is a standalone introduction of the operations. Follow on patches convert each dax-driver and teach fs/dax.c to use ->direct_access() from dax_operations instead of block_device_operations. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-19dax: add a facility to lookup a dax device by 'host' device nameDan Williams
For the current block_device based filesystem-dax path, we need a way for it to lookup the dax_device associated with a block_device. Add a 'host' property of a dax_device that can be used for this purpose. It is a free form string, but for a dax_device associated with a block device it is the bdev name. This is a stop-gap until filesystems are able to mount on a dax-inode directly. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-12dax: refactor dax-fs into a generic provider of 'struct dax_device' instancesDan Williams
We want dax capable drivers to be able to publish a set of dax operations [1]. However, we do not want to further abuse block_devices to advertise these operations. Instead we will attach these operations to a dax device and add a lookup mechanism to go from block device path to a dax device. A dax capable driver like pmem or brd is responsible for registering a dax device, alongside a block device, and then a dax capable filesystem is responsible for retrieving the dax device by path name if it wants to call dax_operations. For now, we refactor the dax pseudo-fs to be a generic facility, rather than an implementation detail, of the device-dax use case. Where a "dax device" is just an inode + dax infrastructure, and "Device DAX" is a mapping service layered on top of that base 'struct dax_device'. "Filesystem DAX" is then a mapping service that layers a filesystem on top of that same base device. Filesystem DAX is associated with a block_device for now, but perhaps directly to a dax device in the future, or for new pmem-only filesystems. [1]: https://lkml.org/lkml/2017/1/19/880 Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>