summaryrefslogtreecommitdiff
path: root/drivers/nvdimm
AgeCommit message (Collapse)Author
2022-03-30Merge tag 'libnvdimm-for-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The update for this cycle includes the deprecation of block-aperture mode and a new perf events interface for the papr_scm nvdimm driver. The perf events approach was acked by PeterZ. - Add perf support for nvdimm events, initially only for 'papr_scm' devices. - Deprecate the 'block aperture' support in libnvdimm, it only ever existed in the specification, not in shipping product" * tag 'libnvdimm-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/blk: Fix title level MAINTAINERS: remove section LIBNVDIMM BLK: MMIO-APERTURE DRIVER powerpc/papr_scm: Fix build failure when drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not set nvdimm/region: Delete nd_blk_region infrastructure ACPI: NFIT: Remove block aperture support nvdimm/namespace: Delete nd_namespace_blk nvdimm/namespace: Delete blk namespace consideration in shared paths nvdimm/blk: Delete the block-aperture window driver nvdimm/region: Fix default alignment for small regions docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for nvdimm pmu powerpc/papr_scm: Add perf interface support drivers/nvdimm: Add perf interface to expose nvdimm performance stats drivers/nvdimm: Add nvdimm pmu structure
2022-03-24Merge tag 'cxl-for-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL (Compute Express Link) updates from Dan Williams: "This development cycle extends the subsystem to discover CXL resources throughout a CXL/PCIe switch topology and respond to hot add/remove events anywhere in that topology. This is more foundational infrastructure in preparation for dynamic memory region provisioning support. Recall that CXL memory regions, as the new "Theory of Operation" section of Documentation/driver-api/cxl/memory-devices.rst describes, bring storage volume striping semantics to memory. The hot add/remove behavior is validated with extensions to the cxl_test unit test environment and this test in the cxl-cli test suite: https://github.com/pmem/ndctl/blob/djbw/for-74/cxl/test/cxl-topology.sh Summary: - Add a driver for 'struct cxl_memdev' objects responsible for CXL.mem operation as distinct from 'cxl_pci' mailbox operations. Its primary responsibility is enumerating an endpoint 'struct cxl_port' and all the 'struct cxl_port' instances between an endpoint and the CXL platform root. - Add a driver for 'struct cxl_port' objects responsible for enumerating and operating all Host-managed Device Memory (HDM) decoder resources between the platform-level CXL memory description, all intervening host bridges / switches, and the HDM resources in endpoints. - Update the cxl_pci driver to validate CXL.mem operation precursors to HDM decoder operation like ready-polling, and legacy CXL 1.1 DVSEC based CXL.mem configuration. - Add basic lockdep coverage for usage of device_lock() on CXL subsystem objects similar to what exists for LIBNVDIMM. Include a compile-time switch for which subsystem to validate at run-time. - Update cxl_test to emulate a one level switch topology. - Document a "Theory of Operation" for the subsystem. - Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs - Include miscellaneous fixes for spec / QEMU CXL emulation compatibility and static analysis reports" * tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (48 commits) cxl/core/port: Fix NULL but dereferenced coccicheck error cxl/port: Hold port reference until decoder release cxl/port: Fix endpoint refcount leak cxl/core: Fix cxl_device_lock() class detection cxl/core/port: Fix unregister_port() lock assertion cxl/regs: Fix size of CXL Capability Header Register cxl/core/port: Handle invalid decoders cxl/core/port: Fix / relax decoder target enumeration tools/testing/cxl: Add a physical_node link tools/testing/cxl: Enumerate mock decoders tools/testing/cxl: Mock one level of switches tools/testing/cxl: Fix root port to host bridge assignment tools/testing/cxl: Mock dvsec_ranges() cxl/core/port: Add endpoint decoders cxl/core: Move target_list out of base decoder attributes cxl/mem: Add the cxl_mem driver cxl/core/port: Add switch port enumeration cxl/memdev: Add numa_node attribute cxl/pci: Emit device serial number cxl/pci: Implement wait for media active ...
2022-03-23drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not setKajol Jain
The following build failure occurs when CONFIG_PERF_EVENTS is not set as generic pmu functions are not visible in that scenario. |-- s390-randconfig-r044-20220313 | |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_migrate_context | |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_register | `-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_unregister Similar build failure in nds32 architecture: nd_perf.c:(.text+0x21e): undefined reference to `perf_pmu_migrate_context' nd_perf.c:(.text+0x434): undefined reference to `perf_pmu_register' nd_perf.c:(.text+0x57c): undefined reference to `perf_pmu_unregister' Fix this issue by adding check for CONFIG_PERF_EVENTS config option and disabling the nvdimm perf interface incase this config is not set. Also remove function declaration of perf_pmu_migrate_context, perf_pmu_register, perf_pmu_unregister functions from nd.h as these are common pmu functions which are part of perf_event.h and since we are disabling nvdimm perf interface incase CONFIG_PERF_EVENTS option is not set, we not need to declare them in nd.h Also move the platform_device header file addition part from nd.h to nd_perf.c and add stub functions for register_nvdimm_pmu and unregister_nvdimm_pmu functions to handle CONFIG_PERF_EVENTS=n case. Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose nvdimm performance stats") (Commit id based on libnvdimm-for-next tree) Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Link: https://lore.kernel.org/all/62317124.YBQFU33+s%2FwdvWGj%25lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20220323164550.109768-1-kjain@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-22Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull folio updates from Matthew Wilcox: - Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/ - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/ - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) * tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits) mm/damon: minor cleanup for damon_pa_young selftests/vm/transhuge-stress: Support file-backed PMD folios mm/filemap: Support VM_HUGEPAGE for file mappings mm/readahead: Switch to page_cache_ra_order mm/readahead: Align file mappings for non-DAX mm/readahead: Add large folio readahead mm: Support arbitrary THP sizes mm: Make large folios depend on THP mm: Fix READ_ONLY_THP warning mm/filemap: Allow large folios to be added to the page cache mm: Turn can_split_huge_page() into can_split_folio() mm/vmscan: Convert pageout() to take a folio mm/vmscan: Turn page_check_references() into folio_check_references() mm/vmscan: Account large folios correctly mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios mm/vmscan: Free non-shmem folios without splitting them mm/rmap: Constify the rmap_walk_control argument mm/rmap: Convert rmap_walk() to take a folio mm: Turn page_anon_vma() into folio_anon_vma() mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() ...
2022-03-11nvdimm/region: Delete nd_blk_region infrastructureDan Williams
Now that the nd_namespace_blk infrastructure is removed, delete all the region machinery to coordinate provisioning aliased capacity between PMEM and BLK. Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/164688418803.2879318.1302315202397235855.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-11nvdimm/namespace: Delete nd_namespace_blkDan Williams
Now that none of the configuration paths consider BLK namespaces, delete the BLK namespace data and supporting code. Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/164688417727.2879318.11691110761800109662.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-11nvdimm/namespace: Delete blk namespace consideration in shared pathsDan Williams
Given is_namespace_blk() is never true outside of the NVDIMM unit tests delete the support from namespace device management. Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/164688417214.2879318.4698377272678028573.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-11nvdimm/blk: Delete the block-aperture window driverDan Williams
Block Aperture Window support was an attempt to layer an error model over PMEM for platforms that did not support machine-check-recovery. However, it was abandoned before it ever shipped, and only ever existed in the ACPI specification. Meanwhile Linux has carried a large pile of dead code for non-shipping infrastructure. For years it has been off to the side out of the way, but now CXL and recent directions with DAX support have the potential to collide with this code. In preparation for adding discontiguous namespace support, a pre-requisite for the nvdimm subsystem to replace device-mapper for striping + concatenation use cases, delete BLK aperture support. On the obscure chance that some hardware vendor shipped support for this mode, note that the driver will still keep BLK space reserved in the label area. So an end user in this case would still have the opportunity to report the regression to get BLK-mode support restored without risking the data they have on that device. Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/164688416668.2879318.16903178375774275120.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-11nvdimm/region: Fix default alignment for small regionsDan Williams
In preparation for removing BLK aperture support the NVDIMM unit tests discovered that the default alignment can be set higher than the capacity of the region. Fall back to PAGE_SIZE in that case. Given this has not been seen in the wild, elide notifying -stable. Fixes: 2522afb86a8c ("libnvdimm/region: Introduce an 'align' attribute") Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/164688416128.2879318.17890707310125575258.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-09drivers/nvdimm: Add perf interface to expose nvdimm performance statsKajol Jain
A common interface is added to get performance stats reporting support for nvdimm devices. Added interface defines supported event list, config fields for the event attributes and their corresponding bit values which are exported via sysfs. Interface also added support for pmu register/unregister functions, cpu hotplug feature along with macros for handling events addition via sysfs. It adds attribute groups for format, cpumask and events to the pmu structure. User could use the standard perf tool to access perf events exposed via nvdimm pmu. [Declare pmu functions in nd.h file to resolve implicit-function-declaration warning and make hotplug function static as reported by kernel test robot] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Link: https://lore.kernel.org/all/202202241242.zqzGkguy-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Madhavan Srinivasan <maddy@in.ibm.com> Link: https://lore.kernel.org/r/20220225143024.47947-3-kjain@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-04nvdimm-btt: use bvec_kmap_local in btt_rw_integrityChristoph Hellwig
Using local kmaps slightly reduces the chances to stray writes, and the bvec interface cleans up the code a little bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220303111905.321089-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04nvdimm-blk: use bvec_kmap_local in nd_blk_rw_integrityChristoph Hellwig
Using local kmaps slightly reduces the chances to stray writes, and the bvec interface cleans up the code a little bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220303111905.321089-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-03mm: don't include <linux/memremap.h> in <linux/mm.h>Christoph Hellwig
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-02-08cxl: Prove CXL lockingDan Williams
When CONFIG_PROVE_LOCKING is enabled the 'struct device' definition gets an additional mutex that is not clobbered by lockdep_set_novalidate_class() like the typical device_lock(). This allows for local annotation of subsystem locks with mutex_lock_nested() per the subsystem's object/lock hierarchy. For CXL, this primarily needs the ability to lock ports by depth and child objects of ports by their parent parent-port lock. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/164365853422.99383.1052399160445197427.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-02block: pass a block_device and opf to bio_allocChristoph Hellwig
Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: remove genhd.hChristoph Hellwig
There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes. - partial support for < MAX_ORDER - 1 granularity for virtio-mem - driver_override for vdpa - sysfs ABI documentation for vdpa - multiqueue config support for mlx5 vdpa - and misc fixes, cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits) vdpa/mlx5: Fix tracking of current number of VQs vdpa/mlx5: Fix is_index_valid() to refer to features vdpa: Protect vdpa reset with cf_mutex vdpa: Avoid taking cf_mutex lock on get status vdpa/vdpa_sim_net: Report max device capabilities vdpa: Use BIT_ULL for bit operations vdpa/vdpa_sim: Configure max supported virtqueues vdpa/mlx5: Report max device capabilities vdpa: Support reporting max device capabilities vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps() vdpa: Add support for returning device configuration information vdpa/mlx5: Support configuring max data virtqueue vdpa/mlx5: Fix config_attr_mask assignment vdpa: Allow to configure max data virtqueues vdpa: Read device configuration only if FEATURES_OK vdpa: Sync calls set/get config/status with cf_mutex vdpa/mlx5: Distribute RX virtqueues in RQT object vdpa: Provide interface to read driver features vdpa: clean up get_config_size ret value handling virtio_ring: mark ring unused on error ...
2022-01-14virtio: wrap config->reset callsMichael S. Tsirkin
This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-12-18dax: remove the copy_from_iter and copy_to_iter methodsChristoph Hellwig
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18dax: remove the DAXDEV_F_SYNC flagChristoph Hellwig
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()Christoph Hellwig
These two wrappers are never used. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211215084508.435401-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: remove dax_capableChristoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: simplify the dax_device <-> gendisk associationChristoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that want to associate their gendisk with a dax_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: remove CONFIG_DAX_DRIVERChristoph Hellwig
CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-10Merge tag 'libnvdimm-for-5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm update from Dan Williams: "A single cleanup that precedes some deeper PMEM/DAX reworks that did not settle in time for v5.16: - Continue the cleanup of the dax api in preparation for a dax-device block-device divorce" * tag 'libnvdimm-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/pmem: move dax_attribute_group from dax to pmem
2021-11-09Merge tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block driver updates from Jens Axboe: - Last series adding error handling support for add_disk() in drivers. After this one, and once the SCSI side has been merged, we can finally annotate add_disk() as must_check. (Luis) - bcache fixes (Coly) - zram fixes (Ming) - ataflop locking fix (Tetsuo) - nbd fixes (Ye, Yu) - MD merge via Song - Cleanup (Yang) - sysfs fix (Guoqing) - Misc fixes (Geert, Wu, luo) * tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block: (34 commits) bcache: Revert "bcache: use bvec_virt" ataflop: Add missing semicolon to return statement floppy: address add_disk() error handling on probe ataflop: address add_disk() error handling on probe block: update __register_blkdev() probe documentation ataflop: remove ataflop_probe_lock mutex mtd/ubi/block: add error handling support for add_disk() block/sunvdc: add error handling support for add_disk() z2ram: add error handling support for add_disk() nvdimm/pmem: use add_disk() error handling nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned nvdimm/blk: add error handling support for add_disk() nvdimm/blk: avoid calling del_gendisk() on early failures nvdimm/btt: add error handling support for add_disk() nvdimm/btt: use goto error labels on btt_blk_init() loop: Remove duplicate assignments drbd: Fix double free problem in drbd_create_device nvdimm/btt: do not call del_gendisk() if not needed bcache: fix use-after-free problem in bcache_device_free() zram: replace fsync_bdev with sync_blockdev ...
2021-11-08Merge tag 'cxl-for-5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dan Williams: "More preparation and plumbing work in the CXL subsystem. From an end user perspective the highlight here is lighting up the CXL Persistent Memory related commands (label read / write) with the generic ioctl() front-end in LIBNVDIMM. Otherwise, the ability to instantiate new persistent and volatile memory regions is still on track for v5.17. Summary: - Fix support for platforms that do not enumerate every ACPI0016 (CXL Host Bridge) in the CHBS (ACPI Host Bridge Structure). - Introduce a common pci_find_dvsec_capability() helper, clean up open coded implementations in various drivers. - Add 'cxl_test' for regression testing CXL subsystem ABIs. 'cxl_test' is a module built from tools/testing/cxl/ that mocks up a CXL topology to augment the nascent support for emulation of CXL devices in QEMU. - Convert libnvdimm to use the uuid API. - Complete the definition of CXL namespace labels in libnvdimm. - Tunnel libnvdimm label operations from nd_ioctl() back to the CXL mailbox driver. Enable 'ndctl {read,write}-labels' for CXL. - Continue to sort and refactor functionality into distinct driver and core-infrastructure buckets. For example, mailbox handling is now a generic core capability consumed by the PCI and cxl_test drivers" * tag 'cxl-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits) ocxl: Use pci core's DVSEC functionality cxl/pci: Use pci core's DVSEC functionality PCI: Add pci_find_dvsec_capability to find designated VSEC cxl/pci: Split cxl_pci_setup_regs() cxl/pci: Add @base to cxl_register_map cxl/pci: Make more use of cxl_register_map cxl/pci: Remove pci request/release regions cxl/pci: Fix NULL vs ERR_PTR confusion cxl/pci: Remove dev_dbg for unknown register blocks cxl/pci: Convert register block identifiers to an enum cxl/acpi: Do not fail cxl_acpi_probe() based on a missing CHBS cxl/pci: Disambiguate cxl_pci further from cxl_mem Documentation/cxl: Add bus internal docs cxl/core: Split decoder setup into alloc + add tools/testing/cxl: Introduce a mock memory device + driver cxl/mbox: Move command definitions to common location cxl/bus: Populate the target list at decoder create tools/testing/cxl: Introduce a mocked-up CXL port hierarchy cxl/pmem: Add support for multiple nvdimm-bridge objects cxl/pmem: Translate NVDIMM label commands to CXL label commands ...
2021-11-04nvdimm/pmem: use add_disk() error handlingLuis Chamberlain
Now that device_add_disk() supports returning an error, use that. We must unwind alloc_dax() on error. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-7-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-04nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assignedLuis Chamberlain
Prior to devm being able to use pmem_release_disk() there are other failure which can occur for which we must account for and release the disk for. Address those few cases. Fixes: 3dd60fb9d95d ("nvdimm/pmem: stop using q_usage_count as external pgmap refcount") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-6-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-04nvdimm/blk: add error handling support for add_disk()Luis Chamberlain
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Since nvdimm/blk uses devm we just need to move the devm registration towards the end. And in hindsight, that seems to also provide a fix given del_gendisk() should not be called unless the disk was already added via add_disk(). The probably of that issue happening is low though, like OOM while calling devm_add_action(), so the fix is minor. We manually unwind in case of add_disk() failure prior to the devm registration. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-5-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-04nvdimm/blk: avoid calling del_gendisk() on early failuresLuis Chamberlain
If nd_integrity_init() fails we'd get del_gendisk() called, but that's not correct as we should only call that if we're done with device_add_disk(). Fix this by providing unwinding prior to the devm call being registered and moving the devm registration to the very end. This should fix calling del_gendisk() if nd_integrity_init() fails. I only spotted this issue through code inspection. It does not fix any real world bug. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-4-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-04nvdimm/btt: add error handling support for add_disk()Luis Chamberlain
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-3-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-04nvdimm/btt: use goto error labels on btt_blk_init()Luis Chamberlain
This will make it easier to share common error paths. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-2-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-03nvdimm/btt: do not call del_gendisk() if not neededLuis Chamberlain
del_gendisk() should not called if the disk has not been added. Fix this. Fixes: 41cd8b70c37a ("libnvdimm, btt: add support for blk integrity") Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103165843.1402142-1-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-01Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...
2021-10-25nvdimm/pmem: stop using q_usage_count as external pgmap refcountChristoph Hellwig
Originally all DAX access when through block_device operations and thus needed a queue reference. But since commit cccbce671582 ("filesystem-dax: convert to dax_direct_access()") all this happens at the DAX device level which uses its own refcounting. Having the external refcount thus wasn't needed but has otherwise been harmless for long time. But now that "block: drain file system I/O on del_gendisk" waits for q_usage_count to reach 0 in del_gendisk this whole scheme can't work anymore (and pmem is the only driver abusing q_usage_count like that). So switch to the internal reference and remove the unbalanced blk_freeze_queue_start that is taken care of by del_gendisk. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019073641.2323410-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-18block: switch polling to be bio basedChristoph Hellwig
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move integrity handling out of <linux/blkdev.h>Christoph Hellwig
Split the integrity/metadata handling definitions out into a new header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-27nvdimm/pmem: fix creating the dax groupChristoph Hellwig
The recent block layer refactoring broke the way how the pmem driver abused device_add_disk. Fix this by properly passing the attribute groups to device_add_disk. Fixes: 52b85909f85d ("block: fold register_disk into device_add_disk") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210922173431.2454024-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-27nvdimm/pmem: move dax_attribute_group from dax to pmemChristoph Hellwig
dax_attribute_group is only used by the pmem driver, and can avoid the completely pointless lookup by the disk name if moved there. This leaves just a single caller of dax_get_by_host, so move dax_get_by_host into the same ifdef block as that caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210922173431.2454024-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-21libnvdimm/labels: Introduce CXL labelsDan Williams
Now that all of use sites of label data have been converted to nsl_* helpers, introduce the CXL label format. The ->cxl flag in nvdimm_drvdata indicates the label format the device expects. A follow-on patch allows a bus provider to select the label style. Note that the EFI definition of the labels represents the Linux "claim class" with a GUID. The CXL definition of the labels stores the same identifier in UUID byte order. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116432405.2460985.5547867384570123403.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-21libnvdimm/label: Define CXL region labelsDan Williams
Add a definition of the CXL 2.0 region label format. Note this is done as a separate patch to make the next patch that adds namespace label support easier to read. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116431893.2460985.4003511000574373922.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-21libnvdimm/labels: Fix kernel-doc for label.hDan Williams
Clean up existing kernel-doc warnings before adding new CXL label data structures. drivers/nvdimm/label.h:66: warning: Function parameter or member 'labelsize' not described in 'nd_namespace_index' drivers/nvdimm/label.h:66: warning: Function parameter or member 'free' not described in 'nd_namespace_index' drivers/nvdimm/label.h:103: warning: Function parameter or member 'align' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'reserved' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'type_guid' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'abstraction_guid' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'reserved2' not described in 'nd_namespace_label' drivers/nvdimm/label.h:103: warning: Function parameter or member 'checksum' not described in 'nd_namespace_label' Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116431381.2460985.6990754901097922099.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-21libnvdimm/labels: Introduce the concept of multi-range namespace labelsDan Williams
The CXL specification defines a mechanism for namespaces to be comprised of multiple dis-contiguous ranges. Introduce that concept to the legacy NVDIMM namespace implementation with a new nsl_set_nrange() helper, that sets the number of ranges to 1. Once the NVDIMM subsystem supports CXL labels and updates its namespace capacity provisioning for dis-contiguous support nsl_set_nrange() can be updated, but in the meantime CXL label validation requires nrange be non-zero. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116430804.2460985.5482188351381597529.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-21libnvdimm/label: Add a helper for nlabel validationDan Williams
In the CXL namespace label there is no need for nlabel since that is inferred from the region. Add a helper that moves nsl_get_label() behind a helper that validates the number of labels relative to the region. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116430293.2460985.12693942353621355232.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-21libnvdimm/labels: Add uuid helpersDan Williams
In preparation for CXL labels that move the uuid to a different offset in the label, add nsl_{ref,get,validate}_uuid(). These helpers use the proper uuid_t type. That type definition predated the libnvdimm subsystem, so now is as a good a time as any to convert all the uuid handling in the subsystem to uuid_t to match the helpers. Note that the uuid fields in the label data and superblocks is not replaced per Andy's expectation that uuid_t is a kernel internal type not to appear in external ABI interfaces. So, in those case {import,export}_uuid() is used to go between the 2 types. Also note that this rework uncovered some unnecessary copies for label comparisons, those are cleaned up with nsl_uuid_equal(). As for the whitespace changes, all new code is clang-format compliant. Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116429748.2460985.15659993454313919977.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-09Merge tag 'cxl-for-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL (Compute Express Link) updates from Dan Williams: - Fix detection of CXL host bridges to filter out disabled ACPI0016 devices in the ACPI DSDT. - Fix kernel lockdown integration to disable raw commands when raw PCI access is disabled. - Fix a broken debug message. - Add support for "Get Partition Info". I.e. enumerate the split between volatile and persistent capacity on bi-modal CXL memory expanders. - Re-factor the core by subject area. This is a work in progress. - Prepare libnvdimm to understand CXL labels in addition to EFI labels. This is a work in progress. * tag 'cxl-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (25 commits) cxl/registers: Fix Documentation warning cxl/pmem: Fix Documentation warning cxl/uapi: Fix defined but not used warnings cxl/pci: Fix debug message in cxl_probe_regs() cxl/pci: Fix lockdown level cxl/acpi: Do not add DSDT disabled ACPI0016 host bridge ports libnvdimm/labels: Add claim class helpers libnvdimm/labels: Add type-guid helpers libnvdimm/labels: Add blk special cases for nlabel and position helpers libnvdimm/labels: Add blk isetcookie set / validation helpers libnvdimm/labels: Add a checksum calculation helper libnvdimm/labels: Introduce label setter helpers libnvdimm/labels: Add isetcookie validation helper libnvdimm/labels: Introduce getters for namespace label fields cxl/mem: Adjust ram/pmem range to represent DPA ranges cxl/mem: Account for partitionable space in ram/pmem ranges cxl/pci: Store memory capacity values cxl/pci: Simplify register setup cxl/pci: Ignore unknown register block types cxl/core: Move memdev management to core ...
2021-09-09Merge tag 'libnvdimm-for-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Fix a race condition in the teardown path of raw mode pmem namespaces. - Cleanup the code that filesystems use to detect filesystem-dax capabilities of their underlying block device. * tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: remove bdev_dax_supported xfs: factor out a xfs_buftarg_is_dax helper dax: stub out dax_supported for !CONFIG_FS_DAX dax: remove __generic_fsdax_supported dax: move the dax_read_lock() locking into dax_supported dax: mark dax_get_by_host static dm: use fs_dax_get_by_bdev instead of dax_get_by_host dax: stop using bdevname fsdax: improve the FS_DAX Kconfig description and help text libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
2021-09-01Merge tag 'driver-core-5.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core patches for 5.15-rc1. These do change a number of different things across different subsystems, and because of that, there were 2 stable tags created that might have already come into your tree from different pulls that did the following - changed the bus remove callback to return void - sysfs iomem_get_mapping rework Other than those two things, there's only a few small things in here: - kernfs performance improvements for huge numbers of sysfs users at once - tiny api cleanups - other minor changes All of these have been in linux-next for a while with no reported problems, other than the before-mentioned merge issue" * tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits) MAINTAINERS: Add dri-devel for component.[hc] driver core: platform: Remove platform_device_add_properties() ARM: tegra: paz00: Handle device properties with software node API bitmap: extend comment to bitmap_print_bitmask/list_to_buf drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI topology: use bin_attribute to break the size limitation of cpumap ABI lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list sysfs: Rename struct bin_attribute member to f_mapping sysfs: Invoke iomem_get_mapping() from the sysfs open callback debugfs: Return error during {full/open}_proxy_open() on rmmod zorro: Drop useless (and hardly used) .driver member in struct zorro_dev zorro: Simplify remove callback sh: superhyway: Simplify check in remove callback nubus: Simplify check in remove callback nubus: Make struct nubus_driver::remove return void kernfs: dont call d_splice_alias() under kernfs node lock kernfs: use i_lock to protect concurrent inode updates kernfs: switch kernfs to use an rwsem kernfs: use VFS negative dentry caching ...
2021-08-24libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbindsumiyawang
There is a use after free crash when the pmem driver tears down its mapping while I/O is still inbound. This is triggered by driver unbind, "ndctl destroy-namespace", while I/O is in flight. Fix the sequence of blk_cleanup_queue() vs memunmap(). The crash signature is of the form: BUG: unable to handle page fault for address: ffffc90080200000 CPU: 36 PID: 9606 Comm: systemd-udevd Call Trace: ? pmem_do_bvec+0xf9/0x3a0 ? xas_alloc+0x55/0xd0 pmem_rw_page+0x4b/0x80 bdev_read_page+0x86/0xb0 do_mpage_readpage+0x5d4/0x7a0 ? lru_cache_add+0xe/0x10 mpage_readpages+0xf9/0x1c0 ? bd_link_disk_holder+0x1a0/0x1a0 blkdev_readpages+0x1d/0x20 read_pages+0x67/0x1a0 ndctl Call Trace in vmcore: PID: 23473 TASK: ffff88c4fbbe8000 CPU: 1 COMMAND: "ndctl" __schedule schedule blk_mq_freeze_queue_wait blk_freeze_queue blk_cleanup_queue pmem_release_queue devm_action_release release_nodes devres_release_all device_release_driver_internal device_driver_detach unbind_store Cc: <stable@vger.kernel.org> Signed-off-by: sumiyawang <sumiyawang@tencent.com> Reviewed-by: yongduan <yongduan@tencent.com> Link: https://lore.kernel.org/r/1629632949-14749-1-git-send-email-sumiyawang@tencent.com Fixes: 50f44ee7248a ("mm/devm_memremap_pages: fix final page put race") Signed-off-by: Dan Williams <dan.j.williams@intel.com>