summaryrefslogtreecommitdiff
path: root/drivers/nvdimm/pfn_devs.c
AgeCommit message (Collapse)Author
2021-09-21libnvdimm/labels: Add uuid helpersDan Williams
In preparation for CXL labels that move the uuid to a different offset in the label, add nsl_{ref,get,validate}_uuid(). These helpers use the proper uuid_t type. That type definition predated the libnvdimm subsystem, so now is as a good a time as any to convert all the uuid handling in the subsystem to uuid_t to match the helpers. Note that the uuid fields in the label data and superblocks is not replaced per Andy's expectation that uuid_t is a kernel internal type not to appear in external ABI interfaces. So, in those case {import,export}_uuid() is used to go between the 2 types. Also note that this rework uncovered some unnecessary copies for label comparisons, those are cleaned up with nsl_uuid_equal(). As for the whitespace changes, all new code is clang-format compliant. Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116429748.2460985.15659993454313919977.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-10-13mm/memremap_pages: support multiple ranges per invocationDan Williams
In support of device-dax growing the ability to front physically dis-contiguous ranges of memory, update devm_memremap_pages() to track multiple ranges with a single reference counter and devm instance. Convert all [devm_]memremap_pages() users to specify the number of ranges they are mapping in their 'struct dev_pagemap' instance. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.co Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13mm/memremap_pages: convert to 'struct range'Dan Williams
The 'struct resource' in 'struct dev_pagemap' is only used for holding resource span information. The other fields, 'name', 'flags', 'desc', 'parent', 'sibling', and 'child' are all unused wasted space. This is in preparation for introducing a multi-range extension of devm_memremap_pages(). The bulk of this change is unwinding all the places internal to libnvdimm that used 'struct resource' unnecessarily, and replacing instances of 'struct dev_pagemap'.res with 'struct dev_pagemap'.range. P2PDMA had a minor usage of the resource flags field, but only to report failures with "%pR". That is replaced with an open coded print of the range. [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()] Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen] Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-28libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()Dan Williams
Move libnvdimm sysfs attributes that currently use an open coded DEVICE_ATTR() to hide sensitive root-only information (physical memory layout) to the new DEVICE_ATTR_ADMIN_RO() helper. Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2020-03-17libnvdimm/namespace: Enforce memremap_compat_align()Dan Williams
The pmem driver on PowerPC crashes with the following signature when instantiating misaligned namespaces that map their capacity via memremap_pages(). BUG: Unable to handle kernel data access at 0xc001000406000000 Faulting instruction address: 0xc000000000090790 NIP [c000000000090790] arch_add_memory+0xc0/0x130 LR [c000000000090744] arch_add_memory+0x74/0x130 Call Trace: arch_add_memory+0x74/0x130 (unreliable) memremap_pages+0x74c/0xa30 devm_memremap_pages+0x3c/0xa0 pmem_attach_disk+0x188/0x770 nvdimm_bus_probe+0xd8/0x470 With the assumption that only memremap_pages() has alignment constraints, enforce memremap_compat_align() for pmem_should_map_pages(), nd_pfn, and nd_dax cases. This includes preventing the creation of namespaces where the base address is misaligned and cases there infoblock padding parameters are invalid. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Jeff Moyer <jmoyer@redhat.com> Fixes: a3619190d62e ("libnvdimm/pfn: stop padding pmem namespaces to section alignment") Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-03-17libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock validDan Williams
The EOPNOTSUPP return code from the pmem driver indicates that the namespace has a configuration that may be valid, but the current kernel does not support it. Expand this to all of the nd_pfn_validate() error conditions after the infoblock has been verified as self consistent. This prevents exposing the namespace to I/O when the infoblock needs to be corrected, or the system needs to be put into a different configuration (like changing the page size on PowerPC). Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-02-20mm/memremap_pages: Introduce memremap_compat_align()Dan Williams
The "sub-section memory hotplug" facility allows memremap_pages() users like libnvdimm to compensate for hardware platforms like x86 that have a section size larger than their hardware memory mapping granularity. The compensation that sub-section support affords is being tolerant of physical memory resources shifting by units smaller (64MiB on x86) than the memory-hotplug section size (128 MiB). Where the platform physical-memory mapping granularity is limited by the number and capability of address-decode-registers in the memory controller. While the sub-section support allows memremap_pages() to operate on sub-section (2MiB) granularity, the Power architecture may still require 16MiB alignment on "!radix_enabled()" platforms. In order for libnvdimm to be able to detect and manage this per-arch limitation, introduce memremap_compat_align() as a common minimum alignment across all driver-facing memory-mapping interfaces, and let Power override it to 16MiB in the "!radix_enabled()" case. The assumption / requirement for 16MiB to be a viable memremap_compat_align() value is that Power does not have platforms where its equivalent of address-decode-registers never hardware remaps a persistent memory resource on smaller than 16MiB boundaries. Note that I tried my best to not add a new Kconfig symbol, but header include entanglements defeated the #ifndef memremap_compat_align design pattern and the need to export it defeats the __weak design pattern for arch overrides. Based on an initial patch by Aneesh. Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.com Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-11-19libnvdimm: Simplify root read-only definition for the 'resource' attributeDan Williams
Rather than update the permission in ->is_visible() set the permission directly at declaration time. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157309905534.1582359.13927459228885931097.stgit@dwillia2-desk3.amr.corp.intel.com
2019-11-17libnvdimm: Move attribute groups to device typeDan Williams
Statically initialize the attribute groups for each libnvdimm device_type. This is a preparation step for removing unnecessary exports of attributes that can be included in the device_type by default. Also take the opportunity to mark 'struct device_type' instances const. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157309900111.1582359.2445687530383470348.stgit@dwillia2-desk3.amr.corp.intel.com
2019-11-14libnvdimm/namespace: Differentiate between probe mapping and runtime mappingAneesh Kumar K.V
The nvdimm core currently maps the full namespace to an ioremap range while probing the namespace mode. This can result in probe failures on architectures that have limited ioremap space. For example, with a large btt namespace that consumes most of I/O remap range, depending on the sequence of namespace initialization, the user can find a pfn namespace initialization failure due to unavailable I/O remap space which nvdimm core uses for temporary mapping. nvdimm core can avoid this failure by only mapping the reserved info block area to check for pfn superblock type and map the full namespace resource only before using the namespace. Given that personalities like BTT can be layered on top of any namespace type create a generic form of devm_nsio_enable (devm_namespace_enable) and use it inside the per-personality attach routines. Now devm_namespace_enable() is always paired with disable unless the mapping is going to be used for long term runtime access. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20191017073308.32645-1-aneesh.kumar@linux.ibm.com [djbw: reworks to move devm_namespace_{en,dis}able into *attach helpers] Reported-by: kbuild test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20191031105741.102793-2-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-11-14libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probeAneesh Kumar K.V
nvdimm core use nd_pfn_validate when looking for devdax or fsdax namespace. In this case device resources are allocated against nd_namespace_io dev. In-order to allow remap of range in nd_pfn_clear_memmap_error(), move the device memmap area clearing while initializing pfn namespace. With this device resource are allocated against nd_pfn and we can use nd_pfn->dev for remapping. This also avoids calling nd_pfn_clear_mmap_errors twice. Once while probing the namespace and second while initializing a pfn namespace. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20191101032728.113001-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-29Merge tag 'libnvdimm-fixes-5.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm More libnvdimm updates from Dan Williams: - Complete the reworks to interoperate with powerpc dynamic huge page sizes - Fix a crash due to missed accounting for the powerpc 'struct page'-memmap mapping granularity - Fix badblock initialization for volatile (DRAM emulated) pmem ranges - Stop triggering request_key() notifications to userspace when NVDIMM-security is disabled / not present - Miscellaneous small fixups * tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/region: Enable MAP_SYNC for volatile regions libnvdimm: prevent nvdimm from requesting key when security is disabled libnvdimm/region: Initialize bad block for volatile namespaces libnvdimm/nfit_test: Fix acpi_handle redefinition libnvdimm/altmap: Track namespace boundaries in altmap libnvdimm: Fix endian conversion issues  libnvdimm/dax: Pick the right alignment default when creating dax devices powerpc/book3s64: Export has_transparent_hugepage() related functions.
2019-09-24libnvdimm/altmap: Track namespace boundaries in altmapAneesh Kumar K.V
With PFN_MODE_PMEM namespace, the memmap area is allocated from the device area. Some architectures map the memmap area with large page size. On architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. This maps a namespace size of 16G. When populating memmap region with 16MB page from the device area, make sure the allocated space is not used to map resources outside this namespace. Such usage of device area will prevent a namespace destroy. Add resource end pnf in altmap and use that to check if the memmap area allocation can map pfn outside the namespace. On ppc64 in such case we fallback to allocation from memory. This fix kernel crash reported below: [ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0 [ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000 [ 133.464760] Faulting instruction address: 0xc00000000007580c [ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1] [ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ..... [ 133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0 [ 133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0 [ 133.464910] Call Trace: [ 133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable) [ 133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240 [ 133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590 [ 133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160 [ 133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0 [ 133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50 [ 133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400 [ 133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250 [ 133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110 [ 133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70 [ 133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0 [ 133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250 [ 133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70 [ 133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250 djbw: Aneesh notes that this crash can likely be triggered in any kernel that supports 'papr_scm', so flagging that commit for -stable consideration. Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: <stable@vger.kernel.org> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Santosh Sivaraj <santosh@fossix.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-24libnvdimm/dax: Pick the right alignment default when creating dax devicesAneesh Kumar K.V
Allow arch to provide the supported alignments and use hugepage alignment only if we support hugepage. Right now we depend on compile time configs whereas this patch switch this to runtime discovery. Architectures like ppc64 can have THP enabled in code, but then can have hugepage size disabled by the hypervisor. This allows us to create dax devices with PAGE_SIZE alignment in this case. Existing dax namespace with alignment larger than PAGE_SIZE will fail to initialize in this specific case. We still allow fsdax namespace initialization. With respect to identifying whether to enable hugepage fault for a dax device, if THP is enabled during compile, we default to taking hugepage fault and in dax fault handler if we find the fault size > alignment we retry with PAGE_SIZE fault size. This also addresses the below failure scenario on ppc64 ndctl create-namespace --mode=devdax | grep align "align":16777216, "align":16777216 cat /sys/devices/ndbus0/region0/dax0.0/supported_alignments 65536 16777216 daxio.static-debug -z -o /dev/dax0.0 Bus error (core dumped) $ dmesg | tail lpar: Failed hash pte insert with error -4 hash-mmu: mm: Hashing failure ! EA=0x7fff17000000 access=0x8000000000000006 current=daxio hash-mmu: trap=0x300 vsid=0x22cb7a3 ssize=1 base psize=2 psize 10 pte=0xc000000501002b86 daxio[3860]: bus error (7) at 7fff17000000 nip 7fff973c007c lr 7fff973bff34 code 2 in libpmem.so.1.0.0[7fff973b0000+20000] daxio[3860]: code: 792945e4 7d494b78 e95f0098 7d494b78 f93f00a0 4800012c e93f0088 f93f0120 daxio[3860]: code: e93f00a0 f93f0128 e93f0120 e95f0128 <f9490000> e93f0088 39290008 f93f0110 The failure was due to guest kernel using wrong page size. The namespaces created with 16M alignment will appear as below on a config with 16M page size disabled. $ ndctl list -Ni [ { "dev":"namespace0.1", "mode":"fsdax", "map":"dev", "size":5351931904, "uuid":"fc6e9667-461a-4718-82b4-69b24570bddb", "align":16777216, "blockdev":"pmem0.1", "supported_alignments":[ 65536 ] }, { "dev":"namespace0.0", "mode":"fsdax", <==== devdax 16M alignment marked disabled. "map":"mem", "size":5368709120, "uuid":"a4bdf81a-f2ee-4bc6-91db-7b87eddd0484", "state":"disabled" } ] Cc: linux-mm@kvack.org Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-8-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-21Merge tag 'libnvdimm-for-5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Some reworks to better support nvdimms on powerpc and an nvdimm security interface update: - Rework the nvdimm core to accommodate architectures with different page sizes and ones that can change supported huge page sizes at boot time rather than a compile time constant. - Introduce a distinct 'frozen' attribute for the nvdimm security state since it is independent of the locked state. - Miscellaneous fixups" * tag 'libnvdimm-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm: Use PAGE_SIZE instead of SZ_4K for align check libnvdimm/label: Remove the dpa align check libnvdimm/pfn_dev: Add page size and struct page size to pfn superblock libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page size change libnvdimm/pmem: Advance namespace seed for specific probe errors libnvdimm/region: Rewrite _probe_success() to _advance_seeds() libnvdimm/security: Consolidate 'security' operations libnvdimm/security: Tighten scope of nvdimm->busy vs security operations libnvdimm/security: Introduce a 'frozen' attribute libnvdimm, region: Use struct_size() in kzalloc() tools/testing/nvdimm: Fix fallthrough warning libnvdimm/of_pmem: Provide a unique name for bus provider
2019-09-05libnvdimm/pfn_dev: Add page size and struct page size to pfn superblockAneesh Kumar K.V
This is needed so that pmem probe don't wrongly initialize a namespace which doesn't have enough space reserved for holding struct pages with the current kernel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-5-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-09-05libnvdimm/pfn_dev: Add a build check to make sure we notice when struct page ↵Aneesh Kumar K.V
size change Namespaces created with PFN_MODE_PMEM mode stores struct page in the reserve block area. We need to make sure we account for the right struct page size while doing this. Instead of directly depending on sizeof(struct page) which can change based on different kernel config option, use the max struct page size (64) while calculating the reserve block area. This makes sure pmem device can be used across kernels built with different configs. If the above assumption of max struct page size change, we need to update the reserve block allocation space for new namespaces created. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-4-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-08-28libnvdimm/pfn: Fix namespace creation on misaligned addressesJeff Moyer
Yi reported[1] that after commit a3619190d62e ("libnvdimm/pfn: stop padding pmem namespaces to section alignment"), it was no longer possible to create a device dax namespace with a 1G alignment. The reason was that the pmem region was not itself 1G-aligned. The code happily skips past the first 512M, but fails to account for a now misaligned end offset (since space was allocated starting at that misaligned address, and extending for size GBs). Reintroduce end_trunc, so that the code correctly handles the misaligned end address. This results in the same behavior as before the introduction of the offending commit. [1] https://lists.01.org/pipermail/linux-nvdimm/2019-July/022813.html Fixes: a3619190d62e ("libnvdimm/pfn: stop padding pmem namespaces ...") Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/x49ftll8f39.fsf@segfault.boston.devel.redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-07-27Merge tag 'libnvdimm-fixes-5.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A collection of locking and async operations fixes for v5.3-rc2. These had been soaking in a branch targeting the merge window, but missed due to a regression hunt. This fixed up version has otherwise been in -next this past week with no reported issues. In order to gain confidence in the locking changes the pull also includes a debug / instrumentation patch to enable lockdep coverage for libnvdimm subsystem operations that depend on the device_lock for exclusion. As mentioned in the changelog it is a hack, but it works and documents the locking expectations of the sub-system in a way that others can use lockdep to verify. The driver core touches got an ack from Greg. Summary: - Fix duplicate device_unregister() calls (multiple threads competing to do unregister work when scheduling device removal from a sysfs attribute of the self-same device). - Fix badblocks registration order bug. Ensure region badblocks are initialized in advance of namespace registration. - Fix a deadlock between the bus lock and probe operations. - Export device-core infrastructure to coordinate async operations via the device ->dead state. - Add device-core infrastructure to validate device_lock() usage with lockdep" * tag 'libnvdimm-fixes-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: driver-core, libnvdimm: Let device subsystems add local lockdep coverage libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl() libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant libnvdimm/region: Register badblocks before namespaces libnvdimm/bus: Prevent duplicate device_unregister() calls drivers/base: Introduce kill_device()
2019-07-18libnvdimm/pfn: stop padding pmem namespaces to section alignmentDan Williams
Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE memory, we no longer need to add padding at pfn/dax device creation time. The kernel will still honor padding established by older kernels. Link: http://lkml.kernel.org/r/156092356588.979959.6793371748950931916.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fieldsDan Williams
At namespace creation time there is the potential for the "expected to be zero" fields of a 'pfn' info-block to be filled with indeterminate data. While the kernel buffer is zeroed on allocation it is immediately overwritten by nd_pfn_validate() filling it with the current contents of the on-media info-block location. For fields like, 'flags' and the 'padding' it potentially means that future implementations can not rely on those fields being zero. In preparation to stop using the 'start_pad' and 'end_trunc' fields for section alignment, arrange for fields that are not explicitly initialized to be guaranteed zero. Bump the minor version to indicate it is safe to assume the 'padding' and 'flags' are zero. Otherwise, this corruption is expected to benign since all other critical fields are explicitly initialized. Note The cc: stable is about spreading this new policy to as many kernels as possible not fixing an issue in those kernels. It is not until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to section alignment" where this improper initialization becomes a problem. So if someone decides to backport "libnvdimm/pfn: Stop padding pmem namespaces to section alignment" (which is not tagged for stable), make sure this pre-requisite is flagged. Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Cc: <stable@vger.kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18driver-core, libnvdimm: Let device subsystems add local lockdep coverageDan Williams
For good reason, the standard device_lock() is marked lockdep_set_novalidate_class() because there is simply no sane way to describe the myriad ways the device_lock() ordered with other locks. However, that leaves subsystems that know their own local device_lock() ordering rules to find lock ordering mistakes manually. Instead, introduce an optional / additional lockdep-enabled lock that a subsystem can acquire in all the same paths that the device_lock() is acquired. A conversion of the NFIT driver and NVDIMM subsystem to a lockdep-validate device_lock() scheme is included. The debug_nvdimm_lock() implementation implements the correct lock-class and stacking order for the libnvdimm device topology hierarchy. Yes, this is a hack, but hopefully it is a useful hack for other subsystems device_lock() debug sessions. Quoting Greg: "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up using it as much as anything else, so user beware :) I don't object to it if it makes things easier for you to debug." Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
2019-07-02memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flagChristoph Hellwig
Add a flags field to struct dev_pagemap to replace the altmap_valid boolean to be a little more extensible. Also add a pgmap_altmap() helper to find the optional altmap and clean up the code using the altmap using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 64 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-06block: remove CONFIG_LBDAFChristoph Hellwig
Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit architectures. These types are required to support block device and/or file sizes larger than 2 TiB, and have generally defaulted to on for a long time. Enabling the option only increases the i386 tinyconfig size by 145 bytes, and many data structures already always use 64-bit values for their in-core and on-disk data structures anyway, so there should not be a large change in dynamic memory usage either. Dropping this option removes a somewhat weird non-default config that has cause various bugs or compiler warnings when actually used. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-22libnvdimm/pfn: Remove dax_label_reserveDan Williams
The reserve was for an abandoned effort to add label (partitioning support) to device-dax instances. Remove it. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init()Dan Williams
Similar to "libnvdimm: Fix altmap reservation size calculation" provide for a reservation of a full page worth of info block space at info-block establishment time. Typically there is already slack in the padding from honoring the default 2MB alignment, but provide for a reservation for corner case configurations that would otherwise fit. Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm: Fix altmap reservation size calculationOliver O'Halloran
Libnvdimm reserves the first 8K of pfn and devicedax namespaces to store a superblock describing the namespace. This 8K reservation is contained within the altmap area which the kernel uses for the vmemmap backing for the pages within the namespace. The altmap allows for some pages at the start of the altmap area to be reserved and that mechanism is used to protect the superblock from being re-used as vmemmap backing. The number of PFNs to reserve is calculated using: PHYS_PFN(SZ_8K) Which is implemented as: #define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT)) So on systems where PAGE_SIZE is greater than 8K the reservation size is truncated to zero and the superblock area is re-used as vmemmap backing. As a result all the namespace information stored in the superblock (i.e. if it's a PFN or DAX namespace) is lost and the namespace needs to be re-created to get access to the contents. This patch fixes this by using PFN_UP() rather than PHYS_PFN() to ensure that at least one page is reserved. On systems with a 4K pages size this patch should have no effect. Cc: stable@vger.kernel.org Cc: Dan Williams <dan.j.williams@intel.com> Fixes: ac515c084be9 ("libnvdimm, pmem, pfn: move pfn setup to the core") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm, pfn: Fix over-trim in trim_pfn_device()Wei Yang
When trying to see whether current nd_region intersects with others, trim_pfn_device() has already calculated the *size* to be expanded to SECTION size. Do not double append 'adjust' to 'size' when calculating whether the end of a region collides with the next pmem region. Fixes: ae86cbfef381 "libnvdimm, pfn: Pad pfn namespaces relative to other regions" Cc: <stable@vger.kernel.org> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-05libnvdimm, pfn: Pad pfn namespaces relative to other regionsDan Williams
Commit cfe30b872058 "libnvdimm, pmem: adjust for section collisions with 'System RAM'" enabled Linux to workaround occasions where platform firmware arranges for "System RAM" and "Persistent Memory" to collide within a single section boundary. Unfortunately, as reported in this issue [1], platform firmware can inflict the same collision between persistent memory regions. The approach of interrogating iomem_resource does not work in this case because platform firmware may merge multiple regions into a single iomem_resource range. Instead provide a method to interrogate regions that share the same parent bus. This is a stop-gap until the core-MM can grow support for hotplug on sub-section boundaries. [1]: https://github.com/pmem/ndctl/issues/76 Fixes: cfe30b872058 ("libnvdimm, pmem: adjust for section collisions with...") Cc: <stable@vger.kernel.org> Reported-by: Patrick Geary <patrickg@supermicro.com> Tested-by: Patrick Geary <patrickg@supermicro.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-09-28libnvdimm, pfn: during init, clear errors in the metadata areaVishal Verma
If there are badblocks present in the 'struct page' area for pfn namespaces, until now, the only way to clear them has been to force the namespace into raw mode, clear the errors, and re-enable the fsdax mode. This is clunky, given that it should be easy enough for the pfn driver to do the same. Add a new helper that uses the most recently available badblocks list to check whether there are any badblocks that lie in the volatile struct page area. If so, before initializing the struct pages, send down targeted writes via nvdimm_write_bytes to write zeroes to the affected blocks, and thus clear errors. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-05-22mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPSDan Williams
In preparation for fixing dax-dma-vs-unmap issues, filesystems need to be able to rely on the fact that they will get wakeups on dev_pagemap page-idle events. Introduce MEMORY_DEVICE_FS_DAX and generic_dax_page_free() as common indicator / infrastructure for dax filesytems to require. With this change there are no users of the MEMORY_DEVICE_HOST designation, so remove it. The HMM sub-system extended dev_pagemap to arrange a callback when a dev_pagemap managed page is freed. Since a dev_pagemap page is free / idle when its reference count is 1 it requires an additional branch to check the page-type at put_page() time. Given put_page() is a hot-path we do not want to incur that check if HMM is not in use, so a static branch is used to avoid that overhead when not necessary. Now, the FS_DAX implementation wants to reuse this mechanism for receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific static-key into a generic mechanism that either HMM or FS_DAX code paths can enable. For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support, care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure. However, we still need to support FS_DAX in the FS_DAX_LIMITED case implemented by the s390/dcssblk driver. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Thomas Meyer <thomas@m3y3r.de> Reported-by: Dave Jiang <dave.jiang@intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-09Merge branch 'for-4.17/libnvdimm' into libnvdimm-for-nextDan Williams
2018-03-13libnvdimm: remove redundant assignment to pointer 'dev'Colin Ian King
Pointer dev is being assigned a value that is never read, it is being re-assigned the same value later on, hence the initialization is redundant and can be removed. Cleans up clang warning: drivers/nvdimm/pfn_devs.c:307:17: warning: Value stored to 'dev' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-06libnvdimm: remove redundant __func__ in dev_dbgDan Williams
Dynamic debug can be instructed to add the function name to the debug output using the +f switch, so there is no need for the libnvdimm modules to do it again. If a user decides to add the +f switch for libnvdimm's dynamic debug this results in double prints of the function name. Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08memremap: change devm_memremap_pages interface to use struct dev_pagemapChristoph Hellwig
This new interface is similar to how struct device (and many others) work. The caller initializes a 'struct dev_pagemap' as required and calls 'devm_memremap_pages'. This allows the pagemap structure to be embedded in another structure and thus container_of can be used. In this way application specific members can be stored in a containing struct. This will be used by the P2P infrastructure and HMM could probably be cleaned up to use it as well (instead of having it's own, similar 'hmm_devmem_pages_create' function). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-12-19libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignmentDan Williams
The following namespace configuration attempt: # ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f libndctl: ndctl_dax_enable: dax0.1: failed to enable Error: namespace0.0: failed to enable failed to reconfigure namespace: No such device or address ...fails when the backing memory range is not physically aligned to 1G: # cat /proc/iomem | grep Persistent 210000000-30fffffff : Persistent Memory (legacy) In the above example the 4G persistent memory range starts and ends on a 256MB boundary. We handle this case correctly when needing to handle cases that violate section alignment (128MB) collisions against "System RAM", and we simply need to extend that padding/truncation for the 1GB alignment use case. Cc: <stable@vger.kernel.org> Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...") Reported-and-tested-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-12-19libnvdimm, pfn: fix start_pad handling for aligned namespacesDan Williams
The alignment checks at pfn driver startup fail to properly account for the 'start_pad' in the case where the namespace is misaligned relative to its internal alignment. This is typically triggered in 1G aligned namespace, but could theoretically trigger with small namespace alignments. When this triggers the kernel reports messages of the form: dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000 Cc: <stable@vger.kernel.org> Fixes: 1ee6667cd8d1 ("libnvdimm, pfn, dax: fix initialization vs autodetect...") Reported-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-09-28libnvdimm, pfn: make 'resource' attribute only readable by rootDan Williams
For the same reason that /proc/iomem returns 0's for non-root readers and acpi tables are root-only, make the 'resource' attribute for pfn devices only readable by root. Otherwise we disclose physical address information. Fixes: f6ed58c70d14 ("libnvdimm, pfn: 'resource'-address and 'size'...") Cc: <stable@vger.kernel.org> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-08-15libnvdimm, pfn, dax: limit namespace alignments to the supported setDan Williams
Now that we properly advertise the supported pte, pmd, and pud sizes, restrict the supported alignments that can be set on a namespace. This assumes that userspace was not previously relying on the ability to set odd alignments. At least ndctl only ever supported setting the namespace alignment to 4K, 2M, or 1G. Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-08-11libnvdimm, pfn, dax: show supported dax/pfn region alignments in sysfsOliver O'Halloran
The alignment of a DAX and PFN regions dictates the page sizes that can be used to map the region. Even if the hardware page sizes are known the actual range of supported page sizes that can be used with DAX depends on the kernel configuration. As a result it's best that the kernel advertises the alignments that should be used with these region types. This patch adds the 'supported_alignments' region attribute to expose this information to userspace. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [djbw: integrate with nd_size_select_show() rename and other fixups] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-07-25libnvdimm: Stop using HPAGE_SIZEOliver O'Halloran
Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more in common with THP than hugetlbfs we should proably be using HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just give it a new name. The other usage of HPAGE_SIZE in libnvdimm is when determining how large the altmap should be. For the reasons mentioned above it doesn't really make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to use in generic code and it happens to match the vmemmap allocation block on x86 and Power. It's still a hack, but it's a slightly nicer hack. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-07-03Merge branch 'for-4.13/dax' into libnvdimm-for-nextDan Williams
2017-06-27libnvdimm, nfit: enable support for volatile rangesDan Williams
Allow volatile nfit ranges to participate in all the same infrastructure provided for persistent memory regions. A resulting resulting namespace device will still be called "pmem", but the parent region type will be "nd_volatile". This is in preparation for disabling the dax ->flush() operation in the pmem driver when it is hosted on a volatile range. Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15libnvdimm, label: add address abstraction identifiersDan Williams
Starting with v1.2 labels, 'address abstractions' can be hinted via an address abstraction id that implies an info-block format. The standard address abstraction in the specification is the v2 format of the Block-Translation-Table (BTT). Support for that is saved for a later patch, for now we add support for the Linux supported address abstractions BTT (v1), PFN, and DAX. The new 'holder_class' attribute for namespace devices is added for tooling to specify the 'abstraction_guid' to store in the namespace label. For v1.1 labels this field is undefined and any setting of 'holder_class' away from the default 'none' value will only have effect until the driver is unloaded. Setting 'holder_class' requires that whatever device tries to claim the namespace must be of the specified class. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-10libnvdimm: add an atomic vs process context flag to rw_bytesVishal Verma
nsio_rw_bytes can clear media errors, but this cannot be done while we are in an atomic context due to locking within ACPI. From the BTT, ->rw_bytes may be called either from atomic or process context depending on whether the calls happen during initialization or during IO. During init, we want to ensure error clearing happens, and the flag marking process context allows nsio_rw_bytes to do that. When called during IO, we're in atomic context, and error clearing can be skipped. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-04libnvdimm, pfn: fix 'npfns' vs section alignmentDan Williams
Fix failures to create namespaces due to the vmem_altmap not advertising enough free space to store the memmap. WARNING: CPU: 15 PID: 8022 at arch/x86/mm/init_64.c:656 arch_add_memory+0xde/0xf0 [..] Call Trace: dump_stack+0x63/0x83 __warn+0xcb/0xf0 warn_slowpath_null+0x1d/0x20 arch_add_memory+0xde/0xf0 devm_memremap_pages+0x244/0x440 pmem_attach_disk+0x37e/0x490 [nd_pmem] nd_pmem_probe+0x7e/0xa0 [nd_pmem] nvdimm_bus_probe+0x71/0x120 [libnvdimm] driver_probe_device+0x2bb/0x460 bind_store+0x114/0x160 drv_attr_store+0x25/0x30 In commit 658922e57b84 "libnvdimm, pfn: fix memmap reservation sizing" we arranged for the capacity to be allocated, but failed to also update the 'npfns' parameter. This leads to cases where there is enough capacity reserved to hold all the allocated sections, but vmemmap_populate_hugepages() still encounters -ENOMEM from altmap_alloc_block_buf(). This fix is a stop-gap until we can teach the core memory hotplug implementation to permit sub-section hotplug. Cc: <stable@vger.kernel.org> Fixes: 658922e57b84 ("libnvdimm, pfn: fix memmap reservation sizing") Reported-by: Anisha Allada <anisha.allada@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-01libnvdimm: fix nvdimm_bus_lock() vs device_lock() orderingDan Williams
A debug patch to turn the standard device_lock() into something that lockdep can analyze yielded the following: ====================================================== [ INFO: possible circular locking dependency detected ] 4.11.0-rc4+ #106 Tainted: G O ------------------------------------------------------- lt-libndctl/1898 is trying to acquire lock: (&dev->nvdimm_mutex/3){+.+.+.}, at: [<ffffffffc023c948>] nd_attach_ndns+0x178/0x1b0 [libnvdimm] but task is already holding lock: (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffc022e0b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}: lock_acquire+0xf6/0x1f0 __mutex_lock+0x88/0x980 mutex_lock_nested+0x1b/0x20 nvdimm_bus_lock+0x21/0x30 [libnvdimm] nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm] nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm] nd_pmem_probe+0x14/0x180 [nd_pmem] nvdimm_bus_probe+0xa9/0x260 [libnvdimm] -> #0 (&dev->nvdimm_mutex/3){+.+.+.}: __lock_acquire+0x1107/0x1280 lock_acquire+0xf6/0x1f0 __mutex_lock+0x88/0x980 mutex_lock_nested+0x1b/0x20 nd_attach_ndns+0x178/0x1b0 [libnvdimm] nd_namespace_store+0x308/0x3c0 [libnvdimm] namespace_store+0x87/0x220 [libnvdimm] In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'. Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect nd_{attach,detach}_ndns() operations. Cc: <stable@vger.kernel.org> Fixes: 8c2f7e8658df ("libnvdimm: infrastructure for btt devices") Reported-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-02-04libnvdimm, pfn: fix memmap reservation size versus 4K alignmentDan Williams
When vmemmap_populate() allocates space for the memmap it does so in 2MB sized chunks. The libnvdimm-pfn driver incorrectly accounts for this when the alignment of the device is set to 4K. When this happens we trigger memory allocation failures in altmap_alloc_block_buf() and trigger warnings of the form: WARNING: CPU: 0 PID: 3376 at arch/x86/mm/init_64.c:656 arch_add_memory+0xe4/0xf0 [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_null+0x1d/0x20 arch_add_memory+0xe4/0xf0 devm_memremap_pages+0x29b/0x4e0 Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-10libnvdimm, pfn: fix align attributeDan Williams
Fix the format specifier so that the attribute can be parsed correctly. Currently it returns decimal 1000 for a 4096-byte alignment. Cc: <stable@vger.kernel.org> Reported-by: Dave Jiang <dave.jiang@intel.com> Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE") Signed-off-by: Dan Williams <dan.j.williams@intel.com>