git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-10-09	x86/mm: Break out user address space handling	Dave Hansen
	The last patch broke out kernel address space handing into its own helper. Now, do the same for user address space handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com
2018-10-09	x86/mm: Break out kernel address space handling	Dave Hansen
	The page fault handler (__do_page_fault()) basically has two sections: one for handling faults in the kernel portion of the address space and another for faults in the user portion of the address space. But, these two parts don't stick out that well. Let's make that more clear from code separation and naming. Pull kernel fault handling into its own helper, and reflect that naming by renaming spurious_fault() -> spurious_kernel_fault(). Also, rewrite the vmalloc() handling comment a bit. It was a bit stale and also glossed over the reserved bit handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160222.401F4E10@viggo.jf.intel.com
2018-10-09	x86/mm: Clarify hardware vs. software "error_code"	Dave Hansen
	We pass around a variable called "error_code" all around the page fault code. Sounds simple enough, especially since "error_code" looks like it exactly matches the values that the hardware gives us on the stack to report the page fault error code (PFEC in SDM parlance). But, that's not how it works. For part of the page fault handler, "error_code" does exactly match PFEC. But, during later parts, it diverges and starts to mean something a bit different. Give it two names for its two jobs. The place it diverges is also really screwy. It's only in a spot where the hardware tells us we have kernel-mode access that occurred while we were in usermode accessing user-controlled address space. Add a warning in there. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160220.4A2272C9@viggo.jf.intel.com
2018-10-09	x86/mm/tlb: Make lazy TLB mode lazier	Rik van Riel
	Lazy TLB mode can result in an idle CPU being woken up by a TLB flush, when all it really needs to do is reload %CR3 at the next context switch, assuming no page table pages got freed. Memory ordering is used to prevent race conditions between switch_mm_irqs_off, which checks whether .tlb_gen changed, and the TLB invalidation code, which increments .tlb_gen whenever page table entries get invalidated. The atomic increment in inc_mm_tlb_gen is its own barrier; the context switch code adds an explicit barrier between reading tlbstate.is_lazy and next->context.tlb_gen. CPUs in lazy TLB mode remain part of the mm_cpumask(mm), both because that allows TLB flush IPIs to be sent at page table freeing time, and because the cache line bouncing on the mm_cpumask(mm) was responsible for about half the CPU use in switch_mm_irqs_off(). We can change native_flush_tlb_others() without touching other (paravirt) implementations of flush_tlb_others() because we'll be flushing less. The existing implementations flush more and are therefore still correct. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-8-riel@surriel.com
2018-10-09	x86/mm/tlb: Add freed_tables element to flush_tlb_info	Rik van Riel
	Pass the information on to native_flush_tlb_others. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-7-riel@surriel.com
2018-10-09	x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range	Rik van Riel
	Add an argument to flush_tlb_mm_range to indicate whether page tables are about to be freed after this TLB flush. This allows for an optimization of flush_tlb_mm_range to skip CPUs in lazy TLB mode. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-6-riel@surriel.com
2018-10-09	smp,cpumask: introduce on_each_cpu_cond_mask	Rik van Riel
	Introduce a variant of on_each_cpu_cond that iterates only over the CPUs in a cpumask, in order to avoid making callbacks for every single CPU in the system when we only need to test a subset. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-5-riel@surriel.com
2018-10-09	smp: use __cpumask_set_cpu in on_each_cpu_cond	Rik van Riel
	The code in on_each_cpu_cond sets CPUs in a locally allocated bitmask, which should never be used by other CPUs simultaneously. There is no need to use locked memory accesses to set the bits in this bitmap. Switch to __cpumask_set_cpu. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-4-riel@surriel.com
2018-10-09	x86/mm/tlb: Restructure switch_mm_irqs_off()	Rik van Riel
	Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. (cherry picked from commit 61d0beb5796ab11f7f3bf38cb2eccc6579aaa70b) Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-4-riel@surriel.com
2018-10-09	x86/mm/tlb: Always use lazy TLB mode	Rik van Riel
	On most workloads, the number of context switches far exceeds the number of TLB flushes sent. Optimizing the context switches, by always using lazy TLB mode, speeds up those workloads. This patch results in about a 1% reduction in CPU use on a two socket Broadwell system running a memcache like workload. Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> (cherry picked from commit 95b0e6357d3e4e05349668940d7ff8f3b7e7e11e) Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-7-riel@surriel.com
2018-10-09	x86/mm: Page size aware flush_tlb_mm_range()	Peter Zijlstra
	Use the new tlb_get_unmap_shift() to determine the stride of the INVLPG loop. Cc: Nick Piggin <npiggin@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-10-09	Merge branch 'tlb/asm-generic' of ↵	Peter Zijlstra
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into x86/mm Pull in the generic mmu_gather changes from the ARM64 tree such that we can put x86 specific things on top as well.
2018-10-09	lightnvm: pblk: guarantee that backpointer is respected on writer stall	Javier González
	pblk's write buffer must guarantee that it respects the device's constrains for reads (i.e., mw_cunits). This is done by maintaining a backpointer that updates the L2P table as entries wrap up, making them point to the media instead of pointing to the write buffer. This mechanism can race in case that the write thread stalls, as the write pointer will protect the last written entry, thus disregarding the read constrains. This patch adds an extra check on wrap up, making sure that the threshold is respected at all times, preventing new entries to overwrite committed data, also in case of write thread stall. Reported-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: consider max hw sectors supported for max_write_pgs	Zhoujie Wu
	When do GC, the number of read/write sectors are determined by max_write_pgs(see gc_rq preparation in pblk_gc_line_prepare_ws). Due to max_write_pgs doesn't consider max hw sectors supported by nvme controller(128K), which leads to GC tries to read 64 * 4K in one command, and see below error caused by pblk_bio_map_addr in function pblk_submit_read_gc. [ 2923.005376] pblk: could not add page to bio [ 2923.005377] pblk: could not allocate GC bio (18446744073709551604) Signed-off-by: Zhoujie Wu <zjwu@marvell.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: fix error handling of pblk_lines_init()	Wei Yongjun
	In the too many bad blocks error handling case, we should release all the allocated resources, otherwise it will cause memory leak. Fixes: 2deeefc02dff ("lightnvm: pblk: fail gracefully on line alloc. failure") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: do no update csecs and sos on 1.2	Javier González
	1.2 devices exposes their data and metadata size through the separate identify command. Make sure that the NVMe LBA format does not override these values. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: guarantee mw_cunits on read buffer	Javier González
	OCSSD 2.0 defines the amount of data that the host must buffer per chunk to guarantee reads through the geometry field mw_cunits. This value is the base that pblk uses to determine the size of its read buffer. Currently, this size is set to be the closes power-of-2 to mw_cunits times the number of parallel units available to the pblk instance for each open line (currently one). When an entry (4KB) is put in the buffer, the L2P table points to it. As the buffer wraps up, the L2P is updated to point to addresses on the device, thus guaranteeing mw_cunits at a chunk level. However, given that pblk cannot write to the device under ws_min (normally ws_opt), there might be a window in which the buffer starts wrapping up and updating L2P entries before the mw_cunits value in a chunk has been surpassed. In order not to violate the mw_cunits constrain in this case, account for ws_opt on the read buffer creation. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: move ring buffer alloc/free rb init	Javier González
	pblk's read/write buffer currently takes a buffer and its size and uses it to create the metadata around it to use it as a ring buffer. This puts the responsibility of allocating/freeing ring buffer memory on the ring buffer user. Instead, move it inside of the ring buffer helpers (pblk-rb.c). This simplifies creation/destruction routines. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: encapsulate rb pointer operations	Javier González
	pblk's read/write buffer is always a power-of-2, thus wrapping up the buffer can be done with a bit mask. Since this is an implementation detail internal to the write buffer, make a helper that hides pointer increment + wrap, and allows to transparently relax this assumption in the future. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: remove unused function	Javier González
	Removed unused function in pblk-rb.c Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: fix race on sysfs line state	Javier González
	pblk exposes a sysfs interface that represents its internal state. Part of this state is the map bitmap for the current open line, which should be protected by the line lock to avoid a race when freeing the line metadata. Currently, it is not. This patch makes sure that the line state is consistent and NULL bitmap pointers are not dereferenced. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: add SPDX license tag	Javier González
	Add GLP-2.0 SPDX license tag to all pblk files Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: recover open lines on 2.0 devices	Javier González
	In the OCSSD 2.0 spec, each chunk reports its write pointer. This means that pblk does not need to scan open lines to find the write pointer, but instead, it can retrieve it directly (and verify it). This patch uses the write pointer on open lines to (i) recover the line up until the last written lba and (ii) reconstruct the map bitmap and rest of line metadata so that the line can be used for new data. Since the 1.2 path in lightnvm core has been re-implemented to populate the chunk structure and thus recover the write pointer on initialization, this patch removes 1.2 specific recovery, as the 2.0 path can be reused. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: take write semaphore on metadata	Javier González
	pblk guarantees write ordering at a chunk level through a per open chunk semaphore. At this point, since we only have an open I/O stream for both user and GC data, the semaphore is per parallel unit. For the metadata I/O that is synchronous, the semaphore is not needed as ordering is guaranteed. However, if the metadata scheme changes or multiple streams are open, this guarantee might not be preserved. This patch makes sure that all writes go through the semaphore, even for synchronous I/O. This is consistent with pblk's write I/O model. It also simplifies maintenance since changes in the metadata scheme could cause ordering issues. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: refactor metadata paths	Javier González
	pblk maintains two different metadata paths for smeta and emeta, which store metadata at the start of the line and at the end of the line, respectively. Until now, these path has been common for writing and retrieving metadata, however, as these paths diverge, the common code becomes less clear and unnecessary complicated. In preparation for further changes to the metadata write path, this patch separates the write and read paths for smeta and emeta and removes the synchronous emeta path as it not used anymore (emeta is scheduled asynchronously to prevent jittering due to internal I/Os). Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: encapsulate rqd dma allocations	Javier González
	dma allocations for ppa_list and meta_list in rqd are replicated in several places across the pblk codebase. Make helpers to encapsulate creation and deletion to simplify the code. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: use internal allocation for chunk log page	Javier González
	The lightnvm subsystem provides helpers to retrieve chunk metadata, where the target needs to provide a buffer to store the metadata. An implicit assumption is that this buffer is contiguous and can be used to retrieve the data from the device. If the device exposes too many chunks, then kmalloc might fail, thus failing instance creation. This patch removes this assumption by implementing an internal buffer in the lightnvm subsystem to retrieve chunk metadata. Targets can then use virtual memory allocations. Since this is a target API change, adapt pblk accordingly. Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: fix two sleep-in-atomic-context bugs	Jia-Ju Bai
	The driver may sleep with holding a spinlock. The function call paths (from bottom to top) in Linux-4.16 are: [FUNC] nvm_dev_dma_alloc(GFP_KERNEL) drivers/lightnvm/pblk-core.c, 754: nvm_dev_dma_alloc in pblk_line_submit_smeta_io drivers/lightnvm/pblk-core.c, 1048: pblk_line_submit_smeta_io in pblk_line_init_bb drivers/lightnvm/pblk-core.c, 1434: pblk_line_init_bb in pblk_line_replace_data drivers/lightnvm/pblk-recovery.c, 980: pblk_line_replace_data in pblk_recov_l2p drivers/lightnvm/pblk-recovery.c, 976: spin_lock in pblk_recov_l2p [FUNC] bio_map_kern(GFP_KERNEL) drivers/lightnvm/pblk-core.c, 762: bio_map_kern in pblk_line_submit_smeta_io drivers/lightnvm/pblk-core.c, 1048: pblk_line_submit_smeta_io in pblk_line_init_bb drivers/lightnvm/pblk-core.c, 1434: pblk_line_init_bb in pblk_line_replace_data drivers/lightnvm/pblk-recovery.c, 980: pblk_line_replace_data in pblk_recov_l2p drivers/lightnvm/pblk-recovery.c, 976: spin_lock in pblk_recov_l2p To fix these bugs, the call to pblk_line_replace_data() is moved out of the spinlock protection. These bugs are found by my static analysis tool DSAC. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: fix mapping issue on failed writes	Hans Holmberg
	On 1.2-devices, the mapping-out of remaning sectors in the failed-write's block can result in an infinite loop, stalling the write pipeline, fix this. Fixes: 6a3abf5beef6 ("lightnvm: pblk: rework write error recovery path") Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: stop recreating global caches	Hans Holmberg
	Pblk should not create a set of global caches every time a pblk instance is created. The global caches should be made available only when there is one or more pblk instances. This patch bundles the global caches together with a kref keeping track of whether the caches should be available or not. Also, turn the global pblk lock into a mutex that explicitly protects the caches (as this was the only purpose of the lock). Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: calculate line pad distance in helper	Javier González
	If a line is padded, calculate the pad distance directly on the helper being used for this purpose. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: move ppa transformations to core	Javier González
	Continuing the effort of moving 1.2 and 2.0 specific code to core, move 64_to_32 and 32_to_64 ppa helpers from pblk to core. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: add tracing for chunk resets	Hans Holmberg
	Trace state of chunk resets. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: add trace events for pblk state changes	Hans Holmberg
	Add trace events for tracking pblk state changes. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: add trace events for line state changes	Hans Holmberg
	Add trace events for logging for line state changes. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: add trace events for chunk states	Hans Holmberg
	Introduce trace points for tracking chunk states in pblk - this is useful for inspection of the entire state of the drive, and real handy for both fw and pblk debugging. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: remove debug from pblk_[down/up]_page	Matias Bjørling
	Remove the debug only iteration within __pblk_down_page, which then allows us to reduce the number of arguments down to pblk and the parallel unit from the functions that calls it. Simplifying the callers logic considerably. Also, rename the functions pblk_[down/up]_page to pblk_[down/up]_chunk, to communicate that it manages the write pointer of the chunk. Note that it also protects the parallel unit such that at most one chunk is active per parallel unit. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: fix write amplificiation calculation	Hans Holmberg
	When the user data counter exceeds 32 bits, the write amplification calculation does not provide the right value. Fix this by using div64_u64 in stead of div64. Fixes: 76758390f83e ("lightnvm: pblk: export write amplification counters to sysfs") Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: fix up prints in pblk_read_check_rand	Hans Holmberg
	The prefix when printing ppas in pblk_read_check_rand should be "rnd" not "seq", so fix this so we can differentiate between lba missmatches in random and sequential reads. Also change the print order so we align with pblk_read_check_seq, printing read lba first. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: remove unused parameters in pblk_up_rq	Hans Holmberg
	The parameters nr_ppas and ppa_list are not used, so remove them. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: allocate line map bitmaps using a mempool	Hans Holmberg
	Line map bitmap allocations are fairly large and can fail. Allocation failures are fatal to pblk, stopping the write pipeline. To avoid this, allocate the bitmaps using a mempool instead. Mempool allocations never fail if called from a process context, and pblk should only allocate map bitmaps in process context, but keep the failure handling for robustness sake. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: introduce nvm_rq_to_ppa_list	Hans Holmberg
	There is a number of places in the lightnvm subsystem where the user iterates over the ppa list. Before iterating, the user must know if it is a single or multiple LBAs due to vector commands using either the nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which leads to open-coding the if/else statement. Instead of having multiple if/else's, move it into a function that can be called by its users. A nice side effect of this cleanup is that this patch fixes up a bunch of cases where we don't consider the single-ppa case in pblk. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: guarantee emeta on line close	Javier González
	If a line is recovered from open chunks, the memory structures for emeta have not necessarily been properly set on line initialization. When closing a line, make sure that emeta is consistent so that the line can be recovered on the fast path on next reboot. Also, remove a couple of empty lines at the end of the function. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: remove unused variable.	Javier González
	Removed unused struct ppa_addr variable. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: fix comment typo	Javier González
	Fix comment typo Decrese -> Decrease Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: improve line helpers	Javier González
	The current helper to obtain a line from a ppa returns the line id, which requires its users to explicitly retrieve the pointer to the line with the id. Make 2 different helpers: one returning the line id and one returning the line directly. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: add helpers for chunk addresses	Javier González
	Implement helpers to go from ppas to a chunk within a line and an address within a chunk. These helpers will be used on the patches adding trace support in pblk, which will be sent in this window. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: refactor put line fn on read completion	Matias Bjørling
	The read completion path uses the put_line variable to decide whether the reference on a line should be released. The function name used for that is pblk_read_put_rqd_kref, which could lead one to believe that it is the rqd that is releasing the reference, while it is the line reference that is put. Rename and also split the function in two to account for either rqd or single ppa callers and move it to core, such that it later can be used in the write path as well. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: remove size and out of bounds read check	Matias Bjørling
	The I/O size and capacity checks are already done by the block layer. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09	lightnvm: pblk: fix incorrect min_write_pgs	Matias Bjørling
	The calculation of pblk->min_write_pgs should only use the optimal write size attribute provided by the drive, it does not correlate to the memory page size of the system, which can be smaller or larger than the LBA size reported. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>