linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-05-06	mtd: maps: Allow MTD_PHYSMAP with MTD_RAM	Chris Packham
	When the physmap_of_core.c code was merged into physmap-core.c the ability to use MTD_PHYSMAP_OF with only MTD_RAM selected was lost. Restore this by adding MTD_RAM to the dependencies of MTD_PHYSMAP. Fixes: commit 642b1e8dbed7 ("mtd: maps: Merge physmap_of.c into physmap-core.c") Cc: <stable@vger.kernel.org> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-06	mtd: maps: physmap: Store gpio_values correctly	Chris Packham
	When the gpio-addr-flash.c driver was merged with physmap-core.c the code to store the current gpio_values was lost. This meant that once a gpio was asserted it was never de-asserted. Fix this by storing the current offset in gpio_values like the old driver used to. Fixes: commit ba32ce95cbd9 ("mtd: maps: Merge gpio-addr-flash.c into physmap-core.c") Cc: <stable@vger.kernel.org> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-06	mtd: parser_imagetag: add of_match_table support	Jonas Gorski
	Allow matching the imagetag parser for fixed partitions defined in the device tree. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-06	mtd: bcm63xxpart: move imagetag parsing to its own parser	Jonas Gorski
	Move the bcm963xx Image Tag parsing into its own partition parser. This Allows reusing the parser with different full flash parsers. While moving it, rename it to bcm963* to better reflect it isn't chip, but reference implementation specific. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-06	mtd: bcm63xxpart: add of_match_table support	Jonas Gorski
	Add of_match_table support to allow using bcm63xxpart as a full flash layout parser from device tree. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-06	mtd: physmap_of_gemini: remove extranous __xipram annotation	Arnd Bergmann
	Marking a local variable as __xipram causes a warning because of the noinline attribute: drivers/mtd/maps/physmap-gemini.c:89:11: error: '__noinline__' attribute only applies to functions [-Werror,-Wignored-attributes] map_word __xipram ret; ^ include/linux/mtd/xip.h:34:18: note: expanded from macro '__xipram' #define __xipram noinline __attribute__ ((__section__ (".xiptext"))) I can't see any reason for the anotation anyway, so just remove it here. Fixes: 9d3b5086f6d4 ("mtd: physmap_of_gemini: Handle pin control") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-06	Merge tag 'spi-nor/for-5.2' of ↵	Richard Weinberger
	git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux into mtd/next SPI NOR core changes: - Print all JEDEC ID bytes on error - Fix comment of spi_nor_find_best_erase_type() - Add region locking flags for s25fl512s SPI NOR controller drivers changes: - intel-spi: * Avoid crossing 4K address boundary on read/write * Add support for Intel Comet Lake SPI serial flash
2019-05-06	Merge branch 'core-rcu-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "This cycles's RCU changes include: - a couple of straggling RCU flavor consolidation updates - SRCU updates - RCU CPU stall-warning updates - torture-test updates - an LKMM commit adding support for synchronize_srcu_expedited() - documentation updates - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) net/ipv4/netfilter: Update comment from call_rcu_bh() to call_rcu() tools/memory-model: Add support for synchronize_srcu_expedited() doc/kprobes: Update obsolete RCU update functions torture: Suppress false-positive CONFIG_INITRAMFS_SOURCE complaint locktorture: NULL cxt.lwsa and cxt.lrsa to allow bad-arg detection rcuperf: Fix cleanup path for invalid perf_type strings rcutorture: Fix cleanup path for invalid torture_type strings rcutorture: Fix expected forward progress duration in OOM notifier rcutorture: Remove ->ext_irq_conflict field rcutorture: Make rcutorture_extend_mask() comment match the code tools/.../rcutorture: Convert to SPDX license identifier torture: Don't try to offline the last CPU rcu: Fix nohz status in stall warning rcu: Move forward-progress checkers into tree_stall.h rcu: Move irq-disabled stall-warning checking to tree_stall.h rcu: Organize functions in tree_stall.h rcu: Move FAST_NO_HZ stall-warning code to tree_stall.h rcu: Inline RCU stall-warning info helper functions rcu: Move rcu_print_task_exp_stall() to tree_exp.h rcu: Inline RCU task stall-warning helper functions ...
2019-05-06	Merge branch 'core-objtool-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "This is a series from Peter Zijlstra that adds x86 build-time uaccess validation of SMAP to objtool, which will detect and warn about the following uaccess API usage bugs and weirdnesses: - call to %s() with UACCESS enabled - return with UACCESS enabled - return with UACCESS disabled from a UACCESS-safe function - recursive UACCESS enable - redundant UACCESS disable - UACCESS-safe disables UACCESS As it turns out not leaking uaccess permissions outside the intended uaccess functionality is hard when the interfaces are complex and when such bugs are mostly dormant. As a bonus we now also check the DF flag. We had at least one high-profile bug in that area in the early days of Linux, and the checking is fairly simple. The checks performed and warnings emitted are: - call to %s() with DF set - return with DF set - return with modified stack frame - recursive STD - redundant CLD It's all x86-only for now, but later on this can also be used for PAN on ARM and objtool is fairly cross-platform in principle. While all warnings emitted by this new checking facility that got reported to us were fixed, there might be GCC version dependent warnings that were not reported yet - which we'll address, should they trigger. The warnings are non-fatal build warnings" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation sched/x86_64: Don't save flags on context switch objtool: Add Direction Flag validation objtool: Add UACCESS validation objtool: Fix sibling call detection objtool: Rewrite alt->skip_orig objtool: Add --backtrace support objtool: Rewrite add_ignores() objtool: Handle function aliases objtool: Set insn->func for alternatives x86/uaccess, kcov: Disable stack protector x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP x86/uaccess, ubsan: Fix UBSAN vs. SMAP x86/uaccess, kasan: Fix KASAN vs SMAP x86/smap: Ditch __stringify() x86/uaccess: Introduce user_access_{save,restore}() x86/uaccess, signal: Fix AC=1 bloat x86/uaccess: Always inline user_access_begin() x86/uaccess, xen: Suppress SMAP warnings ...
2019-05-06	tty: rocket: fix incorrect forward declaration of 'rp_init()'	Linus Torvalds
	Make the forward declaration actually match the real function definition, something that previous versions of gcc had just ignored. This is another patch to fix new warnings from gcc-9 before I start the merge window pulls. I don't want to miss legitimate new warnings just because my system update brought a new compiler with new warnings. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-06	RDMA/efa: Add common command handlers	Gal Pressman
	Add the EFA common commands implementation. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/efa: Implement functions that submit and complete admin commands	Gal Pressman
	Add admin commands submissions/completions implementation. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/efa: Add the com service API definitions	Gal Pressman
	Header file for the various commands that can be sent through admin queue. This includes queue create/modify/destroy, setting up and remove protection domains, address handlers, and memory registration, etc. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/efa: Add the efa_com.h file	Gal Pressman
	A helper header file for EFA admin queue, admin queue completion, asynchronous notification queue, and various hardware configuration data structures and functions. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/efa: Add the efa.h header file	Gal Pressman
	Add EFA driver generic header file defining driver's device independent internal data structures and definitions. Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/efa: Add EFA device definitions	Gal Pressman
	EFA PCIe device implements a single Admin Queue (AQ) and Admin Completion Queue (ACQ) pair to initialize and communicate configuration with the device. Through this pair, we run set/get commands for querying and configuring the device, create/modify/destroy queues, and IB specific commands like Address Handler (AH), Memory Registration (MR) and Protection Domains (PD). In addition to admin (AQ/ACQ), we have data path queues that get classified as Queue Pairs (QP) and Completion Queues (CQ). Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA: Add EFA related definitions	Gal Pressman
	Add EFA driver ID to the IOCTL interface uapi. This patch also adds unspecified node/transport type that will be used by EFA (usnic is left unchanged as it's already part of our ABI). Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	lightnvm: pblk: use nvm_rq_to_ppa_list()	Igor Konopko
	This patch replaces few remaining usages of rqd->ppa_list[] with existing nvm_rq_to_ppa_list() helpers. This is needed for theoretical devices with ws_min/ws_opt equal to 1. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: simplify partial read path	Igor Konopko
	This patch changes the approach to handling partial read path. In old approach merging of data from round buffer and drive was fully made by drive. This had some disadvantages - code was complex and relies on bio internals, so it was hard to maintain and was strongly dependent on bio changes. In new approach most of the handling is done mostly by block layer functions such as bio_split(), bio_chain() and generic_make request() and generally is less complex and easier to maintain. Below some more details of the new approach. When read bio arrives, it is cloned for pblk internal purposes. All the L2P mapping, which includes copying data from round buffer to bio and thus bio_advance() calls is done on the cloned bio, so the original bio is untouched. If we found that we have partial read case, we still have original bio untouched, so we can split it and continue to process only first part of it in current context, when the rest will be called as separate bio request which is passed to generic_make_request() for further processing. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: do not remove instance under global lock	Igor Konopko
	Currently all the target instances are removed under global nvm_lock. This was needed to ensure that nvm_dev struct will not be freed by hot unplug event during target removal. However, current implementation has some drawbacks, since the same lock is used when new nvme subsystem is registered, so we can have a situation, that due to long process of target removal on drive A, registration (and listing in OS) of the drive B will take a lot of time, since it will wait for that lock. Now when we have kref which ensures that nvm_dev will not be freed in the meantime, we can easily get rid of this lock for a time when we are removing nvm targets. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: track inflight target creations	Igor Konopko
	When creation process is still in progress, target is not yet on targets list. This causes a chance for removing whole lightnvm subsystem by calling nvm_unregister() in the meantime and finally by causing kernel panic inside target init function. This patch changes the behaviour by adding kref variable which tracks all the users of nvm_dev structure. When nvm_dev is allocated, kref value is set to 1. Then before every target creation the value is increased and decreased after target removal. The extra reference is decreased when nvm subsystem is unregistered. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: recover only written metadata	Igor Konopko
	This patch ensures that smeta was fully written before even trying to read it based on chunk table state and write pointer. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: IO path reorganization	Igor Konopko
	This patch is made in order to prepare read path for new approach to partial read handling, which is simpler in compare with previous one. The most important change is to move the handling of completed and failed bio from the pblk_make_rq() to particular read and write functions. This is needed, since after partial read path changes, sometimes completed/failed bio will be different from original one, so we cannot do this any longer in pblk_make_rq(). Other changes are small read path refactor in order to reduce the size of the following patch with partial read changes. Generally the goal of this patch is not to change the functionality, but just to prepare the code for the following changes. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: GC error handling	Igor Konopko
	Currently when there is an IO error (or similar) on GC read path, pblk still move the line, which was currently under GC process to free state. Such a behaviour can lead to silent data mismatch issue. With this patch, the line which was under GC process on which some IO errors occurred, will be putted back to closed state (instead of free state as it was without this patch) and the L2P mapping for such a failed sectors will not be updated. Then in case of any user IOs to such a failed sectors, pblk would be able to return at least real IO error instead of stale data as it is right now. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: remove internal IO timeout	Igor Konopko
	Currently during pblk padding, there is internal IO timeout introduced, which is smaller than default NVMe timeout. This can lead to various use-after-free issues. Since in case of any IO timeouts NVMe and block layer will handle timeout by themselves and report it back to use, there is no need to keep this internal timeout in pblk. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: wait for inflight IOs in recovery	Igor Konopko
	This patch changes the behaviour of recovery padding in order to support a case, when some IOs were already submitted to the drive and some next one are not submitted due to error returned. Currently in case of errors we simply exit the pad function without waiting for inflight IOs, which leads to panic on inflight IOs completion. After the changes we always wait for all the inflight IOs before exiting the function. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: propagate errors when reading meta	Igor Konopko
	Read errors are not correctly propagated. Errors are cleared before returning control to the io submitter. Change the behaviour such that all read errors exept high ecc read warning status is returned appropriately. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: fix update line wp in OOB recovery	Igor Konopko
	In case of OOB recovery, we can hit the scenario when all the data in line were written and some part of emeta was written too. In such a case pblk_update_line_wp() function will call pblk_alloc_page() function which will case to set left_msecs to value below zero (since this field does not track emeta region) and thus will lead to multiple kernel warnings. This patch fixes that issue. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: kick writer on write recovery path	Igor Konopko
	In case of write recovery path, there is a chance that writer thread is not active, kick immediately instead of waiting for timer. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: fix lock order in pblk_rb_tear_down_check	Igor Konopko
	In pblk_rb_tear_down_check() the spinlock functions are not called in proper order. Fixes: a4bd217 ("lightnvm: physical block device (pblk) target") Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: prevent race condition on pblk remove	Marcin Dziegielewski
	When we trigger nvm target remove during device hot unplug, there is a probability to hit a general protection fault. This is caused by use of nvm_dev thay may be freed from another (hot unplug) thread (in the nvm_unregister function). Introduce lock in nvme_ioctl_dev_remove function to prevent this situation. Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: set propper line as data_line after gc	Marcin Dziegielewski
	In current implementation of l2p recovery, when we are after gc and we have open line, we are not setting current data line properly (we set last line from the device instead of last line ordered by seq_nr) and in consequence, kernel panic and data corruption. Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: fix bio leak when bio is split	Chansol Kim
	For large size io where blk_queue_split needs to be called inside pblk_rw_io, results in bio leak as bio_endio is not called on the newly allocated. One way to observe this is to mounting ext4 filesystem on the target and issuing 1MB io with dd, e.g., dd bs=1MB if=/dev/null of=/mount/myvolume. kmemleak reports: unreferenced object 0xffff88803d7d0100 (size 256): comm "kworker/u16:1", pid 68, jiffies 4294899333 (age 284.120s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 60 e8 31 81 88 ff ff .........`.1.... 01 40 00 00 06 06 00 00 00 00 00 00 05 00 00 00 .@.............. backtrace: [<000000001f5aa04f>] kmem_cache_alloc+0x204/0x3c0 [<0000000040945aab>] mempool_alloc_slab+0x1d/0x30 [<00000000b4959ab4>] mempool_alloc+0x83/0x220 [<00000000646bad9b>] bio_alloc_bioset+0x229/0x320 [<000000009264b251>] bio_clone_fast+0x26/0xc0 [<0000000008250252>] bio_split+0x41/0x110 [<00000000e365cad0>] blk_queue_split+0x349/0x930 [<00000000eb5426bc>] pblk_make_rq+0x1b5/0x1f0 [<00000000eea09cec>] generic_make_request+0x2f9/0x690 [<00000000ae6acede>] submit_bio+0x12e/0x1f0 [<00000000f9b8b82a>] ext4_io_submit+0x64/0x80 [<000000009e4f817d>] ext4_bio_write_page+0x32e/0x890 [<00000000cbd0d106>] mpage_submit_page+0x65/0xc0 [<000000000eec7359>] mpage_map_and_submit_buffers+0x171/0x330 [<000000009a7afcb6>] ext4_writepages+0xd5e/0x1650 [<000000004476b096>] do_writepages+0x39/0xc0 In case there is a need for a split, blk_queue_split returns the newly allocated bio to the caller by changing the value of pointer passed as a reference, while the original is passed to generic_make_requests. Although pblk_rw_io's local variable bio* has changed and passed to pblk_submit_read and pblk_write_to_cache, work is done on this new bio, and pblk_rw_io returns NVM_IO_DONE, pblk_make_rq calls bio_endio on the old bio because it passed bio pointer by value to pblk_rw_io. pblk_rw_io is unfolded into pblk_make_rq so that there is no copying of bio* and bio_endio is called on the correct bio*. Signed-off-by: Chansol Kim <chansol.kim@samsung.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: Inherit mdts from the parent nvme device	Igor Konopko
	Current lightnvm and pblk implementation does not care about NVMe max data transfer size, which can be smaller than 64*K=256K. There are existing NVMe controllers which NVMe max data transfer size is lower that 256K (for example 128K, which happens for existing NVMe controllers which are NVMe spec compliant). Such a controllers are not able to handle command which contains 64 PPAs, since the the size of DMAed buffer will be above the capabilities of such a controller. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: set proper read status in bio	Igor Konopko
	Currently in case of read errors, bi_status is not set properly which leads to returning inproper data to layers above. This patch fix that by setting proper status in case of read errors. Also remove unnecessary warn_once(), which does not make sense in that place, since user bio is not used for interation with drive and thus bi_status will not be set here. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: cleanly fail when there is not enough memory	Igor Konopko
	L2P table can be huge in many cases, since it typically requires 1GB of DRAM for 1TB of drive. When there is not enough memory available, OOM killer turns on and kills random processes, which can be very annoying for users. This patch changes the flag for L2P table allocation on order to handle this situation in more user friendly way. GFP_KERNEL and __GPF_HIGHMEM are default flags used in parameterless vmalloc() calls, so they are also keeped in that patch. Additionally __GFP_NOWARN flag is added in order to hide very long dmesg warn in case of the allocation failures. The most important flag introduced in that patch is __GFP_RETRY_MAYFAIL, which would cause allocator to try use free memory and if not available to drop caches, but not to run OOM killer. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: ensure that erase is chunk aligned	Igor Konopko
	The sector bits in the erase command may be uninitialized are uninitialized, causing the erase LBA to be unaligned to the chunk size. This is unexpected situation, since erase shall always be chunk aligned based on OCSSD the 2.0 specification. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: fix race during put line	Igor Konopko
	In the pblk_put_line_back function, a race condition with __pblk_map_invalidate can make a line not part of any lists. Fix gc_list by resetting it to null fixes the above issue. Fixes: a4bd217 ("lightnvm: physical block device (pblk) target") Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: gracefully handle GC vmalloc fail	Igor Konopko
	Currently when we fail on rq data allocation in gc, it skips moving active data and moves line straigt to its free state. Losing user data in the process. Move the data allocation to an earlier phase of GC, where we can still fail gracefully by moving line back to the closed state. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: remove unused smeta_ssec field	Igor Konopko
	smeta_ssec field in pblk_line is never used after it was replaced by the function pblk_line_smeta_start(). Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: reduce L2P memory footprint	Igor Konopko
	Currently L2P map size is calculated based on the total number of available sectors, which is redundant, since it contains mapping for overprovisioning as well (11% by default). Change this size to the real capacity and thus reduce the memory footprint significantly - with default op value it is approx. 110MB of DRAM less for every 1TB of media. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: rollback on error during gc read	Igor Konopko
	A line is left unsigned to the blocks lists in case pblk_gc_line returns an error. This moves the line back to be appropriate list, which can then be picked up by the garbage collector. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: line reference fix in GC	Igor Konopko
	Fixes the GC error case when moving a line back to closed state while releasing additional references. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	RDMA/umem: Remove hugetlb flag	Shiraz Saleem
	The drivers i40iw and bnxt_re no longer dependent on the hugetlb flag. So remove this flag from ib_umem structure. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/bnxt_re: Use core helpers to get aligned DMA address	Shiraz Saleem
	Call the core helpers to retrieve the HW aligned address to use for the MR, within a supported bnxt_re page size. Remove checking the umem->hugtetlb flag as it is no longer required. The new DMA block iterator will return the 2M aligned address if the MR is backed by 2M huge pages. Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/i40iw: Use core helpers to get aligned DMA address within a supported ↵	Shiraz Saleem
	page size Call the core helpers to retrieve the HW aligned address to use for the MR, within a supported i40iw page size. Remove code in i40iw to determine when MR is backed by 2M huge pages which involves checking the umem->hugetlb flag and VMA inspection. The new DMA iterator will return the 2M aligned address if the MR is backed by 2M pages. Fixes: f26c7c83395b ("i40iw: Add 2MB page support") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks	Shiraz Saleem
	This helper iterates over a DMA-mapped SGL and returns contiguous memory blocks aligned to a HW supported page size. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/umem: Add API to find best driver supported page size in an MR	Shiraz Saleem
	This helper iterates through the SG list to find the best page size to use from a bitmap of HW supported page sizes. Drivers that support multiple page sizes, but not mixed sizes in an MR can use this API. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	IB/hfi1: Fix WQ_MEM_RECLAIM warning	Mike Marciniszyn
	The work_item cancels that occur when a QP is destroyed can elicit the following trace: workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1] WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100 Call Trace: __flush_work.isra.29+0x8c/0x1a0 ? __switch_to_asm+0x40/0x70 __cancel_work_timer+0x103/0x190 ? schedule+0x32/0x80 iowait_cancel_work+0x15/0x30 [hfi1] rvt_reset_qp+0x1f8/0x3e0 [rdmavt] rvt_destroy_qp+0x65/0x1f0 [rdmavt] ? _cond_resched+0x15/0x30 ib_destroy_qp+0xe9/0x230 [ib_core] ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib] process_one_work+0x171/0x370 worker_thread+0x49/0x3f0 kthread+0xf8/0x130 ? max_active_store+0x80/0x80 ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM. The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become entangled with memory reclaim, so this flag is appropriate. Fixes: 0a226edd203f ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/mlx5: Remove MAYEXEC flag	Leon Romanovsky
	MAYEXEC flag was mistakenly added in the commit cited in the fixes line. Fixes: 4eb6ab13b991 ("RDMA: Remove rdma_user_mmap_page") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>