linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-05-06	lightnvm: pblk: recover only written metadata	Igor Konopko
	This patch ensures that smeta was fully written before even trying to read it based on chunk table state and write pointer. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: IO path reorganization	Igor Konopko
	This patch is made in order to prepare read path for new approach to partial read handling, which is simpler in compare with previous one. The most important change is to move the handling of completed and failed bio from the pblk_make_rq() to particular read and write functions. This is needed, since after partial read path changes, sometimes completed/failed bio will be different from original one, so we cannot do this any longer in pblk_make_rq(). Other changes are small read path refactor in order to reduce the size of the following patch with partial read changes. Generally the goal of this patch is not to change the functionality, but just to prepare the code for the following changes. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: GC error handling	Igor Konopko
	Currently when there is an IO error (or similar) on GC read path, pblk still move the line, which was currently under GC process to free state. Such a behaviour can lead to silent data mismatch issue. With this patch, the line which was under GC process on which some IO errors occurred, will be putted back to closed state (instead of free state as it was without this patch) and the L2P mapping for such a failed sectors will not be updated. Then in case of any user IOs to such a failed sectors, pblk would be able to return at least real IO error instead of stale data as it is right now. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: remove internal IO timeout	Igor Konopko
	Currently during pblk padding, there is internal IO timeout introduced, which is smaller than default NVMe timeout. This can lead to various use-after-free issues. Since in case of any IO timeouts NVMe and block layer will handle timeout by themselves and report it back to use, there is no need to keep this internal timeout in pblk. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: wait for inflight IOs in recovery	Igor Konopko
	This patch changes the behaviour of recovery padding in order to support a case, when some IOs were already submitted to the drive and some next one are not submitted due to error returned. Currently in case of errors we simply exit the pad function without waiting for inflight IOs, which leads to panic on inflight IOs completion. After the changes we always wait for all the inflight IOs before exiting the function. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: propagate errors when reading meta	Igor Konopko
	Read errors are not correctly propagated. Errors are cleared before returning control to the io submitter. Change the behaviour such that all read errors exept high ecc read warning status is returned appropriately. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: fix update line wp in OOB recovery	Igor Konopko
	In case of OOB recovery, we can hit the scenario when all the data in line were written and some part of emeta was written too. In such a case pblk_update_line_wp() function will call pblk_alloc_page() function which will case to set left_msecs to value below zero (since this field does not track emeta region) and thus will lead to multiple kernel warnings. This patch fixes that issue. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: kick writer on write recovery path	Igor Konopko
	In case of write recovery path, there is a chance that writer thread is not active, kick immediately instead of waiting for timer. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: fix lock order in pblk_rb_tear_down_check	Igor Konopko
	In pblk_rb_tear_down_check() the spinlock functions are not called in proper order. Fixes: a4bd217 ("lightnvm: physical block device (pblk) target") Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: prevent race condition on pblk remove	Marcin Dziegielewski
	When we trigger nvm target remove during device hot unplug, there is a probability to hit a general protection fault. This is caused by use of nvm_dev thay may be freed from another (hot unplug) thread (in the nvm_unregister function). Introduce lock in nvme_ioctl_dev_remove function to prevent this situation. Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: set propper line as data_line after gc	Marcin Dziegielewski
	In current implementation of l2p recovery, when we are after gc and we have open line, we are not setting current data line properly (we set last line from the device instead of last line ordered by seq_nr) and in consequence, kernel panic and data corruption. Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: fix bio leak when bio is split	Chansol Kim
	For large size io where blk_queue_split needs to be called inside pblk_rw_io, results in bio leak as bio_endio is not called on the newly allocated. One way to observe this is to mounting ext4 filesystem on the target and issuing 1MB io with dd, e.g., dd bs=1MB if=/dev/null of=/mount/myvolume. kmemleak reports: unreferenced object 0xffff88803d7d0100 (size 256): comm "kworker/u16:1", pid 68, jiffies 4294899333 (age 284.120s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 60 e8 31 81 88 ff ff .........`.1.... 01 40 00 00 06 06 00 00 00 00 00 00 05 00 00 00 .@.............. backtrace: [<000000001f5aa04f>] kmem_cache_alloc+0x204/0x3c0 [<0000000040945aab>] mempool_alloc_slab+0x1d/0x30 [<00000000b4959ab4>] mempool_alloc+0x83/0x220 [<00000000646bad9b>] bio_alloc_bioset+0x229/0x320 [<000000009264b251>] bio_clone_fast+0x26/0xc0 [<0000000008250252>] bio_split+0x41/0x110 [<00000000e365cad0>] blk_queue_split+0x349/0x930 [<00000000eb5426bc>] pblk_make_rq+0x1b5/0x1f0 [<00000000eea09cec>] generic_make_request+0x2f9/0x690 [<00000000ae6acede>] submit_bio+0x12e/0x1f0 [<00000000f9b8b82a>] ext4_io_submit+0x64/0x80 [<000000009e4f817d>] ext4_bio_write_page+0x32e/0x890 [<00000000cbd0d106>] mpage_submit_page+0x65/0xc0 [<000000000eec7359>] mpage_map_and_submit_buffers+0x171/0x330 [<000000009a7afcb6>] ext4_writepages+0xd5e/0x1650 [<000000004476b096>] do_writepages+0x39/0xc0 In case there is a need for a split, blk_queue_split returns the newly allocated bio to the caller by changing the value of pointer passed as a reference, while the original is passed to generic_make_requests. Although pblk_rw_io's local variable bio* has changed and passed to pblk_submit_read and pblk_write_to_cache, work is done on this new bio, and pblk_rw_io returns NVM_IO_DONE, pblk_make_rq calls bio_endio on the old bio because it passed bio pointer by value to pblk_rw_io. pblk_rw_io is unfolded into pblk_make_rq so that there is no copying of bio* and bio_endio is called on the correct bio*. Signed-off-by: Chansol Kim <chansol.kim@samsung.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: Inherit mdts from the parent nvme device	Igor Konopko
	Current lightnvm and pblk implementation does not care about NVMe max data transfer size, which can be smaller than 64*K=256K. There are existing NVMe controllers which NVMe max data transfer size is lower that 256K (for example 128K, which happens for existing NVMe controllers which are NVMe spec compliant). Such a controllers are not able to handle command which contains 64 PPAs, since the the size of DMAed buffer will be above the capabilities of such a controller. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: set proper read status in bio	Igor Konopko
	Currently in case of read errors, bi_status is not set properly which leads to returning inproper data to layers above. This patch fix that by setting proper status in case of read errors. Also remove unnecessary warn_once(), which does not make sense in that place, since user bio is not used for interation with drive and thus bi_status will not be set here. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: cleanly fail when there is not enough memory	Igor Konopko
	L2P table can be huge in many cases, since it typically requires 1GB of DRAM for 1TB of drive. When there is not enough memory available, OOM killer turns on and kills random processes, which can be very annoying for users. This patch changes the flag for L2P table allocation on order to handle this situation in more user friendly way. GFP_KERNEL and __GPF_HIGHMEM are default flags used in parameterless vmalloc() calls, so they are also keeped in that patch. Additionally __GFP_NOWARN flag is added in order to hide very long dmesg warn in case of the allocation failures. The most important flag introduced in that patch is __GFP_RETRY_MAYFAIL, which would cause allocator to try use free memory and if not available to drop caches, but not to run OOM killer. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: ensure that erase is chunk aligned	Igor Konopko
	The sector bits in the erase command may be uninitialized are uninitialized, causing the erase LBA to be unaligned to the chunk size. This is unexpected situation, since erase shall always be chunk aligned based on OCSSD the 2.0 specification. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: fix race during put line	Igor Konopko
	In the pblk_put_line_back function, a race condition with __pblk_map_invalidate can make a line not part of any lists. Fix gc_list by resetting it to null fixes the above issue. Fixes: a4bd217 ("lightnvm: physical block device (pblk) target") Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: gracefully handle GC vmalloc fail	Igor Konopko
	Currently when we fail on rq data allocation in gc, it skips moving active data and moves line straigt to its free state. Losing user data in the process. Move the data allocation to an earlier phase of GC, where we can still fail gracefully by moving line back to the closed state. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: remove unused smeta_ssec field	Igor Konopko
	smeta_ssec field in pblk_line is never used after it was replaced by the function pblk_line_smeta_start(). Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: reduce L2P memory footprint	Igor Konopko
	Currently L2P map size is calculated based on the total number of available sectors, which is redundant, since it contains mapping for overprovisioning as well (11% by default). Change this size to the real capacity and thus reduce the memory footprint significantly - with default op value it is approx. 110MB of DRAM less for every 1TB of media. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: rollback on error during gc read	Igor Konopko
	A line is left unsigned to the blocks lists in case pblk_gc_line returns an error. This moves the line back to be appropriate list, which can then be picked up by the garbage collector. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	lightnvm: pblk: line reference fix in GC	Igor Konopko
	Fixes the GC error case when moving a line back to closed state while releasing additional references. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06	RDMA/umem: Remove hugetlb flag	Shiraz Saleem
	The drivers i40iw and bnxt_re no longer dependent on the hugetlb flag. So remove this flag from ib_umem structure. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/bnxt_re: Use core helpers to get aligned DMA address	Shiraz Saleem
	Call the core helpers to retrieve the HW aligned address to use for the MR, within a supported bnxt_re page size. Remove checking the umem->hugtetlb flag as it is no longer required. The new DMA block iterator will return the 2M aligned address if the MR is backed by 2M huge pages. Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/i40iw: Use core helpers to get aligned DMA address within a supported ↵	Shiraz Saleem
	page size Call the core helpers to retrieve the HW aligned address to use for the MR, within a supported i40iw page size. Remove code in i40iw to determine when MR is backed by 2M huge pages which involves checking the umem->hugetlb flag and VMA inspection. The new DMA iterator will return the 2M aligned address if the MR is backed by 2M pages. Fixes: f26c7c83395b ("i40iw: Add 2MB page support") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks	Shiraz Saleem
	This helper iterates over a DMA-mapped SGL and returns contiguous memory blocks aligned to a HW supported page size. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/umem: Add API to find best driver supported page size in an MR	Shiraz Saleem
	This helper iterates through the SG list to find the best page size to use from a bitmap of HW supported page sizes. Drivers that support multiple page sizes, but not mixed sizes in an MR can use this API. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	IB/hfi1: Fix WQ_MEM_RECLAIM warning	Mike Marciniszyn
	The work_item cancels that occur when a QP is destroyed can elicit the following trace: workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1] WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100 Call Trace: __flush_work.isra.29+0x8c/0x1a0 ? __switch_to_asm+0x40/0x70 __cancel_work_timer+0x103/0x190 ? schedule+0x32/0x80 iowait_cancel_work+0x15/0x30 [hfi1] rvt_reset_qp+0x1f8/0x3e0 [rdmavt] rvt_destroy_qp+0x65/0x1f0 [rdmavt] ? _cond_resched+0x15/0x30 ib_destroy_qp+0xe9/0x230 [ib_core] ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib] process_one_work+0x171/0x370 worker_thread+0x49/0x3f0 kthread+0xf8/0x130 ? max_active_store+0x80/0x80 ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM. The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become entangled with memory reclaim, so this flag is appropriate. Fixes: 0a226edd203f ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines") Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	RDMA/mlx5: Remove MAYEXEC flag	Leon Romanovsky
	MAYEXEC flag was mistakenly added in the commit cited in the fixes line. Fixes: 4eb6ab13b991 ("RDMA: Remove rdma_user_mmap_page") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	IB/mlx5: Device resource control for privileged DEVX user	Ariel Levkovich
	For DEVX users who have SYS_RAWIO capability, we set the internal device resources capability when creating the UCTX. This will allow the device to restrict the allocation of internal device resources such as SW ICM memory to privileged DEVX users only. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	IB/mlx5: Add steering SW ICM device memory type	Ariel Levkovich
	This patch adds support for allocating, deallocating and registering a new device memory type, STEERING_SW_ICM. This memory can be allocated and used by a privileged user for direct rule insertion and management of the device's steering tables. The type is provided by the user via the dedicated attribute in the alloc_dm ioctl command. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	IB/mlx5: Warn on allocated MEMIC buffers during cleanup	Ariel Levkovich
	Adding a warning on allocated MEMIC buffers that weren't freed prior to driver tear down. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	IB/mlx5: Support device memory type attribute	Ariel Levkovich
	This patch intoruduces a new mlx5_ib driver attribute to the DM allocation method - the DM type. In order to allow addition of new types in downstream patches this patch also refactors the allocation, deallocation and registration handlers to consider the requested type and perform the necessary actions according to it. Since not all future device memory types will be such that are mapped to user memory, the mandatory page index output attribute is modified to be optional. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	kernel: cgroup: fix misuse of %x	Fuqian Huang
	Pointers should be printed with %p or %px rather than cast to unsigned long type and printed with %lx. Change %lx to %p to print the pointers. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-05-06	cgroup: get rid of cgroup_freezer_frozen_exit()	Roman Gushchin
	A task should never enter the exit path with the task->frozen bit set. Any frozen task must enter the signal handling loop and the only way to escape is through cgroup_leave_frozen(true), which unconditionally drops the task->frozen bit. So it means that cgroyp_freezer_frozen_exit() has zero chances to be called and has to be removed. Let's put a WARN_ON_ONCE() instead of the cgroup_freezer_frozen_exit() call to catch any potential leak of the task's frozen bit. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-05-06	cgroup: prevent spurious transition into non-frozen state	Roman Gushchin
	If freezing of a cgroup races with waking of a task from the frozen state (like waiting in vfork() or in do_signal_stop()), a spurious transition of the cgroup state can happen. The task enters cgroup_leave_frozen(true), the cgroup->nr_frozen_tasks counter decrements, and the cgroup is switched to the unfrozen state. To prevent it, let's reserve cgroup_leave_frozen(true) for terminating processes and use cgroup_leave_frozen(false) otherwise. To avoid busy-looping in the signal handling loop waiting for JOBCTL_TRAP_FREEZE set from the cgroup freezing path, let's do it explicitly in cgroup_leave_frozen(), if the task is going to stay frozen. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-05-06	cgroup: Remove unused cgrp variable	Shaokun Zhang
	The 'cgrp' is set but not used in commit <76f969e8948d8> ("cgroup: cgroup v2 freezer"). Remove it to avoid [-Wunused-but-set-variable] warning. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-05-06	RDMA/rdmavt: Catch use-after-free access of AH structures	Leon Romanovsky
	Prior to commit d345691471b4 ("RDMA: Handle AH allocations by IB/core"), AH destroy path is rdmavt returned -EBUSY warning to application and caused to potential leakage of kernel memory of AH structure. After that commit, the AH structure is always freed but such early return in driver code can potentially cause to use-after-free error. Add warning to catch such situation to help driver developers to fix AH release path. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06	platform/x86: intel_pmc_core: Allow to dump debug registers on S0ix failure	Rajat Jain
	Add a module parameter which when enabled, will check on resume, if the last S0ix attempt was successful. If not, the driver would warn and provide helpful debug information (which gets latched during the failed suspend attempt) to debug the S0ix failure. This information is very useful to debug S0ix failures. Specially since the latched debug information will be lost (over-written) if the system attempts to go into runtime (or imminent) S0ix again after that failed suspend attempt. Signed-off-by: Rajat Jain <rajatja@google.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-05-06	platform/x86: intel_pmc_core: Convert to a platform_driver	Rajat Jain
	Convert the intel_pmc_core driver to a platform driver. There is no functional change to the driver, or to the way the devices are instantiated. Signed-off-by: Rajat Jain <rajatja@google.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-05-06	platform/x86: mlx-platform: Add mlx-wdt platform driver activation	Vadim Pasternak
	Add mlx-wdt platform driver activation. Watchdog driver uses the same regmap infrastructure as others Mellanox platform drivers. Specific registers description for watchdog platform data configuration are added to mlx-platform. There are the registers for watchdog timer manipulation, and action setting on watchdog timer expiration. The watchdog action function could be configured to perform one of the following: system reset, setting PWM to full speed or counter increment. Two types of watchdog devices are supported main and auxiliary. These devices are co-exist and each of them could be configured to handle the specific action. Signed-off-by: Michael Shych <michealsh@mellanox.com> Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-05-06	platform/x86: mlx-platform: Add support for tachometer speed register	Vadim Pasternak
	Add support for tachometer speed register for the next generation systems MQMB7xx, MSN37xx, MSN34xx, MSN38xx. All these systems support tachometer speed capability register. This register is to be provided mlxreg-fan driver. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-05-06	platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc	Liming Sun
	This commit adds the TmFifo platform driver for Mellanox BlueField Soc. TmFifo is a shared FIFO which enables external host machine to exchange data with the SoC via USB or PCIe. The driver is based on virtio framework and has console and network access enabled. Reviewed-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Liming Sun <lsun@mellanox.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-05-06	platform/x86: thinkpad_acpi: fix spelling mistake "capabilites" -> ↵	Colin Ian King
	"capabilities" There is a spelling mistake in a module parameter description. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-05-06	platform/x86: intel_punit_ipc: Revert "Fix resource ioremap warning"	Andy Shevchenko
	Since we have a proper fix for intel_pmc_ipc driver for resource management, get rid of unneeded commit in the intel_punit_ipc driver. This reverts commit 6cc8cbbc8868033f279b63e98b26b75eaa0006ab. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2019-05-06	platform/x86: intel_pmc_ipc: Don't map non-used optional resources	Andy Shevchenko
	The intel_pmc_ipc driver has a placeholder for all possible resources that may have been provided by ACPI. Since there are few optional ones, the driver still uses them and binds to wrong ranges in resource tree: # grep intel_punit_ipc /proc/iomem 00000000-00000000 : intel_punit_ipc 00000000-00000000 : intel_punit_ipc 00000000-00000000 : intel_punit_ipc 00000000-00000000 : intel_punit_ipc This leads to issues with resource management during inserting and removing modules, such as intel_pmc_ipc itself, which can't be inserted anymore after first removal. Count the actual resources provided and supply only them to the child device. This is a real fix of the commit 8cc7fb4a6523 ("intel_pmc_ipc: update acpi resource structure for Punit") that also fixes a symptoms described in the commit 6cc8cbbc8868 ("platform/x86: intel_punit_ipc: Fix resource ioremap warning") that is going to be reverted afterwards. Reported-by: Junxiao Chang <junxiao.chang@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Qipeng Zha <qipeng.zha@intel.com> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2019-05-06	platform/x86: intel_pmc_ipc: Apply same width for offset definitions	Andy Shevchenko
	Apply same width for offset definitions to make code more consistent. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2019-05-06	platform/x86: intel_pmc_ipc: Use BIT() macro	Andy Shevchenko
	Use BIT() and BIT_MASK() macros for definitions. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2019-05-06	vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files	Kirill Smelkov
	This amends commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") in how position is passed into .read()/.write() handler for stream-like files: Rasmus noticed that we currently pass 0 as position and ignore any position change if that is done by a file implementation. This papers over bugs if ppos is used in files that declare themselves as being stream-like as such bugs will go unnoticed. Even if a file implementation is correctly converted into using stream_open, its read/write later could be changed to use ppos and even though that won't be working correctly, that bug might go unnoticed without someone doing wrong behaviour analysis. It is thus better to pass ppos=NULL into read/write for stream-like files as that don't give any chance for ppos usage bugs because it will oops if ppos is ever used inside .read() or .write(). Note 1: rw_verify_area, new_sync_{read,write} needs to be updated because they are called by vfs_read/vfs_write & friends before file_operations .read/.write . Note 2: if file backend uses new-style .read_iter/.write_iter, position is still passed into there as non-pointer kiocb.ki_pos . Currently stream_open.cocci (semantic patch added by 10dce8af3422) ignores files whose file_operations has *_iter methods. Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
2019-05-06	ASoC: stm32: spdifrx: change trace level on iec control	Olivier Moysan
	Change trace level to debug to avoid spurious messages. Return quietly when accessing iec958 control, while no S/PDIF signal is available. Signed-off-by: Olivier Moysan <olivier.moysan@st.com> Signed-off-by: Mark Brown <broonie@kernel.org>