linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2014-10-09	IB/iser: Use single CQ for RX and TX	Sagi Grimberg
	This will solve a possible condition where we might miss TX completion (flush error) during session teardown. Since we are using a single CQ, we don't need to actively drain the TX CQ, instead just wait for flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors(). This patch might introduce a minor performance regression on its own, but the next patches will enhance performance using a single CQ for RX and TX. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Use internal polling budget to avoid possible live-lock	Sagi Grimberg
	We need a way to guarentee that we don't stay in soft-IRQ context for too long. We might starve other pending CQ tasklets or worse lock against application trying to issue IO on the running CPU. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Centralize iser completion contexts	Sagi Grimberg
	Introduce iser_comp which centralizes all iser completion related items and is referenced by iser_device and each ib_conn. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Use iser_warn instead of BUG_ON in iser_conn_release	Ariel Nahum
	In case iscsid was violently killed (SIGKILL) during its error recovery stage, we may never get a connection teardown sequence for some of the old connections. No harm done, but when we try to unload the module we will need to cleanup all these connections. So we actually may end-up here - so it's not a BUG_ON(), just give a relaxed warning that this happened and continue with normal unload. BUG_ON() will cause segfault on module_exit and we don't want that. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Signal iSCSI layer that transport is broken in error completions	Sagi Grimberg
	Previously we notified iscsi layer about the connection layer when we consumed all of our flush errors. This was racy as there was no guarentee that iscsi_conn wasn't terminated by then (which ends up in an invalid memory access). In case we got a non FLUSH error completion, we are guarenteed that iscsi_conn is still alive. We should notify iSCSI layer with iscsi_conn_failure to initiate error handling. While we are at it, add a nice kernel-doc style documentation. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Protect tasks cleanup in case IB device was already released	Sagi Grimberg
	Bailout in case a task cleanup (iscsi_iser_cleanup_task) is called after the IB device was removed (DEVICE_REMOVAL CM event). We also call iscsi_conn_stop with a lock taken to prevent DEVICE_REMOVAL and tasks cleanup from racing. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Unbind at conn_stop stage	Ariel Nahum
	Previously we didn't need to unbind the iser_conn and iscsi_conn since we always relied on iscsi daemon to teardown the connection and never let it finish before we cleanup all that is needed in iser. This is not the case anymore (for DEVICE_REMOVAL event). So avoid any possible chance we cause iscsi_conn dereference after iscsi_conn was freed. We also call iser_conn_terminate (safe to call multiple times) just for the corner case of iscsi daemon stopping an old connection before invoking endpoint removal (might happen if it was violently killed). Notice we are unbinding under a lock - which is required. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Don't bound release_work completions timeouts	Sagi Grimberg
	We no longer rely on iscsi connection teardown sequence, so no need to give a grace period and continue cleanup if it expired. Have iser_conn_release wait for full completion before freeing iser_conn. ib_completion: Guaranteed to come when: - Got DISCONNECTED/ADDR_CHANGE event or - iSCSI called ep_disconnect/conn_stop Guaranteed to finish when: - Got TIMEWAIT_EXIT/DEVICE_REMOVAL event - All Flush errors are consumed - IB related resources are destroyed stop_completion: Guaranteed to come when: - iSCSI calls conn_stop Guaranteed to finish when: - All inflight tasks were cleaned up Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon	Sagi Grimberg
	iscsi daemon is in user-space, thus we can't rely on it to be invoked at connection teardown (if not running or does not receive CPU time). This patch addresses the issue by re-structuring iSER connection teardown logic and CM events handling. The CM events will dictate the RDMA resources destruction (ib_conn) and iser_conn is kept around as long as iscsi_conn is left around allowing iscsi/iser callbacks to continue after RDMA transport was destroyed. This patch introduces a separation in logic when handling CM events: - DISCONNECTED_HANDLER, ADDR_CHANGED This events indicate the start of teardown process. Actions: 1. Terminate the connection: rdma_disconnect (send DREQ/DREP) 2. Notify iSCSI of connection failure 3. Change state to TERMINATING 4. Poll for all flush errors to be consumed - TIMEWAIT_EXIT, DEVICE_REMOVAL These events indicate the final stage of termination process and we can free RDMA related resources. Actions: 1. Call disconnected handler (we are not guaranteed that DISCONNECTED event was invoked in the past) 2. Cleanup RDMA related resources 3. For DEVICE_REMOVAL return non-zero rc from cma_handler to implicitly destroy the cm_id (Can't rely on user-space, make sure we have forward progress) We replace flush_completion (indicate all flushes were consumed) with ib_completion (rdma resources were cleaned up). The iser_conn_release_work will wait for teardown completions: - conn_stop was completed (tasks were cleaned-up) - stop_completion - RDMA resources were destroyed - ib_completion And then will continue to free iser connection representation (iser_conn). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Extend iser_free_ib_conn_res()	Sagi Grimberg
	Put all connection IB related resources release in this routine. One exception is the cm_id which cannot be destroyed as the routine is protected by the state mutex. Also move its position to avoid forward declaration. While at it fix qp NULL assignment. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Remove unused variables and dead code	Roi Dayan
	Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Re-introduce ib_conn	Sagi Grimberg
	Structure that describes the RDMA relates connection objects. Static member of iser_conn. This patch does not change any functionality Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	IB/iser: Rename ib_conn -> iser_conn	Sagi Grimberg
	Two reasons why we choose to do this: 1. No point today calling struct iser_conn by another name ib_conn 2. In the next patches we will restructure iser control plane representation - struct iser_conn: connection logical representation - struct ib_conn: connection RDMA layout representation This patch does not change any functionality. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09	Merge branch 'rcu/next' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull additional commits for locktorture, from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-09	fix misuses of f_count() in ppp and netlink	Al Viro
	we used to check for "nobody else could start doing anything with that opened file" by checking that refcount was 2 or less - one for descriptor table and one we'd acquired in fget() on the way to wherever we are. That was race-prone (somebody else might have had a reference to descriptor table and do fget() just as we'd been checking) and it had become flat-out incorrect back when we switched to fget_light() on those codepaths - unlike fget(), it doesn't grab an extra reference unless the descriptor table is shared. The same change allowed a race-free check, though - we are safe exactly when refcount is less than 2. It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH, netlink hadn't grown that check until 3.9 and ppp used to live in drivers/net, not drivers/net/ppp until 3.1. The bug existed well before that, though, and the same fix used to apply in old location of file. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	ncpfs: use list_for_each_entry() for d_subdirs walk	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	vfs: move getname() from callers to do_mount()	Seunghun Lee
	It would make more sense to pass char __user * instead of char * in callers of do_mount() and do getname() inside do_mount(). Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Seunghun Lee <waydi1@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	gfs2_atomic_open(): skip lookups on hashed dentry	Al Viro
	hashed dentry can be passed to ->atomic_open() only if a) it has just passed revalidation and b) it's negative Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	[infiniband] remove pointless assignments	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	gadgetfs: saner API for gadgetfs_create_file()	Al Viro
	return dentry, not inode. dev->inode is never used by anything, don't bother with storing it. Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	f_fs: saner API for ffs_sb_create_file()	Al Viro
	make it return dentry instead of inode Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	jfs: don't hash direct inode	Al Viro
	hlist_add_fake(inode->i_hash), same as for the rest of special ones... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	[s390] remove pointless assignment of ->f_op in vmlogrdr ->open()	Al Viro
	The only way we can get to that function is from misc_open(), after the latter has set file->f_op to exactly the same value we are (re)assigning there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	ecryptfs: ->f_op is never NULL	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	android: ->f_op is never NULL	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	nouveau: __iomem misannotations	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	missing annotation in fs/file.c	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	fs: namespace: suppress 'may be used uninitialized' warnings	Tim Gardner
	The gcc version 4.9.1 compiler complains Even though it isn't possible for these variables to not get initialized before they are used. fs/namespace.c: In function ‘SyS_mount’: fs/namespace.c:2720:8: warning: ‘kernel_dev’ may be used uninitialized in this function [-Wmaybe-uninitialized] ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags, ^ fs/namespace.c:2699:8: note: ‘kernel_dev’ was declared here char kernel_dev; ^ fs/namespace.c:2720:8: warning: ‘kernel_type’ may be used uninitialized in this function [-Wmaybe-uninitialized] ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags, ^ fs/namespace.c:2697:8: note: ‘kernel_type’ was declared here char kernel_type; ^ Fix the warnings by simplifying copy_mount_string() as suggested by Al Viro. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	saner perf_atoll()	Al Viro
	That loop in there is both anti-idiomatic and completely pointless. strtoll() is there for purpose; use it and compare what's left with acceptable suffices. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	switch /dev/kmsg to ->write_iter()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	switch logger to ->write_iter()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	switch hci_vhci to ->write_iter()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	switch /dev/zero and /dev/full to ->read_iter()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	dma-buf: don't open-code atomic_long_read()	Al Viro
	... not to mention that even atomic_long_read() is too low-level here - there's file_count(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	rsxx debugfs inanity	Al Viro
	check with the author of that horror... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	carma-fpga: switch to simple_read_from_buffer()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	carma-fpga: switch to fixed_size_llseek()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	cachefiles_write_page(): switch to __kernel_write()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	vme: don't open-code fixed_size_llseek()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	ashmem: use vfs_llseek()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	9p: switch to %p[dD]	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	cifs: switch to use of %p[dD]	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	fs: make cont_expand_zero interruptible	Mikulas Patocka
	This patch makes it possible to kill a process looping in cont_expand_zero. A process may spend a lot of time in this function, so it is desirable to be able to kill it. It happened to me that I wanted to copy a piece data from the disk to a file. By mistake, I used the "seek" parameter to dd instead of "skip". Due to the "seek" parameter, dd attempted to extend the file and became stuck doing so - the only possibility was to reset the machine or wait many hours until the filesystem runs out of space and cont_expand_zero fails. We need this patch to be able to terminate the process. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	Add copy_to_iter(), copy_from_iter() and iov_iter_zero()	Matthew Wilcox
	For DAX, we want to be able to copy between iovecs and kernel addresses that don't necessarily have a struct page. This is a fairly simple rearrangement for bvec iters to kmap the pages outside and pass them in, but for user iovecs it gets more complicated because we might try various different ways to kmap the memory. Duplicating the existing logic works out best in this case. We need to be able to write zeroes to an iovec for reads from unwritten ranges in a file. This is performed by the new iov_iter_zero() function, again patterned after the existing code that handles iovec iterators. [AV: and export the buggers...] Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	fs: Fix theoretical division by 0 in super_cache_scan().	Tetsuo Handa
	total_objects could be 0 and is used as a denom. While total_objects is a "long", total_objects == 0 unlikely happens for 3.12 and later kernels because 32-bit architectures would not be able to hold (1 << 32) objects. However, total_objects == 0 may happen for kernels between 3.1 and 3.11 because total_objects in prune_super() was an "int" and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable <stable@kernel.org> # 3.1+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	dcache: Fix no spaces at the start of a line in dcache.c	Daeseok Youn
	Fixed coding style in dcache.c Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	[jffs2] kill wbuf_queued/wbuf_dwork_lock	Al Viro
	schedule_delayed_work() happening when the work is already pending is a cheap no-op. Don't bother with ->wbuf_queued logics - it's both broken (cancelling ->wbuf_dwork leaves it set, as spotted by Jeff Harris) and pointless. It's cheaper to let schedule_delayed_work() handle that case. Reported-by: Jeff Harris <jefftharris@gmail.com> Tested-by: Jeff Harris <jefftharris@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	vfs: fix typo in s_op->alloc_inode() documentation	Kirill Smelkov
	The function which calls s_op->alloc_inode() is not inode_alloc(), but instead alloc_inode() which lives in fs/inode.c . The typo was there from the beginning from 5ea626aa (VFS: update documentation, 2005) - there was no standalone inode_alloc() for the whole kernel history. Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	constify file_inode()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09	handle suicide on late failure exits in execve() in search_binary_handler()	Al Viro
	... rather than doing that in the guts of ->load_binary(). [updated to fix the bug spotted by Shentino - for SIGSEGV we really need something stronger than send_sig_info(); again, better do that in one place] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>