linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-07-28	KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE	Paul Mackerras
	It turns out that if the guest does a H_CEDE while the CPU is in a transactional state, and the H_CEDE does a nap, and the nap loses the architected state of the CPU (which is is allowed to do), then we lose the checkpointed state of the virtual CPU. In addition, the transactional-memory state recorded in the MSR gets reset back to non-transactional, and when we try to return to the guest, we take a TM bad thing type of program interrupt because we are trying to transition from non-transactional to transactional with a hrfid instruction, which is not permitted. The result of the program interrupt occurring at that point is that the host CPU will hang in an infinite loop with interrupts disabled. Thus this is a denial of service vulnerability in the host which can be triggered by any guest (and depending on the guest kernel, it can potentially triggered by unprivileged userspace in the guest). This vulnerability has been assigned the ID CVE-2016-5412. To fix this, we save the TM state before napping and restore it on exit from the nap, when handling a H_CEDE in real mode. The case where H_CEDE exits to host virtual mode is already OK (as are other hcalls which exit to host virtual mode) because the exit path saves the TM state. Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-07-28	KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures	Paul Mackerras
	This moves the transactional memory state save and restore sequences out of the guest entry/exit paths into separate procedures. This is so that these sequences can be used in going into and out of nap in a subsequent patch. The only code changes here are (a) saving and restore LR on the stack, since these new procedures get called with a bl instruction, (b) explicitly saving r1 into the PACA instead of assuming that HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary and redundant setting of MSR[TM] that should have been removed by commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory support", 2013-09-24) but wasn't. Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-07-27	sparc: serial: sunhv: fix a double lock bug	Dan Carpenter
	We accidentally take the "port->lock" twice in a row. This old code was supposed to be deleted. Fixes: e58e241c1788 ('sparc: serial: Clean up the locking for -rt') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-27	sparc32: off by ones in BUG_ON()	Dan Carpenter
	Smatch complains that these tests are off by one, which is true but not life threatening. arch/sparc/kernel/irq_32.c:169 irq_link() error: buffer overflow 'irq_map' 384 <= 384 Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-28	crypto: marvell - Update cache with input sg only when it is unmapped	Romain Perier
	So far, the cache of the ahash requests was updated from the 'complete' operation. This complete operation is called from mv_cesa_tdma_process before the cleanup operation, which means that the content of req->src can be read and copied when it is still mapped. This commit fixes the issue by moving this cache update from mv_cesa_ahash_complete to mv_cesa_ahash_req_cleanup, so the copy is done once the sglist is unmapped. Fixes: 1bf6682cb31d ("crypto: marvell - Add a complete operation for..") Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-28	crypto: marvell - Don't chain at DMA level when backlog is disabled	Romain Perier
	The flag CRYPTO_TFM_REQ_MAY_BACKLOG is optional and can be set from the user to put requests into the backlog queue when the main cryptographic queue is full. Before calling mv_cesa_tdma_chain we must check the value of the return status to be sure that the current request has been correctly queued or added to the backlog. Fixes: 85030c5168f1 ("crypto: marvell - Add support for chaining...") Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-28	crypto: marvell - Fix memory leaks in TDMA chain for cipher requests	Romain Perier
	So far in mv_cesa_ablkcipher_dma_req_init, if an error is thrown while the tdma chain is built there is a memory leak. This issue exists because the chain is assigned later at the end of the function, so the cleanup function is called with the wrong version of the chain. Fixes: db509a45339f ("crypto: marvell/cesa - add TDMA support") Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-28	Merge branch 'topic/dmaengine_cleanups' into for-linus	Vinod Koul

2016-07-28	mailbox: Add Broadcom PDC mailbox driver	Rob Rice
	The Broadcom PDC mailbox driver is a mailbox controller that manages data transfers to and from one or more offload engines. Signed-off-by: Rob Rice <rob.rice@broadcom.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-07-28	dt-bindings: add bindings documentation for PDC driver.	Rob Rice
	Add the device tree binding documentation for the PDC hardware in Broadcom iProc SoCs. Signed-off-by: Rob Rice <rob.rice@broadcom.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Reviewed-by: Anup Patel <anup.patel@broadcom.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2016-07-27	CIFS: Fix a possible invalid memory access in smb2_query_symlink()	Pavel Shilovsky
	During following a symbolic link we received err_buf from SMB2_open(). While the validity of SMB2 error response is checked previously in smb2_check_message() a symbolic link payload is not checked at all. Fix it by adding such checks. Cc: Dan Carpenter <dan.carpenter@oracle.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-07-27	fs/cifs: make share unaccessible at root level mountable	Aurelien Aptel
	if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but not any of the path components above: - store the /sub/dir/foo prefix in the cifs super_block info - in the superblock, set root dentry to the subpath dentry (instead of the share root) - set a flag in the superblock to remember it - use prefixpath when building path from a dentry fixes bso#8950 Signed-off-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-07-27	random: use for_each_online_node() to iterate over NUMA nodes	Theodore Ts'o
	This fixes a crash on s390 with fake NUMA enabled. Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-27	Add braces to avoid "ambiguous ‘else’" compiler warnings	Linus Torvalds
	Some of our "for_each_xyz()" macro constructs make gcc unhappy about lack of braces around if-statements inside or outside the loop, because the loop construct itself has a "if-then-else" statement inside of it. The resulting warnings look something like this: drivers/gpu/drm/i915/i915_debugfs.c: In function ‘i915_dump_lrc’: drivers/gpu/drm/i915/i915_debugfs.c:2103:6: warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wparentheses] if (ctx != dev_priv->kernel_context) ^ even if the code itself is fine. Since the warning is fairly easy to avoid by adding a braces around the if-statement near the for_each_xyz() construct, do so, rather than disabling the otherwise potentially useful warning. (The if-then-else statements used in the "for_each_xyz()" constructs are designed to be inherently safe even with no braces, but in this case it's quite understandable that gcc isn't really able to tell that). This finally leaves the standard "allmodconfig" build with just a handful of remaining warnings, so new and valid warnings hopefully will stand out. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28	powerpc/mm: Parenthesise IS_ENABLED() in if condition	Stephen Rothwell
	Currently IS_ENABLED() produces an expression surrounded by parentheses, which allows this code to compile, generating eg: else if (1 \|\| 0) hpte_init_native(); However a change to the macro in the kbuild tree will break this in future by removing the parentheses. Fixes: 7353644fa9df ("powerpc/mm: Fix build break when PPC_NATIVE=n") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-27	Disable "frame-address" warning	Linus Torvalds
	Newer versions of gcc warn about the use of __builtin_return_address() with a non-zero argument when "-Wall" is specified: kernel/trace/trace_irqsoff.c: In function ‘stop_critical_timings’: kernel/trace/trace_irqsoff.c:433:86: warning: calling ‘__builtin_return_address’ with a nonzero argument is unsafe [-Wframe-address] stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1); [ .. repeats a few times for other similar cases .. ] It is true that a non-zero argument is somewhat dangerous, and we do not actually have very many uses of that in the kernel - but the ftrace code does use it, and as Stephen Rostedt says: "We are well aware of the danger of using __builtin_return_address() of > 0. In fact that's part of the reason for having the "thunk" code in x86 (See arch/x86/entry/thunk_{64,32}.S). [..] it adds extra frames when tracking irqs off sections, to prevent __builtin_return_address() from accessing bad areas. In fact the thunk_32.S states: 'Trampoline to trace irqs off. (otherwise CALLER_ADDR1 might crash)'." For now, __builtin_return_address() with a non-zero argument is the best we can do, and the warning is not helpful and can end up making people miss other warnings for real problems. So disable the frame-address warning on compilers that need it. Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28	ceph: Correctly return NXIO errors from ceph_llseek	Phil Turnbull
	ceph_llseek does not correctly return NXIO errors because the 'out' path always returns 'offset'. Fixes: 06222e491e66 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: Mark the file cache as unreclaimable	Nikolay Borisov
	Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so that it can satisfy its internal needs. Inspecting the code shows that most of the caches are indeed reclaimable since they are directly related to the generic inode/dentry shrinkers. However, one of the cache used to satisfy struct file is not reclaimable since its entries are freed only when the last reference to the file is dropped. If a heavily loaded node opens a lot of files it can introduce non-trivial discrepancies between memory shown as reclaimable and what is actually reclaimed when drop_caches is used. Fix this by removing the reclaimable flag for the file's cache. Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: optimize cap flush waiting	Yan, Zheng
	Add a 'wake' flag to ceph_cap_flush struct, which indicates if there is someone waiting for it to finish. When getting flush ack message, we check the 'wake' flag in corresponding ceph_cap_flush struct to decide if we should wake up waiters. One corner case is that the acked cap flush has 'wake' flags is set, but it is not the first one on the flushing list. We do not wake up waiters in this case, set 'wake' flags of preceding ceph_cap_flush struct instead Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: cleanup ceph_flush_snaps()	Yan, Zheng
	This patch devide __ceph_flush_snaps() into two stags. In the first stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them to cap flush lists. __ceph_flush_snaps() keeps holding the i_ceph_lock in this stagge. So inode's auth cap can not change. In the second stage, __ceph_flush_snaps() send flushsnap cap messages. i_ceph_lock is unlocked before sending each cap message. If auth cap changes in the middle, __ceph_flush_snaps() just stops. This is OK because kick_flushing_inode_caps() will re-send flushsnap cap messages to inode's new auth MDS. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: kick cap flushes before sending other cap message	Yan, Zheng
	If ceph_check_caps() wants to send cap message to a recovering MDS, make sure it kicks cap flushes first. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: introduce an inode flag to indicates if snapflush is needed	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: avoid sending duplicated cap flush message	Yan, Zheng
	make ceph_kick_flushing_caps() ignore inodes whose cap flushes have already been re-sent by ceph_early_kick_flushing_caps() Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: unify cap flush and snapcap flush	Yan, Zheng
	This patch includes following changes - Assign flush tid to snapcap flush - Remove session's s_cap_snaps_flushing list. Add inode to session's s_cap_flushing list instead. Inode is removed from the list when there is no pending snapcap flush or cap flush. - make __kick_flushing_caps() re-send both snapcap flushes and cap flushes. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: use list instead of rbtree to track cap flushes	Yan, Zheng
	We don't have requirement of searching cap flush by TID. In most cases, we just need to know TID of the oldest cap flush. List is ideal for this usage. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: update types of some local varibles	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: include 'follows' of pending snapflush in cap reconnect message	Yan, Zheng
	This helps the recovering MDS to reconstruct the internal states that tracking pending snapflush. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: update cap reconnect message to version 3	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: mount non-default filesystem by name	Yan, Zheng
	To mount non-default filesytem, user currently needs to provide mds namespace ID. This is inconvenience. This patch makes user be able to mount filesystem by name. If user wants to mount non-default filesystem. Client first subscribes to fsmap.user. Subscribe to mdsmap.<ID> after getting ID of filesystem. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	libceph: fsmap.user subscription support	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28	ceph: handle LOOKUP_RCU in ceph_d_revalidate	Jeff Layton
	We can now handle the snapshot cases under RCU, as well as the non-snapshot case when we don't need to queue up a lease renewal allow LOOKUP_RCU walks to proceed under those conditions. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: allow dentry_lease_is_valid to work under RCU walk	Jeff Layton
	Under rcuwalk, we need to take extra care when dereferencing d_parent. We want to do that once and pass a pointer to dentry_lease_is_valid. Also, we must ensure that that function can handle the case where we're racing with d_release. Check whether "di" is NULL under the d_lock, and just return 0 if so. Finally, we still need to kick off a renewal job if the lease is getting close to expiration. If that's the case, then just drop out of rcuwalk mode since that could block. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: clear d_fsinfo pointer under d_lock	Jeff Layton
	To check for a valid dentry lease, we need to get at the ceph_dentry_info. Under rcuwalk though, we may end up with a dentry that is on its way to destruction. Since we need to take the d_lock in dentry_lease_is_valid already, we can just ensure that we clear the d_fsinfo pointer out under the same lock before destroying it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: remove ceph_mdsc_lease_release	Jeff Layton
	Nothing calls it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: don't use ->d_time	Miklos Szeredi
	Pretty simple: just use ceph_dentry_info.time instead (which was already there, unused). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-07-28	ceph: fix spelling mistake: "resgister" -> "register"	Colin Ian King
	trivial fix to spelling mistake in pr_err message Signed-off-by: Colin Ian King <colin.king@canonical.com>
2016-07-28	ceph: fix NULL dereference in ceph_queue_cap_snap()	Yan, Zheng
	old_snapc->seq is used in dout(...) Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: wait unsafe sync writes for evicting inode	Yan, Zheng
	Otherwise ceph_sync_write_unsafe() may access/modify freed inode. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: fix use-after-free bug in ceph_direct_read_write()	Yan, Zheng
	ceph_aio_complete() can free the ceph_aio_request struct before the code exits the while loop. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: reduce i_nr_by_mode array size	Yan, Zheng
	Track usage count for individual fmode bit. This can reduce the array size by half. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: set user pages dirty after direct IO read	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	ceph: rados pool namespace support	Yan, Zheng
	This patch adds codes that decode pool namespace information in cap message and request reply. Pool namespace is saved in i_layout, it will be passed to libceph when doing read/write. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	libceph: make sure redirect does not change namespace	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28	libceph: rados pool namespace support	Yan, Zheng
	Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's also used when composing OSD request. The namespace pointer in struct ceph_file_layout is RCU protected. So libceph can read namespace without taking lock. Signed-off-by: Yan, Zheng <zyan@redhat.com> [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28	libceph: introduce reference counted string	Yan, Zheng
	The data structure is for storing namesapce string. It allows namespace string to be shared between cephfs inodes with same layout. This data structure can also be referenced by OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	libceph: define new ceph_file_layout structure	Yan, Zheng
	Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace to ceph_file_layout structure. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28	libceph: add start en/decoding block helpers	Ilya Dryomov
	Add ceph_start_encoding() and ceph_start_decoding(), the equivalent of ENCODE_START and DECODE_START in the userspace ceph code. This is based on a patch from Mike Christie <michaelc@cs.wisc.edu>. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28	libceph: add an ONSTACK initializer for oids	Ilya Dryomov
	An on-stack oid in ceph_ioctl_get_dataloc() is not initialized, resulting in a WARN and a NULL pointer dereference later on. We will have more of these on-stack in the future, so fix it with a convenience macro. Fixes: d30291b985d1 ("libceph: variable-sized ceph_object_id") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28	libceph: fix some missing includes	Ilya Dryomov
	- decode.h needs slab.h for kmalloc() - osd_client.h needs msgpool.h for struct ceph_msgpool - msgpool.h doesn't need messenger.h Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-27	sparc: Don't leak context bits into thread->fault_address	David S. Miller
	On pre-Niagara systems, we fetch the fault address on data TLB exceptions from the TLB_TAG_ACCESS register. But this register also contains the context ID assosciated with the fault in the low 13 bits of the register value. This propagates into current_thread_info()->fault_address and can cause trouble later on. So clear the low 13-bits out of the TLB_TAG_ACCESS value in the cases where it matters. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>