linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-01-16	xprtrdma: Introduce rpcrdma_mw_unmap_and_put	Chuck Lever
	Clean up: Code review suggested that a common bit of code can be placed into a helper function, and this gives us fewer places to stick an "I DMA unmapped something" trace point. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Remove usage of "mw"	Chuck Lever
	Clean up: struct rpcrdma_mw was named after Memory Windows, but xprtrdma no longer supports a Memory Window registration mode. Rename rpcrdma_mw and its fields to reduce confusion and make the code more sensible to read. Renaming "mw" was suggested by Tom Talpey, the author of the original xprtrdma implementation. It's a good idea, but I haven't done this until now because it's a huge diffstat for no benefit other than code readability. However, I'm about to introduce static trace points that expose a few of xprtrdma's internal data structures. They should make sense in the trace report, and it's reasonable to treat trace points as a kernel API contract which might be difficult to change later. While I'm churning things up, two additional changes: - rename variables unhelpfully called "r" to "mr", to improve code clarity, and - rename the MR-related helper functions using the form "rpcrdma_mr_<verb>", to be consistent with other areas of the code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Replace all usage of "frmr" with "frwr"	Chuck Lever
	Clean up: Over time, the industry has adopted the term "frwr" instead of "frmr". The term "frwr" is now more widely recognized. For the past couple of years I've attempted to add new code using "frwr" , but there still remains plenty of older code that still uses "frmr". Replace all usage of "frmr" to avoid confusion. While we're churning code, rename variables unhelpfully called "f" to "frwr", to improve code clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Don't clear RPC_BC_PA_IN_USE on pre-allocated rpc_rqst's	Chuck Lever
	No need for the overhead of atomically setting and clearing this bit flag for every use of a pre-allocated backchannel rpc_rqst. These are a distinct pool of rpc_rqsts that are used only for callback operations, so it is safe to simply leave the bit set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Split xprt_rdma_send_request	Chuck Lever
	Clean up. @rqst is set up differently for backchannel Replies. For example, rqst->rq_task and task->tk_client are both NULL. So it is easier to understand and maintain this code path if it is separated. Also, we can get rid of the confusing rl_connect_cookie hack in rpcrdma_bc_receive_call. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: buf_free not called for CB replies	Chuck Lever
	Since commit 5a6d1db45569 ("SUNRPC: Add a transport-specific private field in rpc_rqst"), the rpc_rqst's for RPC-over-RDMA backchannel operations leave rq_buffer set to NULL. xprt_release does not invoke ->op->buf_free when rq_buffer is NULL. The RPCRDMA_REQ_F_BACKCHANNEL check in xprt_rdma_free is therefore redundant because xprt_rdma_free is not invoked for backchannel requests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Move unmap-safe logic to rpcrdma_marshal_req	Chuck Lever
	Clean up. This logic is related to marshaling the request, and I'd like to keep everything that touches req->rl_registered close together, for CPU cache efficiency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Support IPv6 in xprt_rdma_set_port	Chuck Lever
	Clean up a harmless oversight. xprtrdma's ->set_port method has never properly supported IPv6. This issue has never been a problem because NFS/RDMA mounts have always required "port=20049", thus so far, rpcbind is not invoked for these mounts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Remove another sockaddr_storage field (cdata::addr)	Chuck Lever
	Save more space in struct rpcrdma_xprt by removing the redundant "addr" field from struct rpcrdma_create_data_internal. Wherever we have rpcrdma_xprt, we also have the rpc_xprt, which has a sockaddr_storage field with the same content. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Initialize the xprt address string array earlier	Chuck Lever
	This makes the address strings available for debugging messages in earlier stages of transport set up. The first benefit is to get rid of the single-use rep_remote_addr field, saving 128+ bytes in struct rpcrdma_ep. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Remove unused padding variables	Chuck Lever
	Clean up. Remove fields that should have been removed by commit b3221d6a53c4 ("xprtrdma: Remove logic that constructs RDMA_MSGP type calls"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Remove ri_reminv_expected	Chuck Lever
	Clean up. Commit b5f0afbea4f2 ("xprtrdma: Per-connection pad optimization") should have removed this. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Per-mode handling for Remote Invalidation	Chuck Lever
	Refactoring change: Remote Invalidation is particular to the memory registration mode that is use. Use a callout instead of a generic function to handle Remote Invalidation. This gets rid of the 8-byte flags field in struct rpcrdma_mw, of which only a single bit flag has been allocated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_request	Chuck Lever
	The rpcrdma_req is not shared yet, and its associated Send hasn't been posted, thus RMW should be safe. There's no need for the expense of a lock cycle here. Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Fix backchannel allocation of extra rpcrdma_reps	Chuck Lever
	The backchannel code uses rpcrdma_recv_buffer_put to add new reps to the free rep list. This also decrements rb_recv_count, which spoofs the receive overrun logic in rpcrdma_buffer_get_rep. Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)") replaced the original open-coded list_add with a call to rpcrdma_recv_buffer_put(), but then a year later, commit 05c974669ece ("xprtrdma: Fix receive buffer accounting") added rep accounting to rpcrdma_recv_buffer_put. It was an oversight to let the backchannel continue to use this function. The fix this, let's combine the "add to free list" logic with rpcrdma_create_rep. Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in rpcrdma_buffer_create and then allocate additional rpcrdma_reps in rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel set-up is sufficient. Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	xprtrdma: Fix buffer leak after transport set up failure	Chuck Lever
	This leak has been around forever, and is exceptionally rare. EINVAL causes mount to fail with "an incorrect mount option was specified" although it's not likely that one of the mount options is incorrect. Instead, return ENODEV in this case, as this appears to be an issue with system or device configuration rather than a specific mount option. Some obsolete comments are also removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16	cgroup, docs: document the root cgroup behavior of cpu and io controllers	Maciej S. Szmigiero
	Currently, cgroups v2 documentation contains only a generic remark that "How resource consumption in the root cgroup is governed is up to each controller", which isn't really telling users much, who need to dig in the code and / or commit messages to learn the exact behavior. In cgroups v1 at least the blkio controller had its operation with respect to competition between child threads and child cgroups documented in blkio-controller.txt, with references to cfq-iosched.txt. Also, cgroups v2 documentation describes v1 behavior of both cpu and blkio controllers in an "Issues with v1" section. Let's document this behavior also for cgroups v2 to make life easier for users. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-01-16	blkcg: simplify statistic accumulation code	Arnd Bergmann
	Some older compilers (gcc-4.4 through 4.6 in particular) struggle with the way that blkg_rwstat_read() returns a structure, leading to excessive stack usage and rather inefficient code: block/blk-cgroup.c: In function 'blkg_destroy': block/blk-cgroup.c:354:1: error: the frame size of 1296 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] block/cfq-iosched.c: In function 'cfqg_stats_add_aux': block/cfq-iosched.c:753:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] block/bfq-cgroup.c: In function 'bfqg_stats_add_aux': block/bfq-cgroup.c:299:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] I also notice that there is no point in using atomic accesses for the local variables, so storing the temporaries in simple 'u64' variables not only avoids the stack usage on older compilers but also improves the object code on modern versions. Fixes: e6269c445467 ("blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it") Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-16	MAINTAINERS: Add NuBus subsystem entry	Finn Thain
	This is legacy code but it might as well have an official maintainer. Cc: linux-m68k@lists.linux-m68k.org Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	m68k/mac: Fix race conditions in OSS interrupt dispatch	Finn Thain
	The interrupt dispatch algorithm used in the OSS driver seems to be subject to race conditions: an IRQ flag could be lost if asserted between the MOV instructions from and to the interrupt flag register. But testing shows that the write to the flag register has no effect, so rewrite the algorithm without the theoretical race condition. There is a second theoretical race condition here. When oss_irq() is called with say, IPL == 2 it will invoke the SCSI interrupt handler. The SCSI IRQ is then cleared by the mac_scsi driver. If SCSI and NuBus IRQs are now asserted together, oss_irq() will be invoked with IPL == 3 and the mac_scsi interrupt handler can be re-entered. This re-entrance issue is not limited to SCSI and could affect NuBus and ADB drivers too. Fix it by splitting up oss_irq() into separate handlers for each IPL. No-one seems to know how OSS irq flags can be cleared, if at all, so add a comment to this effect (actually reinstate one I previously removed). Testing showed that a slot IRQ with no handler can remain asserted (in this case a Radius video card) without causing problems for other IRQs. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Add support for the driver model	Finn Thain
	This patch brings basic support for the Linux Driver Model to the NuBus subsystem. For flexibility, the matching of boards with drivers is left up to the drivers. This is also the approach taken by NetBSD. A board may have many functions, and drivers may have to consider many functional resources and board resources in order to match a device. This implementation does not bind drivers to resources (nor does it bind many drivers to the same board). Apple's NuBus declaration ROM design is flexible enough to allow that, but I don't see a need to support it as we don't use the "slot zero" resources (in the main logic board ROM). Eliminate the global nubus_boards linked list by rewriting the procfs board iterator around bus_for_each_dev(). Hence the nubus device refcount can be used to determine the lifespan of board objects. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Add expansion_type values for various Mac models	Finn Thain
	Add an expansion slot attribute to allow drivers to properly handle cards like Comm Slot cards and PDS cards without declaration ROMs. This clarifies the logic for the Centris 610 model which has no Comm Slot but has an optional on-board SONIC device. Cc: "David S. Miller" <davem@davemloft.net> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Adopt standard linked list implementation	Finn Thain
	This increases code re-use and improves readability. Cc: "David S. Miller" <davem@davemloft.net> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Rename struct nubus_dev	Finn Thain
	It is misleading to call a functional resource a "device". In adopting the Linux Driver Model, the struct device will be embedded in struct nubus_board. That will compound the terminlogy problem because drivers will bind with boards, not with functional resources. Avoid this by renaming struct nubus_dev as struct nubus_rsrc. "Functional resource" is the vendor's terminology so this helps avoid confusion. Cc: "David S. Miller" <davem@davemloft.net> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Rework /proc/bus/nubus/s/ implementation	Finn Thain
	The /proc/bus/nubus/s/ directory tree for any slot s is missing a lot of information. The struct file_operations methods have long been left unimplemented (hence the familiar compile-time warning, "Need to set some I/O handlers here"). Slot resources have a complex structure which varies depending on board function. The logic for interpreting these ROM data structures is found in nubus.c. Let's not duplicate that logic in proc.c. Create the /proc/bus/nubus/s/ inodes while scanning slot s. During descent through slot resource subdirectories, call the new nubus_proc_add_foo() functions to create the procfs inodes. Also add a new function, nubus_seq_write_rsrc_mem(), to write the contents of a particular slot resource to a given seq_file. This is used by the procfs file_operations methods, to finally give userspace access to slot ROM information, such as the available video modes. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Generalize block resource handling	Finn Thain
	Scrap the specialized code to unpack video mode name resources and driver resources. It isn't useful. Instead, add a re-usable function to handle lists of block resources of any kind, and descend into the video mode table resource directory. Rename callers as nubus_get_foo(), consistent with their purpose and with related functions in the same file. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Clean up whitespace	Finn Thain
	Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Remove redundant code	Finn Thain
	Eliminate unused values from struct nubus_dev to save wasted memory (a Radius PrecisionColor 24X card has about 95 functional resources and up to six such cards may be fitted). Also remove redundant static variable initialization, an unreachable !MACH_IS_MAC conditional, the unused nubus_find_device() function, the bogus get_nubus_list() prototype and the pointless card_present temporary variable. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Call proc_mkdir() not more than once per slot directory	Finn Thain
	This patch fixes the following WARNING. proc_dir_entry 'nubus/a' already registered Modules linked in: CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.13.0-00036-gd57552077387 #1 Stack from 01c1bd9c: 01c1bd9c 003c2c8b 01c1bdc0 0001b0fe 00000000 00322f4a 01c43a20 01c43b0c 01c8c420 01c1bde8 0001b1b8 003a4ac3 00000148 000faa26 00000009 00000000 01c1bde0 003a4b6c 01c1bdfc 01c1be20 000faa26 003a4ac3 00000148 003a4b6c 01c43a71 01c8c471 01c10000 00326430 0043d00c 00000005 01c71a00 0020bce0 00322964 01c1be38 000fac04 01c43a20 01c8c420 01c1bee0 01c8c420 01c1be50 000fac4c 01c1bee0 00000000 01c43a20 00000000 01c1bee8 0020bd26 01c1bee0 Call Trace: [<0001b0fe>] __warn+0xae/0xde [<00322f4a>] memcmp+0x0/0x5c [<0001b1b8>] warn_slowpath_fmt+0x2e/0x36 [<000faa26>] proc_register+0xbe/0xd8 [<000faa26>] proc_register+0xbe/0xd8 [<00326430>] sprintf+0x0/0x20 [<0020bce0>] nubus_proc_attach_device+0x0/0x1b8 [<00322964>] strcpy+0x0/0x22 [<000fac04>] proc_mkdir_data+0x64/0x96 [<000fac4c>] proc_mkdir+0x16/0x1c [<0020bd26>] nubus_proc_attach_device+0x46/0x1b8 [<0020bce0>] nubus_proc_attach_device+0x0/0x1b8 [<00322964>] strcpy+0x0/0x22 [<00001ba6>] kernel_pg_dir+0xba6/0x1000 [<004339a2>] proc_bus_nubus_add_devices+0x1a/0x2e [<000faa40>] proc_create_data+0x0/0xf2 [<0003297c>] parse_args+0x0/0x2d4 [<00433a08>] nubus_proc_init+0x52/0x5a [<00433944>] nubus_init+0x0/0x44 [<00433982>] nubus_init+0x3e/0x44 [<000020dc>] do_one_initcall+0x38/0x196 [<000020a4>] do_one_initcall+0x0/0x196 [<0003297c>] parse_args+0x0/0x2d4 [<00322964>] strcpy+0x0/0x22 [<00040004>] __up_read+0xe/0x40 [<004231d4>] repair_env_string+0x0/0x7a [<0042312e>] kernel_init_freeable+0xee/0x194 [<00423146>] kernel_init_freeable+0x106/0x194 [<00433944>] nubus_init+0x0/0x44 [<000a6000>] kfree+0x0/0x156 [<0032768c>] kernel_init+0x0/0xda [<00327698>] kernel_init+0xc/0xda [<0032768c>] kernel_init+0x0/0xda [<00002a90>] ret_from_kernel_thread+0xc/0x14 ---[ end trace 14a6d619908ea253 ]--- ------------[ cut here ]------------ This gets repeated with each additional functional reasource. The problem here is the call to proc_mkdir() when the directory already exists. Each nubus_board gets a directory, such as /proc/bus/nubus/s/ where s is the hex slot number. Therefore, store the 'procdir' pointer in struct nubus_board instead of struct nubus_dev. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Validate slot resource IDs	Finn Thain
	While we are here, include the slot number in the related error messages. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Fix log spam	Finn Thain
	Testing shows that a single Radius PrecisionColor 24X display board, which has 95 functional resources, produces over a thousand lines of log messages. Suppress these messages with pr_debug(). Remove some redundant messages relating to nubus_get_subdir() calls. Fix the format block debug messages as the sequence of entries is backwards (my bad). Move the "scanning slots" message to its proper location. Fixes: 71ae40e4cf33 ("nubus: Clean up printk calls") Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Use static functions where possible	Finn Thain
	This fixes a couple of warnings from 'make W=1': drivers/nubus/nubus.c:790: warning: no previous prototype for 'nubus_probe_slot' drivers/nubus/nubus.c:824: warning: no previous prototype for 'nubus_scan_bus' Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Fix up header split	Finn Thain
	Due to the '#ifdef __KERNEL__' being located in the wrong place, some definitions from the kernel API were placed in the UAPI header during the scripted header split. Fix this. Also, remove the duplicate comment which is only relevant to the UAPI header. Fixes: 607ca46e97a1 ("UAPI: (Scripted) Disintegrate include/linux") Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	nubus: Avoid array underflow and overflow	Finn Thain
	Check array indices. Avoid sprintf. Use buffers of sufficient size. Use appropriate types for array length parameters. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	m68k/defconfig: Update defconfigs for v4.15-rc1	Geert Uytterhoeven
	Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-01-16	Merge branch 'topic/iio' of ↵	Mark Brown
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into asoc-st-dfsdm
2018-01-16	NFS: commit direct writes even if they fail partially	J. Bruce Fields
	If some of the WRITE calls making up an O_DIRECT write syscall fail, we neglect to commit, even if some of the WRITEs succeed. We also depend on the commit code to free the reference count on the nfs_page taken in the "if (request_commit)" case at the end of nfs_direct_write_completion(). The problem was originally noticed because ENOSPC's encountered partway through a write would result in a closed file being sillyrenamed when it should have been unlinked. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-16	nfs: remove unused label in nfs_encode_fh()	Arnd Bergmann
	The only reference to the label got removed, so we now get a harmless compiler warning: fs/nfs/export.c: In function 'nfs_encode_fh': fs/nfs/export.c:58:1: error: label 'out' defined but not used [-Werror=unused-label] Fixes: aaa150089465 ("nfs: remove dead code from nfs_encode_fh()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-16	KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA	Dongjiu Geng
	ARMv8.2 adds a new bit HCR_EL2.TEA which routes synchronous external aborts to EL2, and adds a trap control bit HCR_EL2.TERR which traps all Non-secure EL1&0 error record accesses to EL2. This patch enables the two bits for the guest OS, guaranteeing that KVM takes external aborts and traps attempts to access the physical error registers. ERRIDR_EL1 advertises the number of error records, we return zero meaning we can treat all the other registers as RAZ/WI too. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> [removed specific emulation, use trap_raz_wi() directly for everything, rephrased parts of the commit message] Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	KVM: arm64: Handle RAS SErrors from EL2 on guest exit	James Morse
	We expect to have firmware-first handling of RAS SErrors, with errors notified via an APEI method. For systems without firmware-first, add some minimal handling to KVM. There are two ways KVM can take an SError due to a guest, either may be a RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO, or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit. The current SError from EL2 code unmasks SError and tries to fence any pending SError into a single instruction window. It then leaves SError unmasked. With the v8.2 RAS Extensions we may take an SError for a 'corrected' error, but KVM is only able to handle SError from EL2 if they occur during this single instruction window... The RAS Extensions give us a new instruction to synchronise and consume SErrors. The RAS Extensions document (ARM DDI0587), '2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising SError interrupts generated by 'instructions, translation table walks, hardware updates to the translation tables, and instruction fetches on the same PE'. This makes ESB equivalent to KVMs existing 'dsb, mrs-daifclr, isb' sequence. Use the alternatives to synchronise and consume any SError using ESB instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT in the exit_code so that we can restart the vcpu if it turns out this SError has no impact on the vcpu. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	KVM: arm64: Handle RAS SErrors from EL1 on guest exit	James Morse
	We expect to have firmware-first handling of RAS SErrors, with errors notified via an APEI method. For systems without firmware-first, add some minimal handling to KVM. There are two ways KVM can take an SError due to a guest, either may be a RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO, or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit. For SError that interrupt a guest and are routed to EL2 the existing behaviour is to inject an impdef SError into the guest. Add code to handle RAS SError based on the ESR. For uncontained and uncategorized errors arm64_is_fatal_ras_serror() will panic(), these errors compromise the host too. All other error types are contained: For the fatal errors the vCPU can't make progress, so we inject a virtual SError. We ignore contained errors where we can make progress as if we're lucky, we may not hit them again. If only some of the CPUs support RAS the guest will see the cpufeature sanitised version of the id registers, but we may still take RAS SError on this CPU. Move the SError handling out of handle_exit() into a new handler that runs before we can be preempted. This allows us to use this_cpu_has_cap(), via arm64_is_ras_serror(). Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	KVM: arm64: Save ESR_EL2 on guest SError	James Morse
	When we exit a guest due to an SError the vcpu fault info isn't updated with the ESR. Today this is only done for traps. The v8.2 RAS Extensions define ISS values for SError. Update the vcpu's fault_info with the ESR on SError so that handle_exit() can determine if this was a RAS SError and decode its severity. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	KVM: arm64: Save/Restore guest DISR_EL1	James Morse
	If we deliver a virtual SError to the guest, the guest may defer it with an ESB instruction. The guest reads the deferred value via DISR_EL1, but the guests view of DISR_EL1 is re-mapped to VDISR_EL2 when HCR_EL2.AMO is set. Add the KVM code to save/restore VDISR_EL2, and make it accessible to userspace as DISR_EL1. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.	James Morse
	Prior to v8.2's RAS Extensions, the HCR_EL2.VSE 'virtual SError' feature generated an SError with an implementation defined ESR_EL1.ISS, because we had no mechanism to specify the ESR value. On Juno this generates an all-zero ESR, the most significant bit 'ISV' is clear indicating the remainder of the ISS field is invalid. With the RAS Extensions we have a mechanism to specify this value, and the most significant bit has a new meaning: 'IDS - Implementation Defined Syndrome'. An all-zero SError ESR now means: 'RAS error: Uncategorized' instead of 'no valid ISS'. Add KVM support for the VSESR_EL2 register to specify an ESR value when HCR_EL2.VSE generates a virtual SError. Change kvm_inject_vabt() to specify an implementation-defined value. We only need to restore the VSESR_EL2 value when HCR_EL2.VSE is set, KVM save/restores this bit during __{,de}activate_traps() and hardware clears the bit once the guest has consumed the virtual-SError. Future patches may add an API (or KVM CAP) to pend a virtual SError with a specified ESR. Cc: Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	KVM: arm/arm64: mask/unmask daif around VHE guests	James Morse
	Non-VHE systems take an exception to EL2 in order to world-switch into the guest. When returning from the guest KVM implicitly restores the DAIF flags when it returns to the kernel at EL1. With VHE none of this exception-level jumping happens, so KVMs world-switch code is exposed to the host kernel's DAIF values, and KVM spills the guest-exit DAIF values back into the host kernel. On entry to a guest we have Debug and SError exceptions unmasked, KVM has switched VBAR but isn't prepared to handle these. On guest exit Debug exceptions are left disabled once we return to the host and will stay this way until we enter user space. Add a helper to mask/unmask DAIF around VHE guests. The unmask can only happen after the hosts VBAR value has been synchronised by the isb in __vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as setting KVMs VBAR value, but is kept here for symmetry. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	arm64: kernel: Prepare for a DISR user	James Morse
	KVM would like to consume any pending SError (or RAS error) after guest exit. Today it has to unmask SError and use dsb+isb to synchronise the CPU. With the RAS extensions we can use ESB to synchronise any pending SError. Add the necessary macros to allow DISR to be read and converted to an ESR. We clear the DISR register when we enable the RAS cpufeature, and the kernel has not executed any ESB instructions. Any value we find in DISR must have belonged to firmware. Executing an ESB instruction is the only way to update DISR, so we can expect firmware to have handled any deferred SError. By the same logic we clear DISR in the idle path. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	arm64: Unconditionally enable IESB on exception entry/return for firmware-first	James Morse
	ARM v8.2 has a feature to add implicit error synchronization barriers whenever the CPU enters or returns from an exception level. Add this to the features we always enable. CPUs that don't support this feature will treat the bit as RES0. This feature causes RAS errors that are not yet visible to software to become pending SErrors. We expect to have firmware-first RAS support so synchronised RAS errors will be take immediately to EL3. Any system without firmware-first handling of errors will take the SError either immediatly after exception return, or when we unmask SError after entry.S's work. Adding IESB to the ELx flags causes it to be enabled by KVM and kexec too. Platform level RAS support may require additional firmware support. Cc: Christoffer Dall <christoffer.dall@linaro.org> Suggested-by: Will Deacon <will.deacon@arm.com> Link: https://www.spinics.net/lists/kvm-arm/msg28192.html Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	arm64: kernel: Survive corrected RAS errors notified by SError	James Morse
	Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS extensions use SError to notify software about RAS errors, these can be contained by the Error Syncronization Barrier. An ACPI system with firmware-first may use SError as its 'SEI' notification. Future patches may add code to 'claim' this SError as a notification. Other systems can distinguish these RAS errors from the SError ESR and use the AET bits and additional data from RAS-Error registers to handle the error. Future patches may add this kernel-first handling. Without support for either of these we will panic(), even if we received a corrected error. Add code to decode the severity of RAS errors. We can safely ignore contained errors where the CPU can continue to make progress. For all other errors we continue to panic(). Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	arm64: cpufeature: Detect CPU RAS Extentions	Xie XiuQi
	ARM's v8.2 Extentions add support for Reliability, Availability and Serviceability (RAS). On CPUs with these extensions system software can use additional barriers to isolate errors and determine if faults are pending. Add cpufeature detection. Platform level RAS support may require additional firmware support. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> [Rebased added config option, reworded commit message] Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-16	arm64: sysreg: Move to use definitions for all the SCTLR bits	James Morse
	__cpu_setup() configures SCTLR_EL1 using some hard coded hex masks, and el2_setup() duplicates some this when setting RES1 bits. Lets make this the same as KVM's hyp_init, which uses named bits. First, we add definitions for all the SCTLR_EL{1,2} bits, the RES{1,0} bits, and those we want to set or clear. Add a build_bug checks to ensures all bits are either set or clear. This means we don't need to preserve endian-ness configuration generated elsewhere. Finally, move the head.S and proc.S users of these hard-coded masks over to the macro versions. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>