Age | Commit message (Collapse) | Author |
|
Clean up: Code review suggested that a common bit of code can be
placed into a helper function, and this gives us fewer places to
stick an "I DMA unmapped something" trace point.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up: struct rpcrdma_mw was named after Memory Windows, but
xprtrdma no longer supports a Memory Window registration mode.
Rename rpcrdma_mw and its fields to reduce confusion and make
the code more sensible to read.
Renaming "mw" was suggested by Tom Talpey, the author of the
original xprtrdma implementation. It's a good idea, but I haven't
done this until now because it's a huge diffstat for no benefit
other than code readability.
However, I'm about to introduce static trace points that expose
a few of xprtrdma's internal data structures. They should make sense
in the trace report, and it's reasonable to treat trace points as a
kernel API contract which might be difficult to change later.
While I'm churning things up, two additional changes:
- rename variables unhelpfully called "r" to "mr", to improve code
clarity, and
- rename the MR-related helper functions using the form
"rpcrdma_mr_<verb>", to be consistent with other areas of the
code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up: Over time, the industry has adopted the term "frwr"
instead of "frmr". The term "frwr" is now more widely recognized.
For the past couple of years I've attempted to add new code using
"frwr" , but there still remains plenty of older code that still
uses "frmr". Replace all usage of "frmr" to avoid confusion.
While we're churning code, rename variables unhelpfully called "f"
to "frwr", to improve code clarity.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
No need for the overhead of atomically setting and clearing this bit
flag for every use of a pre-allocated backchannel rpc_rqst. These
are a distinct pool of rpc_rqsts that are used only for callback
operations, so it is safe to simply leave the bit set.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up. @rqst is set up differently for backchannel Replies. For
example, rqst->rq_task and task->tk_client are both NULL. So it is
easier to understand and maintain this code path if it is separated.
Also, we can get rid of the confusing rl_connect_cookie hack in
rpcrdma_bc_receive_call.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Since commit 5a6d1db45569 ("SUNRPC: Add a transport-specific private
field in rpc_rqst"), the rpc_rqst's for RPC-over-RDMA backchannel
operations leave rq_buffer set to NULL.
xprt_release does not invoke ->op->buf_free when rq_buffer is NULL.
The RPCRDMA_REQ_F_BACKCHANNEL check in xprt_rdma_free is therefore
redundant because xprt_rdma_free is not invoked for backchannel
requests.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up. This logic is related to marshaling the request, and I'd
like to keep everything that touches req->rl_registered close
together, for CPU cache efficiency.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up a harmless oversight. xprtrdma's ->set_port method has
never properly supported IPv6.
This issue has never been a problem because NFS/RDMA mounts have
always required "port=20049", thus so far, rpcbind is not invoked
for these mounts.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Save more space in struct rpcrdma_xprt by removing the redundant
"addr" field from struct rpcrdma_create_data_internal. Wherever
we have rpcrdma_xprt, we also have the rpc_xprt, which has a
sockaddr_storage field with the same content.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This makes the address strings available for debugging messages in
earlier stages of transport set up.
The first benefit is to get rid of the single-use rep_remote_addr
field, saving 128+ bytes in struct rpcrdma_ep.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up. Remove fields that should have been removed by
commit b3221d6a53c4 ("xprtrdma: Remove logic that constructs
RDMA_MSGP type calls").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up.
Commit b5f0afbea4f2 ("xprtrdma: Per-connection pad optimization")
should have removed this.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Refactoring change: Remote Invalidation is particular to the memory
registration mode that is use. Use a callout instead of a generic
function to handle Remote Invalidation.
This gets rid of the 8-byte flags field in struct rpcrdma_mw, of
which only a single bit flag has been allocated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The rpcrdma_req is not shared yet, and its associated Send hasn't
been posted, thus RMW should be safe. There's no need for the
expense of a lock cycle here.
Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The backchannel code uses rpcrdma_recv_buffer_put to add new reps
to the free rep list. This also decrements rb_recv_count, which
spoofs the receive overrun logic in rpcrdma_buffer_get_rep.
Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of
spin_lock_irqsave(rb_lock)") replaced the original open-coded
list_add with a call to rpcrdma_recv_buffer_put(), but then a year
later, commit 05c974669ece ("xprtrdma: Fix receive buffer
accounting") added rep accounting to rpcrdma_recv_buffer_put.
It was an oversight to let the backchannel continue to use this
function.
The fix this, let's combine the "add to free list" logic with
rpcrdma_create_rep.
Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in
rpcrdma_buffer_create and then allocate additional rpcrdma_reps in
rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel
set-up is sufficient.
Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
This leak has been around forever, and is exceptionally rare.
EINVAL causes mount to fail with "an incorrect mount option was
specified" although it's not likely that one of the mount
options is incorrect. Instead, return ENODEV in this case, as this
appears to be an issue with system or device configuration rather
than a specific mount option.
Some obsolete comments are also removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Currently, cgroups v2 documentation contains only a generic remark that
"How resource consumption in the root cgroup is governed is up to each
controller", which isn't really telling users much, who need to dig in the
code and / or commit messages to learn the exact behavior.
In cgroups v1 at least the blkio controller had its operation with respect
to competition between child threads and child cgroups documented in
blkio-controller.txt, with references to cfq-iosched.txt.
Also, cgroups v2 documentation describes v1 behavior of both cpu and
blkio controllers in an "Issues with v1" section.
Let's document this behavior also for cgroups v2 to make life easier for
users.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Some older compilers (gcc-4.4 through 4.6 in particular) struggle
with the way that blkg_rwstat_read() returns a structure, leading
to excessive stack usage and rather inefficient code:
block/blk-cgroup.c: In function 'blkg_destroy':
block/blk-cgroup.c:354:1: error: the frame size of 1296 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
block/cfq-iosched.c: In function 'cfqg_stats_add_aux':
block/cfq-iosched.c:753:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
block/bfq-cgroup.c: In function 'bfqg_stats_add_aux':
block/bfq-cgroup.c:299:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
I also notice that there is no point in using atomic accesses
for the local variables, so storing the temporaries in simple 'u64'
variables not only avoids the stack usage on older compilers but
also improves the object code on modern versions.
Fixes: e6269c445467 ("blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is legacy code but it might as well have an official maintainer.
Cc: linux-m68k@lists.linux-m68k.org
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
The interrupt dispatch algorithm used in the OSS driver seems to be
subject to race conditions: an IRQ flag could be lost if asserted between
the MOV instructions from and to the interrupt flag register. But testing
shows that the write to the flag register has no effect, so rewrite the
algorithm without the theoretical race condition.
There is a second theoretical race condition here. When oss_irq() is
called with say, IPL == 2 it will invoke the SCSI interrupt handler.
The SCSI IRQ is then cleared by the mac_scsi driver. If SCSI and NuBus
IRQs are now asserted together, oss_irq() will be invoked with IPL == 3
and the mac_scsi interrupt handler can be re-entered. This re-entrance
issue is not limited to SCSI and could affect NuBus and ADB drivers too.
Fix it by splitting up oss_irq() into separate handlers for each IPL.
No-one seems to know how OSS irq flags can be cleared, if at all, so add
a comment to this effect (actually reinstate one I previously removed).
Testing showed that a slot IRQ with no handler can remain asserted (in
this case a Radius video card) without causing problems for other IRQs.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
This patch brings basic support for the Linux Driver Model to the
NuBus subsystem.
For flexibility, the matching of boards with drivers is left up to the
drivers. This is also the approach taken by NetBSD. A board may have
many functions, and drivers may have to consider many functional
resources and board resources in order to match a device.
This implementation does not bind drivers to resources (nor does it bind
many drivers to the same board). Apple's NuBus declaration ROM design
is flexible enough to allow that, but I don't see a need to support it
as we don't use the "slot zero" resources (in the main logic board ROM).
Eliminate the global nubus_boards linked list by rewriting the procfs
board iterator around bus_for_each_dev(). Hence the nubus device refcount
can be used to determine the lifespan of board objects.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Add an expansion slot attribute to allow drivers to properly handle
cards like Comm Slot cards and PDS cards without declaration ROMs.
This clarifies the logic for the Centris 610 model which has no
Comm Slot but has an optional on-board SONIC device.
Cc: "David S. Miller" <davem@davemloft.net>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
This increases code re-use and improves readability.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
It is misleading to call a functional resource a "device". In adopting
the Linux Driver Model, the struct device will be embedded in struct
nubus_board. That will compound the terminlogy problem because drivers
will bind with boards, not with functional resources. Avoid this by
renaming struct nubus_dev as struct nubus_rsrc. "Functional resource"
is the vendor's terminology so this helps avoid confusion.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
The /proc/bus/nubus/s/ directory tree for any slot s is missing a lot
of information. The struct file_operations methods have long been left
unimplemented (hence the familiar compile-time warning, "Need to set
some I/O handlers here").
Slot resources have a complex structure which varies depending on board
function. The logic for interpreting these ROM data structures is found
in nubus.c. Let's not duplicate that logic in proc.c.
Create the /proc/bus/nubus/s/ inodes while scanning slot s. During
descent through slot resource subdirectories, call the new
nubus_proc_add_foo() functions to create the procfs inodes.
Also add a new function, nubus_seq_write_rsrc_mem(), to write the
contents of a particular slot resource to a given seq_file. This is
used by the procfs file_operations methods, to finally give userspace
access to slot ROM information, such as the available video modes.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Scrap the specialized code to unpack video mode name resources and
driver resources. It isn't useful.
Instead, add a re-usable function to handle lists of block resources of
any kind, and descend into the video mode table resource directory.
Rename callers as nubus_get_foo(), consistent with their purpose and
with related functions in the same file.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Eliminate unused values from struct nubus_dev to save wasted memory
(a Radius PrecisionColor 24X card has about 95 functional resources
and up to six such cards may be fitted). Also remove redundant static
variable initialization, an unreachable !MACH_IS_MAC conditional,
the unused nubus_find_device() function, the bogus get_nubus_list()
prototype and the pointless card_present temporary variable.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
This patch fixes the following WARNING.
proc_dir_entry 'nubus/a' already registered
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.13.0-00036-gd57552077387 #1
Stack from 01c1bd9c:
01c1bd9c 003c2c8b 01c1bdc0 0001b0fe 00000000 00322f4a 01c43a20 01c43b0c
01c8c420 01c1bde8 0001b1b8 003a4ac3 00000148 000faa26 00000009 00000000
01c1bde0 003a4b6c 01c1bdfc 01c1be20 000faa26 003a4ac3 00000148 003a4b6c
01c43a71 01c8c471 01c10000 00326430 0043d00c 00000005 01c71a00 0020bce0
00322964 01c1be38 000fac04 01c43a20 01c8c420 01c1bee0 01c8c420 01c1be50
000fac4c 01c1bee0 00000000 01c43a20 00000000 01c1bee8 0020bd26 01c1bee0
Call Trace: [<0001b0fe>] __warn+0xae/0xde
[<00322f4a>] memcmp+0x0/0x5c
[<0001b1b8>] warn_slowpath_fmt+0x2e/0x36
[<000faa26>] proc_register+0xbe/0xd8
[<000faa26>] proc_register+0xbe/0xd8
[<00326430>] sprintf+0x0/0x20
[<0020bce0>] nubus_proc_attach_device+0x0/0x1b8
[<00322964>] strcpy+0x0/0x22
[<000fac04>] proc_mkdir_data+0x64/0x96
[<000fac4c>] proc_mkdir+0x16/0x1c
[<0020bd26>] nubus_proc_attach_device+0x46/0x1b8
[<0020bce0>] nubus_proc_attach_device+0x0/0x1b8
[<00322964>] strcpy+0x0/0x22
[<00001ba6>] kernel_pg_dir+0xba6/0x1000
[<004339a2>] proc_bus_nubus_add_devices+0x1a/0x2e
[<000faa40>] proc_create_data+0x0/0xf2
[<0003297c>] parse_args+0x0/0x2d4
[<00433a08>] nubus_proc_init+0x52/0x5a
[<00433944>] nubus_init+0x0/0x44
[<00433982>] nubus_init+0x3e/0x44
[<000020dc>] do_one_initcall+0x38/0x196
[<000020a4>] do_one_initcall+0x0/0x196
[<0003297c>] parse_args+0x0/0x2d4
[<00322964>] strcpy+0x0/0x22
[<00040004>] __up_read+0xe/0x40
[<004231d4>] repair_env_string+0x0/0x7a
[<0042312e>] kernel_init_freeable+0xee/0x194
[<00423146>] kernel_init_freeable+0x106/0x194
[<00433944>] nubus_init+0x0/0x44
[<000a6000>] kfree+0x0/0x156
[<0032768c>] kernel_init+0x0/0xda
[<00327698>] kernel_init+0xc/0xda
[<0032768c>] kernel_init+0x0/0xda
[<00002a90>] ret_from_kernel_thread+0xc/0x14
---[ end trace 14a6d619908ea253 ]---
------------[ cut here ]------------
This gets repeated with each additional functional reasource.
The problem here is the call to proc_mkdir() when the directory already
exists. Each nubus_board gets a directory, such as /proc/bus/nubus/s/
where s is the hex slot number. Therefore, store the 'procdir' pointer
in struct nubus_board instead of struct nubus_dev.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
While we are here, include the slot number in the related error messages.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Testing shows that a single Radius PrecisionColor 24X display board,
which has 95 functional resources, produces over a thousand lines of
log messages. Suppress these messages with pr_debug().
Remove some redundant messages relating to nubus_get_subdir() calls.
Fix the format block debug messages as the sequence of entries is
backwards (my bad).
Move the "scanning slots" message to its proper location.
Fixes: 71ae40e4cf33 ("nubus: Clean up printk calls")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
This fixes a couple of warnings from 'make W=1':
drivers/nubus/nubus.c:790: warning: no previous prototype for 'nubus_probe_slot'
drivers/nubus/nubus.c:824: warning: no previous prototype for 'nubus_scan_bus'
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Due to the '#ifdef __KERNEL__' being located in the wrong place, some
definitions from the kernel API were placed in the UAPI header during
the scripted header split. Fix this. Also, remove the duplicate comment
which is only relevant to the UAPI header.
Fixes: 607ca46e97a1 ("UAPI: (Scripted) Disintegrate include/linux")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Check array indices. Avoid sprintf. Use buffers of sufficient size.
Use appropriate types for array length parameters.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into asoc-st-dfsdm
|
|
If some of the WRITE calls making up an O_DIRECT write syscall fail,
we neglect to commit, even if some of the WRITEs succeed.
We also depend on the commit code to free the reference count on the
nfs_page taken in the "if (request_commit)" case at the end of
nfs_direct_write_completion(). The problem was originally noticed
because ENOSPC's encountered partway through a write would result in a
closed file being sillyrenamed when it should have been unlinked.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The only reference to the label got removed, so we now get
a harmless compiler warning:
fs/nfs/export.c: In function 'nfs_encode_fh':
fs/nfs/export.c:58:1: error: label 'out' defined but not used [-Werror=unused-label]
Fixes: aaa150089465 ("nfs: remove dead code from nfs_encode_fh()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
ARMv8.2 adds a new bit HCR_EL2.TEA which routes synchronous external
aborts to EL2, and adds a trap control bit HCR_EL2.TERR which traps
all Non-secure EL1&0 error record accesses to EL2.
This patch enables the two bits for the guest OS, guaranteeing that
KVM takes external aborts and traps attempts to access the physical
error registers.
ERRIDR_EL1 advertises the number of error records, we return
zero meaning we can treat all the other registers as RAZ/WI too.
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
[removed specific emulation, use trap_raz_wi() directly for everything,
rephrased parts of the commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.
There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
The current SError from EL2 code unmasks SError and tries to fence any
pending SError into a single instruction window. It then leaves SError
unmasked.
With the v8.2 RAS Extensions we may take an SError for a 'corrected'
error, but KVM is only able to handle SError from EL2 if they occur
during this single instruction window...
The RAS Extensions give us a new instruction to synchronise and
consume SErrors. The RAS Extensions document (ARM DDI0587),
'2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
SError interrupts generated by 'instructions, translation table walks,
hardware updates to the translation tables, and instruction fetches on
the same PE'. This makes ESB equivalent to KVMs existing
'dsb, mrs-daifclr, isb' sequence.
Use the alternatives to synchronise and consume any SError using ESB
instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
in the exit_code so that we can restart the vcpu if it turns out this
SError has no impact on the vcpu.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
We expect to have firmware-first handling of RAS SErrors, with errors
notified via an APEI method. For systems without firmware-first, add
some minimal handling to KVM.
There are two ways KVM can take an SError due to a guest, either may be a
RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
For SError that interrupt a guest and are routed to EL2 the existing
behaviour is to inject an impdef SError into the guest.
Add code to handle RAS SError based on the ESR. For uncontained and
uncategorized errors arm64_is_fatal_ras_serror() will panic(), these
errors compromise the host too. All other error types are contained:
For the fatal errors the vCPU can't make progress, so we inject a virtual
SError. We ignore contained errors where we can make progress as if
we're lucky, we may not hit them again.
If only some of the CPUs support RAS the guest will see the cpufeature
sanitised version of the id registers, but we may still take RAS SError
on this CPU. Move the SError handling out of handle_exit() into a new
handler that runs before we can be preempted. This allows us to use
this_cpu_has_cap(), via arm64_is_ras_serror().
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When we exit a guest due to an SError the vcpu fault info isn't updated
with the ESR. Today this is only done for traps.
The v8.2 RAS Extensions define ISS values for SError. Update the vcpu's
fault_info with the ESR on SError so that handle_exit() can determine
if this was a RAS SError and decode its severity.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
If we deliver a virtual SError to the guest, the guest may defer it
with an ESB instruction. The guest reads the deferred value via DISR_EL1,
but the guests view of DISR_EL1 is re-mapped to VDISR_EL2 when HCR_EL2.AMO
is set.
Add the KVM code to save/restore VDISR_EL2, and make it accessible to
userspace as DISR_EL1.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Prior to v8.2's RAS Extensions, the HCR_EL2.VSE 'virtual SError' feature
generated an SError with an implementation defined ESR_EL1.ISS, because we
had no mechanism to specify the ESR value.
On Juno this generates an all-zero ESR, the most significant bit 'ISV'
is clear indicating the remainder of the ISS field is invalid.
With the RAS Extensions we have a mechanism to specify this value, and the
most significant bit has a new meaning: 'IDS - Implementation Defined
Syndrome'. An all-zero SError ESR now means: 'RAS error: Uncategorized'
instead of 'no valid ISS'.
Add KVM support for the VSESR_EL2 register to specify an ESR value when
HCR_EL2.VSE generates a virtual SError. Change kvm_inject_vabt() to
specify an implementation-defined value.
We only need to restore the VSESR_EL2 value when HCR_EL2.VSE is set, KVM
save/restores this bit during __{,de}activate_traps() and hardware clears the
bit once the guest has consumed the virtual-SError.
Future patches may add an API (or KVM CAP) to pend a virtual SError with
a specified ESR.
Cc: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Non-VHE systems take an exception to EL2 in order to world-switch into the
guest. When returning from the guest KVM implicitly restores the DAIF
flags when it returns to the kernel at EL1.
With VHE none of this exception-level jumping happens, so KVMs
world-switch code is exposed to the host kernel's DAIF values, and KVM
spills the guest-exit DAIF values back into the host kernel.
On entry to a guest we have Debug and SError exceptions unmasked, KVM
has switched VBAR but isn't prepared to handle these. On guest exit
Debug exceptions are left disabled once we return to the host and will
stay this way until we enter user space.
Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
happen after the hosts VBAR value has been synchronised by the isb in
__vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
setting KVMs VBAR value, but is kept here for symmetry.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
KVM would like to consume any pending SError (or RAS error) after guest
exit. Today it has to unmask SError and use dsb+isb to synchronise the
CPU. With the RAS extensions we can use ESB to synchronise any pending
SError.
Add the necessary macros to allow DISR to be read and converted to an
ESR.
We clear the DISR register when we enable the RAS cpufeature, and the
kernel has not executed any ESB instructions. Any value we find in DISR
must have belonged to firmware. Executing an ESB instruction is the
only way to update DISR, so we can expect firmware to have handled
any deferred SError. By the same logic we clear DISR in the idle path.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
ARM v8.2 has a feature to add implicit error synchronization barriers
whenever the CPU enters or returns from an exception level. Add this to the
features we always enable. CPUs that don't support this feature will treat
the bit as RES0.
This feature causes RAS errors that are not yet visible to software to
become pending SErrors. We expect to have firmware-first RAS support
so synchronised RAS errors will be take immediately to EL3.
Any system without firmware-first handling of errors will take the SError
either immediatly after exception return, or when we unmask SError after
entry.S's work.
Adding IESB to the ELx flags causes it to be enabled by KVM and kexec
too.
Platform level RAS support may require additional firmware support.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Suggested-by: Will Deacon <will.deacon@arm.com>
Link: https://www.spinics.net/lists/kvm-arm/msg28192.html
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS
extensions use SError to notify software about RAS errors, these can be
contained by the Error Syncronization Barrier.
An ACPI system with firmware-first may use SError as its 'SEI'
notification. Future patches may add code to 'claim' this SError as a
notification.
Other systems can distinguish these RAS errors from the SError ESR and
use the AET bits and additional data from RAS-Error registers to handle
the error. Future patches may add this kernel-first handling.
Without support for either of these we will panic(), even if we received
a corrected error. Add code to decode the severity of RAS errors. We can
safely ignore contained errors where the CPU can continue to make
progress. For all other errors we continue to panic().
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending. Add cpufeature detection.
Platform level RAS support may require additional firmware support.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[Rebased added config option, reworded commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
__cpu_setup() configures SCTLR_EL1 using some hard coded hex masks,
and el2_setup() duplicates some this when setting RES1 bits.
Lets make this the same as KVM's hyp_init, which uses named bits.
First, we add definitions for all the SCTLR_EL{1,2} bits, the RES{1,0}
bits, and those we want to set or clear.
Add a build_bug checks to ensures all bits are either set or clear.
This means we don't need to preserve endian-ness configuration
generated elsewhere.
Finally, move the head.S and proc.S users of these hard-coded masks
over to the macro versions.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|