Age | Commit message (Collapse) | Author |
|
clean up names related to socket filtering and bpf in the following way:
- everything that deals with sockets keeps 'sk_*' prefix
- everything that is pure BPF is changed to 'bpf_*' prefix
split 'struct sk_filter' into
struct sk_filter {
atomic_t refcnt;
struct rcu_head rcu;
struct bpf_prog *prog;
};
and
struct bpf_prog {
u32 jited:1,
len:31;
struct sock_fprog_kern *orig_prog;
unsigned int (*bpf_func)(const struct sk_buff *skb,
const struct bpf_insn *filter);
union {
struct sock_filter insns[0];
struct bpf_insn insnsi[0];
struct work_struct work;
};
};
so that 'struct bpf_prog' can be used independent of sockets and cleans up
'unattached' bpf use cases
split SK_RUN_FILTER macro into:
SK_RUN_FILTER to be used with 'struct sk_filter *' and
BPF_PROG_RUN to be used with 'struct bpf_prog *'
__sk_filter_release(struct sk_filter *) gains
__bpf_prog_release(struct bpf_prog *) helper function
also perform related renames for the functions that work
with 'struct bpf_prog *', since they're on the same lines:
sk_filter_size -> bpf_prog_size
sk_filter_select_runtime -> bpf_prog_select_runtime
sk_filter_free -> bpf_prog_free
sk_unattached_filter_create -> bpf_prog_create
sk_unattached_filter_destroy -> bpf_prog_destroy
sk_store_orig_filter -> bpf_prog_store_orig_filter
sk_release_orig_filter -> bpf_release_orig_filter
__sk_migrate_filter -> bpf_migrate_filter
__sk_prepare_filter -> bpf_prepare_filter
API for attaching classic BPF to a socket stays the same:
sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
which is used by sockets, tun, af_packet
API for 'unattached' BPF programs becomes:
bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We handle FSCR feature bits (well, TAR only really today) lazily when the guest
starts using them. So when a guest activates the bit and later uses that feature
we enable it for real in hardware.
However, when the guest stops using that bit we don't stop setting it in
hardware. That means we can potentially lose a trap that the guest expects to
happen because it thinks a feature is not active.
This patch adds support to drop TAR when then guest turns it off in FSCR. While
at it it also restricts FSCR access to 64bit systems - 32bit ones don't have it.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
T2080PCIe-RDB is a Freescale Reference Design Board that hosts T2080 SoC.
The board feature overview:
Processor:
- T2080 SoC integrating four 64-bit dual-threads e6500 cores up to 1.8GHz
DDR Memory:
- Single memory controller capable of supporting DDR3 and DDR3-LP devices
- 72bit 4GB DDR3-LP SODIMM in slot
Ethernet interfaces:
- Two 1Gbps RGMII ports on-board
- Two 10Gbps SFP+ ports on-board
- Two 10Gbps Base-T ports on-board
Accelerator:
- DPAA components consist of FMan, BMan, QMan, PME, DCE and SEC
IFC/Local Bus
- NOR: 128MB 16-bit NOR flash
- NAND: 1GB 8-bit NAND flash
- CPLD: for system controlling with programable header on-board
eSPI:
- 64MB N25Q512 SPI flash
USB:
- Two USB2.0 ports with internal PHY (both Type-A)
PCIe:
- One PCIe x4 goldfinger(support SR-IOV)
- One PCIe x4 slot
- One PCIe x2 end-point device (C293 crypto co-processor)
SATA:
- Two SATA 2.0 ports on-board
SDHC:
- support a MicroSD/TF card on-board
I2C:
- Four I2C controllers.
UART:
- Dual 4-pins UART serial ports
Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
Now that we have properly split load/store instruction emulation and generic
instruction emulation, we can move the generic one from kvm.ko to kvm-pr.ko
on book3s_64.
This reduces the attack surface and amount of code loaded on HV KVM kernels.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This are not specific to e500hv but applicable for bookehv
(As per comment from Scott Wood on my patch
"kvm: ppc: bookehv: Added wrapper macros for shadow registers")
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
In commit ae91d60ba88ef0bdb1b5e9b2363bd52fc45d2af7, a bug was fixed that
involved converting !x & y to !(x & y). The code below shows the same
pattern, and thus should perhaps be fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
The Coccinelle semantic patch that makes this change is as follows:
// <smpl>
@@ expression E1,E2; @@
(
!E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
mpic_msgrs has type struct mpic_msgr **, not struct mpic_msgr *, so the
elements of the array should have pointer type, not structure type.
The advantage of kcalloc is, that will prevent integer overflows which
could result from the multiplication of number of elements and size and
it is also a bit nicer to read.
The Coccinelle semantic patch that makes the first change is as follows:
// <smpl>
@disable sizeof_type_expr@
type T;
T **x;
@@
x =
<+...sizeof(
- T
+ *x
)...+>
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
The CoreNet Coherency Fabric is part of the memory subsystem on
some Freescale QorIQ chips. It can report coherency violations (e.g.
due to misusing memory that is mapped noncoherent) as well as
transactions that do not hit any local access window, or which hit a
local access window with an invalid target ID.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Bharat Bhushan <bharat.bhushan@freescale.com>
|
|
Erratum A-008139 can cause duplicate TLB entries if an indirect
entry is overwritten using tlbwe while the other thread is using it to
do a lookup. Work around this by using tlbilx to invalidate prior
to overwriting.
To avoid the need to save another register to hold MAS1 during the
workaround code, TID clearing has been moved from tlb_miss_kernel_e6500
until after the SMT section.
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
The general idea is that each core will release all of its
threads into the secondary thread startup code, which will
eventually wait in the secondary core holding area, for the
appropriate bit in the PACA to be set. The kick_cpu function
pointer will set that bit in the PACA, and thus "release"
the core/thread to boot. We also need to do a few things that
U-Boot normally does for CPUs (like enable branch prediction).
Signed-off-by: Andy Fleming <afleming@freescale.com>
[scottwood@freescale.com: various changes, including only enabling
threads if Linux wants to kick them]
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
This ensures that all MSR definitions are consistently unsigned long,
and that MSR_CM does not become 0xffffffff80000000 (this is usually
harmless because MSR is 32-bit on booke and is mainly noticeable when
debugging, but still I'd rather avoid it).
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here are 3 more small powerpc fixes that should still go into .16.
One is a recent regression (MMCR2 business), the other is a trivial
endian fix without which FW updates won't work on LE in IBM machines,
and the 3rd one turns a BUG_ON into a WARN_ON which is definitely a
LOT more friendly especially when the whole thing is about retrieving
error logs ..."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix endianness of flash_block_list in rtas_flash
powerpc/powernv: Change BUG_ON to WARN_ON in elog code
powerpc/perf: Fix MMCR2 handling for EBB
|
|
DCR handling was only needed for 440 KVM. Since we removed it, we can also
remove handling of DCR accesses.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We're going to implement guest code interpretation in KVM for some rare
corner cases. This code needs to be able to inject data and instruction
faults into the guest when it encounters them.
Expose generic APIs to do this in a reasonably subarch agnostic fashion.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Today the instruction emulator can get called via 2 separate code paths. It
can either be called by MMIO emulation detection code or by privileged
instruction traps.
This is bad, as both code paths prepare the environment differently. For MMIO
emulation we already know the virtual address we faulted on, so instructions
there don't have to actually fetch that information.
Split out the two separate use cases into separate files.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We use kvmppc_ld and kvmppc_st to emulate load/store instructions that may as
well access the magic page. Special case it out so that we can properly access
it.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We have a nice and handy helper to read from guest physical address space,
so we should make use of it in kvmppc_ld as we already do for its counterpart
in kvmppc_st.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We have a proper define for invalid HVA numbers. Use those instead of the
ppc specific kvmppc_bad_hva().
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We have enough common infrastructure now to resolve GVA->GPA mappings at
runtime. With this we can move our book3s specific helpers to load / store
in guest virtual address space to common code as well.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We have a nice API to find the translated GPAs of a GVA including protection
flags. So far we only use it on Book3S, but there's no reason the same shouldn't
be used on BookE as well.
Implement a kvmppc_xlate() version for BookE and clean it up to make it more
readable in general.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
With Book3S KVM we can create both PR and HV VMs in parallel on the same
machine. That gives us new challenges on the CAPs we return - both have
different capabilities.
When we get asked about CAPs on the kvm fd, there's nothing we can do. We
can try to be smart and assume we're running HV if HV is available, PR
otherwise. However with the newly added VM CHECK_EXTENSION we can now ask
for capabilities directly on a VM which knows whether it's PR or HV.
With this patch I can successfully expose KVM PVINFO data to user space
in the PR case, fixing magic page mapping for PAPR guests.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In preparation to make the check_extension function available to VM scope
we add a struct kvm * argument to the function header and rename the function
accordingly. It will still be called from the /dev/kvm fd, but with a NULL
argument for struct kvm *.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The POWER8 processor has a Micro Partition Prefetch Engine, which is
a fancy way of saying "has way to store and load contents of L2 or
L2+MRU way of L3 cache". We initiate the storing of the log (list of
addresses) using the logmpp instruction and start restore by writing
to a SPR.
The logmpp instruction takes parameters in a single 64bit register:
- starting address of the table to store log of L2/L2+L3 cache contents
- 32kb for L2
- 128kb for L2+L3
- Aligned relative to maximum size of the table (32kb or 128kb)
- Log control (no-op, L2 only, L2 and L3, abort logout)
We should abort any ongoing logging before initiating one.
To initiate restore, we write to the MPPR SPR. The format of what to write
to the SPR is similar to the logmpp instruction parameter:
- starting address of the table to read from (same alignment requirements)
- table size (no data, until end of table)
- prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
32 cycles)
The idea behind loading and storing the contents of L2/L3 cache is to
reduce memory latency in a system that is frequently swapping vcores on
a physical CPU.
The best case scenario for doing this is when some vcores are doing very
cache heavy workloads. The worst case is when they have about 0 cache hits,
so we just generate needless memory operations.
This implementation just does L2 store/load. In my benchmarks this proves
to be useful.
Benchmark 1:
- 16 core POWER8
- 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
- No split core/SMT
- two guests running sysbench memory test.
sysbench --test=memory --num-threads=8 run
- one guest running apache bench (of default HTML page)
ab -n 490000 -c 400 http://localhost/
This benchmark aims to measure performance of real world application (apache)
where other guests are cache hot with their own workloads. The sysbench memory
benchmark does pointer sized writes to a (small) memory buffer in a loop.
In this benchmark with this patch I can see an improvement both in requests
per second (~5%) and in mean and median response times (again, about 5%).
The spread of minimum and maximum response times were largely unchanged.
benchmark 2:
- Same VM config as benchmark 1
- all three guests running sysbench memory benchmark
This benchmark aims to see if there is a positive or negative affect to this
cache heavy benchmark. Although due to the nature of the benchmark (stores) we
may not see a difference in performance, but rather hopefully an improvement
in consistency of performance (when vcore switched in, don't have to wait
many times for cachelines to be pulled in)
The results of this benchmark are improvements in consistency of performance
rather than performance itself. With this patch, the few outliers in duration
go away and we get more consistent performance in each guest.
benchmark 3:
- same 3 guests and CPU configuration as benchmark 1 and 2.
- two idle guests
- 1 guest running STREAM benchmark
This scenario also saw performance improvement with this patch. On Copy and
Scale workloads from STREAM, I got 5-6% improvement with this patch. For
Add and triad, it was around 10% (or more).
benchmark 4:
- same 3 guests as previous benchmarks
- two guests running sysbench --memory, distinctly different cache heavy
workload
- one guest running STREAM benchmark.
Similar improvements to benchmark 3.
benchmark 5:
- 1 guest, 8 VCPUs, Ubuntu 14.04
- Host configured with split core (SMT8, subcores-per-core=4)
- STREAM benchmark
In this benchmark, we see a 10-20% performance improvement across the board
of STREAM benchmark results with this patch.
Based on preliminary investigation and microbenchmarks
by Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
No code changes, just split it out to a function so that with the addition
of micro partition prefetch buffer allocation (in subsequent patch) looks
neater and doesn't require excessive indentation.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
At present, kvmppc_ld calls kvmppc_xlate, and if kvmppc_xlate returns
any error indication, it returns -ENOENT, which is taken to mean an
HPTE not found error. However, the error could have been a segment
found (no SLB entry) or a permission error. Similarly,
kvmppc_pte_to_hva currently does permission checking, but any error
from it is taken by kvmppc_ld to mean that the access is an emulated
MMIO access. Also, kvmppc_ld does no execute permission checking.
This fixes these problems by (a) returning any error from kvmppc_xlate
directly, (b) moving the permission check from kvmppc_pte_to_hva
into kvmppc_ld, and (c) adding an execute permission check to kvmppc_ld.
This is similar to what was done for kvmppc_st() by commit 82ff911317c3
("KVM: PPC: Deflect page write faults properly in kvmppc_st").
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This does for PR KVM what c9438092cae4 ("KVM: PPC: Book3S HV: Take SRCU
read lock around kvm_read_guest() call") did for HV KVM, that is,
eliminate a "suspicious rcu_dereference_check() usage!" warning by
taking the SRCU lock around the call to kvmppc_rtas_hcall().
It also fixes a return of RESUME_HOST to return EMULATE_FAIL instead,
since kvmppc_h_pr() is supposed to return EMULATE_* values.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Unfortunately, the LPCR got defined as a 32-bit register in the
one_reg interface. This is unfortunate because KVM allows userspace
to control the DPFD (default prefetch depth) field, which is in the
upper 32 bits. The result is that DPFD always get set to 0, which
reduces performance in the guest.
We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID,
since that would break existing userspace binaries. Instead we define
a new KVM_REG_PPC_LPCR_64 id which is 64-bit. Userspace can still use
the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in
the bottom 32 bits that userspace can modify (ILE, TC and AIL).
If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD
as well.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The 440 target hasn't been properly functioning for a few releases and
before I was the only one who fixes a very serious bug that indicates to
me that nobody used it before either.
Furthermore KVM on 440 is slow to the extent of unusable.
We don't have to carry along completely unused code. Remove 440 and give
us one less thing to worry about.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer,
but using SPRN_SPRG_THREAD <=> SPRG3 (thread->vcpu). So this comment
is not valid now.
Note: SPRN_SPRG3R is not supported (do not see any need as of now),
and if we want to support this in future then we have to shift to using
SPRG1 for VCPU pointer.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We now support SPRG9 for guest, so also add a one reg interface for same
Note: Changes are in bookehv code only as we do not have SPRG9 on booke-pr.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
SPRN_SPRG is used by debug interrupt handler, so this is required for
debug support.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
and LRAT), generated by loading a guest address, needs to be handled by KVM.
These exceptions are generated in a substituted guest translation context
(EPLC[EGS] = 1) from host context (MSR[GS] = 0).
Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1),
doing minimal checks on the fast path to avoid host performance degradation.
lwepx exceptions originate from host state (MSR[GS] = 0) which implies
additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking
at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context
Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
too intrusive for the host.
Read guest last instruction from kvmppc_load_last_inst() by searching for the
physical address and kmap it. This address the TODO for TLB eviction and
execute-but-not-read entries, and allow us to get rid of lwepx until we are
able to handle failures.
A simple stress benchmark shows a 1% sys performance degradation compared with
previous approach (lwepx without failure handling):
time for i in `seq 1 10000`; do /bin/echo > /dev/null; done
real 0m 8.85s
user 0m 4.34s
sys 0m 4.48s
vs
real 0m 8.84s
user 0m 4.36s
sys 0m 4.44s
A solution to use lwepx and to handle its exceptions in KVM would be to temporary
highjack the interrupt vector from host. This imposes additional synchronizations
for cores like FSL e6500 that shares host IVOR registers between hardware threads.
This optimized solution can be later developed on top of this patch.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
On book3e, guest last instruction is read on the exit path using load
external pid (lwepx) dedicated instruction. This load operation may fail
due to TLB eviction and execute-but-not-read entries.
This patch lay down the path for an alternative solution to read the guest
last instruction, by allowing kvmppc_get_lat_inst() function to fail.
Architecture specific implmentations of kvmppc_load_last_inst() may read
last guest instruction and instruct the emulation layer to re-execute the
guest in case of failure.
Make kvmppc_get_last_inst() definition common between architectures.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
In the context of replacing kvmppc_ld() function calls with a version of
kvmppc_get_last_inst() which allow to fail, Alex Graf suggested this:
"If we get EMULATE_AGAIN, we just have to make sure we go back into the guest.
No need to inject an ISI into the guest - it'll do that all by itself.
With an error returning kvmppc_get_last_inst we can just use completely
get rid of kvmppc_read_inst() and only use kvmppc_get_last_inst() instead."
As a intermediate step get rid of kvmppc_read_inst() and only use kvmppc_ld()
instead.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Add mising defines MAS0_GET_TLBSEL() and MAS1_GET_TSIZE() for Book3E.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The commit 1d628af7 "add load inst fixup" made an attempt to handle
failures generated by reading the guest current instruction. The fixup
code that was added works by chance hiding the real issue.
Load external pid (lwepx) instruction, used by KVM to read guest
instructions, is executed in a subsituted guest translation context
(EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
interrupts need to be handled by KVM, even though these interrupts
are generated from host context (MSR[GS] = 0) where lwepx is executed.
Currently, KVM hooks only interrupts generated from guest context
(MSR[GS] = 1), doing minimal checks on the fast path to avoid host
performance degradation. As a result, the host kernel handles lwepx
faults searching the faulting guest data address (loaded in DEAR) in
its own Logical Partition ID (LPID) 0 context. In case a host translation
is found the execution returns to the lwepx instruction instead of the
fixup, the host ending up in an infinite loop.
Revert the commit "add load inst fixup". lwepx issue will be addressed
in a subsequent patch without needing fixup code.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So
rename and move get_epr helper function to same file.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
[agraf: remove duplicate return]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Use kvmppc_set_sprg[0-7]() and kvmppc_get_sprg[0-7]() helper
functions
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Uses kvmppc_set_dar() and kvmppc_get_dar() helper functions
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Use kvmppc_set_srr0/srr1() and kvmppc_get_srr0/srr1() helper functions
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on
BOOKE-HV and these shadow registers are guest accessible.
So these shadow registers needs to be updated on BOOKE-HV.
This patch adds new macro for get/set helper of shadow register .
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The magic page is defined as a 4k page of per-vCPU data that is shared
between the guest and the host to accelerate accesses to privileged
registers.
However, when the host is using 64k page size granularity we weren't quite
as strict about that rule anymore. Instead, we partially treated all of the
upper 64k as magic page and mapped only the uppermost 4k with the actual
magic contents.
This works well enough for Linux which doesn't use any memory in kernel
space in the upper 64k, but Mac OS X got upset. So this patch makes magic
page actually stay in a 4k range even on 64k page size hosts.
This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE
hosts for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Today we handle split real mode by mapping both instruction and data faults
into a special virtual address space that only exists during the split mode
phase.
This is good enough to catch 32bit Linux guests that use split real mode for
copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our
instruction pointer and can map the user space process freely below there.
However, that approach fails when we're running KVM inside of KVM. Here the 1st
level last_inst reader may well be in the same virtual page as a 2nd level
interrupt handler.
It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a
kernel copy_from/to_user implementation can easily overlap with user space
addresses.
The architecturally correct way to fix this would be to implement an instruction
interpreter in KVM that kicks in whenever we go into split real mode. This
interpreter however would not receive a great amount of testing and be a lot of
bloat for a reasonably isolated corner case.
So I went back to the drawing board and tried to come up with a way to make
split real mode work with a single flat address space. And then I realized that
we could get away with the same trick that makes it work for Linux:
Whenever we see an instruction address during split real mode that may collide,
we just move it higher up the virtual address space to a place that hopefully
does not collide (keep your fingers crossed!).
That approach does work surprisingly well. I am able to successfully run
Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I
apply a tiny timing probe hack to QEMU. I'd say this is a win over even more
broken split real mode :).
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When a page lookup failed because we're not allowed to write to the page, we
should not overwrite that value with another lookup on the second PTEG which
will return "page not found". Instead, we should just tell the caller that we
had a permission problem.
This fixes Mac OS X guests looping endlessly in page lookup code for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When we have a page that we're not allowed to write to, xlate() will already
tell us -EPERM on lookup of that page. With the code as is we change it into
a "page missing" error which a guest may get confused about. Instead, just
tell the caller about the -EPERM directly.
This fixes Mac OS X guests when run with DCBZ32 emulation.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When building KVM with a lot of vcores (NR_CPUS is big), we can potentially
get out of the ld immediate range for dereferences inside that struct.
Move the array to the end of our kvm_arch struct. This fixes compilation
issues with NR_CPUS=2048 for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
to enable idle power down for cores and devices by setting up the idle count
period at boot time. With the host already controlling the power management
configuration the guest could simply benefit from it, so emulate guest request
as a general store.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Now that we've fixed all the issues that HV KVM code had on little endian
hosts, we can enable it in the kernel configuration for users to play with.
Signed-off-by: Alexander Graf <agraf@suse.de>
|