Age | Commit message (Collapse) | Author |
|
strlcpy is marked as deprecated in Documentation/process/deprecated.rst,
and there is no functional difference when the caller expects truncation
(when not checking the return value). strscpy is relatively better as it
also avoids scanning the whole source string.
This silences the related checkpatch warnings from:
5dbdb2d87c29 ("checkpatch: prefer strscpy to strlcpy")
Acked-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20210131172838.146706-6-memxor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
strlcpy is marked as deprecated in Documentation/process/deprecated.rst,
and there is no functional difference when the caller expects truncation
(when not checking the return value). strscpy is relatively better as it
also avoids scanning the whole source string.
This silences the related checkpatch warnings from:
5dbdb2d87c29 ("checkpatch: prefer strscpy to strlcpy")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20210131172838.146706-5-memxor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
strlcpy is marked as deprecated in Documentation/process/deprecated.rst,
and there is no functional difference when the caller expects truncation
(when not checking the return value). strscpy is relatively better as it
also avoids scanning the whole source string.
This silences the related checkpatch warnings from:
5dbdb2d87c29 ("checkpatch: prefer strscpy to strlcpy")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20210131172838.146706-4-memxor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
strlcpy is marked as deprecated in Documentation/process/deprecated.rst,
and there is no functional difference when the caller expects truncation
(when not checking the return value). strscpy is relatively better as it
also avoids scanning the whole source string.
This silences the related checkpatch warnings from:
5dbdb2d87c29 ("checkpatch: prefer strscpy to strlcpy")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20210131172838.146706-3-memxor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
strlcpy is marked as deprecated in Documentation/process/deprecated.rst,
and there is no functional difference when the caller expects truncation
(when not checking the return value). strscpy is relatively better as it
also avoids scanning the whole source string.
This silences the related checkpatch warnings from:
5dbdb2d87c29 ("checkpatch: prefer strscpy to strlcpy")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20210131172838.146706-2-memxor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The Edimax EW-7811UN V2 uses an RTL8188EU chipset and works with this
driver.
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210204085217.9743-1-martin@kaiser.cx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A netdev xmit function should return NETDEV_TX_OK or NETDEV_TX_BUSY.
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Link: https://lore.kernel.org/r/20210131183920.8514-1-martin@kaiser.cx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently the pointer 'reporter' is not being initialized and is
being read in a netdev_warn message. The pointer is not used
and is redundant, fix this by removing it and replacing the reference
to it with priv->reporter instead.
Fixes: 1053c27804df ("staging: qlge: coredump via devlink health reporter")
Reviewed-by: Coiby Xu <coiby.xu@gmail.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Addresses-Coverity: ("Uninitialized pointer read")
Link: https://lore.kernel.org/r/20210203133834.22388-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch replaces the safe list iteration function with the
non-safe one, as no list element is being deleted.
Signed-off-by: Christian Gromm <christian.gromm@microchip.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/1612265890-18246-3-git-send-email-christian.gromm@microchip.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch checks the function parameter 'bytes' before doing the
subtraction to prevent memory corruption.
Signed-off-by: Christian Gromm <christian.gromm@microchip.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/1612282865-21846-1-git-send-email-christian.gromm@microchip.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
PTR_TO_BTF_ID registers contain either kernel pointer or NULL.
Emit the NULL check explicitly by JIT instead of going into
do_user_addr_fault() on NULL deference.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210202053837.95909-1-alexei.starovoitov@gmail.com
|
|
Pull NVMe fixes from Christoph.
* 'nvme-5.11' of git://git.infradead.org/nvme:
nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
update the email address for Keith Bush
nvme-pci: ignore the subsysem NQN on Phison E16
nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
|
|
Saving one lock/unlock for io-wq is not super important, but adds some
ugliness in the code. More important, atomic decs not turning it to zero
for some archs won't give the right ordering/barriers so the
io_steal_work() may pretty easily get subtly and completely broken.
Return back 2-step io-wq work exchange and clean it up.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Extract a helper io_fixed_file_slot() returning a place in our fixed
files table, so we don't hand-code it three times in the code.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_import_iovec() doesn't return IO size anymore, only error code. Make
it more apparent by returning int instead of ssize and clean up
leftovers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make decision making of whether we need to retry read/write similar for
O_NONBLOCK and RWF_NOWAIT. Set REQ_F_NOWAIT when either is specified and
use it for all relevant checks. Also fix resubmitting NOWAIT requests
via io_rw_reissue().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We already have implicit do-while for read-retries but with goto in the
end. Convert it to an actual do-while, it highlights it so making a
bit more understandable and is cleaner in general.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_read() has not the simpliest control flow with a lot of jumps and
it's hard to read. One of those is a out_free: label, which frees iovec.
However, from the middle of io_read() iovec is NULL'ed and so
kfree(iovec) is no-op, it leaves us with two place where we can inline
it and further clean up the code.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have invariant in io_read() of how much we're trying to read spilled
into an iter and io_size variable. The last one controls decision making
about whether to do read-retries. However, io_size is modified only
after the first read attempt, so if we happen to go for a third retry in
a single call to io_read(), we will get io_size greater than in the
iterator, so may lead to various side effects up to live-locking.
Modify io_size each time.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now we give out ownership of iovec into io_setup_async_rw(), so it
either sets request's context right or frees the iovec on error itself.
Makes our life a bit easier at call sites.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
First, instead of checking iov_iter_count(iter) for 0 to find out that
all needed bytes were read, just compare returned code against io_size.
It's more reliable and arguably cleaner.
Also, place the half-read case into an else branch and delete an extra
label.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
!io_file_supports_async() case of io_read() is hard to read, it jumps
somewhere in the middle of the function just to do async setup and fail
on a similar check. Call io_setup_async_rw() directly for this case,
it's much easier to follow.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It's easy to make a mistake in io_cqring_wait() because for all
break/continue clauses we need to watch for prepare/finish_wait to be
used correctly. Extract all those into a new helper
io_cqring_wait_schedule(), and transforming the loop into simple series
of func calls: prepare(); check_and_schedule(); finish();
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
schedule_timeout() with timeout=MAX_SCHEDULE_TIMEOUT is guaranteed to
work just as schedule(), so instead of hand-coding it based on arguments
always use the timeout version and simplify code.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Files and task cancellations go over same steps trying to cancel
requests in io-wq, poll, etc. Deduplicate it with a helper.
note: new io_uring_try_cancel_requests() is former
__io_uring_cancel_task_requests() with files passed as an agrument and
flushing overflowed requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The STEC S1220 PCIe SSD cards are EOL since 2014 and not supported by
the vendor anymore. As the skd driver for this SSD is starting to cause
problems with improvements to the block layer, stop supporting it in
newer kernel versions.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Abaci Robot reported following panic:
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 800000010ef3f067 P4D 800000010ef3f067 PUD 10d9df067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 0 PID: 1869 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc3+ #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:put_files_struct+0x1b/0x120
Code: 24 18 c7 00 f4 ff ff ff e9 4d fd ff ff 66 90 0f 1f 44 00 00 41 57 41 56 49 89 fe 41 55 41 54 55 53 48 83 ec 08 e8 b5 6b db ff 41 ff 0e 74 13 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 9c
RSP: 0000:ffffc90002147d48 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88810d9a5300 RCX: 0000000000000000
RDX: ffff88810d87c280 RSI: ffffffff8144ba6b RDI: 0000000000000000
RBP: 0000000000000080 R08: 0000000000000001 R09: ffffffff81431500
R10: ffff8881001be000 R11: 0000000000000000 R12: ffff88810ac2f800
R13: ffff88810af38a00 R14: 0000000000000000 R15: ffff8881057130c0
FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000010dbaa002 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__io_clean_op+0x10c/0x2a0
io_dismantle_req+0x3c7/0x600
__io_free_req+0x34/0x280
io_put_req+0x63/0xb0
io_worker_handle_work+0x60e/0x830
? io_wqe_worker+0x135/0x520
io_wqe_worker+0x158/0x520
? __kthread_parkme+0x96/0xc0
? io_worker_handle_work+0x830/0x830
kthread+0x134/0x180
? kthread_create_worker_on_cpu+0x90/0x90
ret_from_fork+0x1f/0x30
Modules linked in:
CR2: 0000000000000000
---[ end trace c358ca86af95b1e7 ]---
I guess case below can trigger above panic: there're two threads which
operates different io_uring ctxs and share same sqthread identity, and
later one thread exits, io_uring_cancel_task_requests() will clear
task->io_uring->identity->files to be NULL in sqpoll mode, then another
ctx that uses same identity will panic.
Indeed we don't need to clear task->io_uring->identity->files here,
io_grab_identity() should handle identity->files changes well, if
task->io_uring->identity->files is not equal to current->files,
io_cow_identity() should handle this changes well.
Cc: stable@vger.kernel.org # 5.5+
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
What platform_device_add_properties() does is it allocates
dynamically a software node that will contain the device
properties supplied to it, and then couples that node with
the device. If the properties are constant, the node can be
constant as well.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20210204141711.53775-5-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
What platform_device_add_properties() does is it allocates
dynamically a software node that will contain the device
properties supplied to it, and then couples that node with
the device. Since that node is always created, it might as
well be constant.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20210204141711.53775-4-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The function dwc2_pci_quirks() does nothing. Removing.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Minas Harutyunyan <hminas@synopsys.com>
Link: https://lore.kernel.org/r/20210204141711.53775-3-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.12/drivers
Pull MD fix from Song.
* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid5: cast chunk_sectors to sector_t value
|
|
for-5.12/drivers
Pull floppy fix from Denis:
"Floppy patch for 5.12
- O_NDELAY/O_NONBLOCK fix for floppy from Jiri Kosina.
libblkid is using O_NONBLOCK when probing devices.
This leads to pollution of kernel log with error
messages from floppy driver. Also the driver fails
a mount prior to being opened without O_NONBLOCK
at least once. The patch fixes the issues."
Signed-off-by: Denis Efremov <efremov@linux.com>
* tag 'floppy-for-5.12' of https://github.com/evdenis/linux-floppy:
floppy: reintroduce O_NDELAY fix
|
|
Add a helper to generate the mask of reserved GPA bits _without_ any
adjustments for repurposed bits, and use it to replace a variety of
open coded variants in the MTRR and APIC_BASE flows.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a helper to generate the mask of reserved PA bits in the host.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use reserved_gpa_bits, which accounts for exceptions to the maxphyaddr
rule, e.g. SEV's C-bit, for the page {table,directory,etc...} entry (PxE)
reserved bits checks. For SEV, the C-bit is ignored by hardware when
walking pages tables, e.g. the APM states:
Note that while the guest may choose to set the C-bit explicitly on
instruction pages and page table addresses, the value of this bit is a
don't-care in such situations as hardware always performs these as
private accesses.
Such behavior is expected to hold true for other features that repurpose
GPA bits, e.g. KVM could theoretically emulate SME or MKTME, which both
allow non-zero repurposed bits in the page tables. Conceptually, KVM
should apply reserved GPA checks universally, and any features that do
not adhere to the basic rule should be explicitly handled, i.e. if a GPA
bit is repurposed but not allowed in page tables for whatever reason.
Refactor __reset_rsvds_bits_mask() to take the pre-generated reserved
bits mask, and opportunistically clean up its code, e.g. to align lines
and comments.
Practically speaking, this is change is a likely a glorified nop given
the current KVM code base. SEV's C-bit is the only repurposed GPA bit,
and KVM doesn't support shadowing encrypted page tables (which is
theoretically possible via SEV debug APIs).
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA
legality checks. AMD's APM states:
If the C-bit is an address bit, this bit is masked from the guest
physical address when it is translated through the nested page tables.
Thus, any access that can conceivably be run through NPT should ignore
the C-bit when checking for validity.
For features that KVM emulates in software, e.g. MTRRs, there is no
clear direction in the APM for how the C-bit should be handled. For
such cases, follow the SME behavior inasmuch as possible, since SEV is
is essentially a VM-specific variant of SME. For SME, the APM states:
In this case the upper physical address bits are treated as reserved
when the feature is enabled except where otherwise indicated.
Collecting the various relavant SME snippets in the APM and cross-
referencing the omissions with Linux kernel code, this leaves MTTRs and
APIC_BASE as the only flows that KVM emulates that should _not_ ignore
the C-bit.
Note, this means the reserved bit checks in the page tables are
technically broken. This will be remedied in a future patch.
Although the page table checks are technically broken, in practice, it's
all but guaranteed to be irrelevant. NPT is required for SEV, i.e.
shadowing page tables isn't needed in the common case. Theoretically,
the checks could be in play for nested NPT, but it's extremely unlikely
that anyone is running nested VMs on SEV, as doing so would require L1
to expose sensitive data to L0, e.g. the entire VMCB. And if anyone is
running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1
would need to put its NPT in shared memory, in which case the C-bit will
never be set. Or, L1 could use shadow paging, but again, if L0 needs to
read page tables, e.g. to load PDPTRs, the memory can't be encrypted if
L1 has any expectation of L0 doing the right thing.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Replace an open coded check for an invalid CR3 with its equivalent
helper.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Replace a variety of open coded GPA checks with the recently introduced
common helpers.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a helper to genericize checking for a legal GPA that also must
conform to an arbitrary alignment, and use it in the existing
page_address_valid(). Future patches will replace open coded variants
in VMX and SVM.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a helper to check for a legal GPA, and use it to consolidate code
in existing, related helpers. Future patches will extend usage to
VMX and SVM code, properly handle exceptions to the maxphyaddr rule, and
add more helpers.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Don't clear the SME C-bit when reading a guest PDPTR, as the GPA (CR3) is
in the guest domain.
Barring a bizarre paravirtual use case, this is likely a benign bug. SME
is not emulated by KVM, loading SEV guest PDPTRs is doomed as KVM can't
use the correct key to read guest memory, and setting guest MAXPHYADDR
higher than the host, i.e. overlapping the C-bit, would cause faults in
the guest.
Note, for SEV guests, stripping the C-bit is technically aligned with CPU
behavior, but for KVM it's the greater of two evils. Because KVM doesn't
have access to the guest's encryption key, ignoring the C-bit would at
best result in KVM reading garbage. By keeping the C-bit, KVM will
fail its read (unless userspace creates a memslot with the C-bit set).
The guest will still undoubtedly die, as KVM will use '0' for the PDPTR
value, but that's preferable to interpreting encrypted data as a PDPTR.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU
reset. The reserved bits check needs to be done even if userspace never
configures the guest's CPUID model.
Cc: stable@vger.kernel.org
Fixes: 0107973a80ad ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210204000117.3303214-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
|
Instead of adding a plethora of new KVM_CAP_XEN_FOO capabilities, just
add bits to the return value of KVM_CAP_XEN_HVM.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
|
It turns out that we can't handle event channels *entirely* in userspace
by delivering them as ExtINT, because KVM is a bit picky about when it
accepts ExtINT interrupts from a legacy PIC. The in-kernel local APIC
has to have LVT0 configured in APIC_MODE_EXTINT and unmasked, which
isn't necessarily the case for Xen guests especially on secondary CPUs.
To cope with this, add kvm_xen_get_interrupt() which checks the
evtchn_pending_upcall field in the Xen vcpu_info, and delivers the Xen
upcall vector (configured by KVM_XEN_ATTR_TYPE_UPCALL_VECTOR) if it's
set regardless of LAPIC LVT0 configuration. This gives us the minimum
support we need for completely userspace-based implementation of event
channels.
This does mean that vcpu_enter_guest() needs to check for the
evtchn_pending_upcall flag being set, because it can't rely on someone
having set KVM_REQ_EVENT unless we were to add some way for userspace to
do so manually.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
|
Allow the Xen emulated guest the ability to register secondary
vcpu time information. On Xen guests this is used in order to be
mapped to userspace and hence allow vdso gettimeofday to work.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
|
Parameterise kvm_setup_pvclock_page() a little bit so that it can be
invoked for different gfn_to_hva_cache structures, and with different
offsets. Then we can invoke it for the normal KVM pvclock and also for
the Xen one in the vcpu_info.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
|
The vcpu info supersedes the per vcpu area of the shared info page and
the guest vcpus will use this instead.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
|
This will be used for per-vCPU setup such as runstate and vcpu_info.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|
|
Wallclock on Xen is written in the shared_info page.
To that purpose, export kvm_write_wall_clock() and pass on the GPA of
its location to populate the shared_info wall clock data.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
|