Age | Commit message (Collapse) | Author |
|
Hyper-V assumes page size to be 4K. While this assumption holds true on
x86 architecture, it might not be true for ARM64 architecture. Hence
define hyper-v specific function to allocate a zeroed page which can
have a different implementation on ARM64 architecture to handle the
conflict between hyper-v's assumed page size and actual guest page size.
Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
When the NMI lands on an ESPFIX_SS, we are on the entry stack and must
swizzle, otherwise we'll run do_nmi() on the entry stack, which is
BAD.
Also, similar to the normal exception path, we need to correct the
ESPFIX magic before leaving the entry stack, otherwise pt_regs will
present a non-flat stack pointer.
Tested by running sigreturn_32 concurrent with perf-record.
Fixes: e5862d0515ad ("x86/entry/32: Leave the kernel via trampoline stack")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@kernel.org
|
|
Right now, we do some fancy parts of the exception entry path while SS
might have a nonzero base: we fill in regs->ss and regs->sp, and we
consider switching to the kernel stack. This results in regs->ss and
regs->sp referring to a non-flat stack and it may result in
overflowing the entry stack. The former issue means that we can try to
call iret_exc on a non-flat stack, which doesn't work.
Tested with selftests/x86/sigreturn_32.
Fixes: 45d7b255747c ("x86/entry/32: Enter the kernel via trampoline stack")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
|
|
This will allow us to get percpu access working before FIXUP_FRAME,
which will allow us to unwind ESPFIX earlier.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
|
|
When re-building the IRET frame we use %eax as an destination %esp,
make sure to then also match the segment for when there is a nonzero
SS base (ESPFIX).
[peterz: Changelog and minor edits]
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
|
|
As reported by Lai, the commit 3c88c692c287 ("x86/stackframe/32:
Provide consistent pt_regs") wrecked the IRET EXTABLE entry by making
.Lirq_return not point at IRET.
Fix this by placing IRET_FRAME in RESTORE_REGS, to mirror how
FIXUP_FRAME is part of SAVE_ALL.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Reported-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@kernel.org
|
|
The entry stack in the cpu entry area is protected against overflow by the
readonly GDT on 64-bit, but on 32-bit the GDT needs to be writeable and
therefore does not trigger a fault on stack overflow.
Add a guard page.
Fixes: c482feefe1ae ("x86/entry/64: Make cpu_entry_area.tss read-only")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
|
|
Commit 945fd17ab6ba ("x86/cpu_entry_area: Sync cpu_entry_area to
initial_page_table") introduced the sync for the initial page table for
32bit.
sync_initial_page_table() uses clone_pgd_range() which does the update for
the kernel page table. If PTI is enabled it also updates the user space
page table counterpart, which is assumed to be in the next page after the
target PGD.
At this point in time 32-bit did not have PTI support, so the user space
page table update was not taking place.
The support for PTI on 32-bit which was introduced later on, did not take
that into account and missed to add the user space counter part for the
initial page table.
As a consequence sync_initial_page_table() overwrites any data which is
located in the page behing initial_page_table causing random failures,
e.g. by corrupting doublefault_tss and wreckaging the doublefault handler
on 32bit.
Fix it by adding a "user" page table right after initial_page_table.
Fixes: 7757d607c6b3 ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Cc: stable@kernel.org
|
|
The double fault TSS was missing GS setup, which is needed for stack
canaries to work.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
|
|
Considering the previous changes, this is a more proper name.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20191121011601.20611-5-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Drop the rbt_memtype_*() call rbt_ prefix, as we no longer use
an rbtree directly.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20191121011601.20611-4-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Get rid of the passing the rb_root down the helper calls; there
is only one: &memtype_rbroot.
No change in functionality.
[ mingo: Fixed the changelog which described a different version of the patch. ]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20191121011601.20611-3-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With some considerations, the custom pat_rbtree implementation can be
simplified to use most of the generic interval_tree machinery:
- The tree inorder traversal can slightly differ when there are key
('start') collisions in the tree due to one going left and another right.
This, however, only affects the output of debugfs' pat_memtype_list file.
- Generic interval trees are now fully closed [a, b], for which we need
to adjust the last endpoint (ie: end - 1).
- Erasing logic must remain untouched as well.
- In order for the types to remain u64, the 'memtype_interval' calls are
introduced, as opposed to simply using struct interval_tree.
In addition, the PAT tree might potentially also benefit by the fast overlap
detection for the insertion case when looking up the first overlapping node
in the tree.
No change in behavior is intended.
Finally, I've tested this on various servers, via sanity warnings, running
side by side with the current version and so far see no differences in the
returned pointer node when doing memtype_rb_lowest_match() lookups.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20191121011601.20611-2-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Using a mask to represent bus DMA constraints has a set of limitations.
The biggest one being it can only hold a power of two (minus one). The
DMA mapping code is already aware of this and treats dev->bus_dma_mask
as a limit. This quirk is already used by some architectures although
still rare.
With the introduction of the Raspberry Pi 4 we've found a new contender
for the use of bus DMA limits, as its PCIe bus can only address the
lower 3GB of memory (of a total of 4GB). This is impossible to represent
with a mask. To make things worse the device-tree code rounds non power
of two bus DMA limits to the next power of two, which is unacceptable in
this case.
In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
over the tree and treat it as such. Note that dev->bus_dma_limit should
contain the higher accessible DMA address.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The AMD FCH USB XHCI Controller advertises support for generating PME#
while in D0. When in D0, it does signal PME# for USB 3.0 connect events,
but not for USB 2.0 or USB 1.1 connect events, which means the controller
doesn't wake correctly for those events.
00:10.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller [1022:7914] (rev 20) (prog-if 30 [XHCI])
Subsystem: Dell FCH USB XHCI Controller [1028:087e]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Clear PCI_PM_CAP_PME_D0 in dev->pme_support to indicate the device will not
assert PME# from D0 so we don't rely on it.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203673
Link: https://lore.kernel.org/r/20190902145252.32111-1-kai.heng.feng@canonical.com
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
|
|
Update arch/x86/pci/Makefile replacing the deprecated EXTRA_CFLAGS with the
ccflags-y matching recommendation per section 3.7 of
Documentation/kbuild/makefiles.txt.
Link: https://lore.kernel.org/r/20190819060532.17093-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Add SPDX GPL-2.0 to numachip.c, which referred to the kernel default
"COPYING" file, which specifies GPL version 2.
Remove the boilerplate language referring to the GPL and "COPYING", relying
on the assertion in b24413180f56 ("License cleanup: add SPDX GPL-2.0
license identifier to files with no license") that the SPDX identifier may
be used instead of the full boilerplate text.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/20190828135322.10370-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Preparatory work for shattering mmu.c into multiple files. Besides making it easier
to follow, this will also make it possible to write unit tests for various parts.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
apic-access-page
According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use
of the INVVPID/INVEPT Instruction, the hypervisor needs to execute
INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and
either: "Virtualize APIC accesses" VM-execution control was changed
from 0 to 1, OR the value of apic_access_page was changed.
In the nested case, the burden falls on L1, unless L0 enables EPT in
vmcs02 but L1 enables neither EPT nor VPID in vmcs12. For this reason
prepare_vmcs02() and load_vmcs12_host_state() have special code to
request a TLB flush in case L1 does not use EPT but it uses
"virtualize APIC accesses".
This special case however is not necessary. On a nested vmentry the
physical TLB will already be flushed except if all the following apply:
* L0 uses VPID
* L1 uses VPID
* L0 can guarantee TLB entries populated while running L1 are tagged
differently than TLB entries populated while running L2.
If the first condition is false, the processor will flush the TLB
on vmentry to L2. If the second or third condition are false,
prepare_vmcs02() will request KVM_REQ_TLB_FLUSH. However, even
if both are true, no extra TLB flush is needed to handle the APIC
access page:
* if L1 doesn't use VPID, the second condition doesn't hold and the
TLB will be flushed anyway.
* if L1 uses VPID, it has to flush the TLB itself with INVVPID and
section 28.3.3.3 doesn't apply to L0.
* even INVEPT is not needed because, if L0 uses EPT, it uses different
EPTP when running L2 than L1 (because guest_mode is part of mmu-role).
In this case SDM section 28.3.3.4 doesn't apply.
Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(),
one could note that L0 won't flush TLB only in cases where SDM sections
28.3.3.3 and 28.3.3.4 don't apply. In particular, if L0 uses different
VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section
28.3.3.3 doesn't apply.
Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit().
Side-note: This patch can be viewed as removing parts of commit
fb6c81984313 ("kvm: vmx: Flush TLB when the APIC-access address changes”)
that is not relevant anymore since commit
1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”).
i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT,
then L0 will use same EPTP for both L0 and L1. Which indeed required
L0 to execute INVEPT before entering L2 guest. This assumption is
not true anymore since when guest_mode was added to mmu-role.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
arch/x86/kvm/x86.c: In function kvm_make_scan_ioapic_request_mask:
arch/x86/kvm/x86.c:7911:7: warning: variable called set but not
used [-Wunused-but-set-variable]
It is not used since commit 7ee30bc132c6 ("KVM: x86: deliver KVM
IOAPIC scan request to target vCPUs")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
vmcs->apic_access_page is simply a token that the hypervisor puts into
the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers
APIC-access VMExit or APIC virtualization logic whenever a CPU running
in VMX non-root mode read/write from/to this PFN.
As every write either triggers an APIC-access VMExit or write is
performed on vmcs->virtual_apic_page, the PFN pointed to by
vmcs->apic_access_page should never actually be touched by CPU.
Therefore, there is no need to mark vmcs02->apic_access_page as dirty
after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation.
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Conflicts:
arch/x86/kvm/vmx/vmx.c
|
|
If X86_FEATURE_RTM is disabled, the guest should not be able to access
MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all
transactions from the guest to abort.
Tested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The current guest mitigation of TAA is both too heavy and not really
sufficient. It is too heavy because it will cause some affected CPUs
(those that have MDS_NO but lack TAA_NO) to fall back to VERW and
get the corresponding slowdown. It is not really sufficient because
it will cause the MDS_NO bit to disappear upon microcode update, so
that VMs started before the microcode update will not be runnable
anymore afterwards, even with tsx=on.
Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
the guest and let it run without the VERW mitigation. Even though
MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
it on every vmentry, we can use the shared MSR functionality because
the host kernel need not protect itself from TSX-based side-channels.
Tested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Because KVM always emulates CPUID, the CPUID clear bit
(bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually"
by the hypervisor when performing said emulation.
Right now neither kvm-intel.ko nor kvm-amd.ko implement
MSR_IA32_TSX_CTRL but this will change in the next patch.
Reviewed-by: Jim Mattson <jmattson@google.com>
Tested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
"Shared MSRs" are guest MSRs that are written to the host MSRs but
keep their value until the next return to userspace. They support
a mask, so that some bits keep the host value, but this mask is
only used to skip an unnecessary MSR write and the value written
to the MSR is always the guest MSR.
Fix this and, while at it, do not update smsr->values[slot].curr if
for whatever reason the wrmsr fails. This should only happen due to
reserved bits, so the value written to smsr->values[slot].curr
will not match when the user-return notifier and the host value will
always be restored. However, it is untidy and in rare cases this
can actually avoid spurious WRMSRs on return to userspace.
Cc: stable@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Tested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
!RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
hidden (it actually was), yet the value says that TSX is not vulnerable
to microarchitectural data sampling. Fix both.
Cc: stable@vger.kernel.org
Tested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-11-20
The following pull-request contains BPF updates for your *net-next* tree.
We've added 81 non-merge commits during the last 17 day(s) which contain
a total of 120 files changed, 4958 insertions(+), 1081 deletions(-).
There are 3 trivial conflicts, resolve it by always taking the chunk from
196e8ca74886c433:
<<<<<<< HEAD
=======
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
<<<<<<< HEAD
void *bpf_map_area_alloc(u64 size, int numa_node)
=======
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
<<<<<<< HEAD
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
=======
/* kmalloc()'ed memory can't be mmap()'ed */
if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5
The main changes are:
1) Addition of BPF trampoline which works as a bridge between kernel functions,
BPF programs and other BPF programs along with two new use cases: i) fentry/fexit
BPF programs for tracing with practically zero overhead to call into BPF (as
opposed to k[ret]probes) and ii) attachment of the former to networking related
programs to see input/output of networking programs (covering xdpdump use case),
from Alexei Starovoitov.
2) BPF array map mmap support and use in libbpf for global data maps; also a big
batch of libbpf improvements, among others, support for reading bitfields in a
relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko.
3) Extend s390x JIT with usage of relative long jumps and loads in order to lift
the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich.
4) Add BPF audit support and emit messages upon successful prog load and unload in
order to have a timeline of events, from Daniel Borkmann and Jiri Olsa.
5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode
(XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson.
6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API
call named bpf_get_link_xdp_info() for retrieving the full set of prog
IDs attached to XDP, from Toke Høiland-Jørgensen.
7) Add BTF support for array of int, array of struct and multidimensional arrays
and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau.
8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo.
9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid
xdping to be run as standalone, from Jiri Benc.
10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song.
11) Fix a memory leak in BPF fentry test run data, from Colin Ian King.
12) Various smaller misc cleanups and improvements mostly all over BPF selftests and
samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit
111e7b15cf10 ("x86/ioperm: Extend IOPL config to control ioperm() as well")
replaced X86_IOPL_EMULATION with X86_IOPL_IOPERM. However it appears
that there was at least one spot missed as tss_update_io_bitmap() still
had a reference to it contained in the code.
The result of this is that it exposed a NULL pointer dereference as seen
below with a linux-next next-20191120 kernel:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 5 PID: 1542 Comm: ovs-vswitchd Tainted: G W 5.4.0-rc8-next-20191120 #125
RIP: 0010:tss_update_io_bitmap+0x4e/0x180
Code: 10 31 c0 65 48 03 1d 69 54 5d 6d 65 48 8b 04 25 40 8c 01 00 48 8b 10 \
f7 c2 00 00 40 00 0f 84 8c 00 00 00 4c 8b a0 c0 22 00 00 <49> 8b 04 \
24 48 39 43 68 74 2e 8b 53 70 41 39 54 24 0c 48 8d 7b 78
RSP: 0018:ffffb8888a0ebf08 EFLAGS: 00010006
RAX: ffff8a429811a680 RBX: ffff8a4c3f946000 RCX: 0000000000000011
RDX: 0000000000400080 RSI: 0000000000400080 RDI: 0000000000000000
RBP: ffffb8888a0ebf30 R08: 00007ffffb5d7ce0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f68a9635c40(0000) GS:ffff8a4c3f940000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000103572a001 CR4: 00000000001606e0
Call Trace:
? syscall_slow_exit_work+0x39/0xdb
do_syscall_64+0x1a5/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f68a7aff797
Fixes: 111e7b15cf10 ("x86/ioperm: Extend IOPL config to control ioperm() as well")
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191120222426.3060.18462.stgit@localhost.localdomain
|
|
The valid memory address check in dma_capable only makes sense when mapping
normal memory, not when using dma_map_resource to map a device resource.
Add a new boolean argument to dma_capable to exclude that check for the
dma_map_resource case.
Fixes: b12d66278dd6 ("dma-direct: check for overflows on 32 bit DMA addresses")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
|
|
Since commit 1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role"),
guest_mode was added to mmu-role and therefore if L0 use EPT, it will
always run L1 and L2 with different EPTP. i.e. EPTP01!=EPTP02.
Because TLB entries are tagged with EP4TA, KVM can assume
TLB entries populated while running L2 are tagged differently
than TLB entries populated while running L1.
Therefore, update nested_has_guest_tlb_tag() to consider if
L0 use EPT instead of if L1 use EPT.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The function is only used in kvm.ko module.
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When L1 guest uses 5-level paging, it fails vm-entry to L2 due to
invalid host-state. It needs to add CR4_LA57 bit to nested CR4_FIXED1
MSR.
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Not zeroing the bitmap used for identifying the destination vCPUs for an
IOAPIC scan request in fixed delivery mode could lead to waking up unwanted
vCPUs. This patch zeroes the vCPU bitmap before passing it to
kvm_bitmap_or_dest_vcpus(), which is responsible for setting the bitmap
with the bits corresponding to the destination vCPUs.
Fixes: 7ee30bc132c6("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This can be had with two instead of six insns, by just checking the high
CS.RPL bit.
Also adjust the comment - there would be no #GP in the mentioned cases, as
there's no segment limit violation or alike. Instead there'd be #PF, but
that one reports the target EIP of said branch, not the address of the
branch insn itself.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/a5986837-01eb-7bf8-bf42-4d3084d6a1f5@suse.com
|
|
Now that SS:ESP always get saved by SAVE_ALL, this also needs to be
accounted for in xen_iret_crit_fixup(). Otherwise the old_ax value gets
interpreted as EFLAGS, and hence VM86 mode appears to be active all the
time, leading to random "vm86_32: no user_vm86: BAD" log messages alongside
processes randomly crashing.
Since following the previous model (sitting after SAVE_ALL) would further
complicate the code _and_ retain the dependency of xen_iret_crit_fixup() on
frame manipulations done by entry_32.S, switch things around and do the
adjustment ahead of SAVE_ALL.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Stable Team <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/32d8713d-25a7-84ab-b74b-aa3e88abce6b@suse.com
|
|
Once again RPL checks have been introduced which don't account for a 32-bit
kernel living in ring 1 when running in a PV Xen domain. The case in
FIXUP_FRAME has been preventing boot.
Adjust BUG_IF_WRONG_CR3 as well to guard against future uses of the macro
on a code path reachable when running in PV mode under Xen; I have to admit
that I stopped at a certain point trying to figure out whether there are
present ones.
Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stable Team <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/0fad341f-b7f5-f859-d55d-f0084ee7087e@suse.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
x86/insn:
Adrian Hunter:
- Add some more Intel instructions to the opcode map:
cldemote, encls, enclu, enclv, enqcmd, enqcmds, movdir64b,
movdiri, pconfig, tpause, umonitor, umwait, wbnoinvd.
- The instruction decoding can be tested using the perf tools'
"x86 instruction decoder - new instructions" test as folllows:
$ perf test -v "new " 2>&1 | grep -i cldemote
Decoded ok: 0f 1c 00 cldemote (%eax)
Decoded ok: 0f 1c 05 78 56 34 12 cldemote 0x12345678
Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%eax,%ecx,8)
Decoded ok: 0f 1c 00 cldemote (%rax)
Decoded ok: 41 0f 1c 00 cldemote (%r8)
Decoded ok: 0f 1c 04 25 78 56 34 12 cldemote 0x12345678
Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%rax,%rcx,8)
Decoded ok: 41 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%r8,%rcx,8)
$ perf test -v "new " 2>&1 | grep -i tpause
Decoded ok: 66 0f ae f3 tpause %ebx
Decoded ok: 66 0f ae f3 tpause %ebx
Decoded ok: 66 41 0f ae f0 tpause %r8d
callchains:
Adrian Hunter:
- Fix segfault in thread__resolve_callchain_sample().
perf probe:
- Line fixes to show only lines where probes can be used with 'perf probe -L',
and when reporting them via 'perf probe -l'.
- Support multiprobe events.
perf scripts python:
Adrian Hunter:
- Fix use of TRUE with SQLite < 3.23 in exported-sql-viewer.py.
perf maps:
- Trim 'struct map' by removing the rb_node member for sorting
by map name, as that is only needed for processing kernel maps,
and only when classifying symbols by section at load time.
Sort them by name using qsort() and do lookups using bsearch()
when map_groups__find_by_name() is used.
perf parse:
Ian Rogers:
- Report initial event parsing error, providing a less cryptic message
to state that a PMU wasn't found in the system.
perf vendor events:
James Clark:
- Fix commas so that PMU event files for arm64, power8 and power nine
become valid JSON.
libtraceevent:
Konstantin Khlebnikov:
- Fix parsing of event %o and %X argument types.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add to the opcode map the following instructions:
cldemote
tpause
umonitor
umwait
movdiri
movdir64b
enqcmd
enqcmds
encls
enclu
enclv
pconfig
wbnoinvd
For information about the instructions, refer Intel SDM May 2019
(325462-070US) and Intel Architecture Instruction Set Extensions
May 2019 (319433-037).
The instruction decoding can be tested using the perf tools'
"x86 instruction decoder - new instructions" test as folllows:
$ perf test -v "new " 2>&1 | grep -i cldemote
Decoded ok: 0f 1c 00 cldemote (%eax)
Decoded ok: 0f 1c 05 78 56 34 12 cldemote 0x12345678
Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%eax,%ecx,8)
Decoded ok: 0f 1c 00 cldemote (%rax)
Decoded ok: 41 0f 1c 00 cldemote (%r8)
Decoded ok: 0f 1c 04 25 78 56 34 12 cldemote 0x12345678
Decoded ok: 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%rax,%rcx,8)
Decoded ok: 41 0f 1c 84 c8 78 56 34 12 cldemote 0x12345678(%r8,%rcx,8)
$ perf test -v "new " 2>&1 | grep -i tpause
Decoded ok: 66 0f ae f3 tpause %ebx
Decoded ok: 66 0f ae f3 tpause %ebx
Decoded ok: 66 41 0f ae f0 tpause %r8d
$ perf test -v "new " 2>&1 | grep -i umonitor
Decoded ok: 67 f3 0f ae f0 umonitor %ax
Decoded ok: f3 0f ae f0 umonitor %eax
Decoded ok: 67 f3 0f ae f0 umonitor %eax
Decoded ok: f3 0f ae f0 umonitor %rax
Decoded ok: 67 f3 41 0f ae f0 umonitor %r8d
$ perf test -v "new " 2>&1 | grep -i umwait
Decoded ok: f2 0f ae f0 umwait %eax
Decoded ok: f2 0f ae f0 umwait %eax
Decoded ok: f2 41 0f ae f0 umwait %r8d
$ perf test -v "new " 2>&1 | grep -i movdiri
Decoded ok: 0f 38 f9 03 movdiri %eax,(%ebx)
Decoded ok: 0f 38 f9 88 78 56 34 12 movdiri %ecx,0x12345678(%eax)
Decoded ok: 48 0f 38 f9 03 movdiri %rax,(%rbx)
Decoded ok: 48 0f 38 f9 88 78 56 34 12 movdiri %rcx,0x12345678(%rax)
$ perf test -v "new " 2>&1 | grep -i movdir64b
Decoded ok: 66 0f 38 f8 18 movdir64b (%eax),%ebx
Decoded ok: 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%eax),%ecx
Decoded ok: 67 66 0f 38 f8 1c movdir64b (%si),%bx
Decoded ok: 67 66 0f 38 f8 8c 34 12 movdir64b 0x1234(%si),%cx
Decoded ok: 66 0f 38 f8 18 movdir64b (%rax),%rbx
Decoded ok: 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%rax),%rcx
Decoded ok: 67 66 0f 38 f8 18 movdir64b (%eax),%ebx
Decoded ok: 67 66 0f 38 f8 88 78 56 34 12 movdir64b 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i enqcmd
Decoded ok: f2 0f 38 f8 18 enqcmd (%eax),%ebx
Decoded ok: f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%eax),%ecx
Decoded ok: 67 f2 0f 38 f8 1c enqcmd (%si),%bx
Decoded ok: 67 f2 0f 38 f8 8c 34 12 enqcmd 0x1234(%si),%cx
Decoded ok: f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
Decoded ok: 67 f3 0f 38 f8 1c enqcmds (%si),%bx
Decoded ok: 67 f3 0f 38 f8 8c 34 12 enqcmds 0x1234(%si),%cx
Decoded ok: f2 0f 38 f8 18 enqcmd (%rax),%rbx
Decoded ok: f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%rax),%rcx
Decoded ok: 67 f2 0f 38 f8 18 enqcmd (%eax),%ebx
Decoded ok: 67 f2 0f 38 f8 88 78 56 34 12 enqcmd 0x12345678(%eax),%ecx
Decoded ok: f3 0f 38 f8 18 enqcmds (%rax),%rbx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%rax),%rcx
Decoded ok: 67 f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: 67 f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i enqcmds
Decoded ok: f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
Decoded ok: 67 f3 0f 38 f8 1c enqcmds (%si),%bx
Decoded ok: 67 f3 0f 38 f8 8c 34 12 enqcmds 0x1234(%si),%cx
Decoded ok: f3 0f 38 f8 18 enqcmds (%rax),%rbx
Decoded ok: f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%rax),%rcx
Decoded ok: 67 f3 0f 38 f8 18 enqcmds (%eax),%ebx
Decoded ok: 67 f3 0f 38 f8 88 78 56 34 12 enqcmds 0x12345678(%eax),%ecx
$ perf test -v "new " 2>&1 | grep -i encls
Decoded ok: 0f 01 cf encls
Decoded ok: 0f 01 cf encls
$ perf test -v "new " 2>&1 | grep -i enclu
Decoded ok: 0f 01 d7 enclu
Decoded ok: 0f 01 d7 enclu
$ perf test -v "new " 2>&1 | grep -i enclv
Decoded ok: 0f 01 c0 enclv
Decoded ok: 0f 01 c0 enclv
$ perf test -v "new " 2>&1 | grep -i pconfig
Decoded ok: 0f 01 c5 pconfig
Decoded ok: 0f 01 c5 pconfig
$ perf test -v "new " 2>&1 | grep -i wbnoinvd
Decoded ok: f3 0f 09 wbnoinvd
Decoded ok: f3 0f 09 wbnoinvd
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20191115135447.6519-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The removed calgary IOMMU driver was the only user of this header file.
Reported-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
|
|
BIOSen -> BIOSes; paing -> paging. Append to 640 its proper unit "Kb".
encomapssing -> encompassing.
[ bp: Merge into a single patch, fix one more typo, massage. ]
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191118070012.27850-1-caoj.fnst@cn.fujitsu.com
|
|
Lots of overlapping changes and parallel additions, stuff
like that.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This implementation is the fastest available x86_64 implementation, and
unlike Sandy2x, it doesn't requie use of the floating point registers at
all. Instead it makes use of BMI2 and ADX, available on recent
microarchitectures. The implementation was written by Armando
Faz-Hernández with contributions (upstream) from Samuel Neves and me,
in addition to further changes in the kernel implementation from us.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
[ardb: - move to arch/x86/crypto
- wire into lib/crypto framework
- implement crypto API KPP hooks ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
These implementations from Samuel Neves support AVX and AVX-512VL.
Originally this used AVX-512F, but Skylake thermal throttling made
AVX-512VL more attractive and possible to do with negligable difference.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Co-developed-by: Samuel Neves <sneves@dei.uc.pt>
[ardb: move to arch/x86/crypto, wire into lib/crypto framework]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
In order to use 128-bit integer arithmetic in C code, the architecture
needs to have declared support for it by setting ARCH_SUPPORTS_INT128,
and it requires a version of the toolchain that supports this at build
time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test
whether __SIZEOF_INT128__ is defined, since this is only the case for
compilers that can support 128-bit integers.
Let's fold this additional test into the Kconfig declaration of
ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles,
e.g., to decide whether a certain object needs to be included in the
first place.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Implement the arch init/update/final Poly1305 library routines in the
accelerated SIMD driver for x86 so they are accessible to users of
the Poly1305 library interface as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Remove the dependency on the generic Poly1305 driver. Instead, depend
on the generic library so that we only reuse code without pulling in
the generic skcipher implementation as well.
While at it, remove the logic that prefers the non-SIMD path for short
inputs - this is no longer necessary after recent FPU handling changes
on x86.
Since this removes the last remaining user of the routines exported
by the generic shash driver, unexport them and make them static.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|