Age | Commit message (Collapse) | Author |
|
A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest. Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI. This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).
When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI. Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed. But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0. Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.
For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing. The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter. Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.
Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate. There are no known errata affecting timer value of '0'.
[1] I/O SMIs also have guarantees on when they arrive, but I have
no idea if/how those are emulated in KVM.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Provide a singular location where the VMX preemption timer bit is
set/cleared so that future usages of the preemption timer can ensure
the VMCS bit is up-to-date without having to modify unrelated code
paths. For example, the preemption timer can be used to force an
immediate VMExit. Cache the status of the timer to avoid redundant
VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
VMEnters/VMExits.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest. This architectural
definition is in place to ensure that a previously expired timer
is correctly recognized by the CPU as it is possible for the timer
to reach zero and not trigger a VMexit due to a higher priority
VMExit being signalled instead, e.g. a pending #DB that morphs into
a VMExit.
Whether by design or coincidence, commit f4124500c2c1 ("KVM: nVMX:
Fully emulate preemption timer") special cased timer values of '0'
and '1' to ensure prompt delivery of the VMExit. Unlike '0', a
timer value of '1' has no has no architectural guarantees regarding
when it is delivered.
Modify the timer emulation to trigger immediate VMExit if and only
if the timer value is '0', and document precisely why '0' is special.
Do this even if calibration of the virtual TSC failed, i.e. VMExit
will occur immediately regardless of the frequency of the timer.
Making only '0' a special case gives KVM leeway to be more aggressive
in ensuring the VMExit is injected prior to executing instructions in
the nested guest, and also eliminates any ambiguity as to why '1' is
a special case, e.g. why wasn't the threshold for a "short timeout"
set to 10, 100, 1000, etc...
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page()
This patch is to fix the commit.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
I run into the following error
testing/selftests/kvm/dirty_log_test.c:285: undefined reference to `pthread_create'
testing/selftests/kvm/dirty_log_test.c:297: undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
my gcc version is gcc version 4.8.4
"-pthread" would work everywhere
Signed-off-by: Lei Yang <Lei.Yang@windriver.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Here is the code path which shows kvm_mmu_setup() is invoked after
kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path,
this means the root_hpa and prev_roots are guaranteed to be invalid. And
it is not necessary to reset it again.
kvm_vm_ioctl_create_vcpu()
kvm_arch_vcpu_create()
vmx_create_vcpu()
kvm_vcpu_init()
kvm_arch_vcpu_init()
kvm_mmu_create()
kvm_arch_vcpu_setup()
kvm_mmu_setup()
kvm_init_mmu()
This patch set reset_roots to false in kmv_mmu_setup().
Fixes: 50c28f21d045dde8c52548f8482d456b3f0956f5
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and
CR4.PAE = 1.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014
The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.
When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.
Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.
The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
libc_compat.h is used by libbpf so make sure it's licensed under
LGPL or BSD license. The license change should be OK, I'm the only
author of the file.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
If boolean CONFIG_BPF_SYSCALL is enabled, kernel/bpf/syscall.c will
call flow_dissector functions from net/core/flow_dissector.c.
This causes this build failure if CONFIG_NET is disabled:
kernel/bpf/syscall.o: In function `__x64_sys_bpf':
syscall.c:(.text+0x3278): undefined reference to
`skb_flow_dissector_bpf_prog_attach'
syscall.c:(.text+0x3310): undefined reference to
`skb_flow_dissector_bpf_prog_detach'
kernel/bpf/syscall.o:(.rodata+0x3f0): undefined reference to
`flow_dissector_prog_ops'
kernel/bpf/verifier.o:(.rodata+0x250): undefined reference to
`flow_dissector_verifier_ops'
Analogous to other optional BPF program types in syscall.c, add stubs
if the relevant functions are not compiled and move the BPF_PROG_TYPE
definition in the #ifdef CONFIG_NET block.
Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Use different loop variables for the inner and outer loop. This avoids
that an infinite loop occurs if there are more RDMA channels than
target->req_ring_size.
Fixes: d92c0da71a35 ("IB/srp: Add multichannel support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Guenter writes:
"Various bug fixes for nct6775 driver"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
James writes:
"SCSI fixes on 20180919
A couple of small but important fixes, one affecting big endian and
the other fixing a BUG_ON in scatterlist processing.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>"
|
|
When a driver domain (e.g. dom0) is running out of maptrack entries it
can't map any more foreign domain pages. Instead of silently stalling
the affected domUs issue a rate limited warning in this case in order
to make it easier to detect that situation.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
Otherwise we may leak kernel stack for events that sample user
registers.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
|
|
Borislav is effectivly maintaining parts of X86 already, make it official.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
|
|
Sync syscall to DAX file needs to flush processor cache, but it
currently does not flush to existing DAX files. This is because
'ext2_da_aops' is set to address_space_operations of existing DAX
files, instead of 'ext2_dax_aops', since S_DAX flag is set after
ext2_set_aops() in the open path.
Similar to ext4, change ext2_iget() to initialize i_flags before
ext2_set_aops().
Fixes: fb094c90748f ("ext2, dax: introduce ext2_dax_aops")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Jan Kara <jack@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix the build on !_GNU_SOURCE libc systems such as Alpine Linux/musl
libc due to usage of strerror_r glibc variant on libbpf (Arnaldo Carvalho de Melo)
- Fix out-of-tree asciidoctor man page generation (Ben Hutchings)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The first argument to WARN_ONCE() is a condition.
Fixes: 5800dc5c19f3 ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919103553.GD9238@mwanda
|
|
Remove mt76_fw dependency from mt76x0u_upload_firmware routine since
it does not define firmware layout properly. Moreover use mt76_poll_msec
utility routine to check if the fw is properly running
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Remove unused usb header file and move mt76x0 firmware definition
in usb.c
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Remove mcu.c source file since it contains just 'one-line' function
that is shared between PCI and USB code
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Firmware loading is usb specific, move it to usb.c file.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Using the NAPI or netdev frag cache along with other drivers can lead to
32 KiB pages being held for a long time, despite only being used for
very few page fragments.
This can happen if the driver grabs one or two fragments for rx ring
refill, while other drivers use (and free up) the remaining fragments.
The 32 KiB higher-order page can only be freed once all users have freed
their fragments.
Depending on the traffic patterns, this can waste a lot of memory and
look a lot like a memory leak.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Move mt76x2_phy_tssi_compensate routine in mt76x2-common module
since it is shared between mt76x2 and mt76x2u drivers
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Move mcu_calibrate routine in mt76x02-lib module since it is
shared between USB and PCI code. Moreover remove duplicated
code
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Move shared mt76x2 {pcie/usb} mcu shared code in a common file
and remove duplicated code
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Move shared mt76x2/mt76x0 mcu shared code in a common file
and remove duplicated code
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Move mt76x2_fw_header definition in mt76x02_mcu.h and remove
duplicated code
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Use mcu common helpers instead of mt76x2 specific routines for
mcu_alloc_msg()/mcu_send_msg(). This is a preliminary patch to
unify mt76e and mt76u mcu code
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Use mt76_dev data structure instead of mt76x2_dev one in mt76x2_mcu_msg_send
and mt76x2_mcu_get_response routines. Moreover add wait_resp parameter to
mt76x2_mcu_msg_send signature. This is a preliminary patch in order to unify
mcu_msg_alloc()/mcu_msg_send() between pcie and usb code
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Move __iomem regs pointer in mt76_mmio data structure
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Introduce mt76_mmio data structure in mt76_dev and
move mt76x2_mcu in mt76_mmio. This is a preliminary
patch to unify mcu code between mt76x02{e,u} drivers
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Remove unused usb buffer in mt76x2_mcu data structure
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Use mt76_dev data structure instead of mt76x2_dev one in
mt76x2_tx_queue_mcu routine. This is a preliminary patch
to share mcu code between mt76x2e and mt76x0e
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Remove mt76_mcu_msg_alloc return value check since it is already
evaluated in __mt76x02u_mcu_send_msg
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
We don't need to use custom burst write regs via MCU, we can use
generic mt76_wr_copy() for the same purpose.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
mt76x0_burst_read_regs is not used, but keep it for eventual use. Since
we have this function now in the driver git history, we can remove it and
eventually revert this commit it the function will be needed.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Add static qualifier to mt76x02_remove_dma_hdr routine and
do not export the symbol since it is only used in mt76x02_util.c
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Introduce mt76x02_dma.h header file to contain mt76x02 dma
related definitions
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Move mt76u_skb_dma_info routine in mt76x02-usb module and rename it in
mt76x02u_skb_dma_info. Moreover move mt76x02u_set_txinfo in
mt76x02_usb_core.c. This is a preliminary patch to move MT_TXD_INFO,
MT_MCU_MSG and MT_RX_FCE_INFO defs in mt76x02-lib module since other
chipsets (e.g. mt7603) use different dma definitions
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Introduce mt76x02_usb_mcu.c in order to contain mt76x02u mcu related
code. Add mt76x02-usb module as a container for mt76x02 usb code.
This is a preliminary patch to move MT_TXD_INFO, MT_MCU_MSG and
MT_RX_FCE_INFO defs in mt76x02-lib module since other chipsets (e.g.
mt7603) use different dma definitions
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Use mcu common helpers instead of usb specific routines.
Add static qualifier to the following functions:
- mt76u_mcu_msg_alloc
- __mt76u_mcu_send_msg
- mt76u_mcu_send_msg
- mt76u_mcu_wr_rp
- mt76u_mcu_rd_rp
- mt76u_wr_rp
- mt76u_rd_rp
This is a preliminary patch to move mt76x02 usb mcu code in
mt76x02-usb module
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Add callbacks for reading and writing reg pairs tables.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Add USB implementation for read and write reg pair routines.
The actual implementation can use mcu related routines according to
MCU state
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Introduce mt76_mcu_ops data structure to contain mcu related function
pointers. This is a preliminary patch to move mt76x02 usb mcu code in
mt76x02-usb module
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Move mt76x0 and mt76x2 mcu shared definition in mt76x02_mcu.h
and remove duplicated code
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
According to vendor sdk, vco calibration has to be executed
for each channel configuration whereas mcu calibration has to be
performed during channel scanning. This patch fixes the mt76x0
monitor mode issue since in that configuration vco calibration
was never executed
Fixes: 10de7a8b4ab9 ("mt76x0: phy files")
Tested-by: Sid Hayn <sidhayn@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Remove some USB specific code form mt76x0_alloc_device.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|