Age | Commit message (Collapse) | Author |
|
When this function fails, it needs to undo the DMA mappings it's
done so far. Otherwise these are leaked.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up. rpcrdma_prepare_hdr_sge() sets num_sge to one, then
rpcrdma_prepare_msg_sges() sets num_sge again to the count of SGEs
it added, plus one for the header SGE just mapped in
rpcrdma_prepare_hdr_sge(). This is confusing, and nails in an
assumption about when these functions are called.
Instead, maintain a running count that both functions can update
with just the number of SGEs they have added to the SGE array.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
We need to decode and save the incoming rdma_credits field _after_
we know that the direction of the message is "forward direction
Reply". Otherwise, the credits value in reverse direction Calls is
also used to update the forward direction credits.
It is safe to decode the rdma_credits field in rpcrdma_reply_handler
now that rpcrdma_reply_handler is single-threaded. Receives complete
in the same order as they were sent on the NFS server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
I noticed that the soft IRQ thread looked pretty busy under heavy
I/O workloads. perf suggested one area that was expensive was the
queue_work() call in rpcrdma_wc_receive. That gave me some ideas.
Instead of scheduling a separate worker to process RPC Replies,
promote the Receive completion handler to IB_POLL_WORKQUEUE, and
invoke rpcrdma_reply_handler directly.
Note that the poll workqueue is single-threaded. In order to keep
memory invalidation from serializing all RPC Replies, handle any
necessary invalidation tasks in a separate multi-threaded workqueue.
This provides a two-tier scheme, similar to OS I/O interrupt
handlers: A fast interrupt handler that schedules the slow handler
and re-enables the interrupt, and a slower handler that is invoked
for any needed heavy lifting.
Benefits include:
- One less context switch for RPCs that don't register memory
- Receive completion handling is moved out of soft IRQ context to
make room for other users of soft IRQ
- The same CPU core now DMA syncs and XDR decodes the Receive buffer
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up: I'd like to be able to invoke the tail of
rpcrdma_reply_handler in two different places. Split the tail out
into its own helper function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Clean up: Make it easier to pass the decoded XID, vers, credits, and
proc fields around by moving these variables into struct rpcrdma_rep.
Note: the credits field will be handled in a subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
A reply with an unrecognized value in the version field means the
transport header is potentially garbled and therefore all the fields
are untrustworthy.
Fixes: 59aa1f9a3cce3 ("xprtrdma: Properly handle RDMA_ERROR ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm and dax updates from Dan Williams:
"Save for a few late fixes, all of these commits have shipped in -next
releases since before the merge window opened, and 0day has given a
build success notification.
The ext4 touches came from Jan, and the xfs touches have Darrick's
reviewed-by. An xfstest for the MAP_SYNC feature has been through
a few round of reviews and is on track to be merged.
- Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable
'userspace flush' of persistent memory updates via filesystem-dax
mappings. It arranges for any filesystem metadata updates that may
be required to satisfy a write fault to also be flushed ("on disk")
before the kernel returns to userspace from the fault handler.
Effectively every write-fault that dirties metadata completes an
fsync() before returning from the fault handler. The new
MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag
is validated as supported by the filesystem's ->mmap() file
operation.
- Add support for the standard ACPI 6.2 label access methods that
replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods.
This enables interoperability with environments that only implement
the standardized methods.
- Add support for the ACPI 6.2 NVDIMM media error injection methods.
- Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for
latch last shutdown status, firmware update, SMART error injection,
and SMART alarm threshold control.
- Cleanup physical address information disclosures to be root-only.
- Fix revalidation of the DIMM "locked label area" status to support
dynamic unlock of the label area.
- Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA
(system-physical-address) command and error injection commands.
Acknowledgements that came after the commits were pushed to -next:
- 957ac8c421ad ("dax: fix PMD faults on zero-length files"):
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
- a39e596baa07 ("xfs: support for synchronous DAX faults") and
7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()")
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>"
* tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits)
acpi, nfit: add 'Enable Latch System Shutdown Status' command support
dax: fix general protection fault in dax_alloc_inode
dax: fix PMD faults on zero-length files
dax: stop requiring a live device for dax_flush()
brd: remove dax support
dax: quiet bdev_dax_supported()
fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core
tools/testing/nvdimm: unit test clear-error commands
acpi, nfit: validate commands against the device type
tools/testing/nvdimm: stricter bounds checking for error injection commands
xfs: support for synchronous DAX faults
xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()
ext4: Support for synchronous DAX faults
ext4: Simplify error handling in ext4_dax_huge_fault()
dax: Implement dax_finish_sync_fault()
dax, iomap: Add support for synchronous faults
mm: Define MAP_SYNC and VM_SYNC flags
dax: Allow tuning whether dax_insert_mapping_entry() dirties entry
dax: Allow dax_iomap_fault() to return pfn
dax: Fix comment describing dax_iomap_fault()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull more device mapper updates from Mike Snitzer:
"Given your expected travel I figured I'd get these fixes to you sooner
rather than later.
- a DM multipath stable@ fix to silence an annoying error message
that isn't _really_ an error
- a DM core @stable fix for discard support that was enabled for an
entire DM device despite only having partial support for discards
due to a mix of discard capabilities across the underlying devices.
- a couple other DM core discard fixes.
- a DM bufio @stable fix that resolves a 32-bit overflow"
* tag 'for-4.15/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm bufio: fix integer overflow when limiting maximum cache size
dm: clear all discard attributes in queue_limits when discards are disabled
dm: do not set 'discards_supported' in targets that do not need it
dm: discard support requires all targets in a table support discards
dm mpath: remove annoying message of 'blk_get_request() returned -11'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma maintainership update from Doug Ledford:
"Add Jason Gunthorpe as co-maintainer of the RDMA stack.
The change is simply to add Jason Gunthorpe to the MAINTAINERS file
for the RDMA stack (and update him in the .mailmap). Jason and I have
talked offline, and Jason will be filing a ticket with the k.o
helpdesk to get an account on k.o, and then we will likely move the
rdma tree to an area where we can both access it and use a shared repo
+ individual topic branches + merged up for-next branch as the staging
basis for each release.
Timing here is nice because in the US we are headed into a holiday
period whereas Jason will be around to keep the patch flow progressing
(I guess Canadians do their equivalent to Thanksgiving in October, so
he doesn't have an excuse to ignore email for the next week ;-))"
[ .. and Konstantin already got Jason his kernel.org account between the
pull request and this actual pull - Linus ]
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA: Add Jason Gunthorpe as a co-maintainer
|
|
The intel-iommu DMA ops fail to correctly handle scatterlists where
sg->offset is greater than PAGE_SIZE - the IOVA allocation is computed
appropriately based on the page-aligned portion of the offset, but the
mapping is set up relative to sg->page, which means it fails to actually
cover the whole buffer (and in the worst case doesn't cover it at all):
(sg->dma_address + sg->dma_len) ----+
sg->dma_address ---------+ |
iov_pfn------+ | |
| | |
v v v
iova: a b c d e f
|--------|--------|--------|--------|--------|
<...calculated....>
[_____mapped______]
pfn: 0 1 2 3 4 5
|--------|--------|--------|--------|--------|
^ ^ ^
| | |
sg->page ----+ | |
sg->offset --------------+ |
(sg->offset + sg->length) ----------+
As a result, the caller ends up overrunning the mapping into whatever
lies beyond, which usually goes badly:
[ 429.645492] DMAR: DRHD: handling fault status reg 2
[ 429.650847] DMAR: [DMA Write] Request device [02:00.4] fault addr f2682000 ...
Whilst this is a fairly rare occurrence, it can happen from the result
of intermediate scatterlist processing such as scatterwalk_ffwd() in the
crypto layer. Whilst that particular site could be fixed up, it still
seems worthwhile to bring intel-iommu in line with other DMA API
implementations in handling this robustly.
To that end, fix the intel_map_sg() path to line up the mapping
correctly (in units of MM pages rather than VT-d pages to match the
aligned_nrpages() calculation) regardless of the offset, and use
sg_phys() consistently for clarity.
Reported-by: Harsh Jain <Harsh@chelsio.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed by: Ashok Raj <ashok.raj@intel.com>
Tested by: Jacob Pan <jacob.jun.pan@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
We cap the upper bound of "fbnum" but we also need to check for
negatives or make the type unsigned.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
|
|
The pointer fbi is dereferenced with par = fbi->par before there is a
null check on fbi, hence there is a potential null pointer dereference
on a null par. Fix this by moving the dereference after the null
pointer check.
Detected by CoverityScan, CID#1461301 ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
|
|
A new field was introduced in 74d46992e0d9, bi_partno, instead of using
bdev->bd_contains and encoding the partition information in the bi_bdev
field. __bio_clone_fast was changed to copy the disk information, but
not the partition information. At minimum, this regressed bcache and
caused data corruption.
Signed-off-by: Michael Lyle <mlyle@lyle.org>
Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reported-by: Pavel Goran <via-bcache@pvgoran.name>
Reported-by: Campbell Steven <casteven@gmail.com>
Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: <stable@vger.kernel.org> # 4.14
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A system booted with a small number of cores enabled per package
panics because the estimate of __max_logical_packages is too low.
This occurs when the total number of active cores across all packages is
less than the maximum core count for a single package. e.g.:
On a 4 package system with 20 cores/package where only 4 cores are
enabled on each package, the value of __max_logical_packages is
calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
Calculate __max_logical_packages after the cpu enumeration has completed.
Use the boot cpu's data to extrapolate the number of packages.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: He Chen <he.chen@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Piotr Luc <piotr.luc@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
|
|
Analyzing large early boot allocations unveiled the logical package id
storage as a prominent memory waste. Since commit 1f12e32f4cd5
("x86/topology: Create logical package id") every 64-bit system allocates a
128k array to convert logical package ids.
This happens because the array is sized for MAX_LOCAL_APIC which is always
32k on 64bit systems, and it needs 4 bytes for each entry.
This is fairly wasteful, especially for the common case of having only one
socket, which uses exactly 4 byte out of 128K.
There is no user of the package id map which is performance critical, so
the lookup is not required to be O(1). Store the logical processor id in
cpu_data and use a loop based lookup.
To keep the mapping stable accross cpu hotplug operations, add a flag to
cpu_data which is set when the CPU is brought up the first time. When the
flag is set, then cpu_data is not reinitialized by copying boot_cpu_data on
subsequent bringups.
[ tglx: Rename the flag to 'initialized', use proper pointers instead of
repeated cpu_data(x) evaluation and massage changelog. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: He Chen <he.chen@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Piotr Luc <piotr.luc@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171114124257.22013-3-prarit@redhat.com
|
|
The SNB-EP uncore driver is the only user of topology_phys_to_logical_pkg
in a performance critical path.
Change it query the logical pkg ID only once at initialization time and
then cache it in box structure. This allows to change the logical package
management without affecting the performance critical path.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: He Chen <he.chen@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Piotr Luc <piotr.luc@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20171114124257.22013-2-prarit@redhat.com
|
|
The new function mp_register_ioapic_irq() is a subset of the code in
mp_override_legacy_irq().
Replace the code duplication by invoking mp_register_ioapic_irq() from
mp_override_legacy_irq().
Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: kkamagui@gmail.com
Cc: linux-acpi@vger.kernel.org
Link: https://lkml.kernel.org/r/1510848825-21965-3-git-send-email-vikas.cha.sajjan@hpe.com
|
|
Platforms which support only IOAPIC mode, pass the SCI information above
the legacy space (0-15) via the FADT mechanism and not via MADT.
In such cases mp_override_legacy_irq() which is invoked from
acpi_sci_ioapic_setup() to register SCI interrupts fails for interrupts
greater equal 16, since it is meant to handle only the legacy space and
emits error "Invalid bus_irq %u for legacy override".
Add a new function to handle SCI interrupts >= 16 and invoke it
conditionally in acpi_sci_ioapic_setup().
The code duplication due to this new function will be cleaned up in a
separate patch.
Co-developed-by: Sunil V L <sunil.vl@hpe.com>
Signed-off-by: Vikas C Sajjan <vikas.cha.sajjan@hpe.com>
Signed-off-by: Sunil V L <sunil.vl@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Abdul Lateef Attar <abdul-lateef.attar@hpe.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: kkamagui@gmail.com
Cc: linux-acpi@vger.kernel.org
Link: https://lkml.kernel.org/r/1510848825-21965-2-git-send-email-vikas.cha.sajjan@hpe.com
|
|
When crosvm is used to boot a kernel as a VM, the SMP MP-table is found
at physical address 0x0. This causes mpf_base to be set to 0 and a
subsequent "if (!mpf_base)" check in default_get_smp_config() results in
the MP-table not being parsed. Further into the boot this results in an
oops when attempting a read_apic_id().
Add a boolean variable that is set to true when the MP-table is found.
Use this variable for testing if the MP-table was found so that even a
value of 0 for mpf_base will result in continued parsing of the MP-table.
Fixes: 5997efb96756 ("x86/boot: Use memremap() to map the MPF and MPC data")
Reported-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: regression@leemhuis.info
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171106201753.23059.86674.stgit@tlendack-t1.amdoffice.net
|
|
Fix duplicates for sparc and parisc. This was due these following commits.
1. commit 4c97a0c8fee3 ("arch: define CPU_BIG_ENDIAN for all fixed big
endian archs")
2. commit 97d9f969161d ("arch/sparc: Define config parameter
CPU_BIG_ENDIAN")
3. commit 74ad3d28af21 ("parisc: Define CONFIG_CPU_BIG_ENDIAN")
Remove duplicates.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
While working on a qemu and SeaBIOS-port to parisc, those PDC structures are
useful to have accessible from userspace.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
parisc is big-endian only but sparse assumes the same endianness as the
building machine.
This is problematic for code which expect __BYTE_ORDER__ being correctly
predefined by the compiler which sparse can then pre-process differently
from what gcc would.
Fix this by letting sparse know about the architecture endianness.
To: James Bottomley <jejb@parisc-linux.org>
To: Helge Deller <deller@gmx.de>
CC: linux-parisc@vger.kernel.org
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Add topology support, including multi-core scheduler support on
PA8800/PA8900 CPUs and enhanced output in /proc/cpuinfo, e.g.
lscpu now reports on a single-socket, dual-core machine:
Architecture: parisc64
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
CPU family: PA-RISC 2.0
Model name: PA8800 (Mako)
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
As noted by Christoph Biedl, passing a pointer size of 4 in the new CAS
implementation causes a kernel crash. The attached patch corrects the
off by one error in the argument validity check.
In reviewing the code, I noticed that we only perform word operations
with the pointer size argument. The subi instruction intentionally uses
a word condition on 64-bit kernels. Nullification was used instead of a
cmpib instruction as the branch should never be taken. The shlw
pseudo-operation generates a depw,z instruction and it clears the target
before doing a shift left word deposit. Thus, we don't need to clip the
upper 32 bits of this argument on 64-bit kernels.
Tested with a gcc testsuite run with a 64-bit kernel. The gcc atomic
code in libgcc is the only direct user of the new CAS implementation
that I am aware of.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
* pm-core:
PM / runtime: Drop children check from __pm_runtime_set_status()
|
|
These bits were not defined until now in common code, but they are
now that the kernel supports UMIP too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The function for CPUID 80000001 ECX is set to 0xc0000001. Set it to
0x80000001.
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Fixes: d6321d493319 ("KVM: x86: generalize guest_cpuid_has_ helpers")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
reinjected to L2
vmx_check_nested_events() should return -EBUSY only in case there is a
pending L1 event which requires a VMExit from L2 to L1 but such a
VMExit is currently blocked. Such VMExits are blocked either
because nested_run_pending=1 or an event was reinjected to L2.
vmx_check_nested_events() should return 0 in case there are no
pending L1 events which requires a VMExit from L2 to L1 or if
a VMExit from L2 to L1 was done internally.
However, upstream commit which introduced blocking in case an event was
reinjected to L2 (commit acc9ab601327 ("KVM: nVMX: Fix pending events
injection")) contains a bug: It returns -EBUSY even if there are no
pending L1 events which requires VMExit from L2 to L1.
This commit fix this issue.
Fixes: acc9ab601327 ("KVM: nVMX: Fix pending events injection")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
According to 82093AA (IOAPIC) manual, Remote IRR and Delivery Status are
read-only. QEMU implements the bits as RO in commit 479c2a1cb7fb
("ioapic: keep RO bits for IOAPIC entry").
Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
IOAPICs without an EOI register. They simulate the EOI message manually
by changing the trigger mode to edge and then back to level, with the
entry being masked during this.
QEMU implements this feature in commit ed1263c363c9
("ioapic: clear remote irr bit for edge-triggered interrupts")
As a side effect, this commit removes an incorrect behavior where Remote
IRR was cleared when the redirection table entry was rewritten. This is not
consistent with the manual and also opens an opportunity for a strange
behavior when a redirection table entry is modified from an interrupt
handler that handles the same entry: The modification will clear the
Remote IRR bit even though the interrupt handler is still running.
Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Remote IRR for level-triggered interrupts was previously checked in
ioapic_set_irq, but since we now have a check in ioapic_service we
can remove the redundant check from ioapic_set_irq.
This commit doesn't change semantics.
Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Avoid firing a level-triggered interrupt that has the Remote IRR bit set,
because that means that some CPU is already processing it. The Remote
IRR bit will be cleared after an EOI and the interrupt will refire
if the irq line is still asserted.
This behavior is aligned with QEMU's IOAPIC implementation that was
introduced by commit f99b86b94987
("x86: ioapic: ignore level irq during processing") in QEMU.
Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
KVM uses ioapic_handled_vectors to track vectors that need to notify the
IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
interrupt with old configuration is pending or running and
ioapic_handled_vectors only remembers the newest configuration;
thus EOI from the old interrupt is not delievered to the IOAPIC.
A previous commit db2bdcbbbd32
("KVM: x86: fix edge EOI and IOAPIC reconfig race")
addressed this issue by adding pending edge-triggered interrupts to
ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
The commit explicitly ignored level-triggered interrupts,
but this race applies to them as well:
1) IOAPIC sends a level triggered interrupt vector to VCPU0
2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
to route the vector to VCPU1. The reconfiguration rewrites only the
upper 32 bits of the IOREDTBLn register. (Causes KVM to update
ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
3) VCPU0 sends EOI for the vector, but it's not delievered to the
IOAPIC because the ioapic_handled_vectors doesn't include the vector.
4) New interrupts are not delievered to VCPU1 because remote_irr bit
is set forever.
Therefore, the correct behavior is to add all pending and running
interrupts to ioapic_handled_vectors.
This commit introduces a slight performance hit similar to
commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
for the rare case that the vector is reused by a non-IOAPIC source on
VCPU0. We prefer to keep solution simple and not handle this case just
as the original commit does.
Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Sometimes, a processor might execute an instruction while another
processor is updating the page tables for that instruction's code page,
but before the TLB shootdown completes. The interesting case happens
if the page is in the TLB.
In general, the processor will succeed in executing the instruction and
nothing bad happens. However, what if the instruction is an MMIO access?
If *that* happens, KVM invokes the emulator, and the emulator gets the
updated page tables. If the update side had marked the code page as non
present, the page table walk then will fail and so will x86_decode_insn.
Unfortunately, even though kvm_fetch_guest_virt is correctly returning
X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as
a fatal error if the instruction cannot simply be reexecuted (as is the
case for MMIO). And this in fact happened sometimes when rebooting
Windows 2012r2 guests. Just checking ctxt->have_exception and injecting
the exception if true is enough to fix the case.
Thanks to Eduardo Habkost for helping in the debugging of this issue.
Reported-by: Yanan Fu <yfu@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Some guests use these unhandled MSRs very frequently.
This cause dmesg to be populated with lots of aggregated messages on
usage of ignored MSRs. As ignore_msrs=true means that the user is
well-aware his guest use ignored MSRs, allow to also disable the
prints on their usage.
An example of such guest is ESXi which tends to access a lot to MSR
0x34 (MSR_SMI_COUNT) very frequently.
In addition, we have observed this to cause unnecessary delays to
guest execution. Such an example is ESXi which experience networking
delays in it's guests (L2 guests) because of these prints (even when
prints are rate-limited). This can easily be reproduced by pinging
from one L2 guest to another. Once in a while, a peak in ping RTT
will be observed. Removing these unhandled MSR prints solves the
issue.
Because these prints can help diagnose issues with guests,
this commit only suppress them by a module parameter instead of
removing them from code entirely.
Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[Changed suppress_ignore_msrs_prints to report_ignored_msrs - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Commit 9d643f63128b ("KVM: x86: avoid large stack allocations in
em_fxrstor") optimize the stack size, but introduced a guest memory access
which might sleep while in atomic.
Fix it by introducing, again, a second fxregs_state. Try to avoid
large stacks by using noinline. Add some helpful comments.
Reported by syzbot:
in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109
2 locks held by syzkaller879109/2909:
#0: (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>] vcpu_load+0x1c/0x70
arch/x86/kvm/../../../virt/kvm/kvm_main.c:154
#1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest
arch/x86/kvm/x86.c:6983 [inline]
#1: (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run
arch/x86/kvm/x86.c:7061 [inline]
#1: (&kvm->srcu){....}, at: [<ffffffff810dd162>]
kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222
CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted 4.13.0-rc4-next-20170811
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014
__might_sleep+0x95/0x190 kernel/sched/core.c:5967
__might_fault+0xab/0x1d0 mm/memory.c:4383
__copy_from_user include/linux/uaccess.h:71 [inline]
__kvm_read_guest_page+0x58/0xa0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771
kvm_vcpu_read_guest_page+0x44/0x60
arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791
kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407
kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466
segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819
em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022
x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471
x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698
kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854
handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400
vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718
vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline]
vcpu_run arch/x86/kvm/x86.c:7061 [inline]
kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222
kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x437fc9
RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000
R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000
Fixes: 9d643f63128b ("KVM: x86: avoid large stack allocations in em_fxrstor")
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure
properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls in L1)
null pointer deference and also L0 calltrace when EPT=0 on both L0 and L1.
In L1:
BUG: unable to handle kernel paging request at ffffffffc015bf8f
IP: vmx_vcpu_run+0x202/0x510 [kvm_intel]
PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161
Oops: 0003 [#1] PREEMPT SMP
CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6
RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel]
Call Trace:
WARNING: kernel stack frame pointer at ffffb86f4988bc18 in qemu-system-x86:1798 has bad value 0000000000000002
In L0:
-----------[ cut here ]------------
WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9845 vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G OE 4.14.0-rc7+ #25
RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
Call Trace:
paging64_page_fault+0x500/0xde0 [kvm]
? paging32_gva_to_gpa_nested+0x120/0x120 [kvm]
? nonpaging_page_fault+0x3b0/0x3b0 [kvm]
? __asan_storeN+0x12/0x20
? paging64_gva_to_gpa+0xb0/0x120 [kvm]
? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm]
? lock_acquire+0x2c0/0x2c0
? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel]
? vmx_get_segment+0x2a6/0x310 [kvm_intel]
? sched_clock+0x1f/0x30
? check_chain_key+0x137/0x1e0
? __lock_acquire+0x83c/0x2420
? kvm_multiple_exception+0xf2/0x220 [kvm]
? debug_check_no_locks_freed+0x240/0x240
? debug_smp_processor_id+0x17/0x20
? __lock_is_held+0x9e/0x100
kvm_mmu_page_fault+0x90/0x180 [kvm]
kvm_handle_page_fault+0x15c/0x310 [kvm]
? __lock_is_held+0x9e/0x100
handle_exception+0x3c7/0x4d0 [kvm_intel]
vmx_handle_exit+0x103/0x1010 [kvm_intel]
? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm]
The commit avoids to load host state of vmcs12 as vmcs01's guest state
since vmcs12 is not modified (except for the VM-instruction error field)
if the checking of vmcs control area fails. However, the mmu context is
switched to nested mmu in prepare_vmcs02() and it will not be reloaded
since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME
fails. This patch fixes it by reloading mmu context when nested
VMLAUNCH/VMRESUME fails.
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
According to the SDM, if the "load IA32_BNDCFGS" VM-entry controls is 1, the
following checks are performed on the field for the IA32_BNDCFGS MSR:
- Bits reserved in the IA32_BNDCFGS MSR must be 0.
- The linear address in bits 63:12 must be canonical.
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Pedro reported:
During tests that we conducted on KVM, we noticed that executing a "PUSH %ES"
instruction under KVM produces different results on both memory and the SP
register depending on whether EPT support is enabled. With EPT the SP is
reduced by 4 bytes (and the written value is 0-padded) but without EPT support
it is only reduced by 2 bytes. The difference can be observed when the CS.DB
field is 1 (32-bit) but not when it's 0 (16-bit).
The internal segment descriptor cache exist even in real/vm8096 mode. The CS.D
also should be respected instead of just default operand/address-size/66H
prefix/67H prefix during instruction decoding. This patch fixes it by also
adjusting operand/address-size according to CS.D.
Reported-by: Pedro Fonseca <pfonseca@cs.washington.edu>
Tested-by: Pedro Fonseca <pfonseca@cs.washington.edu>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Pedro Fonseca <pfonseca@cs.washington.edu>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
In case of instruction-decode failure or emulation failure,
x86_emulate_instruction() will call reexecute_instruction() which will
attempt to use the cr2 value passed to x86_emulate_instruction().
However, when x86_emulate_instruction() is called from
emulate_instruction(), cr2 is not passed (passed as 0) and therefore
it doesn't make sense to execute reexecute_instruction() logic at all.
Fixes: 51d8b66199e9 ("KVM: cleanup emulate_instruction")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
On this case, handle_emulation_failure() fills kvm_run with
internal-error information which it expects to be delivered
to user-mode for further processing.
However, the code reports a wrong return-value which makes KVM to never
return to user-mode on this scenario.
Fixes: 6d77dbfc88e3 ("KVM: inject #UD if instruction emulation fails and exit to
userspace")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
Instruction emulation after trapping a #UD exception can result in an
MMIO access, for example when emulating a MOVBE on a processor that
doesn't support the instruction. In this case, the #UD vmexit handler
must exit to user mode, but there wasn't any code to do so. Add it for
both VMX and SVM.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
When running L2, #UD should be intercepted by L1 or just forwarded
directly to L2. It should not reach L0 x86 emulator.
Therefore, set intercept for #UD only based on L1 exception-bitmap.
Also add WARN_ON_ONCE() on L0 #UD intercept handlers to make sure
it is never reached while running L2.
This improves commit ae1f57670703 ("KVM: nVMX: Do not emulate #UD while
in guest mode") by removing an unnecessary exit from L2 to L0 on #UD
when L1 doesn't intercept it.
In addition, SVM L0 #UD intercept handler doesn't handle correctly the
case it is raised from L2. In this case, it should forward the #UD to
guest instead of x86 emulator. As done in VMX #UD intercept handler.
This commit fixes this issue as-well.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
When guest passes KVM it's pvclock-page GPA via WRMSR to
MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize
pvclock-page to some start-values. It just requests a clock-update which
will happen before entering to guest.
The clock-update logic will call kvm_setup_pvclock_page() to update the
pvclock-page with info. However, kvm_setup_pvclock_page() *wrongly*
assumes that the version-field is initialized to an even number. This is
wrong because at first-time write, field could be any-value.
Fix simply makes sure that if first-time version-field is odd, increment
it once more to make it even and only then start standard logic.
This follows same logic as done in other pvclock shared-pages (See
kvm_write_wall_clock() and record_steal_time()).
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
To simplify testing of these rarely used code paths, add a module parameter
that turns it on. One eventinj.flat test (NMI after iret) fails when
loading kvm_intel with vnmi=0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
This is more or less a revert of commit 2c82878b0cb3 ("KVM: VMX: require
virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines
only had virtual NMIs in some SKUs.
The revert is not trivial because in the meanwhile there have been several
fixes to nested NMI injection. Therefore, the entire vNMI state is moved
to struct loaded_vmcs.
Another change compared to before the patch is a simplification here:
if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked &&
!(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis(
get_vmcs12(vcpu))))) {
The final condition here is always true (because nested_cpu_has_virtual_nmis
is always false) and is removed.
Fixes: 2c82878b0cb38fd516fd612c67852a6bbf282003
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
For many years some users of assigned devices have reported worse
performance on AMD processors with NPT than on AMD without NPT,
Intel or bare metal.
The reason turned out to be that SVM is discarding the guest PAT
setting and uses the default (PA0=PA4=WB, PA1=PA5=WT, PA2=PA6=UC-,
PA3=UC). The guest might be using a different setting, and
especially might want write combining but isn't getting it
(instead getting slow UC or UC- accesses).
Thanks a lot to geoff@hostfission.com for noticing the relation
to the g_pat setting. The patch has been tested also by a bunch
of people on VFIO users forums.
Fixes: 709ddebf81cb40e3c36c6109a7892e8b93a09464
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196409
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
GICv4 Support for KVM/ARM for v4.15
|
|
We got a regression report about the HD-audio HDMI chmap, where some
surround channels are reported as UNKNOWN. The git bisection pointed
the culprit at the commit 9b3dc8aa3fb1 ("ALSA: hda - Register chmap
obj as priv data instead of codec"). The story behind scene is like
this:
- While moving the code out of the legacy HDA to the HDA common place,
the patch modifies the code to obtain the chmap array indirectly in
a byte array, and it expands it to kctl value array.
- At the latter operation, the size of the array is wrongly passed by
sizeof() to the pointer.
- It can be 4 on 32bit arch, thus too short for 6+ channels.
(And that's the reason why it didn't hit other persons; it's 8 on
64bit arch, thus it's usually enough.)
The code was further changed meanwhile, but the problem persisted.
Let's fix it by correctly evaluating the array size.
Fixes: 9b3dc8aa3fb1 ("ALSA: hda - Register chmap obj as priv data instead of codec")
Reported-by: VDR User <user.vdr@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|