Age | Commit message (Collapse) | Author |
|
* pci/resource:
PCI: Remove res_to_dev_res() debug message
|
|
* pci/msi:
PCI/MSI: Update MSI/MSI-X bits in PCIEBUS-HOWTO
PCI/MSI: Document pci_alloc_irq_vectors(), deprecate pci_enable_msi()
PCI/MSI: Return -ENOSPC if pci_enable_msi_range() can't get enough vectors
PCI/portdrv: Use pci_irq_alloc_vectors()
PCI/MSI: Check that we have a legacy interrupt line before using it
PCI/MSI: Remove pci_msi_domain_{alloc,free}_irqs()
PCI/MSI: Remove unused pci_msi_create_default_irq_domain()
PCI/MSI: Return failure when msix_setup_entries() fails
PCI/MSI: Remove pci_enable_msi_{exact,range}()
amd-xgbe: Update PCI support to use new IRQ functions
[media] cobalt: use pci_irq_allocate_vectors()
PCI/MSI: Fix msi_capability_init() kernel-doc warnings
|
|
* pci/hotplug:
PCI: acpiphp_ibm: Make ibm_apci_table_attr __ro_after_init
PCI: rpadlpar: Remove unnecessary return statement
|
|
* pci/enumeration:
PCI: Remove duplicate check for positive return value from probe() functions
PCI: Enable PCIe Extended Tags if supported
PCI: Avoid possible deadlock on pci_lock and p->pi_lock
PCI/ACPI: Fix bus range comparison in pci_mcfg_lookup()
PCI: Apply _HPX settings only to relevant devices
|
|
* pci/dpc:
PCI/DPC: Wait for Root Port busy to clear
PCI/DPC: Decode extended reasons
|
|
* pci/aspm:
PCI/ASPM: Add comment about L1 substate latency
PCI/ASPM: Configure L1 substate settings
PCI/ASPM: Calculate and save the L1.2 timing parameters
PCI/ASPM: Read and set up L1 substate capabilities
PCI/ASPM: Add support for L1 substates
PCI/ASPM: Add L1 substate capability structure register definitions
|
|
* pci/aer:
PCI/AER: Remove unused .link_reset() callback
|
|
Update the MSI/MSI-X bits in PCIEBUS-HOWTO. Stop talking about low-level
details that mention deprecated APIs and concentrate on what service
drivers should do and why.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Yuval Mintz says:
====================
qed*: Add support for PTP
This patch series adds required changes for qed/qede drivers for
supporting the IEEE Precision Time Protocol (PTP).
Changes from previous versions:
v7: Fixed Kbuild robot warnings.
v6: Corrected broken loop iteration in previous version.
Reduced approximation error of adjfreq.
v5: Removed two divisions from the adjust-frequency loop.
Resulting logic would use 8 divisions [instead of 24].
v4: Remove the loop iteration for value '0' in the qed_ptp_hw_adjfreq()
implementation.
v3: Use div_s64 for 64-bit divisions as do_div gives error for signed
types.
Incorporated review comments from Richard Cochran.
- Clear timestamp resgisters as soon as timestamp is read.
- Use shift operation in the place of 'divide by 16'.
v2: Use do_div for 64-bit divisions.
====================
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the driver support for,
- Registering the ptp clock functionality with the OS.
- Timestamping the Rx/Tx PTP packets.
- Ethtool callbacks related to PTP.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The patch adds the required qed interfaces for configuring/reading
the PTP clock on the adapter.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Count buffer group drops or truncates as rx drops rather than
rx errors in netdev stats.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Arjun V <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 91572088e3fd ("net: use core MTU range checking in core net
infra") changed the openvswitch internal device to use the core net
infra for controlling the MTU range, but failed to actually set the
max_mtu as described in the commit message, which now defaults to
ETH_DATA_LEN.
This patch fixes this by setting max_mtu to ETH_MAX_MTU after
ether_setup() call.
Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When setting a neigh related sysctl parameter, we always send a
NETEVENT_DELAY_PROBE_TIME_UPDATE netevent. For instance, when
executing
sysctl net.ipv6.neigh.wlp3s0.retrans_time_ms=2000
a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent is generated.
This is caused by commit 2a4501ae18b5 ("neigh: Send a
notification when DELAY_PROBE_TIME changes"). According to the
commit's description, it was intended to generate such an event
when setting the "delay_first_probe_time" sysctl parameter.
In order to fix this, only generate this event when actually
setting the "delay_first_probe_time" sysctl parameter. This fix
should not have any unintended side-effects, because all but one
registered netevent callbacks check for other netevent event
types (the registered callbacks were obtained by grepping for
"register_netevent_notifier"). The only callback that uses the
NETEVENT_DELAY_PROBE_TIME_UPDATE event is
mlxsw_sp_router_netevent_event() (in
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c): in case
of this event, it only accesses the DELAY_PROBE_TIME of the
passed neigh_parms.
Fixes: 2a4501ae18b5 ("neigh: Send a notification when DELAY_PROBE_TIME changes")
Signed-off-by: Marcus Huewe <suse-tux@gmx.de>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next
Merge in cros-ec changes applied through MFD branch to resolve
conflicts.
|
|
This fixes broken build for !NET_CLS:
net/built-in.o: In function `fq_codel_destroy':
/home/sab/linux/net-next/net/sched/sch_fq_codel.c:468: undefined reference to `tcf_destroy_chain'
Fixes: cf1facda2f61 ("sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The xilinx_emaclite uses __raw_writel and __raw_readl for register
accesses. Those functions do not imply any kind of memory barriers and
they may be reordered.
The driver does not seem to take that into account, though, and the
driver does not satisfy the ordering requirements of the hardware.
For clear examples, see xemaclite_mdio_write() and xemaclite_mdio_read()
which try to set MDIO address before initiating the transaction.
I'm seeing system freezes with the driver with GCC 5.4 and current
Linux kernels on Zynq-7000 SoC immediately when trying to use the
interface.
In commit 123c1407af87 ("net: emaclite: Do not use microblaze and ppc
IO functions") the driver was switched from non-generic
in_be32/out_be32 (memory barriers, big endian) to
__raw_readl/__raw_writel (no memory barriers, native endian), so
apparently the device follows system endianness and the driver was
originally written with the assumption of memory barriers.
Rather than try to hunt for each case of missing barrier, just switch
the driver to use iowrite32/ioread32/iowrite32be/ioread32be depending
on endianness instead.
Tested on little-endian Zynq-7000 ARM SoC FPGA.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Fixes: 123c1407af87 ("net: emaclite: Do not use microblaze and ppc IO
functions")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
xilinx_emaclite looks at the received data to try to determine the
Ethernet packet length but does not properly clamp it if
proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
overflow and a panic via skb_panic() as the length exceeds the allocated
skb size.
Fix those cases.
Also add an additional unconditional check with WARN_ON() at the end.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Fixes: bb81b2ddfa19 ("net: add Xilinx emac lite device driver")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is needed to force a rebuild of bpf.o when one of its dependencies
(e.g. uapi/linux/bpf.h) is updated.
Add a phony target.
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Wang Nan <wangnan0@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove a useless ifdef __NR_bpf as requested by Wang Nan.
Inline one-line static functions as it was in the bpf_sys.h file.
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/r/828ab1ff-4dcf-53ff-c97b-074adb895006@huawei.com
Acked-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using a reader-writer lock in fast path is silly, when we can
instead use RCU or a seqlock.
For mlx4 hwstamp clock, a seqlock is the way to go, removing
two atomic operations and false sharing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This interrupt handler is broken in several ways:
- It loops forever when the op code is not decodeable
- It never returns IRQ_HANDLED because the only way to exit the loop
returns IRQ_NONE unconditionally.
The whole concept of this is broken. Creating devices in an interrupt
handler is beyond any point of sanity.
Make it at least behave halfways sane so accidental users do not have to
deal with a hard to debug lockup.
Fixes: e809c22b8fb028 ("goldfish: add the goldfish virtual bus")
Reported-by: Gabriel C <nix.or.die@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The goldfish platform code registers the platform device unconditionally
which causes havoc in several ways if the goldfish_pdev_bus driver is
enabled:
- Access to the hardcoded physical memory region, which is either not
available or contains stuff which is completely unrelated.
- Prevents that the interrupt of the serial port can be requested
- In case of a spurious interrupt it goes into a infinite loop in the
interrupt handler of the pdev_bus driver (which needs to be fixed
seperately).
Add a 'goldfish' command line option to make the registration opt-in when
the platform is compiled in.
I'm seriously grumpy about this engineering trainwreck, which has seven
SOBs from Intel developers for 50 lines of code. And none of them figured
out that this is broken. Impressive fail!
Fixes: ddd70cf93d78 ("goldfish: platform device for x86")
Reported-by: Gabriel C <nix.or.die@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Move all declarations and definitions in keyspan.h to keyspan.c, which
is the only place were they are used.
This specifically moves the driver device-id tables and usb-serial
driver definitions to the source file where they are expected to be
found.
While at it, fix up some multi-line comments and minor white-space
issues (spaces instead of tabs and superfluous white space).
Note that the information in the comment header of the removed header
file is also present in the source file.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
|
|
Move the driver device-id tables and usb-serial driver definitions to
the source file where they are expected to be found.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
|
|
Document pci_alloc_irq_vectors() instead of the deprecated pci_enable_msi()
and pci_enable_msix() APIs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
When a new disk shows up, sysfs queue directory is created before elevator
is registered. This allows a user to attempt a scheduler switch even though
the initial registration hasn't completed yet.
In one scenario, blk_register_queue() calls elv_register_queue() and
right before cfq_registered_queue() is called, another process executes
elevator_switch() and replaces q->elevator with deadline scheduler. When
cfq_registered_queue() executes it interprets e->elevator_data as struct
cfq_data even though it is actually struct deadline_data.
Grab q->sysfs_lock in blk_register_queue() to synchronize with sysfs
callers.
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In addition to making PME non-modular, d7def2040077 ("PCI/PME: Make
explicitly non-modular") removed the pcie_pme_driver .remove() method,
pcie_pme_remove().
pcie_pme_remove() freed the PME IRQ that was requested in pci_pme_probe().
The fact that we don't free the IRQ after d7def2040077 causes the following
crash when removing a PCIe port device via /sys:
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:370!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 1 PID: 14509 Comm: sh Tainted: G W 4.8.0-rc1-yh-00012-gd29438d
RIP: 0010:[<ffffffff9758bbf5>] free_msi_irqs+0x65/0x190
...
Call Trace:
[<ffffffff9758cda4>] pci_disable_msi+0x34/0x40
[<ffffffff97583817>] cleanup_service_irqs+0x27/0x30
[<ffffffff97583e9a>] pcie_port_device_remove+0x2a/0x40
[<ffffffff97584250>] pcie_portdrv_remove+0x40/0x50
[<ffffffff97576d7b>] pci_device_remove+0x4b/0xc0
[<ffffffff9785ebe6>] __device_release_driver+0xb6/0x150
[<ffffffff9785eca5>] device_release_driver+0x25/0x40
[<ffffffff975702e4>] pci_stop_bus_device+0x74/0xa0
[<ffffffff975704ea>] pci_stop_and_remove_bus_device_locked+0x1a/0x30
[<ffffffff97578810>] remove_store+0x50/0x70
[<ffffffff9785a378>] dev_attr_store+0x18/0x30
[<ffffffff97260b64>] sysfs_kf_write+0x44/0x60
[<ffffffff9725feae>] kernfs_fop_write+0x10e/0x190
[<ffffffff971e13f8>] __vfs_write+0x28/0x110
[<ffffffff970b0fa4>] ? percpu_down_read+0x44/0x80
[<ffffffff971e53a7>] ? __sb_start_write+0xa7/0xe0
[<ffffffff971e53a7>] ? __sb_start_write+0xa7/0xe0
[<ffffffff971e1f04>] vfs_write+0xc4/0x180
[<ffffffff971e3089>] SyS_write+0x49/0xa0
[<ffffffff97001a46>] do_syscall_64+0xa6/0x1b0
[<ffffffff9819201e>] entry_SYSCALL64_slow_path+0x25/0x25
...
RIP [<ffffffff9758bbf5>] free_msi_irqs+0x65/0x190
RSP <ffff89ad3085bc48>
---[ end trace f4505e1dac5b95d3 ]---
Segmentation fault
Restore pcie_pme_remove().
[bhelgaas: changelog]
Fixes: d7def2040077 ("PCI/PME: Make explicitly non-modular")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v4.9+
|
|
The create target ioctl takes a lun begin and lun end parameter, which
defines the range of luns to initialize a target with. If the user does
not set the parameters, it default to only using lun 0. Instead,
defaults to use all luns in the OCSSD, as it is the usual behaviour
users want.
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If one specifies the end lun id to be the absolute number of luns,
without taking zero indexing into account, the lightnvm core will pass
the off-by-one end lun id to target creation, which then panics during
nvm_ioctl_dev_create.
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The switch to the new DSA binding used "marvell,mv88e6095" for the
compatible property which doesn't exist, use "marvell,mv88e6085"
instead.
Fixes: 455b82f03f52 ("ARM: dts: armada-385-linksys: Utilize new DSA binding")
Signed-off-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
|
|
The OF graph API leaves too much of the graph walking to clients when
in many cases the driver doesn't care about accessing the port or
endpoint nodes. The drivers typically just want the device connected via
a particular graph connection. of_graph_get_remote_node provides this
functionality.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Philipp Zabel <p.zabel@pengutronix.de>
|
|
The destination address in a listening rdma_id does not have an address
family. Since address family in both sides of a connection must be the
same in rdma_bind_addr() we set the address family of the destination to
the address family of the source.
This patch serves the logic in cma_port_is_unique() which requires to
know if destination address that is associated with a rdma_id is any address
(cma_zero_addr() and cma_loopback_addr()).
This can happen when port reuse is checked for a port number
that is being listened to.
Fixes: 19b752a19dce ("IB/cma: Allow port reuse for rdma_id")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add new entry to the RDMA-CM configfs that allows users
to select default TOS for RDMA-CM QPs.
This is useful for users that want to control the TOS for legacy
applications without changing their code.
Application that sets the TOS explicitly using the rdma_set_option
API will continue to work as expected, meaning overriding the configfs
value.
CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This patch avoids unnecessary type casting from void to net_device.
CC: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When sending packet to destination that was not resolved yet
via path query, the driver keeps the skb and tries to re-send it
again when the path is resolved.
But when re-sending via dev_queue_xmit the kernel doesn't call
to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb
and to put the destination address inside them.
In that way the dev_start_xmit will have the correct destination,
and the driver won't take the destination from the skb->data, while
nothing exists there, which causes to packet be be dropped.
The test flow is:
1. Run the SM on remote node,
2. Restart the driver.
4. Ping some destination,
3. Observe that first ICMP request will be dropped.
Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The compatible should be 98dx4251 not 98dx4521.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
|
|
When the "ib_virt" cap is set, configuration of port capabilities need
to be done through mlx5_core_modify_hca_vport_context.
Since modify_hca_vport_context accepts mask and value, there is no need
to read the port capabilities and calculate the new cap values so we
avoid the mutex when ib_virt is set.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The hashtable and guarding spinlock are global data structures,
we can inititalize them statically.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170124212116.4568-1-david@redhat.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As pointed out by clang, we were not providing a prototype for a
function before using it:
util/parse-events.y:699:6: error: conflicting types for 'parse_events_error'
void parse_events_error(YYLTYPE *loc, void *data,
^
/tmp/build/perf/util/parse-events-bison.c:2224:7: note: previous implicit declaration is here
yyerror (&yylloc, _data, scanner, YY_("syntax error"));
^
/tmp/build/perf/util/parse-events-bison.c:65:25: note: expanded from macro 'yyerror'
#define yyerror parse_events_error
1 error generated.
One line fix it.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170215130605.GC4020@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
ath.git patches for 4.11. Major changes:
ath10k
* when trying older firmware versions don't confuse user with error messages
ath9k
* fix crash in AP mode (regression)
* fix relayfs crash (regression)
* fix initialisation with AR9340 and AR9550
|
|
Nested_vmx_run is split into two parts: the part that handles the
VMLAUNCH/VMRESUME instruction, and the part that modifies the vcpu state
to transition from VMX root mode to VMX non-root mode. The latter will
be used when restoring the checkpointed state of a vCPU that was in VMX
operation when a snapshot was taken.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The checks performed on the contents of the vmcs12 are extracted from
nested_vmx_run so that they can be used to validate a vmcs12 that has
been restored from a checkpoint.
Signed-off-by: Jim Mattson <jmattson@google.com>
[Change prepare_vmcs02 and nested_vmx_load_cr3's last argument to u32,
to match check_vmentry_postreqs. Update comments for singlestep
handling. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Perform the checks on vmcs12 state early, but defer the gpa->hpa lookups
until after prepare_vmcs02. Later, when we restore the checkpointed
state of a vCPU in guest mode, we will not be able to do the gpa->hpa
lookups when the restore is done.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Handle_vmptrld is split into two parts: the part that handles the
VMPTRLD instruction, and the part that establishes the current VMCS
pointer. The latter will be used when restoring the checkpointed state
of a vCPU that had a valid VMCS pointer when a snapshot was taken.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Handle_vmon is split into two parts: the part that handles the VMXON
instruction, and the part that modifies the vcpu state to transition
from legacy mode to VMX operation. The latter will be used when
restoring the checkpointed state of a vCPU that was in VMX operation
when a snapshot was taken.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Split prepare_vmcs12 into two parts: the part that stores the current L2
guest state and the part that sets up the exit information fields. The
former will be used when checkpointing the vCPU's VMX state.
Modify prepare_vmcs02 so that it can construct a vmcs02 midway through
L2 execution, using the checkpointed L2 guest state saved into the
cached vmcs12 above.
Signed-off-by: Jim Mattson <jmattson@google.com>
[Rebasing: add from_vmentry argument to prepare_vmcs02 instead of using
vmx->nested.nested_run_pending, because it is no longer 1 at the
point prepare_vmcs02 is called. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
is blocked", 2015-09-18) the posted interrupt descriptor is checked
unconditionally for PIR.ON. Therefore we don't need KVM_REQ_EVENT to
trigger the scan and, if NMIs or SMIs are not involved, we can avoid
the complicated event injection path.
Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
there since APICv was introduced.
However, without the KVM_REQ_EVENT safety net KVM needs to be much
more careful about races between vmx_deliver_posted_interrupt and
vcpu_enter_guest. First, the IPI for posted interrupts may be issued
between setting vcpu->mode = IN_GUEST_MODE and disabling interrupts.
If that happens, kvm_trigger_posted_interrupt returns true, but
smp_kvm_posted_intr_ipi doesn't do anything about it. The guest is
entered with PIR.ON, but the posted interrupt IPI has not been sent
and the interrupt is only delivered to the guest on the next vmentry
(if any). To fix this, disable interrupts before setting vcpu->mode.
This ensures that the IPI is delayed until the guest enters non-root mode;
it is then trapped by the processor causing the interrupt to be injected.
Second, the IPI may be issued between kvm_x86_ops->sync_pir_to_irr(vcpu)
and vcpu->mode = IN_GUEST_MODE. In this case, kvm_vcpu_kick is called
but it (correctly) doesn't do anything because it sees vcpu->mode ==
OUTSIDE_GUEST_MODE. Again, the guest is entered with PIR.ON but no
posted interrupt IPI is pending; this time, the fix for this is to move
the RVI update after IN_GUEST_MODE.
Both issues were mostly masked by the liberal usage of KVM_REQ_EVENT,
though the second could actually happen with VT-d posted interrupts.
In both race scenarios KVM_REQ_EVENT would cancel guest entry, resulting
in another vmentry which would inject the interrupt.
This saves about 300 cycles on the self_ipi_* tests of vmexit.flat.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Calls to apic_find_highest_irr are scanning IRR twice, once
in vmx_sync_pir_from_irr and once in apic_search_irr. Change
sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
now that it does the computation, it can also do the RVI write.
In order to avoid complications in svm.c, make the callback optional.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|