Age | Commit message (Collapse) | Author |
|
for-5.5/drivers-post
Pull NVMe changes from Keith:
"- The only new feature is the optional hwmon support for nvme (Guenter
and Akinobu)
- A universal work-around for controllers reading discard payloads
beyond the range boundary (Eduard)
- Chaitanya graciously agreed to share the target driver maintenance"
* 'nvme-5.5' of git://git.infradead.org/nvme:
nvme: hwmon: add quirk to avoid changing temperature threshold
nvme: hwmon: provide temperature min and max values for each sensor
nvmet: add another maintainer
nvme: Discard workaround for non-conformant devices
nvme: Add hardware monitoring support
|
|
Considering the previous changes, this is a more proper name.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20191121011601.20611-5-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Drop the rbt_memtype_*() call rbt_ prefix, as we no longer use
an rbtree directly.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20191121011601.20611-4-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Get rid of the passing the rb_root down the helper calls; there
is only one: &memtype_rbroot.
No change in functionality.
[ mingo: Fixed the changelog which described a different version of the patch. ]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20191121011601.20611-3-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With some considerations, the custom pat_rbtree implementation can be
simplified to use most of the generic interval_tree machinery:
- The tree inorder traversal can slightly differ when there are key
('start') collisions in the tree due to one going left and another right.
This, however, only affects the output of debugfs' pat_memtype_list file.
- Generic interval trees are now fully closed [a, b], for which we need
to adjust the last endpoint (ie: end - 1).
- Erasing logic must remain untouched as well.
- In order for the types to remain u64, the 'memtype_interval' calls are
introduced, as opposed to simply using struct interval_tree.
In addition, the PAT tree might potentially also benefit by the fast overlap
detection for the insertion case when looking up the first overlapping node
in the tree.
No change in behavior is intended.
Finally, I've tested this on various servers, via sanity warnings, running
side by side with the current version and so far see no differences in the
returned pointer node when doing memtype_rb_lowest_match() lookups.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20191121011601.20611-2-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing
the value of the temperature threshold feature for specific devices that
show undesirable behavior.
Guenter reported:
"On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the
Composite temperature sensor results in a temperature warning, and that
warning is sticky until I reset the controller.
It doesn't seem to matter which temperature I write; writing -273000 has
the same result."
The Intel NVMe has the latest firmware version installed, so this isn't
a problem that was ever fixed.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
According to the NVMe specification, the over temperature threshold and
under temperature threshold features shall be implemented for Composite
Temperature if a non-zero WCTEMP field value is reported in the Identify
Controller data structure. The features are also implemented for all
implemented temperature sensors (i.e., all Temperature Sensor fields that
report a non-zero value).
This provides the over temperature threshold and under temperature
threshold for each sensor as temperature min and max values of hwmon
sysfs attributes.
The WCTEMP is already provided as a temperature max value for Composite
Temperature, but this change isn't incompatible. Because the default
value of the over temperature threshold for Composite Temperature is
the WCTEMP.
Now the alarm attribute for Composite Temperature indicates one of the
temperature is outside of a temperature threshold. Because there is only
a single bit in Critical Warning field that indicates a temperature is
outside of a threshold.
Example output from the "sensors" command:
nvme-pci-0100
Adapter: PCI adapter
Composite: +33.9°C (low = -273.1°C, high = +69.8°C)
(crit = +79.8°C)
Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C)
This also adds helper macros for kelvin from/to milli Celsius conversion,
and replaces the repeated code in hwmon.c.
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Sagi and I have been pretty busy lately, and Chaitanya has been
helping a lot with target work and agreed to share the load.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Now the USB gadget subsystem can use the USB debugfs root directory,
so move it's directory from the root of the debugfs filesystem into
the root of usb
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Link: https://lore.kernel.org/r/1574232183-5760-3-git-send-email-chunfeng.yun@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now the USB gadget subsystem can use the USB debugfs root directory,
so move it's directory from the root of the debugfs filesystem into
the root of usb
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Link: https://lore.kernel.org/r/1574232183-5760-2-git-send-email-chunfeng.yun@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now the USB gadget subsystem can use the USB debugfs root directory,
so move musb's directory from the root of the debugfs filesystem into
the root of usb
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Link: https://lore.kernel.org/r/1574232183-5760-1-git-send-email-chunfeng.yun@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We really don't need this, as the slow path will do the right thing
anyway.
This reverts commit 6952a7f8446ee85ea9d10ab87b64797a031eaae3.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Using a mask to represent bus DMA constraints has a set of limitations.
The biggest one being it can only hold a power of two (minus one). The
DMA mapping code is already aware of this and treats dev->bus_dma_mask
as a limit. This quirk is already used by some architectures although
still rare.
With the introduction of the Raspberry Pi 4 we've found a new contender
for the use of bus DMA limits, as its PCIe bus can only address the
lower 3GB of memory (of a total of 4GB). This is impossible to represent
with a mask. To make things worse the device-tree code rounds non power
of two bus DMA limits to the next power of two, which is unacceptable in
this case.
In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
over the tree and treat it as such. Note that dev->bus_dma_limit should
contain the higher accessible DMA address.
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into dma-mapping-for-next
Pull in a stable branch from the arm64 tree that adds the zone_dma_bits
variable to avoid creating hard to resolve conflicts with that addition.
|
|
git://people.freedesktop.org/~gabbayo/linux into char-misc-next
Oded writes:
This tag contains the following changes for kernel 5.5:
- MMU code improvements that includes:
- Distinguish between "normal" unmapping and unmapping that is done as
part of the tear-down of a user process. This improves performance of
unmapping during reset of the device.
- Add future ASIC support in generic MMU code.
- Improve device reset code by adding more protection around accessing the
device during the reset process.
- Add new H/W queue type for future ASIC support
- Add more information to be retrieved by users through INFO IOCTL:
- clock rate
- board name
- reset counters
- Small bug fixes and minor improvements to code.
* tag 'misc-habanalabs-next-2019-11-21' of git://people.freedesktop.org/~gabbayo/linux: (31 commits)
habanalabs: add more protection of device during reset
habanalabs: flush EQ workers in hard reset
habanalabs: make the reset code more consistent
habanalabs: expose reset counters via existing INFO IOCTL
habanalabs: make code more concise
habanalabs: use defines for F/W files
habanalabs: remove prints on successful device initialization
habanalabs: remove unnecessary checks
habanalabs: invalidate MMU cache only once
habanalabs: skip VA block list update in reset flow
habanalabs: optimize MMU unmap
habanalabs: prevent read/write from/to the device during hard reset
habanalabs: split MMU properties to PCI/DRAM
habanalabs: re-factor MMU masks and documentation
habanalabs: type specific MMU cache invalidation
habanalabs: re-factor memory module code
habanalabs: export uapi defines to user-space
habanalabs: don't print error when queues are full
habanalabs: increase max jobs number to 512
habanalabs: set ETR as non-secured
...
|
|
Requests that triggers flushing volatile writeback cache to disk (barriers)
have significant effect to overall performance.
Block layer has sophisticated engine for combining several flush requests
into one. But there is no statistics for actual flushes executed by disk.
Requests which trigger flushes usually are barriers - zero-size writes.
This patch adds two iostat counters into /sys/class/block/$dev/stat and
/proc/diskstats - count of completed flush requests and their total time.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The earlier patch introduced a typo, change LOWPART back to
LOPART.
Fixes: 176ed98c8a76 ("y2038: vdso: powerpc: avoid timespec references")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Add pin mux configuration for the OTG VBUS pin of the JZ4770.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lore.kernel.org/r/20191119155211.102527-2-paul@crapouillou.net
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
This makes the driver support the 'output-low' and 'output-high'
devicetree properties in gpio-hog sub-nodes.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Link: https://lore.kernel.org/r/20191119155211.102527-1-paul@crapouillou.net
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/1574306382-32516-1-git-send-email-krzk@kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Version 1.1v6 of pin list has some changes in pin names for Intel Lewisburg.
Update the driver accordingly.
Note, it reveals the bug in the driver that misses two pins in GPP_L and
has rather two extra ones. That's why the ordering of some groups is changed.
Fixes: e480b745386e ("pinctrl: intel: Add Intel Lewisburg GPIO support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20191120133739.54332-1-andriy.shevchenko@linux.intel.com
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
For the repositories we keep on git.kernel.org replace my email
to be on the same domain for sake of consistency.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20191118135258.37574-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
As explained in the following commit a9a1a4833613 ("pinctrl:
armada-37xx: Fix gpio interrupt setup") the armada_37xx_irq_set_type()
function can be called before the initialization of the mask field.
That means that we can't use this field in this function and need to
workaround it using hwirq.
Fixes: 30ac0d3b0702 ("pinctrl: armada-37xx: Add edge both type gpio irq support")
Cc: stable@vger.kernel.org
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Link: https://lore.kernel.org/r/20191115155752.2562-1-gregory.clement@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Add kernel AUX area sampling definitions, which brings perf_event.h into
line with the kernel version.
New sample type PERF_SAMPLE_AUX requests a sample of the AUX area
buffer. New perf_event_attr member 'aux_sample_size' specifies the
desired size of the sample.
Also add support for parsing samples containing AUX area data i.e.
PERF_SAMPLE_AUX.
Committer notes:
I squashed the first two patches in this series to avoid breaking
automatic bisection, i.e. after applying only the original first patch
in this series we would have:
# perf test -v parsing
26: Sample parsing :
--- start ---
test child forked, pid 17018
sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating
test child finished with -1
---- end ----
Sample parsing: FAILED!
#
With the two paches combined:
# perf test parsing
26: Sample parsing : Ok
#
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20191115124225.5247-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add dt bindings document for pinmux & GPIO controller driver of
Intel Lightning Mountain SoC.
Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Link: https://lore.kernel.org/r/b59afc497e41404fea06aa48d633cba183ee944d.1573797249.git.rahul.tanwar@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Intel Lightning Mountain SoC has a pinmux controller & GPIO controller IP which
controls pin multiplexing & configuration including GPIO functions selection &
GPIO attributes configuration.
This IP is not based on & does not have anything in common with Chassis
specification. The pinctrl drivers under pinctrl/intel/* are all based upon
Chassis spec compliant pinctrl IPs. So this driver doesn't fit & can not use
pinctrl framework under pinctrl/intel/* and it requires a separate new driver.
Add a new GPIO & pin control framework based driver for this IP.
Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Link: https://lore.kernel.org/r/33e649758b70490f01724a887c490d5008c7656d.1573797249.git.rahul.tanwar@linux.intel.com
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20191121132856.29130-1-krzk@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20191121132901.29186-1-krzk@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20191121132905.29248-1-krzk@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20191121132910.29310-1-krzk@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20191121132914.29368-1-krzk@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Adjust indentation from seven spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20191121132842.28942-1-krzk@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Having static inline oneliner does not benefit too much when it is
only called from another oneliner function. Remove some of the
'onion'. This simplifies also the coming usage of the gpiolib
defines. We can do conversion from chip bits to gpiolib direction
defines as last step in the get_direction callback. Drivers can
use chip specific values in driver internal functions and do
conversion only once.
Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20191113071045.GA22110@localhost.localdomain
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
If DEBUG_FS=n, compile fails with the following error:
kernel/trace/trace.c: In function 'tracing_init_dentry':
kernel/trace/trace.c:8658:9: error: passing argument 3 of 'debugfs_create_automount' from incompatible pointer type [-Werror=incompatible-pointer-types]
8658 | trace_automount, NULL);
| ^~~~~~~~~~~~~~~
| |
| struct vfsmount * (*)(struct dentry *, void *)
In file included from kernel/trace/trace.c:24:
./include/linux/debugfs.h:206:25: note: expected 'struct vfsmount * (*)(void *)' but argument is of type 'struct vfsmount * (*)(struct dentry *, void *)'
206 | struct vfsmount *(*f)(void *),
| ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Link: https://lore.kernel.org/r/20191121102021787.MLMY.25002.ppp.dion.ne.jp@dmta0003.auone-net.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit 8e12257dead7 ("of: property: Add device link support for iommus,
mboxes and io-channels") added device link support for IOMMU linkages
described using the "iommus" property. For PCI devices, this property
is not present and instead the "iommu-map" property is used on the host
bridge node to map the endpoint RequesterIDs to their corresponding
IOMMU instance.
Add support for "iommu-map" to the device link supplier bindings so that
probing of PCI devices can be deferred until after the IOMMU is
available.
Cc: Rob Herring <robh@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20191120190028.4722-1-will@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The of_is_ancestor_of() function was renamed from of_link_is_valid()
based on review feedback. The rename meant the semantics of the function
had to be inverted, but this was missed in the earlier patch.
So, fix the semantics of of_is_ancestor_of() and invert the conditional
expressions where it is used.
Fixes: a3e1d1a7f5fc ("of: property: Add functional dependency link from DT bindings")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20191120080230.16007-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 7a7dab237027939cb95dc07c4647b80bad5fbbde. We found
out that there is still a race with RuntimePM. This can lead to a hang
when accessing the eMMC in some situations. Revert this change until the
RPM issue is fixed.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Preparatory work for shattering mmu.c into multiple files. Besides making it easier
to follow, this will also make it possible to write unit tests for various parts.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
apic-access-page
According to Intel SDM section 28.3.3.3/28.3.3.4 Guidelines for Use
of the INVVPID/INVEPT Instruction, the hypervisor needs to execute
INVVPID/INVEPT X in case CPU executes VMEntry with VPID/EPTP X and
either: "Virtualize APIC accesses" VM-execution control was changed
from 0 to 1, OR the value of apic_access_page was changed.
In the nested case, the burden falls on L1, unless L0 enables EPT in
vmcs02 but L1 enables neither EPT nor VPID in vmcs12. For this reason
prepare_vmcs02() and load_vmcs12_host_state() have special code to
request a TLB flush in case L1 does not use EPT but it uses
"virtualize APIC accesses".
This special case however is not necessary. On a nested vmentry the
physical TLB will already be flushed except if all the following apply:
* L0 uses VPID
* L1 uses VPID
* L0 can guarantee TLB entries populated while running L1 are tagged
differently than TLB entries populated while running L2.
If the first condition is false, the processor will flush the TLB
on vmentry to L2. If the second or third condition are false,
prepare_vmcs02() will request KVM_REQ_TLB_FLUSH. However, even
if both are true, no extra TLB flush is needed to handle the APIC
access page:
* if L1 doesn't use VPID, the second condition doesn't hold and the
TLB will be flushed anyway.
* if L1 uses VPID, it has to flush the TLB itself with INVVPID and
section 28.3.3.3 doesn't apply to L0.
* even INVEPT is not needed because, if L0 uses EPT, it uses different
EPTP when running L2 than L1 (because guest_mode is part of mmu-role).
In this case SDM section 28.3.3.4 doesn't apply.
Similarly, examining nested_vmx_vmexit()->load_vmcs12_host_state(),
one could note that L0 won't flush TLB only in cases where SDM sections
28.3.3.3 and 28.3.3.4 don't apply. In particular, if L0 uses different
VPIDs for L1 and L2 (i.e. vmx->vpid != vmx->nested.vpid02), section
28.3.3.3 doesn't apply.
Thus, remove this flush from prepare_vmcs02() and nested_vmx_vmexit().
Side-note: This patch can be viewed as removing parts of commit
fb6c81984313 ("kvm: vmx: Flush TLB when the APIC-access address changes”)
that is not relevant anymore since commit
1313cc2bd8f6 ("kvm: mmu: Add guest_mode to kvm_mmu_page_role”).
i.e. The first commit assumes that if L0 use EPT and L1 doesn’t use EPT,
then L0 will use same EPTP for both L0 and L1. Which indeed required
L0 to execute INVEPT before entering L2 guest. This assumption is
not true anymore since when guest_mode was added to mmu-role.
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
arch/x86/kvm/x86.c: In function kvm_make_scan_ioapic_request_mask:
arch/x86/kvm/x86.c:7911:7: warning: variable called set but not
used [-Wunused-but-set-variable]
It is not used since commit 7ee30bc132c6 ("KVM: x86: deliver KVM
IOAPIC scan request to target vCPUs")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
vmcs->apic_access_page is simply a token that the hypervisor puts into
the PFN of a 4KB EPTE (or PTE if using shadow-paging) that triggers
APIC-access VMExit or APIC virtualization logic whenever a CPU running
in VMX non-root mode read/write from/to this PFN.
As every write either triggers an APIC-access VMExit or write is
performed on vmcs->virtual_apic_page, the PFN pointed to by
vmcs->apic_access_page should never actually be touched by CPU.
Therefore, there is no need to mark vmcs02->apic_access_page as dirty
after unpin it on L2->L1 emulated VMExit or when L1 exit VMX operation.
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Conflicts:
arch/x86/kvm/vmx/vmx.c
|
|
Nick implements many features of nds32 such as perf, power management and
unaligned access handler. Let's add him as a maintainer.
Signed-off-by: Greentime Hu <green.hu@gmail.com>
|
|
Prevent accesses to the device (register read/write) from debugfs entries
during reset as that can cause the device to get stuck.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
During hard-reset, there can be multiple events received from the H/W. For
each event, the driver opens a worker thread to handle it. For some of the
events, the driver will read/write registers in the code that handles the
event.
In case of hard-reset, we must prevent reads/writes to the registers during
the reset operation because the device might get stuck if that happens.
Therefore, flush the EQ workers before resetting the device (in hard-reset
only). Additional events won't arrive as we synced and disabled the
interrupts.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
In the hl_device_reset we ask about the hard_reset argument when we want to
differentiate between soft and hard reset, except for three places where
we use "from_hard_reset_thread". Replace one of those locations with the
hard_reset argument as it is guaranteed that if we reached to that
line in the code during hard_reset, it is from a kernel thread.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
|
|
Expose both soft and hard reset counts via INFO IOCTL.
This will allow system management applications to easily check
if the device has undergone reset.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Instead of doing if inside if, just write them with && operator.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|
|
Make the code more concise and maintainable by using defines for the F/W
files.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
|