Age | Commit message (Collapse) | Author |
|
Writing an invalid schemata with no domain values (e.g., "(L3|MB):"),
results in a silent failure, i.e. the last_cmd_status returns OK,
Check for an empty value and set the result string with a proper error
message and return -EINVAL.
Before the fix:
# mkdir /sys/fs/resctrl/p1
# echo "L3:" > /sys/fs/resctrl/p1/schemata
(silent failure)
# cat /sys/fs/resctrl/info/last_cmd_status
ok
# echo "MB:" > /sys/fs/resctrl/p1/schemata
(silent failure)
# cat /sys/fs/resctrl/info/last_cmd_status
ok
After the fix:
# mkdir /sys/fs/resctrl/p1
# echo "L3:" > /sys/fs/resctrl/p1/schemata
-bash: echo: write error: Invalid argument
# cat /sys/fs/resctrl/info/last_cmd_status
Missing 'L3' value
# echo "MB:" > /sys/fs/resctrl/p1/schemata
-bash: echo: write error: Invalid argument
# cat /sys/fs/resctrl/info/last_cmd_status
Missing 'MB' value
[ Tony: This is an unintended side effect of the patch earlier to allow the
user to just write the value they want to change. While allowing
user to specify less than all of the values, it also allows an
empty value. ]
Fixes: c4026b7b95a4 ("x86/intel_rdt: Implement "update" mode when writing schemata file")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lkml.kernel.org/r/20171110191624.20280-1-tony.luck@intel.com
|
|
|
|
Add support for user space receive window (for the Fast thread-wakeup
coprocessor type)
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Define an interface to return a system-wide unique id for a given VAS
window.
The vas_win_id() will be used in a follow-on patch to generate an unique
handle for a user space receive window. Applications can use this handle
to pair send and receive windows for fast thread-wakeup.
The hardware refers to this system-wide unique id as a Partition Send
Window ID which is expected to be used during fault handling. Hence the
"pswid" in the function names.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Define an interface that the NX drivers can use to find the physical
paste address of a send window. This interface is expected to be used
with the mmap() operation of the NX driver's device. i.e the user space
process can use driver's mmap() operation to map the send window's paste
address into their address space and then use copy and paste instructions
to submit the CRBs to the NX engine.
Note that kernel drivers will use vas_paste_crb() directly and don't need
this interface.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
A CP_ABORT instruction is required in processes that have mapped a VAS
"paste address" with the intention of using COPY/PASTE instructions.
But since CP_ABORT is expensive, we want to restrict it to only
processes that use/intend to use COPY/PASTE.
Define an interface, set_thread_uses_vas(), that VAS can use to
indicate that the current process opened a send window. During context
switch, issue CP_ABORT only for processes that have the flag set.
Thanks for input from Nick Piggin, Michael Ellerman.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[mpe: Fix to not use new_thread after _switch() returns]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We need the SPRN_TIDR to be set for use with fast thread-wakeup (core-
to-core wakeup) and also with CAPI.
Each thread in a process needs to have a unique id within the process.
But for now, we assign globally unique thread ids to all threads in
the system.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
[mpe: Simplify tidr clearing on fork() and ctx switch code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Export the VAS Window context information to debugfs.
We need to hold a mutex when closing the window to prevent a race
with the debugfs read(). Rather than introduce a per-instance mutex,
we use the global vas_mutex for now, since it is not heavily contended.
The window->cop field is only relevant to a receive window so we were
not setting it for a send window (which is is paired to a receive window
anyway). But to simplify reporting in debugfs, set the 'cop' field for the
send window also.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Define a helper, chip_to_vas_id() to map a given chip id to corresponding
vas id.
Normally, callers of vas_rx_win_open() and vas_tx_win_open() want the VAS
window to be on the same chip where the calling thread is executing. These
callers can pass in -1 for the VAS id.
This interface will be useful if a thread running on one chip wants to open
a window on another chip (like the NX-842 driver does during start up).
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Create a cpu to vasid mapping so callers can specify -1 instead of
trying to find a VAS id.
Changelog[v2]
[Michael Ellerman] Use per-cpu variables to simplify code.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Normally, the NX driver waits for the CRBs to be processed before closing
the window. But it is better to ensure that the credits are returned before
the window gets reassigned later.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Save the configured max window credits for a window in the vas_window
structure. We will need this when polling for return of window credits.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
A VAS window is normally in "busy" state for only a short duration.
Reduce the time we wait for the window to go to "not-busy" state to
speed-up vas_win_close() a bit.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Use a helper to have the hardware unpin and mark a window closed.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Polling for window cast out is listed in the spec, but turns out that
it is not strictly necessary and slows down window close. Making it a
stub for now.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Clean up vas.h and the debug code around ifdef vas_debug.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
NX-842, the only user of VAS, sets the window credits to default values
but VAS should check the credits against the possible max values.
The VAS_WCREDS_MIN is not needed and can be dropped.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Initialize a few missing window context fields from the window attributes
specified by the caller. These fields are currently set to their default
values by the caller (NX-842), but would be good to apply them anyway.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
NACK'd by x86 maintainer.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Error injection is sloppy and very ad-hoc. BPF could fill this niche
perfectly with it's kprobe functionality. We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations. Accomplish this with the bpf_override_funciton
helper. This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function. This gives us a nice
clean way to implement systematic error injection for all of our code
paths.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: d004a5e7d4dd ("block: remove __bio_kmap_atomic")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This helper doesn't buy us much over calling kmap_atomic directly.
In fact in the only caller it does a bit of useless work as the
caller already has the bvec at hand, and said caller would even
buggy for a multi-segment bio due to the use of this helper.
So just remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Building 32-bit MIPS64r2 kernels produces warnings like the following
on certain toolchains (such as GNU assembler 2.24.90, but not GNU
assembler 2.28.51) since commit 22b8ba765a72 ("MIPS: Fix MIPS64 FP
save/restore on 32-bit kernels"), due to the exposure of fpu_save_16odd
from fpu_save_double and fpu_restore_16odd from fpu_restore_double:
arch/mips/kernel/r4k_fpu.S:47: Warning: float register should be even, was 1
...
arch/mips/kernel/r4k_fpu.S:59: Warning: float register should be even, was 1
...
This appears to be because .set mips64r2 does not change the FPU ABI to
64-bit when -march=mips64r2 (or e.g. -march=xlp) is provided on the
command line on that toolchain, from the default FPU ABI of 32-bit due
to the -mabi=32. This makes access to the odd FPU registers invalid.
Fix by explicitly changing the FPU ABI with .set fp=64 directives in
fpu_save_16odd and fpu_restore_16odd, and moving the undefine of fp up
in asmmacro.h so fp doesn't turn into $30.
Fixes: 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels")
Signed-off-by: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.0+: 22b8ba765a72: MIPS: Fix MIPS64 FP save/restore on 32-bit kernels
Cc: <stable@vger.kernel.org> # 4.0+
Patchwork: https://patchwork.linux-mips.org/patch/17656/
|
|
Pull KVM fix from Radim Krčmář:
"Fix PPC HV host crash that can occur as a result of resizing the guest
hashed page table"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS fixes from James Hogan:
"A final few MIPS fixes for 4.14:
- fix BMIPS NULL pointer dereference (4.7)
- fix AR7 early GPIO init allocation failure (3.19)
- fix dead serial output on certain AR7 platforms (2.6.35)"
* tag 'mips_fixes_4.14_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: AR7: Ensure that serial ports are properly set up
MIPS: AR7: Defer registration of GPIO
MIPS: BMIPS: Fix missing cbr address
|
|
This reverts commit 941f5f0f6ef5338814145cf2b813cf1f98873e2f.
Sadly, it turns out that we really can't just do the cross-CPU IPI to
all CPU's to get their proper frequencies, because it's much too
expensive on systems with lots of cores.
So we'll have to revert this for now, and revisit it using a smarter
model (probably doing one system-wide IPI at open time, and doing all
the frequency calculations in parallel).
Reported-by: WANG Chao <chao.wang@ucloud.cn>
Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rebooting into a new kernel with kexec fails (system dies) if tried on
a machine that has no-execute support. Reason for this is that the so
called datamover code gets executed with DAT on (MMU is active) and
the page that contains the datamover is marked as non-executable.
Therefore when branching into the datamover an unexpected program
check happens and afterwards the machine is dead.
This can be simply avoided by disabling DAT, which also disables any
no-execute checks, just before the datamover gets executed.
In fact the first thing done by the datamover is to disable DAT. The
code in the datamover that disables DAT can be removed as well.
Thanks to Michael Holzheu and Gerald Schaefer for tracking this down.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixes: 57d7f939e7bd ("s390: add no-execute support")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Dan Horák reported the following crash related to transactional execution:
User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
task: 00000000fafc8000 task.stack: 00000000fafc4000
User PSW : 0705200180000000 000003ff93c14e70
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
0000000000000000 0000000000000002 0000000000000000 000003ff00000000
0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11)
000003ff93c14e66: 60f0b038 std %f15,56(%r11)
#000003ff93c14e6a: e5600000ff0e tbegin 0,65294
>000003ff93c14e70: a7740006 brc 7,3ff93c14e7c
000003ff93c14e74: a7080000 lhi %r0,0
000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe
000003ff93c14e7c: b2220000 ipm %r0
000003ff93c14e80: 8800001c srl %r0,28
There are several bugs with control register handling with respect to
transactional execution:
- on task switch update_per_regs() is only called if the next task has
an mm (is not a kernel thread). This however is incorrect. This
breaks e.g. for user mode helper handling, where the kernel creates
a kernel thread and then execve's a user space program. Control
register contents related to transactional execution won't be
updated on execve. If the previous task ran with transactional
execution disabled then the new task will also run with
transactional execution disabled, which is incorrect. Therefore call
update_per_regs() unconditionally within switch_to().
- on startup the transactional execution facility is not enabled for
the idle thread. This is not really a bug, but an inconsistency to
other facilities. Therefore enable the facility if it is available.
- on fork the new thread's per_flags field is not cleared. This means
that a child process inherits the PER_FLAG_NO_TE flag. This flag can
be set with a ptrace request to disable transactional execution for
the current process. It should not be inherited by new child
processes in order to be consistent with the handling of all other
PER related debugging options. Therefore clear the per_flags field in
copy_thread_tls().
Reported-and-tested-by: Dan Horák <dan@danny.cz>
Fixes: d35339a42dd1 ("s390: add support for transactional memory")
Cc: <stable@vger.kernel.org> # v3.7+
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Make use of the "stack_depth" tracking feature introduced with
commit 8726679a0fa31 ("bpf: teach verifier to track stack depth") for the
s390 JIT, so that stack usage can be reduced.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Add remote-wakeup-connected for omap OHCI as that's needed by
ohci-platform driver.
Cc: devicetree@vger.kernel.org
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
"ti,am335x-usb-phy" is using the phy binding, but is missing #phy-cells
property. Fixes the following warning in TI dts files:
Warning (phys_property): Missing property '#phy-cells' in node ...
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: "Benoît Cousson" <bcousson@baylibre.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: linux-omap@vger.kernel.org
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
"usb-nop-xceiv" is using the phy binding, but is missing #phy-cells
property. This is probably because the binding was the precursor to the phy
binding.
Fixes the following warning in OMAP dts files:
Warning (phys_property): Missing property '#phy-cells' in node ...
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: "Benoît Cousson" <bcousson@baylibre.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Enric Balletbo i Serra <eballetbo@gmail.com>
Cc: Javier Martinez Canillas <javier@dowhile0.org>
Cc: linux-omap@vger.kernel.org
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Arnd reported the following build bug bug:
In file included from arch/arm/mach-sa1100/simpad.c:20:0:
arch/arm/mach-sa1100/include/mach/SA-1100.h:1118:18: error: large
integer implicitly truncated to unsigned type [-Werror=overflow]
(0x00000001 << (Nb))
^
include/linux/gpio/machine.h:56:16: note: in definition of macro
'GPIO_LOOKUP_IDX'
.chip_hwnum = _chip_hwnum,
^~~~~~~~~~~
arch/arm/mach-sa1100/include/mach/SA-1100.h:1140:21: note: in
expansion of macro 'GPIO_GPIO'
^~~~~~~~~
arch/arm/mach-sa1100/simpad.c:331:27: note: in expansion of
macro 'GPIO_GPIO21'
GPIO_LOOKUP_IDX("gpio", GPIO_GPIO21, NULL, 0,
This is what happened:
commit b2e63555592f81331c8da3afaa607d8cf83e8138
"i2c: gpio: Convert to use descriptors"
commit 4d0ce62c0a02e41a65cfdcfe277f5be430edc371
"i2c: gpio: Augment all boardfiles to use open drain"
together uncovered an old bug in the Simpad board
file: as theGPIO_LOOKUP_IDX() encodes GPIO offsets
on gpiochips in an u16 (see <linux/gpio/machine.h>)
these GPIO "numbers" does not fit, since in
arch/arm/mach-sa1100/include/mach/SA-1100.h it is
defined as:
#define GPIO_GPIO(Nb) (0x00000001 << (Nb))
(...)
#define GPIO_GPIO21 GPIO_GPIO(21) /* GPIO [21] */
This is however provably wrong, since the i2c-gpio
driver uses proper GPIO numbers, albeit earlier from
the global number space, whereas this GPIO_GPIO21
is the local line offset in the GPIO register, which
is used in other code but certainly not in the
gpiolib GPIO driver in drivers/gpio/gpio-sa1100.c, which
has code like this:
static void sa1100_gpio_set(struct gpio_chip *chip,
unsigned offset, int value)
{
int reg = value ? R_GPSR : R_GPCR;
writel_relaxed(BIT(offset),
sa1100_gpio_chip(chip)->membase + reg);
}
So far everything however compiled fine as an unsigned
int was used to pass the GPIO numbers in
struct i2c_gpio_platform_data. We can trace the actual error
back to
commit dbd406f9d0a1d33a1303eb75cbe3f9435513d339
"ARM: 7025/1: simpad: add GPIO based device definitions."
This added the i2c_gpio with the wrong offsets.
This commit was before the SA1100 was converted to use
the gpiolib, but as can be seen from the contemporary
gpio.c in mach-sa1100, it was already using:
static int sa1100_gpio_get(struct gpio_chip *chip,
unsigned offset)
{
return GPLR & GPIO_GPIO(offset);
}
And GPIO_GPIO() is essentially the BIT() macro.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
MFENCE appears to be way slower than a locked instruction - let's use
LOCK ADD unconditionally, as we always did on old 32-bit.
Performance testing results:
perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0
Before:
0.922565990 seconds time elapsed ( +- 1.15% )
After:
0.578667024 seconds time elapsed ( +- 1.21% )
i.e. about ~60% faster.
Just poking at SP would be the most natural, but if we then read the
value from SP, we get a false dependency which will slow us down.
This was noted in this article:
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
And is easy to reproduce by sticking a barrier in a small non-inline
function.
So let's use a negative offset - which avoids this problem since we
build with the red zone disabled.
For userspace, use an address just below the redzone.
The one difference between LOCK ADD and MFENCE is that LOCK ADD does
not affect CLFLUSH, previous patches converted all uses of CLFLUSH to
call mb(), such that changes to smp_mb() won't affect it.
Update mb/rmb/wmb() on 32-bit to use the negative offset, too, for
consistency.
As a follow-up, it might be worth considering switching users
of CLFLUSH to another API (e.g. clflush_mb()?) - we will
then be able to convert mb() to smp_mb() again.
Also arguably, GCC should switch to use LOCK ADD for __sync_synchronize().
This might be worth pursuing separately.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: qemu-devel@nongnu.org
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1509118355-4890-1-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Take the DSCR value set by firmware as the dscr_default value,
rather than zero.
POWER9 recommends DSCR default to a non-zero value.
Signed-off-by: From: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make record_spr_defaults() __init]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
OPAL boot does not insert secondaries at 0x60 to wait at the secondary
hold spinloop. Instead they are started later, and inserted at
generic_secondary_smp_init(), which is after the secondary hold
spinloop.
Avoid waiting on this spinloop when booting with OPAL firmware. This
wait always times out that case.
This saves 100ms boot time on powernv, and 10s of seconds of real time
when booting on the simulator in SMP.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Unmaps that free page tables always flush the entire PID, which is
sub-optimal. Provide TLB range flushing with an additional PWC flush
that can be use for va range invalidations with PWC flush.
Time to munmap N pages of memory including last level page table
teardown (after mmap, touch), local invalidate:
N 1 2 4 8 16 32 64
vanilla 3.2us 3.3us 3.4us 3.6us 4.1us 5.2us 7.2us
patched 1.4us 1.5us 1.7us 1.9us 2.6us 3.7us 6.2us
Global invalidate:
N 1 2 4 8 16 32 64
vanilla 2.2us 2.3us 2.4us 2.6us 3.2us 4.1us 6.2us
patched 2.1us 2.5us 3.4us 5.2us 8.7us 15.7us 6.2us
Local invalidates get much better across the board. Global ones have
the same issue where multiple tlbies for va flush do get slower than
the single tlbie to invalidate the PID. None of this test captures
the TLB benefits of avoiding killing everything.
Global gets worse, but it is brought in to line with global invalidate
for munmap()s that do not free page tables.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The single page flush ceiling is the cut-off point at which we switch
from invalidating individual pages, to invalidating the entire process
address space in response to a range flush.
Introduce a local variant of this heuristic because local and global
tlbie have significantly different properties:
- Local tlbiel requires 128 instructions to invalidate a PID, global
tlbie only 1 instruction.
- Global tlbie instructions are expensive broadcast operations.
The local ceiling has been made much higher, 2x the number of
instructions required to invalidate the entire PID (i.e., 256 pages).
Time to mprotect N pages of memory (after mmap, touch), local invalidate:
N 32 34 64 128 256 512
vanilla 7.4us 9.0us 14.6us 26.4us 50.2us 98.3us
patched 7.4us 7.8us 13.8us 26.4us 51.9us 98.3us
The behaviour of both is identical at N=32 and N=512. Between there,
the vanilla kernel does a PID invalidate and the patched kernel does
a va range invalidate.
At N=128, these require the same number of tlbiel instructions, so
the patched version can be sen to be cheaper when < 128, and more
expensive when > 128. However this does not well capture the cost
of invalidated TLB.
The additional cost at 256 pages does not seem prohibitive. It may
be the case that increasing the limit further would continue to be
beneficial to avoid invalidating all of the process's TLB entries.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently for radix, flush_tlb_range flushes the entire PID, because
the Linux mm code does not tell us about page size here for THP vs
regular pages. This is quite sub-optimal for small mremap / mprotect
/ change_protection.
So implement va range flushes with two flush passes, one for each
page size (regular and THP). The second flush has an order of matnitude
fewer tlbie instructions than the first, so it is a relatively small
additional cost.
There is still room for improvement here with some changes to generic
APIs, particularly if there are mostly THP pages to be invalidated,
the small page flushes could be reduced.
Time to mprotect 1 page of memory (after mmap, touch):
vanilla 2.9us 1.8us
patched 1.2us 1.6us
Time to mprotect 30 pages of memory (after mmap, touch):
vanilla 8.2us 7.2us
patched 6.9us 17.9us
Time to mprotect 34 pages of memory (after mmap, touch):
vanilla 9.1us 8.0us
patched 9.0us 8.0us
34 pages is the point at which the invalidation switches from va
to entire PID, which tlbie can do in a single instruction. This is
why in the case of 30 pages, the new code runs slower for this test.
This is a deliberate tradeoff already present in the unmap and THP
promotion code, the idea is that the benefit from avoiding flushing
entire TLB for this PID on all threads in the system.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Move the barriers and range iteration down into the _tlbie* level,
which improves readability.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Short range flushes issue a sequences of tlbie(l) instructions for
individual effective addresses. These do not all require individual
barrier sequences, only one covering all tlbie(l) instructions.
Commit f7327e0ba3 ("powerpc/mm/radix: Remove unnecessary ptesync")
made a similar optimization for tlbiel for PID flushing.
For tlbie, the ISA says:
The tlbsync instruction provides an ordering function for the
effects of all tlbie instructions executed by the thread executing
the tlbsync instruction, with respect to the memory barrier
created by a subsequent ptesync instruction executed by the same
thread.
Time to munmap 30 pages of memory (after mmap, touch):
local global
vanilla 10.9us 22.3us
patched 3.4us 14.4us
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We have some dependencies & conflicts between patches in fixes and
things to go in next, both in the radix TLB flush code and the IMC PMU
driver. So merge fixes into next.
|
|
In case we are booted via the default boot entry by a generic loader
like grub or OVMF it is necessary to distinguish between a HVM guest
with a device model supporting legacy devices and a PVH guest without
device model.
PVH guests will always have x86_platform.legacy.no_vga set and
x86_platform.legacy.rtc cleared, while both won't be true for HVM
guests.
Test for both conditions in the guest_late_init hook and set xen_pvh
to true if they are met.
Move some of the early PVH initializations to the new hook in order
to avoid duplicated code.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20171109132739.23465-6-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
structure
Add a new guest_late_init callback to the hypervisor_x86 structure. It
will replace the current kvm_guest_init() call which is changed to
make use of the new callback.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kvm@vger.kernel.org
Cc: rkrcmar@redhat.com
Link: http://lkml.kernel.org/r/20171109132739.23465-5-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add a test for ACPI_FADT_NO_VGA when scanning the FADT and set the new
flag x86_platform.legacy.no_vga accordingly.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: len.brown@intel.com
Cc: linux-pm@vger.kernel.org
Cc: pavel@ucw.cz
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/20171109132739.23465-4-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The x86_hyper pointer is only used for checking whether a virtual
device is supporting the hypervisor the system is running on.
Use an enum for that purpose instead and drop the x86_hyper pointer.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Xavier Deguillard <xdeguillard@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akataria@vmware.com
Cc: arnd@arndb.de
Cc: boris.ostrovsky@oracle.com
Cc: devel@linuxdriverproject.org
Cc: dmitry.torokhov@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: haiyangz@microsoft.com
Cc: kvm@vger.kernel.org
Cc: kys@microsoft.com
Cc: linux-graphics-maintainer@vmware.com
Cc: linux-input@vger.kernel.org
Cc: moltmann@vmware.com
Cc: pbonzini@redhat.com
Cc: pv-drivers@vmware.com
Cc: rkrcmar@redhat.com
Cc: sthemmin@microsoft.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20171109132739.23465-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
and 'struct x86_init'
Instead of x86_hyper being either NULL on bare metal or a pointer to a
struct hypervisor_x86 in case of the kernel running as a guest merge
the struct into x86_platform and x86_init.
This will remove the need for wrappers making it hard to find out what
is being called. With dummy functions added for all callbacks testing
for a NULL function pointer can be removed, too.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: devel@linuxdriverproject.org
Cc: haiyangz@microsoft.com
Cc: kvm@vger.kernel.org
Cc: kys@microsoft.com
Cc: pbonzini@redhat.com
Cc: rkrcmar@redhat.com
Cc: rusty@rustcorp.com.au
Cc: sthemmin@microsoft.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20171109132739.23465-2-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
These two functions are only called by tsc_init(), which is an __init
function during boot time, so mark them __init as well.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1510135792-17429-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In order to control the GICv4 view of virtual CPUs, we rely
on an irqdomain allocated for that purpose. Let's add a couple
of helpers to that effect.
At the same time, the vgic data structures gain new fields to
track all this... erm... wonderful stuff.
The way we hook into the vgic init is slightly convoluted. We
need the vgic to be initialized (in order to guarantee that
the number of vcpus is now fixed), and we must have a vITS
(otherwise this is all very pointless). So we end-up calling
the init from both vgic_init and vgic_its_create.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|