summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2019-04-19ARM: lpc32xx: stop overwriting TEST_CLK_SELAlexandre Belloni
While the UDA1380 is described in some lpc3250 device trees, there is currently no real user of that codec. Anyway, if the codec needs a clock, it should take it explicitly. lpc3250_machine_init is called for all the lpc32xx machines and some are using test1_clk (for example to strobe an HW watchdog). Overwriting TEST_CLK_SEL prevents booting those platforms. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Tested-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
2019-04-19x86/tools/relocs: Fix big section header tablesArtem Savkov
In case when the number of entries in the section header table is larger then or equal to SHN_LORESERVE the size of the table is held in the sh_size member of the initial entry in section header table instead of e_shnum. Same with the string table index which is located in sh_link instead of e_shstrndx. This case is easily reproducible with KCFLAGS="-ffunction-sections", bzImage build fails with "String table index out of bounds" error. Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Cc: Eric W . Biederman <ebiederm@xmission.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181129155615.2594-1-asavkov@redhat.com [ Simplify the die() lines. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19ARM: ixp4xx: Convert to SPARSE_IRQLinus Walleij
This localizes the <mach/irqs.h> header to the mach-ixp4xx directory, removes NR_IRQS and switches IXP4xx over to using SPARSE_IRQ. This is a prerequisite for DT support. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-04-19ARM: ixp4xx: Pass IRQ resource to beeperLinus Walleij
All IXP4xx devices except the beeper passes the IRQ as a resource, augment the NSLU2 beeper to do the same. This is a prerequisite for SPARSE_IRQ. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-04-19ARM: ixp4xx: Convert to MULTI_IRQ_HANDLERLinus Walleij
This rewrites the IXP4xx to use MULTI_IRQ_HANDLER and create an irqdomain for the irqchip in the platform. We convert the timer to request the interrupt like any other driver in the process. We bump all IRQs to 16+offset to avoid using IRQ 0 and set NR_IRQS to 512 (the default for most systems). This conveniently fits with the first 16 IRQs being pre-allocated when using SPARSE_IRQ. This is a prerequisite for SPARSE_IRQ and DT boot. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-04-19x86/topology: Make DEBUG_HOTPLUG_CPU0 pr_info() more descriptiveJuri Lelli
DEBUG_HOTPLUG_CPU0 debug feature offlines a CPU as early as possible allowing userspace to boot up without that CPU (so that it is possible to check for unwanted dependencies towards the offlined CPU). After doing so it emits a "CPU %u is now offline" pr_info, which is not enough descriptive of why the CPU was offlined (e.g., one might be running with a config that triggered some problem, not being aware that CONFIG_DEBUG_ HOTPLUG_CPU0 is set). Add a bit more of informative text to the pr_info, so that it is immediately obvious why a CPU has been offlined in early boot stages. Background: Got to scratch my head a bit while debugging a WARNING splat related to the offlining of CPU0. Without being aware yet of this debug option it wasn't immediately obvious why CPU0 was being offlined by the kernel. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Link: http://lkml.kernel.org/r/20181219151647.15073-1-juri.lelli@redhat.com [ Merge line-broken line. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/fault: Decode and print #PF oops in human readable formSean Christopherson
Linus pointed out that deciphering the raw #PF error code and printing a more human readable message are two different things, and also that printing the negative cases is mostly just noise[1]. For example, the USER bit doesn't mean the fault originated in user code and stating that an oops wasn't due to a protection keys violation isn't interesting since an oops on a keys violation is a one-in-a-million scenario. Remove the per-bit decoding of the error code and instead print: - the raw error code - why the fault occurred - the effective privilege level of the access - the type of access - whether the fault originated in user code or kernel code This provides the user with the information needed to triage 99.9% of oopses without polluting the log with useless information or conflating the error_code with the CPL. Sample output: BUG: kernel NULL pointer dereference, address = 0000000000000008 #PF: supervisor-privileged instruction fetch from kernel code #PF: error_code(0x0010) - not-present page BUG: unable to handle page fault for address = ffffbeef00000000 #PF: supervisor-privileged instruction fetch from kernel code #PF: error_code(0x0010) - not-present page BUG: unable to handle page fault for address = ffffc90000230000 #PF: supervisor-privileged write access from kernel code #PF: error_code(0x000b) - reserved bit violation [1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20181221213657.27628-3-sean.j.christopherson@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/fault: Reword initial BUG message for unhandled page faultsSean Christopherson
Reword the NULL pointer dereference case to simply state that a NULL pointer was dereferenced, i.e. drop "unable to handle" as that implies that there are instances where the kernel actual does handle NULL pointer dereferences, which is not true barring funky exception fixup. For the non-NULL case, replace "kernel paging request" with "page fault" as the kernel can technically oops on faults that originated in user code. Dropping "kernel" also allows future patches to provide detailed information on where the fault occurred, e.g. user vs. kernel, without conflicting with the initial BUG message. In both cases, replace "at address=" with wording more appropriate to the oops, as "at" may be interpreted as stating that the address is the RIP of the instruction that faulted. Last, and probably least, further qualify the NULL-pointer path by checking that the fault actually originated in kernel code. It's technically possible for userspace to map address 0, and not printing a super specific message is the least of our worries if the kernel does manage to oops on an actual NULL pointer dereference from userspace. Before: BUG: unable to handle kernel NULL pointer dereference at ffffbeef00000000 BUG: unable to handle kernel paging request at ffffbeef00000000 After: BUG: kernel NULL pointer dereference, address = 0000000000000008 BUG: unable to handle page fault for address = ffffbeef00000000 Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20181221213657.27628-2-sean.j.christopherson@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/power: Optimize C3 entry on Centaur CPUsDavid Wang
For new Centaur CPUs the ucode will take care of the preservation of cache coherence between CPU cores in C-states regardless of how deep the C-states are. So, it is not necessary to flush the caches in software befor entering C3. This useless operation will cause performance drop for the cores which share some caches with the idling core. Signed-off-by: David Wang <davidwang@zhaoxin.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: brucechang@via-alliance.com Cc: cooperyan@zhaoxin.com Cc: len.brown@intel.com Cc: linux-pm@kernel.org Cc: qiyuanwang@zhaoxin.com Cc: rjw@rjwysocki.net Cc: timguo@zhaoxin.com Link: http://lkml.kernel.org/r/1545900110-2757-1-git-send-email-davidwang@zhaoxin.com [ Tidy up the comment. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/mce: Fix debugfs_simple_attr.cocci warningsYueHaibing
Use DEFINE_DEBUGFS_ATTRIBUTE() rather than DEFINE_SIMPLE_ATTRIBUTE() for debugfs files. Semantic patch information: Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file() imposes some significant overhead as compared to DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe(). Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci Signed-off-by: YueHaibing <yuehaibing@huawei.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/1545981853-70877-1-git-send-email-yuehaibing@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log ↵Hans de Goede
priority The "ENERGY_PERF_BIAS: Set to 'normal', was 'performance'" message triggers on pretty much every Intel machine. The purpose of log messages with a warning level is to notify the user of something which potentially is a problem, or at least somewhat unexpected. This message clearly does not match those criteria, so lower its log priority from warning to info. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181230172715.17469-1-hdegoede@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/kvm: Make steal_time visibleAndi Kleen
This per cpu variable is accessed from assembler code, so it needs to be visible for LTO. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20190330004743.29541-8-andi@firstfloor.org
2019-04-19x86/hyperv: Make hv_vcpu_is_preempted() visibleAndi Kleen
This function is referrenced from assembler, so it needs to be marked visible for LTO. Fixes: 3a025de64bf8 ("x86/hyperv: Enable PV qspinlock for Hyper-V") Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yi Sun <yi.y.sun@linux.intel.com> Cc: kys@microsoft.com Cc: haiyangz@microsoft.com Link: https://lkml.kernel.org/r/20190330004743.29541-6-andi@firstfloor.org
2019-04-19x86/timer: Don't inline __const_udelay()Andi Kleen
LTO will happily inline __const_udelay() everywhere it is used. Forcing it noinline saves ~44k text in a LTO build. 13999560 1740864 1499136 17239560 1070e08 vmlinux-with-udelay-inline 13954764 1736768 1499136 17190668 1064f0c vmlinux-wo-udelay-inline Even without LTO this function should never be inlined. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190330004743.29541-4-andi@firstfloor.org
2019-04-19x86/cpu/amd: Exclude 32bit only assembler from 64bit buildAndi Kleen
The "vide" inline assembler is only needed on 32bit kernels for old 32bit only CPUs. Guard it with an #ifdef so it's not included in 64bit builds. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190330004743.29541-2-andi@firstfloor.org
2019-04-19x86/asm: Mark all top level asm statements as .textAndi Kleen
With gcc toplevel assembler statements that do not mark themselves as .text may end up in other sections. This causes LTO boot crashes because various assembler statements ended up in the middle of the initcall section. It's also a latent problem without LTO, although it's currently not known to cause any real problems. According to the gcc team it's expected behavior. Always mark all the top level assembler statements as text so that they switch to the right section. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190330004743.29541-1-andi@firstfloor.org
2019-04-19x86/cpu/bugs: Use __initconst for 'const' init dataAndi Kleen
Some of the recently added const tables use __initdata which causes section attribute conflicts. Use __initconst instead. Fixes: fa1202ef2243 ("x86/speculation: Add command line control") Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190330004743.29541-9-andi@firstfloor.org
2019-04-19x86/kprobes: Avoid kretprobe recursion bugMasami Hiramatsu
Avoid kretprobe recursion loop bg by setting a dummy kprobes to current_kprobe per-CPU variable. This bug has been introduced with the asm-coded trampoline code, since previously it used another kprobe for hooking the function return placeholder (which only has a nop) and trampoline handler was called from that kprobe. This revives the old lost kprobe again. With this fix, we don't see deadlock anymore. And you can see that all inner-called kretprobe are skipped. event_1 235 0 event_2 19375 19612 The 1st column is recorded count and the 2nd is missed count. Above shows (event_1 rec) + (event_2 rec) ~= (event_2 missed) (some difference are here because the counter is racy) Reported-by: Andrea Righi <righi.andrea@gmail.com> Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: c9becf58d935 ("[PATCH] kretprobe: kretprobe-booster") Link: http://lkml.kernel.org/r/155094064889.6137.972160690963039.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/kprobes: Verify stack frame on kretprobeMasami Hiramatsu
Verify the stack frame pointer on kretprobe trampoline handler, If the stack frame pointer does not match, it skips the wrong entry and tries to find correct one. This can happen if user puts the kretprobe on the function which can be used in the path of ftrace user-function call. Such functions should not be probed, so this adds a warning message that reports which function should be blacklisted. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19Make anon_inodes unconditionalDavid Howells
Make the anon_inodes facility unconditional so that it can be used by core VFS code and pidfd code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [christian@brauner.io: adapt commit message to mention pidfds] Signed-off-by: Christian Brauner <christian@brauner.io>
2019-04-19rseq: Clean up comments by reflecting removal of event counterMathieu Desnoyers
The "event counter" was removed from rseq before it was merged upstream. However, a few comments in the source code still refer to it. Adapt the comments to match reality. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Maurer <bmaurer@fb.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Lameter <cl@linux.com> Cc: Dave Watson <davejwatson@fb.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20190305194755.2602-2-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/defconfig: Remove archaic partition tables supportAhmed S. Darwish
On x86 systems, only MSDOS and GPT partition tables are typically encountered. Remove all the rest. Note, CONFIG_EFI_PARTITION is also removed since it defaults to `y'. Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190306004425.GA30537@darwi-home-pc Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19x86/smpboot: Rename match_die() to match_pkg()Len Brown
Syntax only, no functional or semantic change. This routine matches packages, not die, so name it thus. Signed-off-by: Len Brown <len.brown@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/7ca18c4ae7816a1f9eda37414725df676e63589d.1551160674.git.len.brown@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18arm64: dts: qcom: msm8998: Fix blsp2_i2c5 addressMarc Gonzalez
blsp1_i2c1 is at 0x0c175000 blsp2_i2c5 is at 0x0c1ba000 (the label is correct) Fixes: 1e71d0c273d0a ("arm64: dts: qcom: msm8998: Enumerate i2c controllers") Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> Reviewed-by: Jeffrey Hugo <jhugo@codeaurora.org> Signed-off-by: Andy Gross <agross@kernel.org>
2019-04-18arm64: dts: qcom: qcs404-evb: Change the compatible to distinguish platformsKhasim Syed Mohammed
The compatible flag should be different for each board to match with the dtb and to let the bootloader pick the appropriate dtb. Signed-off-by: Khasim Syed Mohammed <khasim.mohammed@linaro.org> Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org> Signed-off-by: Andy Gross <agross@kernel.org>
2019-04-18arm64: dts: qcom: pmi8998: add gpio-rangesBrian Masney
This adds the gpio-ranges property so that the GPIO pins are initialized by the GPIO framework and not pinctrl. This fixes a circular dependency between these two frameworks so GPIO hogging can be used on this board. This was not tested on this particular hardware, however this same change was tested on qcom-pm8941 using a LG Nexus 5 (hammerhead) phone. Signed-off-by: Brian Masney <masneyb@onstation.org> Signed-off-by: Andy Gross <agross@kernel.org>
2019-04-18arm64: dts: qcom: pmi8994: add gpio-rangesBrian Masney
This adds the gpio-ranges property so that the GPIO pins are initialized by the GPIO framework and not pinctrl. This fixes a circular dependency between these two frameworks so GPIO hogging can be used on this board. This was not tested on this particular hardware, however this same change was tested on qcom-pm8941 using a LG Nexus 5 (hammerhead) phone. Signed-off-by: Brian Masney <masneyb@onstation.org> Signed-off-by: Andy Gross <agross@kernel.org>
2019-04-18arm64: dts: qcom: pm8998: add gpio-rangesBrian Masney
This adds the gpio-ranges property so that the GPIO pins are initialized by the GPIO framework and not pinctrl. This fixes a circular dependency between these two frameworks so GPIO hogging can be used on this board. This was not tested on this particular hardware, however this same change was tested on qcom-pm8941 using a LG Nexus 5 (hammerhead) phone. Signed-off-by: Brian Masney <masneyb@onstation.org> Signed-off-by: Andy Gross <agross@kernel.org>
2019-04-18arm64: dts: qcom: pm8005: add gpio-rangesBrian Masney
This adds the gpio-ranges property so that the GPIO pins are initialized by the GPIO framework and not pinctrl. This fixes a circular dependency between these two frameworks so GPIO hogging can be used on this board. This was not tested on this particular hardware, however this same change was tested on qcom-pm8941 using a LG Nexus 5 (hammerhead) phone. Signed-off-by: Brian Masney <masneyb@onstation.org> Signed-off-by: Andy Gross <agross@kernel.org>
2019-04-18x86/platform/olpc: Use a correct version when making up a battery nodeLubomir Rintel
The XO-1 and XO-1.5 batteries apparently differ in an ability to report ambient temperature. We need to use a different compatible string for the XO-1.5 battery. Previously olpc_dt_fixup() used the presence of the battery node's compatible property to decide whether the DT is up to date. Now we need to look for a particular value in the compatible string, to decide Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2019-04-18x86/platform/olpc: Trivial code move in DT fixupLubomir Rintel
This makes the following patch more concise. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2019-04-18x86/platform/olpc: Don't split string literals when fixing up the DTLubomir Rintel
It was pointed out in a review, and checkpatch.pl complains about this. Breaking it down into multiple ofw evaluations works just as well and reads better. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2019-04-18Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Avoid compiler uninitialised warning introduced by recent arm64 futex fix" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: futex: Restore oldval initialization to work around buggy compilers
2019-04-18arm64: futex: Restore oldval initialization to work around buggy compilersNathan Chancellor
Commit 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value") removed oldval's zero initialization in arch_futex_atomic_op_inuser because it is not necessary. Unfortunately, Android's arm64 GCC 4.9.4 [1] does not agree: ../kernel/futex.c: In function 'do_futex': ../kernel/futex.c:1658:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ In file included from ../kernel/futex.c:73:0: ../arch/arm64/include/asm/futex.h:53:6: note: 'oldval' was declared here int oldval, ret, tmp; ^ GCC fails to follow that when ret is non-zero, futex_atomic_op_inuser returns right away, avoiding the uninitialized use that it claims. Restoring the zero initialization works around this issue. [1]: https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/ Cc: stable@vger.kernel.org Fixes: 045afc24124d ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value") Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-04-18KVM: lapic: Convert guest TSC to host time domain if necessarySean Christopherson
To minimize the latency of timer interrupts as observed by the guest, KVM adjusts the values it programs into the host timers to account for the host's overhead of programming and handling the timer event. In the event that the adjustments are too aggressive, i.e. the timer fires earlier than the guest expects, KVM busy waits immediately prior to entering the guest. Currently, KVM manually converts the delay from nanoseconds to clock cycles. But, the conversion is done in the guest's time domain, while the delay occurs in the host's time domain. This is perfectly ok when the guest and host are using the same TSC ratio, but if the guest is using a different ratio then the delay may not be accurate and could wait too little or too long. When the guest is not using the host's ratio, convert the delay from guest clock cycles to host nanoseconds and use ndelay() instead of __delay() to provide more accurate timing. Because converting to nanoseconds is relatively expensive, e.g. requires division and more multiplication ops, continue using __delay() directly when guest and host TSCs are running at the same ratio. Cc: Liran Alon <liran.alon@oracle.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-18KVM: lapic: Allow user to disable adaptive tuning of timer advancementSean Christopherson
The introduction of adaptive tuning of lapic timer advancement did not allow for the scenario where userspace would want to disable adaptive tuning but still employ timer advancement, e.g. for testing purposes or to handle a use case where adaptive tuning is unable to settle on a suitable time. This is epecially pertinent now that KVM places a hard threshold on the maximum advancment time. Rework the timer semantics to accept signed values, with a value of '-1' being interpreted as "use adaptive tuning with KVM's internal default", and any other value being used as an explicit advancement time, e.g. a time of '0' effectively disables advancement. Note, this does not completely restore the original behavior of lapic_timer_advance_ns. Prior to tracking the advancement per vCPU, which is necessary to support autotuning, userspace could adjust lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the module params are snapshotted at vCPU creation, i.e. applying a new advancement effectively requires restarting a VM. Dynamically updating a running vCPU is possible, e.g. a helper could be added to retrieve the desired delay, choosing between the global module param and the per-VCPU value depending on whether or not auto-tuning is (globally) enabled, but introduces a great deal of complexity. The wrapper itself is not complex, but understanding and documenting the effects of dynamically toggling auto-tuning and/or adjusting the timer advancement is nigh impossible since the behavior would be dependent on KVM's implementation as well as compiler optimizations. In other words, providing stable behavior would require extremely careful consideration now and in the future. Given that the expected use of a manually-tuned timer advancement is to "tune once, run many", use the vastly simpler approach of recognizing changes to the module params only when creating a new vCPU. Cc: Liran Alon <liran.alon@oracle.com> Cc: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-18KVM: lapic: Track lapic timer advance per vCPUSean Christopherson
Automatically adjusting the globally-shared timer advancement could corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting the advancement value. That could be partially fixed by using a local variable for the arithmetic, but it would still be susceptible to a race when setting timer_advance_adjust_done. And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the correct calibration for a given vCPU may not apply to all vCPUs. Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is effectively violated when finding a stable advancement takes an extended amount of timer. Opportunistically change the definition of lapic_timer_advance_ns to a u32 so that it matches the style of struct kvm_timer. Explicitly pass the param to kvm_create_lapic() so that it doesn't have to be exposed to lapic.c, thus reducing the probability of unintentionally using the global value instead of the per-vCPU value. Cc: Liran Alon <liran.alon@oracle.com> Cc: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-18KVM: lapic: Disable timer advancement if adaptive tuning goes haywireSean Christopherson
To minimize the latency of timer interrupts as observed by the guest, KVM adjusts the values it programs into the host timers to account for the host's overhead of programming and handling the timer event. Now that the timer advancement is automatically tuned during runtime, it's effectively unbounded by default, e.g. if KVM is running as L1 the advancement can measure in hundreds of milliseconds. Disable timer advancement if adaptive tuning yields an advancement of more than 5000ns, as large advancements can break reasonable assumptions of the guest, e.g. that a timer configured to fire after 1ms won't arrive on the next instruction. Although KVM busy waits to mitigate the case of a timer event arriving too early, complications can arise when shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test will fail when its "host" exits on interrupts as KVM may inject the INTR before the guest executes STI+HLT. Arguably the unit test is "broken" in the sense that delaying a timer interrupt by 1ms doesn't technically guarantee the interrupt will arrive after STI+HLT, but it's a reasonable assumption that KVM should support. Furthermore, an unbounded advancement also effectively unbounds the time spent busy waiting, e.g. if the guest programs a timer with a very large delay. 5000ns is a somewhat arbitrary threshold. When running on bare metal, which is the intended use case, timer advancement is expected to be in the general vicinity of 1000ns. 5000ns is high enough that false positives are unlikely, while not being so high as to negatively affect the host's performance/stability. Note, a future patch will enable userspace to disable KVM's adaptive tuning, which will allow priveleged userspace will to specifying an advancement value in excess of this arbitrary threshold in order to satisfy an abnormal use case. Cc: Liran Alon <liran.alon@oracle.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-18x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012Vitaly Kuznetsov
It was reported that with some special Multi Processor Group configuration, e.g: bcdedit.exe /set groupsize 1 bcdedit.exe /set maxgroup on bcdedit.exe /set groupaware on for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism is in use. Tracing kvm_hv_flush_tlb immediately reveals the issue: kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2 The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES, however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified. We don't flush anything and apparently it's not what Windows expects. TLFS doesn't say anything about such requests and newer Windows versions seem to be unaffected. This all feels like a WS2012 bug, which is, however, easy to workaround in KVM: let's flush everything when we see an empty flush request, over-flushing doesn't hurt. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-18KVM: x86: Consider LAPIC TSC-Deadline timer expired if deadline too shortLiran Alon
If guest sets MSR_IA32_TSCDEADLINE to value such that in host time-domain it's shorter than lapic_timer_advance_ns, we can reach a case that we call hrtimer_start() with expiration time set at the past. Because lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it is not allowed to run in softirq and therefore will never expire. To avoid such a scenario, verify that deadline expiration time is set on host time-domain further than (now + lapic_timer_advance_ns). A future patch can also consider adding a min_timer_deadline_ns module parameter, similar to min_timer_period_us to avoid races that amount of ns it takes to run logic could still call hrtimer_start() with expiration timer set at the past. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-18Merge tag 'kvm-ppc-fixes-5.1-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD KVM/PPC fixes for 5.1 - Fix host hang in the HTM assist code for POWER9 - Take srcu read lock around memslot lookup
2019-04-18KVM: arm/arm64: Clean up vcpu finalization function parameter namingDave Martin
Currently, the internal vcpu finalization functions use a different name ("what") for the feature parameter than the name ("feature") used in the documentation. To avoid future confusion, this patch converts everything to use the name "feature" consistently. No functional change. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18KVM: arm64/sve: Explain validity checks in set_sve_vls()Dave Martin
Correct virtualization of SVE relies for correctness on code in set_sve_vls() that verifies consistency between the set of vector lengths requested by userspace and the set of vector lengths available on the host. However, the purpose of this code is not obvious, and not likely to be apparent at all to people who do not have detailed knowledge of the SVE system-level architecture. This patch adds a suitable comment to explain what these checks are for. No functional change. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18KVM: arm64/sve: Simplify KVM_REG_ARM64_SVE_VLS array sizingDave Martin
A complicated DIV_ROUND_UP() expression is currently written out explicitly in multiple places in order to specify the size of the bitmap exchanged with userspace to represent the value of the KVM_REG_ARM64_SVE_VLS pseudo-register. Userspace currently has no direct way to work this out either: for documentation purposes, the size is just quoted as 8 u64s. To make this more intuitive, this patch replaces these with a single define, which is also exported to userspace as KVM_ARM64_SVE_VLS_WORDS. Since the number of words in a bitmap is just the index of the last word used + 1, this patch expresses the bound that way instead. This should make it clearer what is being expressed. For userspace convenience, the minimum and maximum possible vector lengths relevant to the KVM ABI are exposed to UAPI as KVM_ARM64_SVE_VQ_MIN, KVM_ARM64_SVE_VQ_MAX. Since the only direct use for these at present is manipulation of KVM_REG_ARM64_SVE_VLS, no corresponding _VL_ macros are defined. They could be added later if a need arises. Since use of DIV_ROUND_UP() was the only reason for including <linux/kernel.h> in guest.c, this patch also removes that #include. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18KVM: arm64/sve: WARN when avoiding divide-by-zero in sve_reg_to_region()Dave Martin
sve_reg_to_region() currently passes the result of vcpu_sve_state_size() to array_index_nospec(), effectively leading to a divide / modulo operation. Currently the code bails out and returns -EINVAL if vcpu_sve_state_size() turns out to be zero, in order to avoid going ahead and attempting to divide by zero. This is reasonable, but it should only happen if the kernel contains some other bug that allowed this code to be reached without the vcpu having been properly initialised. To make it clear that this is a defence against bugs rather than something that the user should be able to trigger, this patch marks the check with WARN_ON(). Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18KVM: arm64/sve: Make register ioctl access errors more consistentDave Martin
Currently, the way error codes are generated when processing the SVE register access ioctls in a bit haphazard. This patch refactors the code so that the behaviour is more consistent: now, -EINVAL should be returned only for unrecognised register IDs or when some other runtime error occurs. -ENOENT is returned for register IDs that are recognised, but whose corresponding register (or slice) does not exist for the vcpu. To this end, in {get,set}_sve_reg() we now delegate the vcpu_has_sve() check down into {get,set}_sve_vls() and sve_reg_to_region(). The KVM_REG_ARM64_SVE_VLS special case is picked off first, then sve_reg_to_region() plays the role of exhaustively validating or rejecting the register ID and (where accepted) computing the applicable register region as before. sve_reg_to_region() is rearranged so that -ENOENT or -EPERM is not returned prematurely, before checking whether reg->id is in a recognised range. -EPERM is now only returned when an attempt is made to access an actually existing register slice on an unfinalized vcpu. Fixes: e1c9c98345b3 ("KVM: arm64/sve: Add SVE support to register access ioctl interface") Fixes: 9033bba4b535 ("KVM: arm64/sve: Add pseudo-register for the guest's vector lengths") Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18KVM: arm64/sve: Miscellaneous tidyups in guest.cDave Martin
* Remove a few redundant blank lines that are stylistically inconsistent with code already in guest.c and are just taking up space. * Delete a couple of pointless empty default cases from switch statements whose behaviour is otherwise obvious anyway. * Fix some typos and consolidate some redundantly duplicated comments. * Respell the slice index check in sve_reg_to_region() as "> 0" to be more consistent with what is logically being checked here (i.e., "is the slice index too large"), even though we don't try to cope with multiple slices yet. No functional change. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18KVM: arm64/sve: Clean up UAPI register ID definitionsDave Martin
Currently, the SVE register ID macros are not all defined in the same way, and advertise the fact that FFR maps onto the nonexistent predicate register P16. This is really just for kernel convenience, and may lead userspace into bad habits. Instead, this patch masks the ID macro arguments so that architecturally invalid register numbers will not be passed through any more, and uses a literal KVM_REG_ARM64_SVE_FFR_BASE macro to define KVM_REG_ARM64_SVE_FFR(), similarly to the way the _ZREG() and _PREG() macros are defined. Rather than plugging in magic numbers for the number of Z- and P- registers and the maximum possible number of register slices, this patch provides definitions for those too. Userspace is going to need them in any case, and it makes sense for them to come from <uapi/asm/kvm.h>. sve_reg_to_region() uses convenience constants that are defined in a different way, and also makes use of the fact that the FFR IDs are really contiguous with the P15 IDs, so this patch retains the existing convenience constants in guest.c, supplemented with a couple of sanity checks to check for consistency with the UAPI header. Fixes: e1c9c98345b3 ("KVM: arm64/sve: Add SVE support to register access ioctl interface") Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18KVM: arm64/sve: sys_regs: Demote redundant vcpu_has_sve() checks to WARNsDave Martin
Because of the logic in kvm_arm_sys_reg_{get,set}_reg() and sve_id_visibility(), we should never call {get,set}_id_aa64zfr0_el1() for a vcpu where !vcpu_has_sve(vcpu). To avoid the code giving the impression that it is valid for these functions to be called in this situation, and to help the compiler make the right optimisation decisions, this patch adds WARN_ON() for these cases. Given the way the logic is spread out, this seems preferable to dropping the checks altogether. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-18KVM: arm: Make vcpu finalization stubs into inline functionsDave Martin
The vcpu finalization stubs kvm_arm_vcpu_finalize() and kvm_arm_vcpu_is_finalized() are currently #defines for ARM, which limits the type-checking that the compiler can do at runtime. The only reason for them to be #defines was to avoid reliance on the definition of struct kvm_vcpu, which is not available here due to circular #include problems. However, because these are stubs containing no code, they don't need the definition of struct kvm_vcpu after all; only a declaration is needed (which is available already). So in the interests of cleanliness, this patch converts them to inline functions. No functional change. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>