summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2018-03-08x86/MCE: Cleanup and complete struct mce fields definitionsBorislav Petkov
The struct is part of the uapi, document that fact and all fields properly and fix formatting. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20180306142143.19990-3-bp@alien8.de
2018-03-08Merge branch 'ras/urgent' into ras/coreThomas Gleixner
Pick up urgent fixes to apply further development changes.
2018-03-08x86/MCE: Serialize sysfs changesSeunghun Han
The check_interval file in /sys/devices/system/machinecheck/machinecheck<cpu number> directory is a global timer value for MCE polling. If it is changed by one CPU, mce_restart() broadcasts the event to other CPUs to delete and restart the MCE polling timer and __mcheck_cpu_init_timer() reinitializes the mce_timer variable. If more than one CPU writes a specific value to the check_interval file concurrently, mce_timer is not protected from such concurrent accesses and all kinds of explosions happen. Since only root can write to those sysfs variables, the issue is not a big deal security-wise. However, concurrent writes to these configuration variables is void of reason so the proper thing to do is to serialize the access with a mutex. Boris: - Make store_int_with_restart() use device_store_ulong() to filter out negative intervals - Limit min interval to 1 second - Correct locking - Massage commit message Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180302202706.9434-1-kkamagui@gmail.com
2018-03-08x86/MCE: Save microcode revision in machine check recordsTony Luck
Updating microcode used to be relatively rare. Now that it has become more common we should save the microcode version in a machine check record to make sure that those people looking at the error have this important information bundled with the rest of the logged information. [ Borislav: Simplify a bit. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180301233449.24311-1-tony.luck@intel.com
2018-03-08x86/pti: Fix a comment typoSeunghun Han
s/visinble/visible/ Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1520397135-132809-1-git-send-email-kkamagui@gmail.com
2018-03-08x86/jailhouse: Allow to use PCI_MMCONFIG without ACPIJan Kiszka
Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the latter can be built without having to enable ACPI as well. Primarily, its required to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI, instead of just the former. Saves some bytes in the Jailhouse non-root kernel. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: jailhouse-dev@googlegroups.com Cc: linux-pci@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/788bbd5325d1922235e9562c213057425fbc548c.1520408357.git.jan.kiszka@siemens.com
2018-03-08x86: Consolidate PCI_MMCONFIG configsJan Kiszka
Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), there exist two PCI_MMCONFIG entries, one from the original i386 and another from x86_64. Consolidate both entries into a single one. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: jailhouse-dev@googlegroups.com Cc: linux-pci@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/2a0ccd51ea6f7996e07162918228e23bdc1fbb03.1520408357.git.jan.kiszka@siemens.com
2018-03-08x86: Align x86_64 PCI_MMCONFIG with 32-bit variantJan Kiszka
Allow to enable PCI_MMCONFIG when only SFI is present and make this option default on. This will help consolidating both into one Kconfig statement. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: jailhouse-dev@googlegroups.com Cc: linux-pci@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/a2faf78c54f340f5549149e8b679c95950dae83d.1520408357.git.jan.kiszka@siemens.com
2018-03-08x86/jailhouse: Enable PCI mmconfig access in inmatesOtavio Pontes
Use the PCI mmconfig base address exported by jailhouse in boot parameters in order to access the memory mapped PCI configuration space. [Jan: rebased, fixed !CONFIG_PCI_MMCONFIG, used pcibios_last_bus] Signed-off-by: Otavio Pontes <otavio.pontes@intel.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: jailhouse-dev@googlegroups.com Cc: linux-pci@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/2ee9e4401fa22377b3965893a558120f169be82b.1520408357.git.jan.kiszka@siemens.com
2018-03-08PCI: Scan all functions when running over JailhouseJan Kiszka
Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to have a function 0. Therefore, Linux scans for devices at function 0 (devfn 0/8/16/...) and only scans for other functions if function 0 has its Multi-Function Device bit set or ARI or SR-IOV indicate there are more functions. The Jailhouse hypervisor may pass individual functions of a multi-function device to a guest without passing function 0, which means a Linux guest won't find them. Change Linux PCI probing so it scans all function numbers when running as a guest over Jailhouse. This is technically prohibited by the spec, so it is possible that PCI devices without the Multi-Function Device bit set may have unexpected behavior in response to this probe. Originally-by: Benedikt Spranger <b.spranger@linutronix.de> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: jailhouse-dev@googlegroups.com Cc: Benedikt Spranger <b.spranger@linutronix.de> Cc: linux-pci@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Link: https://lkml.kernel.org/r/06e279b2a3e06cf6689ab3975f8ab592bba02362.1520408357.git.jan.kiszka@siemens.com
2018-03-08jailhouse: Provide detection for non-x86 systemsJan Kiszka
Implement jailhouse_paravirt() via device tree probing on architectures != x86. Will be used by the PCI core. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: jailhouse-dev@googlegroups.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-pci@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/dae9fe0c6e63141c28ca90492fa5712b4c33ffb5.1520408357.git.jan.kiszka@siemens.com
2018-03-08x86/cpuid: Switch to 'static const' specifierRalf Ramsauer
This is the only spot where the 'const static' specifier is used; everywhere else 'static const' is preferred, as static should be the first specifier. This is just a cosmetic fix that aligns this, no functional change. Signed-off-by: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Gayatri Kammela <gayatri.kammela@intel.com> Link: https://lkml.kernel.org/r/20180307160734.6691-1-ralf.ramsauer@oth-regensburg.de
2018-03-08x86/dumpstack: Unify show_regs()Borislav Petkov
The 32-bit version uses KERN_EMERG and commit b0f4c4b32c8e ("bugs, x86: Fix printk levels for panic, softlockups and stack dumps") changed the 64-bit version to KERN_DEFAULT. The same justification in that commit that those messages do not belong in the terminal, holds true for 32-bit also, so make it so. Make code_bytes static, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180306094920.16917-4-bp@alien8.de
2018-03-08x86/fault: Do not print IP in show_fault_oops()Borislav Petkov
... because __show_regs() already does that. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/20180306094920.16917-3-bp@alien8.de
2018-03-08x86/MSR: Move native_* variants to msr.hBorislav Petkov
... where they belong. No functional change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/20180301151336.12948-1-bp@alien8.de
2018-03-08x86/microcode: Synchronize late microcode loadingAshok Raj
Original idea by Ashok, completely rewritten by Borislav. Before you read any further: the early loading method is still the preferred one and you should always do that. The following patch is improving the late loading mechanism for long running jobs and cloud use cases. Gather all cores and serialize the microcode update on them by doing it one-by-one to make the late update process as reliable as possible and avoid potential issues caused by the microcode update. [ Borislav: Rewrite completely. ] Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-8-bp@alien8.de
2018-03-08x86/microcode: Request microcode on the BSPBorislav Petkov
... so that any newer version can land in the cache and can later be fished out by the application functions. Do that before grabbing the hotplug lock. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-7-bp@alien8.de
2018-03-08x86/microcode/intel: Look into the patch cache firstBorislav Petkov
The cache might contain a newer patch - look in there first. A follow-on change will make sure newest patches are loaded into the cache of microcode patches. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-6-bp@alien8.de
2018-03-08x86/microcode: Do not upload microcode if CPUs are offlineAshok Raj
Avoid loading microcode if any of the CPUs are offline, and issue a warning. Having different microcode revisions on the system at any time is outright dangerous. [ Borislav: Massage changelog. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-4-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-5-bp@alien8.de
2018-03-08x86/microcode/intel: Writeback and invalidate caches before updating microcodeAshok Raj
Updating microcode is less error prone when caches have been flushed and depending on what exactly the microcode is updating. For example, some of the issues around certain Broadwell parts can be addressed by doing a full cache flush. [ Borislav: Massage it and use native_wbinvd() in both cases. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-3-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-4-bp@alien8.de
2018-03-08x86/microcode/intel: Check microcode revision before updating sibling threadsAshok Raj
After updating microcode on one of the threads of a core, the other thread sibling automatically gets the update since the microcode resources on a hyperthreaded core are shared between the two threads. Check the microcode revision on the CPU before performing a microcode update and thus save us the WRMSR 0x79 because it is a particularly expensive operation. [ Borislav: Massage changelog and coding style. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-2-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-3-bp@alien8.de
2018-03-08x86/microcode: Get rid of struct apply_microcode_ctxBorislav Petkov
It is a useless remnant from earlier times. Use the ucode_state enum directly. No functional change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-2-bp@alien8.de
2018-03-08x86/spectre_v2: Don't check microcode versions when running under hypervisorsKonrad Rzeszutek Wilk
As: 1) It's known that hypervisors lie about the environment anyhow (host mismatch) 2) Even if the hypervisor (Xen, KVM, VMWare, etc) provided a valid "correct" value, it all gets to be very murky when migration happens (do you provide the "new" microcode of the machine?). And in reality the cloud vendors are the ones that should make sure that the microcode that is running is correct and we should just sing lalalala and trust them. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: kvm <kvm@vger.kernel.org> Cc: Krčmář <rkrcmar@redhat.com> Cc: Borislav Petkov <bp@alien8.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180226213019.GE9497@char.us.oracle.com
2018-03-08x86/devicetree: Fix device IRQ settings in DTIvan Gorinov
IRQ parameters for the SoC devices connected directly to I/O APIC lines (without PCI IRQ routing) may be specified in the Device Tree. Called from DT IRQ parser, irq_create_fwspec_mapping() calls irq_domain_alloc_irqs() with a pointer to irq_fwspec structure as @arg. But x86-specific DT IRQ allocation code casts @arg to of_phandle_args structure pointer and crashes trying to read the IRQ parameters. The function was not converted when the mapping descriptor was changed to irq_fwspec in the generic irqdomain code. Fixes: 11e4438ee330 ("irqdomain: Introduce a firmware-specific IRQ specifier structure") Signed-off-by: Ivan Gorinov <ivan.gorinov@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: https://lkml.kernel.org/r/a234dee27ea60ce76141872da0d6bdb378b2a9ee.1520450752.git.ivan.gorinov@intel.com
2018-03-08x86/devicetree: Initialize device tree before using itIvan Gorinov
Commit 08d53aa58cb1 added CRC32 calculation in early_init_dt_verify() and checking in late initcall of_fdt_raw_init(), making early_init_dt_verify() mandatory. The required call to early_init_dt_verify() was not added to the x86-specific implementation, causing failure to create the sysfs entry in of_fdt_raw_init(). Fixes: 08d53aa58cb1 ("of/fdt: export fdt blob as /sys/firmware/fdt") Signed-off-by: Ivan Gorinov <ivan.gorinov@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Link: https://lkml.kernel.org/r/c8c7e941efc63b5d25ebf9b6350b0f3df38f6098.1520450752.git.ivan.gorinov@intel.com
2018-03-08x86/vsyscall/64: Drop "native" vsyscallsAndy Lutomirski
Since Linux v3.2, vsyscalls have been deprecated and slow. From v3.2 on, Linux had three vsyscall modes: "native", "emulate", and "none". "emulate" is the default. All known user programs work correctly in emulate mode, but vsyscalls turn into page faults and are emulated. This is very slow. In "native" mode, the vsyscall page is easily usable as an exploit gadget, but vsyscalls are a bit faster -- they turn into normal syscalls. (This is in contrast to vDSO functions, which can be much faster than syscalls.) In "none" mode, there are no vsyscalls. For all practical purposes, "native" was really just a chicken bit in case something went wrong with the emulation. It's been over six years, and nothing has gone wrong. Delete it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/519fee5268faea09ae550776ce969fa6e88668b0.1520449896.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-03-08 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix various BPF helpers which adjust the skb and its GSO information with regards to SCTP GSO. The latter is a special case where gso_size is of value GSO_BY_FRAGS, so mangling that will end up corrupting the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s). 2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined due to too old kernel headers in the system, from Jiri. 3) Increase the number of x64 JIT passes in order to allow larger images to converge instead of punting them to interpreter or having them rejected when the interpreter is not built into the kernel, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07bpf, x64: increase number of passesDaniel Borkmann
In Cilium some of the main programs we run today are hitting 9 passes on x64's JIT compiler, and we've had cases already where we surpassed the limit where the JIT then punts the program to the interpreter instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON or insertion failures due to the prog array owner being JITed but the program to insert not (both must have the same JITed/non-JITed property). One concrete case the program image shrunk from 12,767 bytes down to 10,288 bytes where the image converged after 16 steps. I've measured that this took 340us in the JIT until it converges on my i7-6600U. Thus, increase the original limit we had from day one where the JIT covered cBPF only back then before we run into the case (as similar with the complexity limit) where we trip over this and hit program rejections. Also add a cond_resched() into the compilation loop, the JIT process runs without any locks and may sleep anyway. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-03-07Merge branch 'perf/urgent' into perf/core, to resolve conflictIngo Molnar
Conflicts: tools/perf/perf.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07x86/entry/64/compat: Save one instruction in entry_INT80_compat()Dominik Brodowski
As %rdi is never user except in the following push, there is no need to restore %rdi to the original value. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07x86/entry: Do not special-case clone(2) in compat entryDominik Brodowski
With the CPU renaming registers on its own, and all the overhead of the syscall entry/exit, it is doubtful whether the compiled output of mov %r8, %rax mov %rcx, %r8 mov %rax, %rcx jmpq sys_clone is measurably slower than the hand-crafted version of xchg %r8, %rcx So get rid of this special case. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07x86/syscalls: Use COMPAT_SYSCALL_DEFINEx() macros for x86-only compat syscallsDominik Brodowski
While at it, convert declarations of type "unsigned" to "unsigned int". Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07x86/syscalls: Use proper syscall definition for sys_ioperm()Dominik Brodowski
Using SYSCALL_DEFINEx() is recommended, so use it also here. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07x86/entry: Remove stale syscall prototypeDominik Brodowski
sys32_vm86_warning() is long gone. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-07x86/syscalls/32: Simplify $entry == $compat entriesDominik Brodowski
If the compat entry point is equivalent to the native entry point, it does not need to be specified explicitly. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@amacapital.net Cc: viro@zeniv.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-06Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull sigingo fix from Eric Biederman: "The kbuild test robot found that I accidentally moved si_pkey when I was cleaning up siginfo_t. A short followed by an int with the int having 8 byte alignment. Sheesh siginfo_t is a weird structure. I have now corrected it and added build time checks that with a little luck will catch any similar future mistakes. The build time checks were sufficient for me to verify the bug and to verify my fix. So they are at least useful this once." * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal/x86: Include the field offsets in the build time checks signal: Correct the offset of si_pkey in struct siginfo
2018-03-06Drivers: hv: vmbus: Implement Direct Mode for stimer0Michael Kelley
The 2016 version of Hyper-V offers the option to operate the guest VM per-vcpu stimer's in Direct Mode, which means the timer interupts on its own vector rather than queueing a VMbus message. Direct Mode reduces timer processing overhead in both the hypervisor and the guest, and avoids having timer interrupts pollute the VMbus interrupt stream for the synthetic NIC and storage. This patch enables Direct Mode by default on stimer0 when running on a version of Hyper-V that supports it. In prep for coming support of Hyper-V on ARM64, the arch independent portion of the code contains calls to routines that will be populated on ARM64 but are not needed and do nothing on x86. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-06KVM: nVMX: expose VMX capabilities for nested hypervisors to userspacePaolo Bonzini
Use the new MSR feature framework to tell userspace which VMX capabilities are available for nested hypervisors. Before, these were only accessible with the KVM_GET_MSR VCPU ioctl, after VCPUs had been created. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06KVM: nVMX: introduce struct nested_vmx_msrsPaolo Bonzini
Move the MSRs to a separate struct, so that we can introduce a global instance and return it from the /dev/kvm KVM_GET_MSRS ioctl. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06KVM: X86: Don't use PV TLB flush with dedicated physical CPUsWanpeng Li
vCPUs are very unlikely to get preempted when they are the only task running on a CPU. PV TLB flush is slower that the native flush in that case, so disable it. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06KVM: X86: Choose qspinlock when dedicated physical CPUs are availableWanpeng Li
Waiman Long mentioned that: > Generally speaking, unfair lock performs well for VMs with a small > number of vCPUs. Native qspinlock may perform better than pvqspinlock > if there is vCPU pinning and there is no vCPU over-commitment. This patch uses the KVM_HINTS_DEDICATED performance hint, which is provided by the hypervisor admin, to choose the qspinlock algorithm when a dedicated physical CPU is available. PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock PV_DEDICATED = 0, PV_UNHALT = 1: default is Hybrid PV queued/unfair lock PV_DEDICATED = 0, PV_UNHALT = 0: default is tas Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06KVM: Introduce paravirtualization hints and KVM_HINTS_DEDICATEDWanpeng Li
This patch introduces kvm_para_has_hint() to query for hints about the configuration of the guests. The first hint KVM_HINTS_DEDICATED, is set if the guest has dedicated physical CPUs for each vCPU (i.e. pinning and no over-commitment). This allows optimizing spinlocks and tells the guest to avoid PV TLB flush. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06KVM: x86: KVM_CAP_SYNC_REGSKen Hofsass
This commit implements an enhanced x86 version of S390 KVM_CAP_SYNC_REGS functionality. KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers without having to call SET/GET_*REGS”. This reduces ioctl overhead which is particularly important when userspace is making synchronous guest state modifications (e.g. when emulating and/or intercepting instructions). Originally implemented upstream for the S390, the x86 differences follow: - userspace can select the register sets to be synchronized with kvm_run using bit-flags in the kvm_valid_registers and kvm_dirty_registers fields. - vcpu_events is available in addition to the regs and sregs register sets. Signed-off-by: Ken Hofsass <hofsass@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed wrapper around check for reserved kvm_valid_regs. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06kvm: x86: hyperv: guest->host event signaling via eventfdRoman Kagan
In Hyper-V, the fast guest->host notification mechanism is the SIGNAL_EVENT hypercall, with a single parameter of the connection ID to signal. Currently this hypercall incurs a user exit and requires the userspace to decode the parameters and trigger the notification of the potentially different I/O context. To avoid the costly user exit, process this hypercall and signal the corresponding eventfd in KVM, similar to ioeventfd. The association between the connection id and the eventfd is established via the newly introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an (srcu-protected) IDR. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: David Hildenbrand <david@redhat.com> [asm/hyperv.h changes approved by KY Srinivasan. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06kvm: x86: factor out kvm.arch.hyperv (de)initRoman Kagan
Move kvm.arch.hyperv initialization and cleanup to separate functions. For now only a mutex is inited in the former, and the latter is empty; more stuff will go in there in a followup patch. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06signal/x86: Include the field offsets in the build time checksEric W. Biederman
Due to an oversight when refactoring siginfo_t si_pkey has been in the wrong position since 4.16-rc1. Add an explicit check of the offset of every user space field in siginfo_t and compat_siginfo_t to make a mistake like this hard to make in the future. I have run this code on 4.15 and 4.16-rc1 with the position of si_pkey fixed and all of the fields show up in the same location. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All of the conflicts were cases of overlapping changes. In net/core/devlink.c, we have to make care that the resouce size_params have become a struct member rather than a pointer to such an object. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04Merge branch 'x86/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of fixes for x86: - Add missing instruction suffixes to assembly code so it can be compiled by newer GAS versions without warnings. - Switch refcount WARN exceptions to UD2 as we did in general - Make the reboot on Intel Edison platforms work - A small documentation update so text and sample command match" * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation, x86, resctrl: Make text and sample command match x86/platform/intel-mid: Handle Intel Edison reboot correctly x86/asm: Add instruction suffixes to bitops x86/entry/64: Add instruction suffix x86/refcounts: Switch to UD2 for exceptions
2018-03-04Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/pti fixes from Thomas Gleixner: "Three fixes related to melted spectrum: - Sync the cpu_entry_area page table to initial_page_table on 32 bit. Otherwise suspend/resume fails because resume uses initial_page_table and triggers a triple fault when accessing the cpu entry area. - Zero the SPEC_CTL MRS on XEN before suspend to address a shortcoming in the hypervisor. - Fix another switch table detection issue in objtool" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table objtool: Fix another switch table detection issue x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
2018-03-04perf/x86/intel/uncore: Fix Skylake UPI event formatKan Liang
There is no event extension (bit 21) for SKX UPI, so use 'event' instead of 'event_ext'. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1520004150-4855-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>