summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2016-12-11Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11perf/x86: Fix exclusion of BTS and LBR for GoldmontAndi Kleen
An earlier patch allowed enabling PT and LBR at the same time on Goldmont. However it also allowed enabling BTS and LBR at the same time, which is still not supported. Fix this by bypassing the check only for PT. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alexander.shishkin@intel.com Cc: kan.liang@intel.com Cc: <stable@vger.kernel.org> Fixes: ccbebba4c6bf ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it") Link: http://lkml.kernel.org/r/20161209001417.4713-1-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-11MIPS: Lantiq: Fix mask of GPE frequencyHauke Mehrtens
The hardware documentation says bit 11:10 are used for the GPE frequency selection. Fix the mask in the define to match these bits. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Thomas Langer <thomas.langer@intel.com> Cc: linux-mips@linux-mips.org Cc: john@phrozen.org Patchwork: https://patchwork.linux-mips.org/patch/14648/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-12-11MIPS: Return -ENODEV from weak implementation of rtc_mips_set_timeLuuk Paulussen
The sync_cmos_clock function in kernel/time/ntp.c first tries to update the internal clock of the cpu by calling the "update_persistent_clock64" architecture specific function. If this returns -ENODEV, it then tries to update an external RTC using "rtc_set_ntp_time". On the mips architecture, the weak implementation of the underlying function would return 0 if it wasn't overridden. This meant that the sync_cmos_clock function would never try to update an external RTC (if both CONFIG_GENERIC_CMOS_UPDATE and CONFIG_RTC_SYSTOHC are configured) Returning -ENODEV instead, means that an external RTC will be tried. Signed-off-by: Luuk Paulussen <luuk.paulussen@alliedtelesis.co.nz> Reviewed-by: Richard Laing <richard.laing@alliedtelesis.co.nz> Reviewed-by: Scott Parlane <scott.parlane@alliedtelesis.co.nz> Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14649/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-12-10x86/ldt: Make all size computations unsignedThomas Gleixner
ldt->size can never be negative. The helper functions take 'unsigned int' arguments which are assigned from ldt->size. The related user space user_desc struct member entry_number is unsigned as well. But ldt->size itself and a few local variables which are related to ldt->size are type 'int' which makes no sense whatsoever and results in typecasts which make the eyes bleed. Clean it up and convert everything which is related to ldt->size to unsigned it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dan Carpenter <dan.carpenter@oracle.com>
2016-12-10x86/ldt: Make a size argument unsignedDan Carpenter
My static checker complains that we put an upper bound on the "size" argument but not a lower bound. The checker is not smart enough to know the possible ranges of "old_mm->context.ldt->size" from init_new_context_ldt() so it thinks maybe it could be negative. Let's make it unsigned to silence the warning and future proof the code a bit. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: kernel-janitors@vger.kernel.org Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161208105602.GA11382@elgon.mountain Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86: Remove empty idle.h headerThomas Gleixner
One include less is always a good thing(tm). Good riddance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86/amd: Simplify AMD E400 aware idle routineBorislav Petkov
Reorganize the E400 detection now that we have everything in place: switch the CPUs to broadcast mode after the LAPIC has been initialized and remove the facilities that were used previously on the idle path. Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine because alternatives have been applied when the actual detection happens, so the static switching does not take effect and the test will stay false. Use boot_cpu_has_bug() instead which is definitely an improvement over the RDMSR and the cpumask handling. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86/amd: Check for the C1E bug post ACPI subsystem initThomas Gleixner
AMD CPUs affected by the E400 erratum suffer from the issue that the local APIC timer stops when the CPU goes into C1E. Unfortunately there is no way to detect the affected CPUs on early boot. It's only possible to determine the range of possibly affected CPUs from the family/model range. The actual decision whether to enter C1E and thus cause the bug is done by the firmware and we need to detect that case late, after ACPI has been initialized. The current solution is to check in the idle routine whether the CPU is affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is affected and the system is switched into forced broadcast mode. This is ineffective and on non-affected CPUs every entry to idle does the extra RDMSR. After doing some research it turns out that the bits are visible on the boot CPU right after the ACPI subsystem is initialized in the early boot process. So instead of polling for the bits in the idle loop, add a detection function after acpi_subsystem_init() and check for the MSR bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it will stop in C1E state as well. The switch to broadcast mode cannot be done at this point because the boot CPU still uses HPET as a clockevent device and the local APIC timer is not yet calibrated and installed. The switch to broadcast mode on the affected CPUs needs to be done when the local APIC timer is actually set up. This allows to cleanup the amd_e400_idle() function in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86/bugs: Separate AMD E400 erratum and C1E bugThomas Gleixner
The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E state) is a two step process: - Selection of the E400 aware idle routine - Detection whether the platform is affected The idle routine selection happens for possibly affected CPUs depending on family/model/stepping information. These range of CPUs is not necessarily affected as the decision whether to enable the C1E feature is made by the firmware. Unfortunately there is no way to query this at early boot. The current implementation polls a MSR in the E400 aware idle routine to detect whether the CPU is affected. This is inefficient on non affected CPUs because every idle entry has to do the MSR read. There is a better way to detect this before going idle for the first time which requires to seperate the bug flags: X86_BUG_AMD_E400 - Selects the E400 aware idle routine and enables the detection X86_BUG_AMD_APIC_C1E - Set when the platform is affected by E400 Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400 bug bit to select the idle routine which currently does an unconditional detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches to remove the MSR polling and simplify the handling of this misfeature. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09x86/cpufeature: Provide helper to set bugs bitsBorislav Petkov
Will be used in a later patch to set bug bits for bugs which need late detection. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Final batch of SoC fixes A few fixes that have trickled in over the last week, all fixing minor errors in devicetrees -- UART pin assignment on Allwinner H3, correcting number of SATA ports on a Marvell-based Linkstation platform and a display clock fix for Freescale/NXP i.MX7D that fixes a freeze when starting up X" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: orion5x: fix number of sata port for linkstation ls-gl ARM: dts: imx7d: fix LCDIF clock assignment dts: sun8i-h3: correct UART3 pin definitions
2016-12-09Merge tag 'm68k-for-v4.9-tag2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k fixes from Geert Uytterhoeven: - build fix for drivers calling ndelay() in a conditional block without curly braces - defconfig updates * tag 'm68k-for-v4.9-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Fix ndelay() macro m68k/defconfig: Update defconfigs for v4.9-rc1
2016-12-09arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guestMarc Zyngier
The ARMv8 architecture allows the cycle counter to be configured by setting PMSELR_EL0.SEL==0x1f and then accessing PMXEVTYPER_EL0, hence accessing PMCCFILTR_EL0. But it disallows the use of PMSELR_EL0.SEL==0x1f to access the cycle counter itself through PMXEVCNTR_EL0. Linux itself doesn't violate this rule, but we may end up with PMSELR_EL0.SEL being set to 0x1f when we enter a guest. If that guest accesses PMXEVCNTR_EL0, the access may UNDEF at EL1, despite the guest not having done anything wrong. In order to avoid this unfortunate course of events (haha!), let's sanitize PMSELR_EL0 on guest entry. This ensures that the guest won't explode unexpectedly. Cc: stable@vger.kernel.org #4.6+ Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-12-09xen/x86: Increase xen_e820_map to E820_X_MAX possible entriesAlex Thorlton
On systems with sufficiently large e820 tables, and several IOAPICs, it is possible for the XENMEM_machine_memory_map callback (and its counterpart, XENMEM_memory_map) to attempt to return an e820 table with more than 128 entries. This callback adds entries to the BIOS-provided e820 table to account for IOAPIC registers, which, on sufficiently large systems, can result in an e820 table that is too large to copy back into xen_e820_map. This change simply increases the size of xen_e820_map to E820_X_MAX to ensure that there is enough room to store the entire e820 map returned from this callback. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-12-09x86: Make E820_X_MAX unconditionally larger than E820MAXAlex Thorlton
It's really not necessary to limit E820_X_MAX to 128 in the non-EFI case. This commit drops E820_X_MAX's dependency on CONFIG_EFI, so that E820_X_MAX is always at least slightly larger than E820MAX. The real motivation behind this is actually to prevent some issues in the Xen kernel, where the XENMEM_machine_memory_map hypercall can produce an e820 map larger than 128 entries, even on systems where the original e820 table was quite a bit smaller than that, depending on how many IOAPICs are installed on the system. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Ingo Molnar <mingo@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-12-09m68k/atari: Use seq_puts() in atari_get_hardware_list()Markus Elfring
A string which did not contain a data format specification should be put into a sequence. Thus use the corresponding function "seq_puts". This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2016-12-09m68k/amiga: Use seq_puts() in amiga_get_hardware_list()Markus Elfring
A string which did not contain a data format specification should be put into a sequence. Thus use the corresponding function "seq_puts". This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2016-12-08Merge branch 'parisc-4.9-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "Three important fixes for the parisc architecture. Dave provided two patches: One which purges the TLB before setting a PTE entry and a second one which drops unnecessary TLB flushes. Both patches have been tested for one week on the debian buildd servers and prevent random segmentation faults. The patch from me fixes a crash at boot inside the TLB measuring code on SMP machines with PA8000-PA8700 CPUs (specifically A500-44 and J5000 servers)" * 'parisc-4.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix TLB related boot crash on SMP machines parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and flush_icache_page_asm parisc: Purge TLB before setting PTE
2016-12-08parisc: Fix TLB related boot crash on SMP machinesHelge Deller
At bootup we run measurements to calculate the best threshold for when we should be using full TLB flushes instead of just flushing a specific amount of TLB entries. This performance test is run over the kernel text segment. But running this TLB performance test on the kernel text segment turned out to crash some SMP machines when the kernel text pages were mapped as huge pages. To avoid those crashes this patch simply skips this test on some SMP machines and calculates an optimal threshold based on the maximum number of available TLB entries and number of online CPUs. On a technical side, this seems to happen: The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge instruction seems to work without problems even if the pages were mapped as huge pages. But on SMP systems the TLB purge instruction is broadcasted to other CPUs. Those CPUs then crash the machine because the page size is not as expected. C8000 machines with PA8800/PA8900 CPUs were not affected by this problem, because the required cache coherency prohibits to use huge pages at all. Sadly I didn't found any documentation about this behaviour, so this finding is purely based on testing with phyiscal SMP machines (A500-44 and J5000, both were 2-way boxes). Cc: <stable@vger.kernel.org> # v3.18+ Signed-off-by: Helge Deller <deller@gmx.de>
2016-12-08bpf: xdp: Allow head adjustment in XDP progMartin KaFai Lau
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08ARM: dts: orion5x: fix number of sata port for linkstation ls-glRoger Shimizu
Bug report from Debian [0] shows there's minor changed model of Linkstation LS-GL that uses the 2nd SATA port of the SoC. So it's necessary to enable two SATA ports, though for that specific model only the 2nd one is used. [0] https://bugs.debian.org/845611 Fixes: b1742ffa9ddb ("ARM: dts: orion5x: add device tree for buffalo linkstation ls-gl") Reported-by: Ryan Tandy <ryan@nardis.ca> Tested-by: Ryan Tandy <ryan@nardis.ca> Signed-off-by: Roger Shimizu <rogershimizu@gmail.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
2016-12-08KVM: x86: Handle the kthread worker using the new APIPetr Mladek
Use the new API to create and destroy the "kvm-pit" kthread worker. The API hides some implementation details. In particular, kthread_create_worker() allocates and initializes struct kthread_worker. It runs the kthread the right way and stores task_struct into the worker structure. kthread_destroy_worker() flushes all pending works, stops the kthread and frees the structure. This patch does not change the existing behavior except for dynamically allocating struct kthread_worker and storing only the pointer of this structure. It is compile tested only because I did not find an easy way how to run the code. Well, it should be pretty safe given the nature of the change. Signed-off-by: Petr Mladek <pmladek@suse.com> Message-Id: <1476877847-11217-1-git-send-email-pmladek@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-08KVM: nVMX: invvpid handling improvementsJan Dakinevich
- Expose all invalidation types to the L1 - Reject invvpid instruction, if L1 passed zero vpid value to single context invalidations Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-08KVM: nVMX: check host CR3 on vmentry and vmexitLadi Prosek
This commit adds missing host CR3 checks. Before entering guest mode, the value of CR3 is checked for reserved bits. After returning, nested_vmx_load_cr3 is called to set the new CR3 value and check and load PDPTRs. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentryLadi Prosek
Loading CR3 as part of emulating vmentry is different from regular CR3 loads, as implemented in kvm_set_cr3, in several ways. * different rules are followed to check CR3 and it is desirable for the caller to distinguish between the possible failures * PDPTRs are not loaded if PAE paging and nested EPT are both enabled * many MMU operations are not necessary This patch introduces nested_vmx_load_cr3 suitable for CR3 loads as part of nested vmentry and vmexit, and makes use of it on the nested vmentry path. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: propagate errors from prepare_vmcs02Ladi Prosek
It is possible that prepare_vmcs02 fails to load the guest state. This patch adds the proper error handling for such a case. L1 will receive an INVALID_STATE vmexit with the appropriate exit qualification if it happens. A failure to set guest CR3 is the only error propagated from prepare_vmcs02 at the moment. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPTLadi Prosek
KVM does not correctly handle L1 hypervisors that emulate L2 real mode with PAE and EPT, such as Hyper-V. In this mode, the L1 hypervisor populates guest PDPTE VMCS fields and leaves guest CR3 uninitialized because it is not used (see 26.3.2.4 Loading Page-Directory-Pointer-Table Entries). KVM always dereferences CR3 and tries to load PDPTEs if PAE is on. This leads to two related issues: 1) On the first nested vmentry, the guest PDPTEs, as populated by L1, are overwritten in ept_load_pdptrs because the registers are believed to have been loaded in load_pdptrs as part of kvm_set_cr3. This is incorrect. L2 is running with PAE enabled but PDPTRs have been set up by L1. 2) When L2 is about to enable paging and loads its CR3, we, again, attempt to load PDPTEs in load_pdptrs called from kvm_set_cr3. There are no guarantees that this will succeed (it's just a CR3 load, paging is not enabled yet) and if it doesn't, kvm_set_cr3 returns early without persisting the CR3 which is then lost and L2 crashes right after it enables paging. This patch replaces the kvm_set_cr3 call with a simple register write if PAE and EPT are both on. CR3 is not to be interpreted in this case. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entryDavid Matlack
vmx_set_cr0() modifies GUEST_EFER and "IA-32e mode guest" in the current VMCS. Call vmx_set_efer() after vmx_set_cr0() so that emulated VM-entry is more faithful to VMCS12. This patch correctly causes VM-entry to fail when "IA-32e mode guest" is 1 and GUEST_CR0.PG is 0. Previously this configuration would succeed and "IA-32e mode guest" would silently be disabled by KVM. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUIDDavid Matlack
MSR_IA32_CR{0,4}_FIXED1 define which bits in CR0 and CR4 are allowed to be 1 during VMX operation. Since the set of allowed-1 bits is the same in and out of VMX operation, we can generate these MSRs entirely from the guest's CPUID. This lets userspace avoiding having to save/restore these MSRs. This patch also initializes MSR_IA32_CR{0,4}_FIXED1 from the CPU's MSRs by default. This is a saner than the current default of -1ull, which includes bits that the host CPU does not support. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: fix checks on CR{0,4} during virtual VMX operationDavid Matlack
KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning all CR0 and CR4 bits are allowed to be 1 during VMX operation. This does not match real hardware, which disallows the high 32 bits of CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits which are defined in the SDM but missing according to CPUID). A guest can induce a VM-entry failure by setting these bits in GUEST_CR0 and GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are valid. Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing checks on these registers do not verify must-be-0 bits. Fix these checks to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1. This patch should introduce no change in behavior in KVM, since these MSRs are still -1ULL. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: support restore of VMX capability MSRsDavid Matlack
The VMX capability MSRs advertise the set of features the KVM virtual CPU can support. This set of features varies across different host CPUs and KVM versions. This patch aims to addresses both sources of differences, allowing VMs to be migrated across CPUs and KVM versions without guest-visible changes to these MSRs. Note that cross-KVM- version migration is only supported from this point forward. When the VMX capability MSRs are restored, they are audited to check that the set of features advertised are a subset of what KVM and the CPU support. Since the VMX capability MSRs are read-only, they do not need to be on the default MSR save/restore lists. The userspace hypervisor can set the values of these MSRs or read them from KVM at VCPU creation time, and restore the same value after every save/restore. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: nVMX: generate non-true VMX MSRs based on true versionsDavid Matlack
The "non-true" VMX capability MSRs can be generated from their "true" counterparts, by OR-ing the default1 bits. The default1 bits are fixed and defined in the SDM. Since we can generate the non-true VMX MSRs from the true versions, there's no need to store both in struct nested_vmx. This also lets userspace avoid having to restore the non-true MSRs. Note this does not preclude emulating MSR_IA32_VMX_BASIC[55]=0. To do so, we simply need to set all the default1 bits in the true MSRs (such that the true MSRs and the generated non-true MSRs are equal). Signed-off-by: David Matlack <dmatlack@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs.Kyle Huey
The trap flag stays set until software clears it. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: x86: Add kvm_skip_emulated_instruction and use it.Kyle Huey
kvm_skip_emulated_instruction calls both kvm_x86_ops->skip_emulated_instruction and kvm_vcpu_check_singlestep, skipping the emulated instruction and generating a trap if necessary. Replacing skip_emulated_instruction calls with kvm_skip_emulated_instruction is straightforward, except for: - ICEBP, which is already inside a trap, so avoid triggering another trap. - Instructions that can trigger exits to userspace, such as the IO insns, MOVs to CR8, and HALT. If kvm_skip_emulated_instruction does trigger a KVM_GUESTDBG_SINGLESTEP exit, and the handling code for IN/OUT/MOV CR8/HALT also triggers an exit to userspace, the latter will take precedence. The singlestep will be triggered again on the next instruction, which is the current behavior. - Task switch instructions which would require additional handling (e.g. the task switch bit) and are instead left alone. - Cases where VMLAUNCH/VMRESUME do not proceed to the next instruction, which do not trigger singlestep traps as mentioned previously. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12Kyle Huey
We can't return both the pass/fail boolean for the vmcs and the upcoming continue/exit-to-userspace boolean for skip_emulated_instruction out of nested_vmx_check_vmcs, so move skip_emulated_instruction out of it instead. Additionally, VMENTER/VMRESUME only trigger singlestep exceptions when they advance the IP to the following instruction, not when they a) succeed, b) fail MSR validation or c) throw an exception. Add a separate call to skip_emulated_instruction that will later not be converted to the variant that checks the singlestep flag. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: VMX: Reorder some skip_emulated_instruction callsKyle Huey
The functions being moved ahead of skip_emulated_instruction here don't need updated IPs, and skipping the emulated instruction at the end will make it easier to return its value. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08KVM: x86: Add a return value to kvm_emulate_cpuidKyle Huey
Once skipping the emulated instruction can potentially trigger an exit to userspace (via KVM_GUESTDBG_SINGLESTEP) kvm_emulate_cpuid will need to propagate a return value. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-08cris: No need to append -O2 and $(LINUXINCLUDE)Paul Bolle
The make variables asflags-y and ccflags-y are appended with -O2 and $(LINUXINCLUDE). But the build already picks up -O2 from the top Makefile and $(LINUXINCLUDE) from scripts/Makefile.lib. The net effect is that -O2 and the (long) list of include directories are used twice. This is harmless but pointless. So stop appending to these flags. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jesper Nilsson <jespern@axis.com>
2016-12-08xen/pci: Bubble up error and fix description.Konrad Rzeszutek Wilk
The function is never called under PV guests, and only shows up when MSI (or MSI-X) cannot be allocated. Convert the message to include the error value. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-12-07arm/xen: Use alloc_percpu rather than __alloc_percpuJulien Grall
The function xen_guest_init is using __alloc_percpu with an alignment which are not power of two. However, the percpu allocator never supported alignments which are not power of two and has always behaved incorectly in thise case. Commit 3ca45a4 "percpu: ensure requested alignment is power of two" introduced a check which trigger a warning [1] when booting linux-next on Xen. But in reality this bug was always present. This can be fixed by replacing the call to __alloc_percpu with alloc_percpu. The latter will use an alignment which are a power of two. [1] [ 0.023921] illegal size (48) or align (48) for percpu allocation [ 0.024167] ------------[ cut here ]------------ [ 0.024344] WARNING: CPU: 0 PID: 1 at linux/mm/percpu.c:892 pcpu_alloc+0x88/0x6c0 [ 0.024584] Modules linked in: [ 0.024708] [ 0.024804] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc7-next-20161128 #473 [ 0.025012] Hardware name: Foundation-v8A (DT) [ 0.025162] task: ffff80003d870000 task.stack: ffff80003d844000 [ 0.025351] PC is at pcpu_alloc+0x88/0x6c0 [ 0.025490] LR is at pcpu_alloc+0x88/0x6c0 [ 0.025624] pc : [<ffff00000818e678>] lr : [<ffff00000818e678>] pstate: 60000045 [ 0.025830] sp : ffff80003d847cd0 [ 0.025946] x29: ffff80003d847cd0 x28: 0000000000000000 [ 0.026147] x27: 0000000000000000 x26: 0000000000000000 [ 0.026348] x25: 0000000000000000 x24: 0000000000000000 [ 0.026549] x23: 0000000000000000 x22: 00000000024000c0 [ 0.026752] x21: ffff000008e97000 x20: 0000000000000000 [ 0.026953] x19: 0000000000000030 x18: 0000000000000010 [ 0.027155] x17: 0000000000000a3f x16: 00000000deadbeef [ 0.027357] x15: 0000000000000006 x14: ffff000088f79c3f [ 0.027573] x13: ffff000008f79c4d x12: 0000000000000041 [ 0.027782] x11: 0000000000000006 x10: 0000000000000042 [ 0.027995] x9 : ffff80003d847a40 x8 : 6f697461636f6c6c [ 0.028208] x7 : 6120757063726570 x6 : ffff000008f79c84 [ 0.028419] x5 : 0000000000000005 x4 : 0000000000000000 [ 0.028628] x3 : 0000000000000000 x2 : 000000000000017f [ 0.028840] x1 : ffff80003d870000 x0 : 0000000000000035 [ 0.029056] [ 0.029152] ---[ end trace 0000000000000000 ]--- [ 0.029297] Call trace: [ 0.029403] Exception stack(0xffff80003d847b00 to 0xffff80003d847c30) [ 0.029621] 7b00: 0000000000000030 0001000000000000 ffff80003d847cd0 ffff00000818e678 [ 0.029901] 7b20: 0000000000000002 0000000000000004 ffff000008f7c060 0000000000000035 [ 0.030153] 7b40: ffff000008f79000 ffff000008c4cd88 ffff80003d847bf0 ffff000008101778 [ 0.030402] 7b60: 0000000000000030 0000000000000000 ffff000008e97000 00000000024000c0 [ 0.030647] 7b80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.030895] 7ba0: 0000000000000035 ffff80003d870000 000000000000017f 0000000000000000 [ 0.031144] 7bc0: 0000000000000000 0000000000000005 ffff000008f79c84 6120757063726570 [ 0.031394] 7be0: 6f697461636f6c6c ffff80003d847a40 0000000000000042 0000000000000006 [ 0.031643] 7c00: 0000000000000041 ffff000008f79c4d ffff000088f79c3f 0000000000000006 [ 0.031877] 7c20: 00000000deadbeef 0000000000000a3f [ 0.032051] [<ffff00000818e678>] pcpu_alloc+0x88/0x6c0 [ 0.032229] [<ffff00000818ece8>] __alloc_percpu+0x18/0x20 [ 0.032409] [<ffff000008d9606c>] xen_guest_init+0x174/0x2f4 [ 0.032591] [<ffff0000080830f8>] do_one_initcall+0x38/0x130 [ 0.032783] [<ffff000008d90c34>] kernel_init_freeable+0xe0/0x248 [ 0.032995] [<ffff00000899a890>] kernel_init+0x10/0x100 [ 0.033172] [<ffff000008082ec0>] ret_from_fork+0x10/0x50 Reported-by: Wei Chen <wei.chen@arm.com> Link: https://lkml.org/lkml/2016/11/28/669 Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Cc: stable@vger.kernel.org
2016-12-07ARM: dts: imx7d: fix LCDIF clock assignmentStefan Agner
The eLCDIF IP of the i.MX 7 SoC knows multiple clocks and lists them separately: Clock Clock Root Description apb_clk MAIN_AXI_CLK_ROOT AXI clock pix_clk LCDIF_PIXEL_CLK_ROOT Pixel clock ipg_clk_s MAIN_AXI_CLK_ROOT Peripheral access clock All of them are switched by a single gate, which is part of the IMX7D_LCDIF_PIXEL_ROOT_CLK clock. Hence using that clock also for the AXI bus clock (clock-name "axi") makes sure the gate gets enabled when accessing registers. There seem to be no separate AXI display clock, and the clock is optional. Hence remove the dummy clock. This fixes kernel freezes when starting the X-Server (which disables/re-enables the display controller). Fixes: e8ed73f691bd ("ARM: dts: imx7d: add lcdif support") Signed-off-by: Stefan Agner <stefan@agner.ch> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com> Acked-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2016-12-07dts: sun8i-h3: correct UART3 pin definitionsJorik Jonker
In a previous commit, I made a copy/paste error in the pinmux definitions of UART3: PG{13,14} instead of PA{13,14}. This commit takes care of that. I have tested this commit on Orange Pi PC and Orange Pi Plus, and it works for these boards. Fixes: e3d11d3c45c5 ("dts: sun8i-h3: add pinmux definitions for UART2-3") Signed-off-by: Jorik Jonker <jorik@kippendief.biz> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2016-12-07Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a core dumping crash fix, a guess-unwinder regression fix, plus three build warning fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind: Fix guess-unwinder regression x86/build: Annotate die() with noreturn to fix build warning on clang x86/platform/olpc: Fix resume handler build warning x86/apic/uv: Silence a shift wrapping warning x86/coredump: Always use user_regs_struct for compat_elf_gregset_t
2016-12-07Merge branch 'pl061' into develLinus Walleij
2016-12-07gpio: pl061: move platform data into driverLinus Walleij
No boardfile defines any PL061 platform data anymore: the Integrator IM/PD-1 includes the file but is not making use of the struct. Let's delete the include and all references, then move the platform data into the driver for later consolidation into the driver state container. The only resource defined by the IM/PD-1 is the IRQ which is passed through the AMBA PrimeCell bus abstraction struct amba_device. Cc: arm@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-12-07s390/sysinfo: show partition extended name and UUID if availableViktor Mihajlovski
Extract extended name and UUID from SYSIB 2.2.2 data. As the code to convert the raw extended name into printable format can be reused by stsi_2_2_2 we're moving the conversion code into a separate function convert_ext_name. Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-07parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and ↵John David Anglin
flush_icache_page_asm We have four routines in pacache.S that use temporary alias pages: copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and flush_icache_page_asm(). copy_user_page_asm() and clear_user_page_asm() don't purge the TLB entry used for the operation. flush_dcache_page_asm() and flush_icache_page_asm do purge the entry. Presumably, this was thought to optimize TLB use. However, the operation is quite heavy weight on PA 1.X processors as we need to take the TLB lock and a TLB broadcast is sent to all processors. This patch removes the purges from flush_dcache_page_asm() and flush_icache_page_asm. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Helge Deller <deller@gmx.de>
2016-12-07parisc: Purge TLB before setting PTEJohn David Anglin
The attached change interchanges the order of purging the TLB and setting the corresponding page table entry. TLB purges are strongly ordered. It occurred to me one night that setting the PTE first might have subtle ordering issues on SMP machines and cause random memory corruption. A TLB lock guards the insertion of user TLB entries. So after the TLB is purged, a new entry can't be inserted until the lock is released. This ensures that the new PTE value is used when the lock is released. Since making this change, no random segmentation faults have been observed on the Debian hppa buildd servers. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Helge Deller <deller@gmx.de>