summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2016-03-09powerpc/eeh: powerpc/eeh: Support error recovery for VF PEWei Yang
PFs are enumerated on PCI bus, while VFs are created by PF's driver. In EEH recovery, it has two cases: 1. Device and driver is EEH aware, error handlers are called. 2. Device and driver is not EEH aware, un-plug the device and plug it again by enumerating it. The special thing happens on the second case. For a PF, we could use the original pci core to enumerate the bus, while for VF we need to record the VFs which aer un-plugged then plug it again. Also The patch caches the VF index in pci_dn, which can be used to calculate VF's bus, device and function number. Those information helps to locate the VF's PCI device instance when doing hotplug during EEH recovery if necessary. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/powernv: Support PCI config restore for VFsWei Yang
After PE reset, OPAL API opal_pci_reinit() is called on all devices contained in the PE to reinitialize them. While skiboot is not aware of VFs, we have to implement the function in kernel to reinitialize VFs after reset on PE for VFs. In this patch, two functions pnv_pci_fixup_vf_mps() and pnv_eeh_restore_vf_config() both manipulate the MPS of the VF, since for a VF it has three cases. 1. Normal creation for a VF In this case, pnv_pci_fixup_vf_mps() is called to make the MPS a proper value compared with its parent. 2. EEH recovery without VF removed In this case, MPS is stored in pci_dn and pnv_eeh_restore_vf_config() is called to restore it and reinitialize other part. 3. EEH recovery with VF removed In this case, VF will be removed then re-created. Both functions are called. First pnv_pci_fixup_vf_mps() is called to store the proper MPS to pci_dn and then pnv_eeh_restore_vf_config() is called to do proper thing. This introduces two functions: pnv_pci_fixup_vf_mps() to fixup the VF's MPS to make sure it is equal to parent's and store this value in pci_dn for future use. pnv_eeh_restore_vf_config() to re-initialize on VF by restoring MPS, disabling completion timeout, enabling SERR, etc. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/powernv: Support EEH reset for VF PEWei Yang
PEs for VFs don't have primary bus. So they have to have their own reset backend, which is used during EEH recovery. The patch implements the reset backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained in the PE. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Create PE for VFsWei Yang
This creates PEs for VFs in the weak function pcibios_bus_add_device(). Those PEs for VFs are identified with newly introduced flag EEH_PE_VF so that we treat them differently during EEH recovery. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: EEH device for VFWei Yang
VFs and their corresponding pdn are created and released dynamically when their PF's SRIOV capability is enabled and disabled. This creates and releases EEH devices for VFs when creating and releasing their pdn instances, which means EEH devices and pdn instances have same life cycle. Also, VF's EEH device is identified by (struct eeh_dev::physfn). Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Cache normal BARs, not windows or IOV BARsWei Yang
This restricts the EEH address cache to use only the first 7 BARs. This makes __eeh_addr_cache_insert_dev() ignore PCI bridge window and IOV BARs. As the result of this change, eeh_addr_cache_get_dev() will return VFs from VF's resource addresses instead of parent PFs. This also removes PCI bridge check as we limit __eeh_addr_cache_insert_dev() to 7 BARs and this effectively excludes PCI bridges from being cached. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/pci: Remove VFs prior to PFWei Yang
As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove") indicates, VFs which is on the same PCI bus as their PF, should be removed before the PF. Otherwise, we might run into kernel crash at PCI unplugging time. This applies the above pattern to powerpc PCI hotplug path. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Reworked eeh_pe_bus_get()Gavin Shan
The original implementation is ugly: unnecessary if statements and "out" tag. This reworks the function to avoid above weaknesses. No functional changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-08PCI: Include pci/hotplug Kconfig directly from pci/KconfigBjorn Helgaas
Include pci/hotplug/Kconfig directly from pci/Kconfig, so arches don't have to source both pci/Kconfig and pci/hotplug/Kconfig. Note that this effectively adds pci/hotplug/Kconfig to the following arches, because they already sourced drivers/pci/Kconfig but they previously did not source drivers/pci/hotplug/Kconfig: alpha arm avr32 frv m68k microblaze mn10300 sparc unicore32 Inspired-by-patch-from: Bogicevic Sasa <brutallesale@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08PCI: Include pci/pcie/Kconfig directly from pci/KconfigBogicevic Sasa
Include pci/pcie/Kconfig directly from pci/Kconfig, so arches don't have to source both pci/Kconfig and pci/pcie/Kconfig. Note that this effectively adds pci/pcie/Kconfig to the following arches, because they already sourced drivers/pci/Kconfig but they previously did not source drivers/pci/pcie/Kconfig: alpha avr32 blackfin frv m32r m68k microblaze mn10300 parisc sparc unicore32 xtensa [bhelgaas: changelog, source pci/pcie/Kconfig at top of pci/Kconfig, whitespace] Signed-off-by: Sasa Bogicevic <brutallesale@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08microblaze/PCI: Support generic Xilinx AXI PCIe Host Bridge IP driverBharat Kumar Gogada
Modify the Microblaze PCI subsystem to work with the generic drivers/pci/host/pcie-xilinx.c driver on Microblaze and Zynq. [bhelgaas: changelog] Signed-off-by: Bharat Kumar Gogada <bharatku@xilinx.com> Signed-off-by: Ravi Kiran Gummaluri <rgummal@xilinx.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Michal Simek <michal.simek@xilinx.com>
2016-03-08Input: ad7879 - move header to platform_data directoryStefan Agner
The header file is used by the SPI and I2C variant of the driver. Therefore, move it to a more generic place under platform_data. Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-03-08PCI: Set ROM shadow location in arch code, not in PCI coreBjorn Helgaas
IORESOURCE_ROM_SHADOW means there is a copy of a device's option ROM in RAM. The existence of such a copy and its location are arch-specific. Previously the IORESOURCE_ROM_SHADOW flag was set in arch code, but the 0xC0000-0xDFFFF location was hard-coded into the PCI core. If we're using a shadow copy in RAM, disable the ROM BAR and release the address space it was consuming. Move the location information from the PCI core to the arch code that sets IORESOURCE_ROM_SHADOW. Save the location of the RAM copy in the struct resource for PCI_ROM_RESOURCE. After this change, pci_map_rom() will call pci_assign_resource() and pci_enable_rom() for these IORESOURCE_ROM_SHADOW resources, which we did not do before. This is safe because: - pci_assign_resource() will do nothing because the resource is marked IORESOURCE_PCI_FIXED, which means we can't move it, and - pci_enable_rom() will not turn on the ROM BAR's enable bit because the resource is marked IORESOURCE_ROM_SHADOW, which means it is in RAM rather than in PCI memory space. Storing the location in the struct resource means "lspci" will show the shadow location, not the value from the ROM BAR. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08PCI: Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXEDBjorn Helgaas
A shadow copy of an option ROM is placed by the BIOS as a fixed address. Set IORESOURCE_PCI_FIXED to indicate that we can't move the shadow copy. This prevents warnings like the following when we assign resources: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment This warning is emitted by pdev_sort_resources(), which already ignores IORESOURCE_PCI_FIXED resources. Link: http://lkml.kernel.org/r/CA+55aFyVMfTBB0oz_yx8+eQOEJnzGtCsYSj9QuhEpdZ9BHdq5A@mail.gmail.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-08x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARsBjorn Helgaas
The Home Agent and PCU PCI devices in Broadwell-EP have a non-BAR register where a BAR should be. We don't know what the side effects of sizing the "BAR" would be, and we don't know what address space the "BAR" might appear to describe. Mark these devices as having non-compliant BARs so the PCI core doesn't touch them. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Andi Kleen <ak@linux.intel.com> CC: stable@vger.kernel.org
2016-03-08unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definitionBjorn Helgaas
HAVE_ARCH_PCI_SET_DMA_MASK is unused, so remove the #define for it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: GUAN Xuetao <gxt@mprc.pku.edu.cn>
2016-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08x86/mm, x86/mce: Add memcpy_mcsafe()Tony Luck
Make use of the EXTABLE_FAULT exception table entries to write a kernel copy routine that doesn't crash the system if it encounters a machine check. Prime use case for this is to copy from large arrays of non-volatile memory used as storage. We have to use an unrolled copy loop for now because current hardware implementations treat a machine check in "rep mov" as fatal. When that is fixed we can simplify. Return type is a "bool". True means that we copied OK, false means that it didn't. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@gmail.com> Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08perf/x86/intel/rapl: Simplify quirk handling even moreBorislav Petkov
Drop the quirk() function pointer in favor of a simple boolean which says whether the quirk should be applied or not. Update comment while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-tip-commits@vger.kernel.org Link: http://lkml.kernel.org/r/20160308164041.GF16568@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08s390: Fix misspellings in commentsAdam Buchbinder
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: split arch/s390/mm/pgtable.cMartin Schwidefsky
The pgtable.c file is quite big, before it grows any larger split it into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related header definitions into the new gmap.h header and all of the pgste helpers from pgtable.h to pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline pmdp_xxx functions from pgtable.hMartin Schwidefsky
The pmdp_xxx function are smaller than their ptep_xxx counterparts but to keep things symmetrical unline them as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline ptep_xxx functions from pgtable.hMartin Schwidefsky
The code in the various ptep_xxx functions has grown quite large, consolidate them to four out-of-line functions: ptep_xchg_direct to exchange a pte with another with immediate flushing ptep_xchg_lazy to exchange a pte with another in a batched update ptep_modify_prot_start to begin a protection flags update ptep_modify_prot_commit to commit a protection flags update Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08arm64: dts: juno/vexpress: fix node name unit-address presence warningsSudeep Holla
Commit fa38a82096a1 ("scripts/dtc: Update to upstream version 53bf130b1cdd") added warnings on node name unit-address presence/absence mismatch in device trees. This patch fixes those warning on all the juno/vexpress platforms where unit-address is present in node name while the reg/ranges property is not present. It also adds unit-address to all smb bus node. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2016-03-08Merge branch 'linus' into x86/fpu, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/asm-offsets: Remove PARAVIRT_enabledAndy Lutomirski
It no longer has any users. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabledAndy Lutomirski
x86_64 has very clean espfix handling on paravirt: espfix64 is set up in native_iret, so paravirt systems that override iret bypass espfix64 automatically. This is robust and straightforward. x86_32 is messier. espfix is set up before the IRET paravirt patch point, so it can't be directly conditionalized on whether we use native_iret. We also can't easily move it into native_iret without regressing performance due to a bizarre consideration. Specifically, on 64-bit kernels, the logic is: if (regs->ss & 0x4) setup_espfix; On 32-bit kernels, the logic is: if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 && (regs->flags & X86_EFLAGS_VM) == 0) setup_espfix; The performance of setup_espfix itself is essentially irrelevant, but the comparison happens on every IRET so its performance matters. On x86_64, there's no need for any registers except flags to implement the comparison, so we fold the whole thing into native_iret. On x86_32, we don't do that because we need a free register to implement the comparison efficiently. We therefore do espfix setup before restoring registers on x86_32. This patch gets rid of the explicit paravirt_enabled check by introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE to skip espfix on paravirt systems where iret != native_iret. This is also messy, but it's at least in line with other things we do. This improves espfix performance by removing a branch, but no one cares. More importantly, it removes a paravirt_enabled user, which is good because paravirt_enabled is ill-defined and is going away. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08KVM: s390: allocate only one DMA page per VMDavid Hildenbrand
We can fit the 2k for the STFLE interpretation and the crypto control block into one DMA page. As we now only have to allocate one DMA page, we can clean up the code a bit. As a nice side effect, this also fixes a problem with crycbd alignment in case special allocation debug options are enabled, debugged by Sascha Silbe. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: enable STFLE interpretation only if enabled for the guestDavid Hildenbrand
Not setting the facility list designation disables STFLE interpretation, this is what we want if the guest was told to not have it. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: wake up when the VCPU cpu timer expiresDavid Hildenbrand
When the VCPU cpu timer expires, we have to wake up just like when the ckc triggers. For now, setting up a cpu timer in the guest and going into enabled wait will never lead to a wakeup. This patch fixes this problem. Just as for the ckc, we have to take care of waking up too early. We have to recalculate the sleep time and go back to sleep. Please note that the timer callback calls kvm_s390_get_cpu_timer() from interrupt context. As the timer is canceled when leaving handle_wait(), and we don't do any VCPU cpu timer writes/updates in that function, we can be sure that we will never try to read the VCPU cpu timer from the same cpu that is currentyl updating the timer (deadlock). Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Tested-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: step the VCPU timer while in enabled waitDavid Hildenbrand
The cpu timer is a mean to measure task execution time. We want to account everything for a VCPU for which it is responsible. Therefore, if the VCPU wants to sleep, it shall be accounted for it. We can easily get this done by not disabling cpu timer accounting when scheduled out while sleeping because of enabled wait. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: protect VCPU cpu timer with a seqcountDavid Hildenbrand
For now, only the owning VCPU thread (that has loaded the VCPU) can get a consistent cpu timer value when calculating the delta. However, other threads might also be interested in a more recent, consistent value. Of special interest will be the timer callback of a VCPU that executes without having the VCPU loaded and could run in parallel with the VCPU thread. The cpu timer has a nice property: it is only updated by the owning VCPU thread. And speaking about accounting, a consistent value can only be calculated by looking at cputm_start and the cpu timer itself in one shot, otherwise the result might be wrong. As we only have one writing thread at a time (owning VCPU thread), we can use a seqcount instead of a seqlock and retry if the VCPU refreshed its cpu timer. This avoids any heavy locking and only introduces a counter update/check plus a handful of smp_wmb(). The owning VCPU thread should never have to retry on reads, and also for other threads this might be a very rare scenario. Please note that we have to use the raw_* variants for locking the seqcount as lockdep will produce false warnings otherwise. The rq->lock held during vcpu_load/put is also acquired from hardirq context. Lockdep cannot know that we avoid potential deadlocks by disabling preemption and thereby disable concurrent write locking attempts (via vcpu_put/load). Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: step VCPU cpu timer during kvm_run ioctlDavid Hildenbrand
Architecturally we should only provide steal time if we are scheduled away, and not if the host interprets a guest exit. We have to step the guest CPU timer in these cases. In the first shot, we will step the VCPU timer only during the kvm_run ioctl. Therefore all time spent e.g. in interception handlers or on irq delivery will be accounted for that VCPU. We have to take care of a few special cases: - Other VCPUs can test for pending irqs. We can only report a consistent value for the VCPU thread itself when adding the delta. - We have to take care of STP sync, therefore we have to extend kvm_clock_sync() and disable preemption accordingly - During any call to disable/enable/start/stop we could get premeempted and therefore get start/stop calls. Therefore we have to make sure we don't get into an inconsistent state. Whenever a VCPU is scheduled out, sleeping, in user space or just about to enter the SIE, the guest cpu timer isn't stepped. Please note that all primitives are prepared to be called from both environments (cpu timer accounting enabled or not), although not completely used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called while cpu timer accounting is enabled). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: abstract access to the VCPU cpu timerDavid Hildenbrand
We want to manually step the cpu timer in certain scenarios in the future. Let's abstract any access to the cpu timer, so we can hide the complexity internally. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: store cpu id in vcpu->cpu when scheduled inDavid Hildenbrand
By storing the cpu id, we have a way to verify if the current cpu is owning a VCPU. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: Add diag "watchdog functions" to trace event decodingAlexander Yarygin
DIAG 0x288 may occur now. Let's add its code to the diag table in sie.h. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08x86/nmi: Mark 'ignore_nmis' as __read_mostlyKostenzer Felix
ignore_nmis is used in two distinct places: 1. modified through {stop,restart}_nmi by alternative_instructions 2. read by do_nmi to determine if default_do_nmi should be called or not thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath. Signed-off-by: Kostenzer Felix <fkostenzer@live.at> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUSDavid Hildenbrand
With MACHINE_HAS_VX, we convert the floating point registers from the vector registeres when storing the status. For other VCPUs, these are stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs, which resolves to the currently loaded VCPU. So kvm_s390_store_status_unloaded() currently writes the wrong floating point registers (converted from the vector registers) when called from another VCPU on a z13. This is only the case for old user space not handling SIGP STORE STATUS and SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All other calls come from the loaded VCPU via kvm_s390_store_status(). Fixes: 9abc2a08a7d6 (KVM: s390: fix memory overwrites when vx is disabled) Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08Merge branch 'kvm-ppc-fixes' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
2016-03-08KVM: VMX: disable PEBS before a guest entryRadim Krčmář
Linux guests on Haswell (and also SandyBridge and Broadwell, at least) would crash if you decided to run a host command that uses PEBS, like perf record -e 'cpu/mem-stores/pp' -a This happens because KVM is using VMX MSR switching to disable PEBS, but SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it isn't safe: When software needs to reconfigure PEBS facilities, it should allow a quiescent period between stopping the prior event counting and setting up a new PEBS event. The quiescent period is to allow any latent residual PEBS records to complete its capture at their previously specified buffer address (provided by IA32_DS_AREA). There might not be a quiescent period after the MSR switch, so a CPU ends up using host's MSR_IA32_DS_AREA to access an area in guest's memory. (Or MSR switching is just buggy on some models.) The guest can learn something about the host this way: If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results in #PF where we leak host's MSR_IA32_DS_AREA through CR2. After that, a malicious guest can map and configure memory where MSR_IA32_DS_AREA is pointing and can therefore get an output from host's tracing. This is not a critical leak as the host must initiate with PEBS tracing and I have not been able to get a record from more than one instruction before vmentry in vmx_vcpu_run() (that place has most registers already overwritten with guest's). We could disable PEBS just few instructions before vmentry, but disabling it earlier shouldn't affect host tracing too much. We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that optimization isn't worth its code, IMO. (If you are implementing PEBS for guests, be sure to handle the case where both host and guest enable PEBS, because this patch doesn't.) Fixes: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.") Cc: <stable@vger.kernel.org> Reported-by: Jiří Olša <jolsa@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: micro-optimize gpte_accessPaolo Bonzini
Avoid AND-NOT, most x86 processor lack an instruction for it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: simplify last_pte_bitmapPaolo Bonzini
Branch-free code is fun and everybody knows how much Avi loves it, but last_pte_bitmap takes it a bit to the extreme. Since the code is simply doing a range check, like (level == 1 || ((gpte & PT_PAGE_SIZE_MASK) && level < N) we can make it branch-free without storing the entire truth table; it is enough to cache N. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: coalesce more page zapping in mmu_sync_childrenPaolo Bonzini
mmu_sync_children can only process up to 16 pages at a time. Check if we need to reschedule, and do not bother zapping the pages until that happens. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: move zap/flush to kvm_mmu_get_pagePaolo Bonzini
kvm_mmu_get_page is the only caller of kvm_sync_page_transient and kvm_sync_pages. Moving the handling of the invalid_list there removes the need for the underdocumented kvm_sync_page_transient function. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page*Paolo Bonzini
Return true if the page was synced (and the TLB must be flushed) and false if the page was zapped. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: cleanup __kvm_sync_page and its callersPaolo Bonzini
Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes things unnecessarily tricky. If kvm_mmu_prepare_zap_page is called, it will call kvm_unlink_unsync_page too. So kvm_unlink_unsync_page can be called just as well at the beginning or the end of __kvm_sync_page... which means that we might do it in kvm_sync_page too and remove the parameter. kvm_sync_page ends up being the same code that kvm_sync_pages used to have before the previous patch. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: use kvm_sync_page in kvm_sync_pagesPaolo Bonzini
If the last argument is true, kvm_unlink_unsync_page is called anyway in __kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page itself). Therefore, kvm_sync_pages can just call kvm_sync_page, instead of going through kvm_unlink_unsync_page+__kvm_sync_page. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: move TLB flush out of __kvm_sync_pagePaolo Bonzini
By doing this, kvm_sync_pages can use __kvm_sync_page instead of reinventing it. Because of kvm_mmu_flush_or_zap, the code does not end up being more complex than before, and more cleanups to kvm_sync_pages will come in the next patches. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08KVM: MMU: introduce kvm_mmu_flush_or_zapPaolo Bonzini
This is a generalization of mmu_pte_write_flush_tlb, that also takes care of calling kvm_mmu_commit_zap_page. The next patches will introduce more uses. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08x86/apic: Deinline _flat_send_IPI_mask, save ~150 bytesDenys Vlasenko
_flat_send_IPI_mask: 157 bytes, 3 callsites text data bss dec hex filename 96183823 20860520 36122624 153166967 9212477 vmlinux1_before 96183699 20860520 36122624 153166843 92123fb vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Borislav Petkov <bp@alien.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1457287876-6001-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>