linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2012-03-08	TTY: serialP, remove unused material	Jiri Slaby
	First, remove unused macro and rs_multiport_struct structure. Nobody uses them at all. Further, the 2 drivers (they are below) which use the rest of structures from serialP.h (async_struct and serial_state) do not use all the members. Remove the members: * which are unused or * which are only initialized and never used for something real. Everybody should avoid the structures with a looong distance. Finally, remove the ALPHA kludge MCR quirks. They are 1:1 copy from 8250.h. No need to redefine them here. The 2 promised users of the structures: arch/ia64/hp/sim/simserial.c drivers/tty/amiserial.c Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	TTY: remove serialP.h inclusion from some files	Jiri Slaby
	All of them do not use the ugly interface defined in that header. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	TTY: iss/console, use tty_port	Jiri Slaby
	Even though the port is not used for anything real there yet, this will change as tty buffers will be in tty_port in the near future. So the port will be needed in all drivers. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	XTENSA: iss/console, fix potential deadlock	Jiri Slaby
	If the timer ticks while we are holding the spinlock, the system deadlocks. It is due to synchronous del_timer. So to fix that, use spinlocks that properly disable bottom halves. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	XTENSA: iss/console, use setup_timer	Jiri Slaby
	Use setup_timer instead of explicit assignments. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	TTY: remove unneeded tty->index checks	Jiri Slaby
	Checking if tty->index is in bounds is not needed. The tty has the index set in the initial open. This is done in get_tty_driver. And it can be only in interval <0,driver->num). So remove the tests which check exactly this interval. Some are left untouched as they check against the current backing device count. (Leaving apart that the check is racy in most of the cases.) Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	TTY: srmcons, convert to use tty_port	Jiri Slaby
	This is needed because the tty buffer will become a tty_port member later. That will help us to wipe out most of the races and checks for the tty pointer in hot paths. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	ALPHA: srmcons, fix racy singleton structure	Jiri Slaby
	The test and the assignment were racy. Make it really a singleton. This is achieved by one global variable initialized at the module init. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	ALPHA: srmcons, use timer functions	Jiri Slaby
	It makes the code more readable. We move the setup to the allocation location because we need to initialize timers only once. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	TTY: remove re-assignments to tty_driver members	Jiri Slaby
	All num, magic and owner are set by alloc_tty_driver. No need to re-set them on each allocation site. pti driver sets something different to what it passes to alloc_tty_driver. It is not a bug, since we don't use the lines parameter in any way. Anyway this is fixed, and now we do the right thing. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08	Merge branch 'fixes' of ↵	Olof Johansson
	git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: Fix module build errors with CONFIG_OMAP4_ERRATA_I688 ARM: OMAP: id: Add missing break statement in omap3xxx_check_revision ARM: OMAP2+: Remove apply_uV constraints for fixed regulator ARM: OMAP: irqs: Fix NR_IRQS value to handle PRCM interrupts
2012-03-08	ARM: OMAP2+: Fix build for omap4 only builds with missing include of linux/bug.h	Tony Lindgren
	Found few more with randconfig generated .configs: In file included from arch/arm/mach-omap2/prm-regbits-34xx.h:17, from arch/arm/mach-omap2/vc.c:18: arch/arm/mach-omap2/prm2xxx_3xxx.h: In function ‘omap2_prm_read_mod_reg’: arch/arm/mach-omap2/prm2xxx_3xxx.h:239: error: implicit declaration of function ‘WARN’ In file included from arch/arm/mach-omap2/powerdomain44xx.c:22: arch/arm/mach-omap2/prm2xxx_3xxx.h: In function ‘omap2_prm_read_mod_reg’: arch/arm/mach-omap2/prm2xxx_3xxx.h:239: error: implicit declaration of function ‘WARN’ This is because omap2_prm functions are currently just stubs for omap4 only builds. Signed-off-by: Tony Lindgren <tony@atomide.com>
2012-03-08	Merge branch 'regulator' of git://github.com/hzhuang1/linux into next/drivers	Olof Johansson
	* 'regulator' of git://github.com/hzhuang1/linux: (2 commits) regulator: Remove bq24022 regulator driver pxa: magician/hx4700: Convert to gpio-regulator from bq24022 (plus update to v3.3-rc6)
2012-03-08	Merge branch 'dt' of git://github.com/hzhuang1/linux into next/dt	Olof Johansson
	* 'dt' of git://github.com/hzhuang1/linux: (6 commits) Document: devicetree: add OF documents for arch-mmp ARM: dts: append DTS file of pxa168 ARM: mmp: append OF support on pxa168 ARM: mmp: enable rtc clk in pxa168 i2c: pxa: add OF support serial: pxa: add OF support (plus update to v3.3-rc6)
2012-03-08	Merge tag 'imx35-imx5-aips-setup' of ↵	Olof Johansson
	git://git.pengutronix.de/git/imx/linux-2.6 into next/soc i.MX35/5 AIPS setup Includes sync up to 3.3-rc6 * tag 'imx35-imx5-aips-setup' of git://git.pengutronix.de/git/imx/linux-2.6: ARM: mx35: Setup the AIPS registers ARM: mx5: Use common function for configuring AIPS
2012-03-08	ARM: S3C2440: Fixed build error for s3c244x	Kukjin Kim
	Fixed following: arch/arm/mach-s3c2440/s3c244x.c: In function 's3c244x_restart': arch/arm/mach-s3c2440/s3c244x.c:209: error: expected declaration or statement at end of input make[1]: * [arch/arm/mach-s3c24xx/s3c244x.o] Error 1 make: * [arch/arm/mach-s3c24xx] Error 2 Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2012-03-08	Merge branch 'next/cleanup-s3c24xx' of ↵	Olof Johansson
	git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/cleanup * 'next/cleanup-s3c24xx' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: (24 commits) ARM: S3C24XX: remove call to s3c24xx_setup_clocks ARM: S3C24XX: add get_rate for clk_p on S3C2416/2443 ARM: S3C24XX: add get_rate for clk_h on S3C2416/2443 ARM: S3C24XX: remove XXX_setup_clocks method from S3C2443 ARM: S3C24XX: remove obsolete S3C2416_DMA option ARM: S3C24XX: Reuse S3C2443 dma for S3C2416 ARM: S3C24XX: Fix indentation of dma-s3c2443 ARM: S3C24XX: Move device setup files to mach directory ARM: S3C24XX: Consolidate Simtec extensions ARM: S3C24XX: move simtec-specific code to mach directory ARM: S3C24XX: Move common-smdk code to mach directory ARM: S3C24XX: Move s3c2443-clock.c to mach-s3c24xx ARM: s3c2410_defconfig: update s3c2410_defconfig ARM: S3C2443: move mach-s3c2443/* into mach-s3c24xx/ ARM: S3C2440: move mach-s3c2440/* into mach-s3c24xx/ ARM: S3C2416: move mach-s3c2416/* into mach-s3c24xx/ ARM: S3C2412: move mach-s3c2412/* into mach-s3c24xx/ ARM: S3C2410: move mach-s3c2410/* into mach-s3c24xx/ ARM: S3C24XX: change the ARCH_S3C2410 to ARCH_S3C24XX ARM: S3C2410: move s3c2410_baseclk_add to clock.h ...
2012-03-08	DMA: PL330: Merge PL330 driver into drivers/dma/	Boojin Kim
	Currently there were two part of DMAC PL330 driver for support old styled s3c-pl330 which has been merged into drivers/dma/pl330.c driver. Actually, there is no reason to separate them now. Basically this patch merges arch/arm/common/pl330.c into drivers/dma/pl330.c driver and removes useless exported symbol, externed function and so on. The newer pl330 driver tested on SMDKV310 and SMDK4212 boards Cc: Jassi Brar <jassisinghbrar@gmail.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Boojin Kim <boojin.kim@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Acked-by: Jassi Brar <jassisinghbrar@gmail.com> Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
2012-03-08	KVM: nVMX: Fix erroneous exception bitmap check	Nadav Har'El
	The code which checks whether to inject a pagefault to L1 or L2 (in nested VMX) was wrong, incorrect in how it checked the PF_VECTOR bit. Thanks to Dan Carpenter for spotting this. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Ignore the writes to MSR_K7_HWCR(3)	Nicolae Mogoreanu
	When CPUID Fn8000_0001_EAX reports 0x00100f22 Windows 7 x64 guest tries to set bit 3 in MSRC001_0015 in nt!KiDisableCacheErrataSource and fails. This patch will ignore this step and allow things to move on without having to fake CPUID value. Signed-off-by: Nicolae Mogoreanu <mogoreanu@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask	Davidlohr Bueso
	The reset_rsvds_bits_mask() function can use the guest walker's root level number instead of using a separate 'level' variable. Signed-off-by: Davidlohr Bueso <dave@gnu.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: PMU: add proper support for fixed counter 2	Gleb Natapov
	Currently pmu emulation emulates fixed counter 2 as bus cycles architectural counter, but since commit 9c1497ea591b25d perf has pseudo encoding for it. Use it. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: PMU: Fix raw event check	Gleb Natapov
	If eventsel has EDGE, INV or CMASK set we should create raw counter for it, but the check is done on a wrong variable. Fix it. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: PMU: warn when pin control is set in eventsel msr	Gleb Natapov
	Print warning once if pin control bit is set in eventsel msr since emulation does not support it yet. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: VMX: Fix delayed load of shared MSRs	Avi Kivity
	Shared MSRs (MSR_*STAR and related) are stored in both vmx->guest_msrs and in the CPU registers, but vmx_set_msr() only updated memory. Prior to 46199f33c2953, this didn't matter, since we called vmx_load_host_state(), which scheduled a vmx_save_host_state(), which re-synchronized the CPU state, but now we don't, so the CPU state will not be synchronized until the next exit to host userspace. This mostly affects nested vmx workloads, which play with these MSRs a lot. Fix by loading the MSR eagerly. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Allow host IRQ sharing for assigned PCI 2.3 devices	Jan Kiszka
	PCI 2.3 allows to generically disable IRQ sources at device level. This enables us to share legacy IRQs of such devices with other host devices when passing them to a guest. The new IRQ sharing feature introduced here is optional, user space has to request it explicitly. Moreover, user space can inform us about its view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the interrupt and signaling it if the guest masked it via the virtualized PCI config space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Ensure all vcpus are consistent with in-kernel irqchip settings	Avi Kivity
	If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu->arch.apic is created without kvm->lock held. Based on earlier patch by Michael Ellerman. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: x86 emulator: Allow PM/VM86 switch during task switch	Kevin Wolf
	Task switches can switch between Protected Mode and VM86. The current mode must be updated during the task switch emulation so that the new segment selectors are interpreted correctly. In order to let privilege checks succeed, rflags needs to be updated in the vcpu struct as this causes a CPL update. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: SVM: Fix CPL updates	Kevin Wolf
	Keep CPL at 0 in real mode and at 3 in VM86. In protected/long mode, use RPL rather than DPL of the code segment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: x86 emulator: VM86 segments must have DPL 3	Kevin Wolf
	Setting the segment DPL to 0 for at least the VM86 code segment makes the VM entry fail on VMX. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: x86 emulator: Fix task switch privilege checks	Kevin Wolf
	Currently, all task switches check privileges against the DPL of the TSS. This is only correct for jmp/call to a TSS. If a task gate is used, the DPL of this take gate is used for the check instead. Exceptions, external interrupts and iret shouldn't perform any check. [avi: kill kvm-kmod remnants] Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice	Danny Kukawka
	arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice, remove the duplicate. Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation	Gleb Natapov
	Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Introduce kvm_memory_slot::arch and move lpage_info into it	Takuya Yoshikawa
	Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Introduce gfn_to_index() which returns the index for a given level	Takuya Yoshikawa
	This patch cleans up the code and removes the "(void)level;" warning suppressor. Note that we can also use this for PT_PAGE_TABLE_LEVEL to treat every level uniformly later. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: s390: provide control registers via kvm_run	Christian Borntraeger
	There are several cases were we need the control registers for userspace. Lets also provide those in kvm_run. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: s390: add stop_on_stop flag when doing stop and store	Jens Freimann
	When we do a stop and store status we need to pass ACTION_STOP_ON_STOP flag to __sigp_stop(). Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: s390: ignore sigp stop overinitiative	Jens Freimann
	In __inject_sigp_stop() do nothing when the CPU is already in stopped state. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: s390: make sigp restart return busy when stop pending	Jens Freimann
	On reboot the guest sends in smp_send_stop() a sigp stop to all CPUs except for current CPU. Then the guest switches to the IPL cpu by sending a restart to the IPL CPU, followed by a sigp stop to the current cpu. Since restart is handled by userspace it's possible that the restart is delivered before the old stop. This means that the IPL CPU isn't restarted and we have no running CPUs. So let's make sure that there is no stop action pending when we do the restart. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: s390: do store status after handling STOP_ON_STOP bit	Jens Freimann
	In handle_stop() handle the stop bit before doing the store status as described for "Stop and Store Status" in the Principles of Operation. We have to give up the local_int.lock before calling kvm store status since it calls gmap_fault() which might sleep. Since local_int.lock only protects local_int.* and not guest memory we can give up the lock. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: s390: Sanitize fpc registers for KVM_SET_FPU	Christian Borntraeger
	commit 7eef87dc99e419b1cc051e4417c37e4744d7b661 (KVM: s390: fix register setting) added a load of the floating point control register to the KVM_SET_FPU path. Lets make sure that the fpc is valid. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Fix write protection race during dirty logging	Takuya Yoshikawa
	This patch fixes a race introduced by: commit 95d4c16ce78cb6b7549a09159c409d52ddd18dae KVM: Optimize dirty logging by rmap_write_protect() During protecting pages for dirty logging, other threads may also try to protect a page in mmu_sync_children() or kvm_mmu_get_page(). In such a case, because get_dirty_log releases mmu_lock before flushing TLB's, the following race condition can happen: A (get_dirty_log) B (another thread) lock(mmu_lock) clear pte.w unlock(mmu_lock) lock(mmu_lock) pte.w is already cleared unlock(mmu_lock) skip TLB flush return ... TLB flush Though thread B assumes the page has already been protected when it returns, the remaining TLB entry will break that assumption. This patch fixes this problem by making get_dirty_log hold the mmu_lock until it flushes the TLB's. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: VMX: remove yield_on_hlt	Raghavendra K T
	yield_on_hlt was introduced for CPU bandwidth capping. Now it is redundant with CFS hardlimit. yield_on_hlt also complicates the scenario in paravirtual environment, that needs to trap halt. for e.g. paravirtualized ticket spinlocks. Acked-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Track TSC synchronization in generations	Zachary Amsden
	This allows us to track the original nanosecond and counter values at each phase of TSC writing by the guest. This gets us perfect offset matching for stable TSC systems, and perfect software computed TSC matching for machines with unstable TSC. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Dont mark TSC unstable due to S4 suspend	Zachary Amsden
	During a host suspend, TSC may go backwards, which KVM interprets as an unstable TSC. Technically, KVM should not be marking the TSC unstable, which causes the TSC clocksource to go bad, but we need to be adjusting the TSC offsets in such a case. Dealing with this issue is a little tricky as the only place we can reliably do it is before much of the timekeeping infrastructure is up and running. On top of this, we are not in a KVM thread context, so we may not be able to safely access VCPU fields. Instead, we compute our best known hardware offset at power-up and stash it to be applied to all VCPUs when they actually start running. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Allow adjust_tsc_offset to be in host or guest cycles	Marcelo Tosatti
	Redefine the API to take a parameter indicating whether an adjustment is in host or guest cycles. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Add last_host_tsc tracking back to KVM	Zachary Amsden
	The variable last_host_tsc was removed from upstream code. I am adding it back for two reasons. First, it is unnecessary to use guest TSC computation to conclude information about the host TSC. The guest may set the TSC backwards (this case handled by the previous patch), but the computation of guest TSC (and fetching an MSR) is significanlty more work and complexity than simply reading the hardware counter. In addition, we don't actually need the guest TSC for any part of the computation, by always recomputing the offset, we can eliminate the need to deal with the current offset and any scaling factors that may apply. The second reason is that later on, we are going to be using the host TSC value to restore TSC offsets after a host S4 suspend, so we need to be reading the host values, not the guest values here. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Fix last_guest_tsc / tsc_offset semantics	Zachary Amsden
	The variable last_guest_tsc was being used as an ad-hoc indicator that guest TSC has been initialized and recorded correctly. However, it may not have been, it could be that guest TSC has been set to some large value, the back to a small value (by, say, a software reboot). This defeats the logic and causes KVM to falsely assume that the guest TSC has gone backwards, marking the host TSC unstable, which is undesirable behavior. In addition, rather than try to compute an offset adjustment for the TSC on unstable platforms, just recompute the whole offset. This allows us to get rid of one callsite for adjust_tsc_offset, which is problematic because the units it takes are in guest units, but here, the computation was originally being done in host units. Doing this, and also recording last_guest_tsc when the TSC is written allow us to remove the tricky logic which depended on last_guest_tsc being zero to indicate a reset of uninitialized value. Instead, we now have the guarantee that the guest TSC offset is always at least something which will get us last_guest_tsc. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Leave TSC synchronization window open with each new sync	Zachary Amsden
	Currently, when the TSC is written by the guest, the variable ns is updated to force the current write to appear to have taken place at the time of the first write in this sync phase. This leaves a cliff at the end of the match window where updates will fall of the end. There are two scenarios where this can be a problem in practe - first, on a system with a large number of VCPUs, the sync period may last for an extended period of time. The second way this can happen is if the VM reboots very rapidly and we catch a VCPU TSC synchronization just around the edge. We may be unaware of the reboot, and thus the first VCPU might synchronize with an old set of the timer (at, say 0.97 seconds ago, when first powered on). The second VCPU can come in 0.04 seconds later to try to synchronize, but it misses the window because it is just over the threshold. Instead, stop doing this artificial setback of the ns variable and just update it with every write of the TSC. It may be observed that doing so causes values computed by compute_guest_tsc to diverge slightly across CPUs - note that the last_tsc_ns and last_tsc_write variable are used here, and now they last_tsc_ns will be different for each VCPU, reflecting the actual time of the update. However, compute_guest_tsc is used only for guests which already have TSC stability issues, and further, note that the previous patch has caused last_tsc_write to be incremented by the difference in nanoseconds, converted back into guest cycles. As such, only boundary rounding errors should be visible, which given the resolution in nanoseconds, is going to only be a few cycles and only visible in cross-CPU consistency tests. The problem can be fixed by adding a new set of variables to track the start offset and start write value for the current sync cycle. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08	KVM: Improve TSC offset matching	Zachary Amsden
	There are a few improvements that can be made to the TSC offset matching code. First, we don't need to call the 128-bit multiply (especially on a constant number), the code works much nicer to do computation in nanosecond units. Second, the way everything is setup with software TSC rate scaling, we currently have per-cpu rates. Obviously this isn't too desirable to use in practice, but if for some reason we do change the rate of all VCPUs at runtime, then reset the TSCs, we will only want to match offsets for VCPUs running at the same rate. Finally, for the case where we have an unstable host TSC, but rate scaling is being done in hardware, we should call the platform code to compute the TSC offset, so the math is reorganized to recompute the base instead, then transform the base into an offset using the existing API. [avi: fix 64-bit division on i386] Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> KVM: Fix 64-bit division in kvm_write_tsc() Breaks i386 build. Signed-off-by: Avi Kivity <avi@redhat.com>