summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-05-19KVM: Add cpuid.txt fileGlauber Costa
This file documents cpuid bits used by KVM. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: Tell the guest we'll warn it about tsc stabilityGlauber Costa
This patch puts up the flag that tells the guest that we'll warn it about the tsc being trustworthy or not. By now, we also say it is not. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19x86, paravirt: don't compute pvclock adjustments if we trust the tscGlauber Costa
If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19x86: KVM guest: Try using new kvm clock msrsGlauber Costa
We now added a new set of clock-related msrs in replacement of the old ones. In theory, we could just try to use them and get a return value indicating they do not exist, due to our use of kvm_write_msr_save. However, kvm clock registration happens very early, and if we ever try to write to a non-existant MSR, we raise a lethal #GP, since our idt handlers are not in place yet. So this patch tests for a cpuid feature exported by the host to decide which set of msrs are supported. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUIDGlauber Costa
Right now, we were using individual KVM_CAP entities to communicate userspace about which cpuids we support. This is suboptimal, since it generates a delay between the feature arriving in the host, and being available at the guest. A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID. This makes userspace automatically aware of what we provide. And if we ever add a new cpuid bit in the future, we have to do that again, which create some complexity and delay in feature adoption. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: add new KVMCLOCK cpuid featureGlauber Costa
This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest that kvmclock is available through a new set of MSRs. The old ones are deprecated. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: change msr numbers for kvmclockGlauber Costa
Avi pointed out a while ago that those MSRs falls into the pentium PMU range. So the idea here is to add new ones, and after a while, deprecate the old ones. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19x86, paravirt: Add a global synchronization point for pvclockGlauber Costa
In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means *BIG*), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Avi Kivity <avi@redhat.com> CC: Marcelo Tosatti <mtosatti@redhat.com> CC: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19x86, paravirt: Enable pvclock flags in vcpu_time_info structureGlauber Costa
This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: Inject #GP with the right rip on efer writesRoedel, Joerg
This patch fixes a bug in the KVM efer-msr write path. If a guest writes to a reserved efer bit the set_efer function injects the #GP directly. The architecture dependent wrmsr function does not see this, assumes success and advances the rip. This results in a #GP in the guest with the wrong rip. This patch fixes this by reporting efer write errors back to the architectural wrmsr function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: SVM: Don't allow nested guest to VMMCALL into hostJoerg Roedel
This patch disables the possibility for a l2-guest to do a VMMCALL directly into the host. This would happen if the l1-hypervisor doesn't intercept VMMCALL and the l2-guest executes this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: x86: Fix exception reinjection forced to trueJoerg Roedel
The patch merged recently which allowed to mark an exception as reinjected has a bug as it always marks the exception as reinjected. This breaks nested-svm shadow-on-shadow implementation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: Fix wallclock version writing raceAvi Kivity
Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_rootsAvi Kivity
On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)Shane Wang
Per document, for feature control MSR: Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to enable this kind of check with SMX for VMXON in KVM. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: x86: properly update ready_for_interrupt_injectionMarcelo Tosatti
The recent changes to emulate string instructions without entering guest mode exposed a bug where pending interrupts are not properly reflected in ready_for_interrupt_injection. The result is that userspace overwrites a previously queued interrupt, when irqchip's are emulated in userspace. Fix by always updating state before returning to userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: VMX: Atomically switch efer if EPT && !EFER.NXAvi Kivity
When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page tables. This causes accesses through ptes with bit 63 set to succeed instead of failing a reserved bit check. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: VMX: Add facility to atomically switch MSRs on guest entry/exitAvi Kivity
Some guest msr values cannot be used on the host (for example. EFER.NX=0), so we need to switch them atomically during guest entry or exit. Add a facility to program the vmx msr autoload registers accordingly. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entriesAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: VMX: Add definition for msr autoload entryAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: Let vcpu structure alignment be determined at runtimeAvi Kivity
vmx and svm vcpus have different contents and therefore may have different alignmment requirements. Let each specify its required alignment. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: MMU: cleanup invlpg codeXiao Guangrong
Using is_last_spte() to cleanup invlpg code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: move unsync/sync tracpoints to proper placeXiao Guangrong
Move unsync/sync tracepoints to the proper place, it's good for us to obtain unsync page live time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: convert mmu tracepointsXiao Guangrong
Convert mmu tracepoints by using DECLARE_EVENT_CLASS Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: fix for calculating gpa in invlpg codeXiao Guangrong
If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: powerpc: use of kzalloc/kfree requires including slab.hStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: Fix mmu shrinker errorGui Jianfeng
kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims only one sp, but that's not the case. This will cause mmu shrinker returns a wrong number. This patch fix the counting error. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: fix hashing for TDP and non-paging modesEric Northup
For TDP mode, avoid creating multiple page table roots for the single guest-to-host physical address map by fixing the inputs used for the shadow page table hash in mmu_alloc_roots(). Signed-off-by: Eric Northup <digitaleric@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19ds2760_battery: Document ABI changeDaniel Mack
Add some documentation for the newly added writeable properties. Suggested-by: Greg KH <gregkh@suse.de> Signed-off-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
2010-05-19ds2760_battery: Make charge_now and charge_full writeableDaniel Mack
For userspace tools and daemons, it might be necessary to adjust the charge_now and charge_full properties of the ds2760 battery monitor, for example for unavoidable corrections due to aging batteries. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Matt Reimer <mreimer@vpop.net> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alexey Starikovskiy <astarikovskiy@suse.de> Cc: Len Brown <len.brown@intel.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
2010-05-19power_supply: Add support for writeable propertiesDaniel Mack
This patch adds support for writeable power supply properties and exposes them as writeable to sysfs. A power supply implementation must implement two new function calls in order to use that feature: int set_property(struct power_supply *psy, enum power_supply_property psp, const union power_supply_propval *val); int property_is_writeable(struct power_supply *psy, enum power_supply_property psp); Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alexey Starikovskiy <astarikovskiy@suse.de> Cc: Len Brown <len.brown@intel.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Matt Reimer <mreimer@vpop.net> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
2010-05-19power_supply: Use attribute groupsAnton Vorontsov
This fixes a race between power supply device and initial attributes creation, plus makes it possible to implement writable properties. [Daniel Mack - removed superflous return statement and dropped .mode attribute from POWER_SUPPLY_ATTR] Suggested-by: Greg KH <gregkh@suse.de> Suggested-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com> Tested-by: Daniel Mack <daniel@caiaq.de>
2010-05-19module: drop the lock while waiting for module to complete initialization.Rusty Russell
This fixes "gave up waiting for init of module libcrc32c." which happened at boot time due to multiple parallel module loads. The problem was a deadlock: we wait for a module to finish initializing, but we keep the module_lock mutex so it can't complete. In particular, this could reasonably happen if a module does a request_module() in its initialization routine. So we change use_module() to return an errno rather than a bool, and if it's -EBUSY we drop the lock and wait in the caller, then reaquire the lock. Reported-by: Brandon Philips <brandon@ifup.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Tested-by: Brandon Philips <brandon@ifup.org>
2010-05-19MODULE_DEVICE_TABLE(isapnp, ...) does nothingOndrej Zary
On Monday 23 November 2009 04:29:53 Rusty Russell wrote: > On Mon, 23 Nov 2009 07:31:57 am Ondrej Zary wrote: > > The problem is that > > scripts/mod/file2alias.c simply ignores isapnp. > > AFAICT it always has, and noone has complained until now. Perhaps > something was still reading /lib/modules/`uname -r`/modules.isapnpmap? The patch below works fine (at least with Debian). It needs your first patch that moves the definitions to mod_devicetable.h. Verified that aliases for these modules are generated correctly: drivers/media/radio/radio-sf16fmi.c drivers/net/ne.c drivers/net/3c515.c drivers/net/smc-ultra.c drivers/pcmcia/i82365.c drivers/scsi/aha1542.c drivers/scsi/aha152x.c drivers/scsi/sym53c416.c drivers/scsi/g_NCR5380.c Tested with RTL8019AS (ne), AVA-1505AE (aha152x) and dtc436e (g_NCR5380) cards - they now work automatically. Generate pnp:d aliases for isapnp_device_tables. This allows udev to load these modules automatically. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19hisax_fcpcipnp: fix broken isapnp device table.Rusty Russell
Found that drivers/isdn/hisax/hisax_fcpcipnp.c has broken pnp device table - wrong type (isapnp instead of pnp) and also ending record missing. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (split patch)
2010-05-19isapnp: move definitions to mod_devicetable.h so file2alias can reach them.Rusty Russell
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tablesBen Hutchings
We now know how to deal with these tables so that they are harmless. Set TAINT_FIRMWARE_WORKAROUND instead of the default TAINT_WARN. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-05-19intel-iommu: Combine the BIOS DMAR table warning messagesBen Hutchings
We have nearly the same code for warnings repeated four times. Move it into a separate function. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-05-19panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I')Ben Hutchings
This taint flag will initially be used when warning about invalid ACPI DMAR tables. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-05-19panic: Allow warnings to set different taint flagsBen Hutchings
WARN() is used in some places to report firmware or hardware bugs that are then worked-around. These bugs do not affect the stability of the kernel and should not set the flag for TAINT_WARN. To allow for this, add WARN_TAINT() and WARN_TAINT_ONCE() macros that take a taint number as argument. Architectures that implement warnings using trap instructions instead of calls to warn_slowpath_*() now implement __WARN_TAINT(taint) instead of __WARN(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Helge Deller <deller@gmx.de> Tested-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-05-19drm/nouveau: fix i2c-related init table handlersBen Skeggs
Mutliple issues. INIT_ZM_I2C_BYTE/INIT_I2C_BYTE didn't even try and use the register value, and all the handlers were using the wrong slave address. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-05-19drm/nouveau: support init table i2c device identifier 0x81Ben Skeggs
It appears to be meant to reference the second "default index". Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-05-19drm/nouveau: ensure we've parsed i2c table entry for INIT_*I2C* handlersBen Skeggs
We may not have parsed the entry yet if the i2c_index is for an i2c bus that's not referenced by a DCB encoder. This could be done oh so much more nicely, except we have to care about prehistoric DCB tables too, and they make life painful. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-05-19drm/nouveau: display error message for any failed init table opcodeBen Skeggs
Some handlers don't report specific errors, but we still *really* want to know if we failed to parse a complete init table. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-05-19drm/nouveau: fix init table handlers to return proper error codesBen Skeggs
We really want to be able to distinguish between INIT_DONE and an actual error sometimes. This commit fixes up several lazy "return 0;" to be actual error codes, and explicitly reserves "0" as "success, but stop parsing this table". Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-05-19drm/nv50: support fractional feedback divider on newer chipsBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2010-05-18Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller
Conflicts: include/linux/mod_devicetable.h scripts/mod/file2alias.c
2010-05-18qlcnic: adding co maintainerAmit Kumar Salecha
Adding Anirban as co maintainer Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-19crypto: n2 - Add Niagara2 crypto driverDavid S. Miller
Current deficiencies: 1) No HMAC hash support yet. 2) Although the algs are registered as ASYNC they always run synchronously. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-05-19crypto: skcipher - Add ablkcipher_walk interfacesDavid S. Miller
These are akin to the blkcipher_walk helpers. The main differences in the async variant are: 1) Only physical walking is supported. We can't hold on to kmap mappings across the async operation to support virtual ablkcipher_walk operations anyways. 2) Bounce buffers used for async more need to be persistent and freed at a later point in time when the async op completes. Therefore we maintain a list of writeback buffers and require that the ablkcipher_walk user call the 'complete' operation so we can copy the bounce buffers out to the real buffers and free up the bounce buffer chunks. These interfaces will be used by the new Niagara2 crypto driver. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>