summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-12-06Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull watchdog fix from Thomas Gleixner: "Trivial CPU hotplug regression fix for the watchdog code" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: watchdog: Fix CPU hotplug regression
2012-12-06KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdumpZhang Yanfei
The vmclear function will be assigned to the callback function pointer when loading kvm-intel module. And the bitmap indicates whether we should do VMCLEAR operation in kdump. The bits in the bitmap are set/unset according to different conditions. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-06x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessaryZhang Yanfei
This patch provides a way to VMCLEAR VMCSs related to guests on all cpus before executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the vmcore updated and non-corrupted. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-06tcm_vhost: remove unused variable in vhost_scsi_allocate_cmd()Wei Yongjun
The variable se_sess is initialized but never used otherwise, so remove the unused variable. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06vhost-net: enable zerocopy tx by defaultMichael S. Tsirkin
Zero copy TX has been around for a while now. We seem to be down to eliminating theoretical bugs and performance tuning at this point: it's probably time to enable it by default so that most users get the benefit. Keep the flag around meanwhile so users can experiment with disabling this if they experience regressions. I expect that we will remove it in the future. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06vhost-net: skip head management if no outstandingMichael S. Tsirkin
For short packets zerocopy mode adds overhead of managing heads which isn't necessary: we could simly update used ring directly same as with zerocopy disabled. Things seem to run a bit faster if we detect and bypass head management when zcopy isn't used. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06vhost-net: flush outstanding DMAs on memory changeMichael S. Tsirkin
When memory map changes, we need to flush outstanding DMAs as they might in theory reference old memory addresses. To do this simply stop initiating new DMAs and wait for ubufs ref count to drop to 0. Afterwards reset the count back to 1. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06vhost: avoid backend flush on vring opsMichael S. Tsirkin
vring changes already do a flush internally where appropriate, so we do not need a second flush. It's currently not very expensive but a follow-up patch makes flush more heavy-weight, so remove the extra flush here to avoid regressing performance if call or kick fds are changed on data path. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06tools:virtio: fix compilation warningCong Ding
We do not allow old-style function definition. Always spell foo(void) if a function does not take any parameters. Signed-off-by: Cong Ding <dinggnu@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06mac80211: cancel work instead of waiting for it to do nothingJohannes Berg
If the sdata work is pending while the interface is stopped, we currently flush it. If it's not running this means waiting for it to run, which could take a while if the workqueue is backlogged. However, the work exits right away if it starts to run while the interface is already stopping. There's no point in waiting for that, so use cancel_work_sync() instead. Reported-by: Ben Greear <greearb@candelatech.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-06wireless: fix VHT max AMPDU exponent definitionJohannes Berg
This is really a 3-bit field, not a single bit, so declare a mask and shift. Also fix hwsim, it advertises the maximum possible. While at it reindent all the defines using tabs instead of spaces. Change-Id: I7cd81c0d72f76deb5010aba5bfa3dd312006e898 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-06mac80211: don't drop mesh peering frames from unknown STAMarco Porsch
Previously, mesh peering frames from a STA without a station entry were being dropped. Mesh Peering Open and other frames (WLAN_CATEGORY_SELF_PROTECTED) are valid mesh peering frames even if received from a yet unknown station; the STA entry will be created in mesh_peer_init later. The problem didn't occur previously since both STAs receive each other's beacons which created the STA entry. However, this causes an unnecessary delay and beacons might not be received if either node is in PS mode. Signed-off-by: Marco Porsch <marco.porsch@etit.tu-chemnitz.de> [reword commit log a bit] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-06HID: i2c-hid: fix ret_count checkJiri Kosina
ret_count has to be at least 3, as we have to count the 2 bytes that are used for the size of the reply. Without this, memcpy() might be called with zero or negative count. Reported-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06HID: i2c-hid: fix i2c_hid_get_raw_report count mismatchesBenjamin Tissoires
The previous memcpy implementation relied on the size advertized by the device. There were no guarantees that buf was big enough. Some gymnastic is also required with the +2/-2 to take into account the first 2 bytes of the returned buffer where the total returned length is supplied by the device. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06HID: i2c-hid: remove extra .irq field in struct i2c_hidBenjamin Tissoires
There is no point in keeping the irq in i2c_hid as it's already there in client. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Reviewed-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06HID: i2c-hid: reorder allocation/free of buffersBenjamin Tissoires
Simplifies i2c_hid_alloc_buffers tests, and makes this function responsible of the assignment of ihid->bufsize. The condition for the reallocation in i2c_hid_start is then simpler. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Reviewed-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06HID: i2c-hid: fix memory corruption due to missing hid declarationBenjamin Tissoires
HID descriptors contains 4 bytes of reserved field. The previous implementation was overriding the next fields in struct i2c_hid. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Reviewed-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06devicetree/bindings: Move gpio-leds binding into leds directoryGrant Likely
Merely reorganizing documentation. No functional changes. It makes more sense for the gpio-leds binding to be grouped with other led bindings than with gpio drivers. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-12-06propagate name change to comments in kernel sourceNadia Yvette Chambers
I've legally changed my name with New York State, the US Social Security Administration, et al. This patch propagates the name change and change in initials and login to comments in the kernel source as well. Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06ASoC: arizona: Make FLL lock timeout very highMark Brown
Provide robustness against low quality FLL sync clocks by increasing the timeout for lock to an absurdly high point; we should never get anywhere near hitting the timeout in a real system unless it is failing. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-12-06KVM: MMU: optimize for set_spteXiao Guangrong
There are two cases we need to adjust page size in set_spte: 1): the one is other vcpu creates new sp in the window between mapping_level() and acquiring mmu-lock. 2): the another case is the new sp is created by itself (page-fault path) when guest uses the target gfn as its page table. In current code, set_spte drop the spte and emulate the access for these case, it works not good: - for the case 1, it may destroy the mapping established by other vcpu, and do expensive instruction emulation. - for the case 2, it may emulate the access even if the guest is accessing the page which not used as page table. There is a example, 0~2M is used as huge page in guest, in this huge page, only page 3 used as page table, then guest read/writes on other pages can cause instruction emulation. Both of these cases can be fixed by allowing guest to retry the access, it will refault, then we can establish the mapping by using small page Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-06lib/Makefile: Fix oid_registry build dependencyTim Gardner
It is $(obj)/oid_registry.o that is dependent on $(obj)/oid_registry_data.c. The object file cannot be built until $(obj)/oid_registry_data.c has been generated. A periodic and hard to reproduce parallel build failure is due to this incorrect lib/Makefile dependency. The compile error is completely disingenuous. GEN lib/oid_registry_data.c Compiling 49 OIDs CC lib/oid_registry.o gcc: error: lib/oid_registry.c: No such file or directory gcc: fatal error: no input files compilation terminated. make[3]: *** [lib/oid_registry.o] Error 4 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Michel Lespinasse <walken@google.com> Cc: David Howells <dhowells@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-12-06regulator: gpio-regulator: Add ifdef CONFIG_OF guard for regulator_gpio_of_matchAxel Lin
Use of_match_ptr and add ifdef CONFIG_OF guard for regulator_gpio_of_match. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-12-06regulator: palmas: Convert palmas_ops_smps to ↵Axel Lin
regulator_[get|set]_voltage_sel_regmap Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-12-06regulator: palmas: Return raw register values as the selectors in ↵Axel Lin
[get|set]_voltage_sel Don't adjust the selector in [get|set]_voltage_sel, fix it in list_voltage() instead. For smps*(except smps10), the vsel reg-value and voltage mapping as below: reg-value volt (uV) ( Assume RANGE is x1 ) 0 0 1 500000 2 500000 3 500000 4 500000 5 500000 6 500000 (0.49V + 1 * 0.01V) * RANGE 7 510000 (0.49V + 2 * 0.01V) * RANGE 8 520000 (0.49V + 3 * 0.01V) * RANGE 9 530000 (0.49V + 4 * 0.01V) * RANGE .... The linear mapping is start from selector 6. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-12-06regulators: add regulator_can_change_voltage() functionMarek Szyprowski
Introduce a regulator_can_change_voltage() function for the subsytems or drivers which might check if applying voltage change is possible and use special workaround code when the driver is used with fixed regulators or regulators with disabled ability to change the voltage. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-12-06regmap: Cache register and value sizes for debugfsMark Brown
No point in calculating them every time. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-12-05net: fix some compiler warning in net/core/neighbour.cCong Wang
net/core/neighbour.c:65:12: warning: 'zero' defined but not used [-Wunused-variable] net/core/neighbour.c:66:12: warning: 'unres_qlen_max' defined but not used [-Wunused-variable] These variables are only used when CONFIG_SYSCTL is defined, so move them under #ifdef CONFIG_SYSCTL. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-06KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interfaceMihai Caraman
Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to the list of ONE_REG PPC supported registers. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: remove HV dependency, use get/put_user] Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulationMihai Caraman
Add EPCR support in booke mtspr/mfspr emulation. EPCR register is defined only for 64-bit and HV categories, we will expose it at this point only to 64-bit virtual processors running on 64-bit HV hosts. Define a reusable setter function for vcpu's EPCR. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: move HV dependency in the code] Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: bookehv: Add guest computation mode for irq deliveryMihai Caraman
When delivering guest IRQs, update MSR computation mode according to guest interrupt computation mode found in EPCR. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: remove HV dependency in the code] Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Make EPCR a valid field for booke64 and bookehvAlexander Graf
In BookE, EPCR is defined and valid when either the HV or the 64bit category are implemented. Reflect this in the field definition. Today the only KVM target on 64bit is HV enabled, so there is no change in actual source code, but this keeps the code closer to the spec and doesn't build up artificial road blocks for a PR KVM on 64bit. Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: booke: Extend MAS2 EPN mask for 64-bitMihai Caraman
Extend MAS2 EPN mask to retain most significant bits on 64-bit hosts. Use this mask in tlb effective address accessor. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulationMihai Caraman
Mask high 32 bits of MAS2's effective page number in tlbwe emulation for guests running in 32-bit mode. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulationMihai Caraman
Mask high 32 bits of effective address in emulation layer for guests running in 32-bit mode. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: fix indent] Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: e500: Add emulation helper for getting instruction eaMihai Caraman
Add emulation helper for getting instruction ea and refactor tlb instruction emulation to use it. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: keep rt variable around] Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: bookehv64: Add support for interrupt handlingMihai Caraman
Add interrupt handling support for 64-bit bookehv hosts. Unify 32 and 64 bit implementations using a common stack layout and a common execution flow starting from kvm_handler_common macro. Update documentation for 64-bit input register values. This patch only address the bolted TLB miss exception handlers version. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: bookehv: Remove GET_VCPU macro from exception handlerMihai Caraman
GET_VCPU define will not be implemented for 64-bit for performance reasons so get rid of it also on 32-bit. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: booke: Fix get_tb() compile error on 64-bitMihai Caraman
Include header file for get_tb() declaration. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: e500: Silence bogus GCC warning in tlb codeMihai Caraman
64-bit GCC 4.5.1 warns about an uninitialized variable which was guarded by a flag. Initialize the variable to make it happy. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> [agraf: reword comment] Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without ↵Paul Mackerras
panicking Currently, if a machine check interrupt happens while we are in the guest, we exit the guest and call the host's machine check handler, which tends to cause the host to panic. Some machine checks can be triggered by the guest; for example, if the guest creates two entries in the SLB that map the same effective address, and then accesses that effective address, the CPU will take a machine check interrupt. To handle this better, when a machine check happens inside the guest, we call a new function, kvmppc_realmode_machine_check(), while still in real mode before exiting the guest. On POWER7, it handles the cases that the guest can trigger, either by flushing and reloading the SLB, or by flushing the TLB, and then it delivers the machine check interrupt directly to the guest without going back to the host. On POWER7, the OPAL firmware patches the machine check interrupt vector so that it gets control first, and it leaves behind its analysis of the situation in a structure pointed to by the opal_mc_evt field of the paca. The kvmppc_realmode_machine_check() function looks at this, and if OPAL reports that there was no error, or that it has handled the error, we also go straight back to the guest with a machine check. We have to deliver a machine check to the guest since the machine check interrupt might have trashed valid values in SRR0/1. If the machine check is one we can't handle in real mode, and one that OPAL hasn't already handled, or on PPC970, we exit the guest and call the host's machine check handler. We do this by jumping to the machine_check_fwnmi label, rather than absolute address 0x200, because we don't want to re-execute OPAL's handler on POWER7. On PPC970, the two are equivalent because address 0x200 just contains a branch. Then, if the host machine check handler decides that the system can continue executing, kvmppc_handle_exit() delivers a machine check interrupt to the guest -- once again to let the guest know that SRR0/1 have been modified. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix checkpatch warnings] Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidationsPaul Mackerras
When we change or remove a HPT (hashed page table) entry, we can do either a global TLB invalidation (tlbie) that works across the whole machine, or a local invalidation (tlbiel) that only affects this core. Currently we do local invalidations if the VM has only one vcpu or if the guest requests it with the H_LOCAL flag, though the guest Linux kernel currently doesn't ever use H_LOCAL. Then, to cope with the possibility that vcpus moving around to different physical cores might expose stale TLB entries, there is some code in kvmppc_hv_entry to flush the whole TLB of entries for this VM if either this vcpu is now running on a different physical core from where it last ran, or if this physical core last ran a different vcpu. There are a number of problems on POWER7 with this as it stands: - The TLB invalidation is done per thread, whereas it only needs to be done per core, since the TLB is shared between the threads. - With the possibility of the host paging out guest pages, the use of H_LOCAL by an SMP guest is dangerous since the guest could possibly retain and use a stale TLB entry pointing to a page that had been removed from the guest. - The TLB invalidations that we do when a vcpu moves from one physical core to another are unnecessary in the case of an SMP guest that isn't using H_LOCAL. - The optimization of using local invalidations rather than global should apply to guests with one virtual core, not just one vcpu. (None of this applies on PPC970, since there we always have to invalidate the whole TLB when entering and leaving the guest, and we can't support paging out guest memory.) To fix these problems and simplify the code, we now maintain a simple cpumask of which cpus need to flush the TLB on entry to the guest. (This is indexed by cpu, though we only ever use the bits for thread 0 of each core.) Whenever we do a local TLB invalidation, we set the bits for every cpu except the bit for thread 0 of the core that we're currently running on. Whenever we enter a guest, we test and clear the bit for our core, and flush the TLB if it was set. On initial startup of the VM, and when resetting the HPT, we set all the bits in the need_tlb_flush cpumask, since any core could potentially have stale TLB entries from the previous VM to use the same LPID, or the previous contents of the HPT. Then, we maintain a count of the number of online virtual cores, and use that when deciding whether to use a local invalidation rather than the number of online vcpus. The code to make that decision is extracted out into a new function, global_invalidates(). For multi-core guests on POWER7 (i.e. when we are using mmu notifiers), we now never do local invalidations regardless of the H_LOCAL flag. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06MAINTAINERS: Add git tree link for PPC KVMMichael Ellerman
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S PR: MSR_DE doesn't exist on Book 3SPaul Mackerras
The mask of MSR bits that get transferred from the guest MSR to the shadow MSR included MSR_DE. In fact that bit only exists on Book 3E processors, and it is assigned the same bit used for MSR_BE on Book 3S processors. Since we already had MSR_BE in the mask, this just removes MSR_DE. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S PR: Fix VSX handlingPaul Mackerras
This fixes various issues in how we were handling the VSX registers that exist on POWER7 machines. First, we were running off the end of the current->thread.fpr[] array. Ultimately this was because the vcpu->arch.vsr[] array is sized to be able to store both the FP registers and the extra VSX registers (i.e. 64 entries), but PR KVM only uses it for the extra VSX registers (i.e. 32 entries). Secondly, calling load_up_vsx() from C code is a really bad idea, because it jumps to fast_exception_return at the end, rather than returning with a blr instruction. This was causing it to jump off to a random location with random register contents, since it was using the largely uninitialized stack frame created by kvmppc_load_up_vsx. In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx, since giveup_fpu and load_up_fpu handle the extra VSX registers as well as the standard FP registers on machines with VSX. Also, since VSX instructions can access the VMX registers and the FP registers as well as the extra VSX registers, we have to load up the FP and VMX registers before we can turn on the MSR_VSX bit for the guest. Conversely, if we save away any of the VSX or FP registers, we have to turn off MSR_VSX for the guest. To handle all this, it is more convenient for a single call to kvmppc_giveup_ext() to handle all the state saving that needs to be done, so we make it take a set of MSR bits rather than just one, and the switch statement becomes a series of if statements. Similarly kvmppc_handle_ext needs to be able to load up more than one set of registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S PR: Emulate PURR, SPURR and DSCR registersPaul Mackerras
This adds basic emulation of the PURR and SPURR registers. We assume we are emulating a single-threaded core, so these advance at the same rate as the timebase. A Linux kernel running on a POWER7 expects to be able to access these registers and is not prepared to handle a program interrupt on accessing them. This also adds a very minimal emulation of the DSCR (data stream control register). Writes are ignored and reads return zero. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S HV: Don't give the guest RW access to RO pagesPaul Mackerras
Currently, if the guest does an H_PROTECT hcall requesting that the permissions on a HPT entry be changed to allow writing, we make the requested change even if the page is marked read-only in the host Linux page tables. This is a problem since it would for instance allow a guest to modify a page that KSM has decided can be shared between multiple guests. To fix this, if the new permissions for the page allow writing, we need to look up the memslot for the page, work out the host virtual address, and look up the Linux page tables to get the PTE for the page. If that PTE is read-only, we reduce the HPTE permissions to read-only. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPTPaul Mackerras
This fixes a bug in the code which allows userspace to read out the contents of the guest's hashed page table (HPT). On the second and subsequent passes through the HPT, when we are reporting only those entries that have changed, we were incorrectly initializing the index field of the header with the index of the first entry we skipped rather than the first changed entry. This fixes it. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S HV: Reset reverse-map chains when resetting the HPTPaul Mackerras
With HV-style KVM, we maintain reverse-mapping lists that enable us to find all the HPT (hashed page table) entries that reference each guest physical page, with the heads of the lists in the memslot->arch.rmap arrays. When we reset the HPT (i.e. when we reboot the VM), we clear out all the HPT entries but we were not clearing out the reverse mapping lists. The result is that as we create new HPT entries, the lists get corrupted, which can easily lead to loops, resulting in the host kernel hanging when it tries to traverse those lists. This fixes the problem by zeroing out all the reverse mapping lists when we zero out the HPT. This incidentally means that we are also zeroing our record of the referenced and changed bits (not the bits in the Linux PTEs, used by the Linux MM subsystem, but the bits used by the KVM_GET_DIRTY_LOG ioctl, and those used by kvm_age_hva() and kvm_test_age_hva()). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-06KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPTPaul Mackerras
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on this fd return the contents of the HPT (hashed page table), writes create and/or remove entries in the HPT. There is a new capability, KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl takes an argument structure with the index of the first HPT entry to read out and a set of flags. The flags indicate whether the user is intending to read or write the HPT, and whether to return all entries or only the "bolted" entries (those with the bolted bit, 0x10, set in the first doubleword). This is intended for use in implementing qemu's savevm/loadvm and for live migration. Therefore, on reads, the first pass returns information about all HPTEs (or all bolted HPTEs). When the first pass reaches the end of the HPT, it returns from the read. Subsequent reads only return information about HPTEs that have changed since they were last read. A read that finds no changed HPTEs in the HPT following where the last read finished will return 0 bytes. The format of the data provides a simple run-length compression of the invalid entries. Each block of data starts with a header that indicates the index (position in the HPT, which is just an array), the number of valid entries starting at that index (may be zero), and the number of invalid entries following those valid entries. The valid entries, 16 bytes each, follow the header. The invalid entries are not explicitly represented. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix documentation] Signed-off-by: Alexander Graf <agraf@suse.de>