summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-07-24KVM: MMU: filter out the mmio pfn from the fault pfnXiao Guangrong
If the page fault is caused by mmio, the gfn can not be found in memslots, and 'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify the mmio page fault. And, to clarify the meaning of mmio pfn, we return fault page instead of bad page when the gfn is not allowd to prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: remove bypass_guest_pfXiao Guangrong
The idea is from Avi: | Maybe it's time to kill off bypass_guest_pf=1. It's not as effective as | it used to be, since unsync pages always use shadow_trap_nonpresent_pte, | and since we convert between the two nonpresent_ptes during sync and unsync. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: split kvm_mmu_free_pageXiao Guangrong
Split kvm_mmu_free_page to kvm_mmu_isolate_page and kvm_mmu_free_page One is used to remove the page from cache under mmu lock and the other is used to free page table out of mmu lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: count used shadow pages on prepareing pathXiao Guangrong
Move counting used shadow pages from commiting path to preparing path to reduce tlb flush on some paths Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: rename 'pt_write' to 'emulate'Xiao Guangrong
If 'pt_write' is true, we need to emulate the fault. And in later patch, we need to emulate the fault even though it is not a pt_write event, so rename it to better fit the meaning Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cleanup for FNAME(fetch)Xiao Guangrong
gw->pte_access is the final access permission, since it is unified with gw->pt_access when we walked guest page table: FNAME(walk_addr_generic): pte_access = pt_access & FNAME(gpte_access)(vcpu, pte, true); Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: optimize to handle dirty bitXiao Guangrong
If dirty bit is not set, we can make the pte access read-only to avoid handing dirty bit everywhere Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cache mmio info on page fault pathXiao Guangrong
If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the codeXiao Guangrong
Introduce vcpu_mmio_gva_to_gpa to translate the gva to gpa, we can use it to cleanup the code between read emulation and write emulation Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: do not update slot bitmap if spte is nonpresentXiao Guangrong
Set slot bitmap only if the spte is present Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: fix walking shadow page tableXiao Guangrong
Properly check the last mapping, and do not walk to the next level if last spte is met Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM guest: KVM Steal time registrationGlauber Costa
This patch implements the kvm bits of the steal time infrastructure. The most important part of it, is the steal time clock. It is an continuous clock that shows the accumulated amount of steal time since vcpu creation. It is supposed to survive cpu offlining/onlining. [marcelo: fix build with CONFIG_KVM_GUEST=n] Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Avi Kivity <avi@redhat.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-24[S390] use siginfo for sigtrap signalsMartin Schwidefsky
Provide additional information on SIGTRAP by using a sig_info signal. Use TRAP_BRKPT for breakpoints via illegal operation and TRAP_HWBKPT for breakpoints via program event recording. Provide the address of the instruction that caused the breakpoint via si_addr. While we are at it get rid of tracehook_consider_fatal_signal. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] dasd: add enhanced DASD statistics interfaceStefan Weinhuber
This patch extends the DASD statistics to allow for a more detailed analysis of DASD I/O operations. In particular we want the statistics to provide answers to the following questions: - How many requests used a PAV alias? - How many requests used High Performance FICON? - How do read request perform versus write requests? The existing DASD statistics interface has several shortcomings - The interface for global data is a formatted text table in procfs (/proc/dasd/statistics). The layout is meant for human readers and is not to easy to parse. If values get to large for the table layout, they get scaled down. - The statistics which are collected per block device can be accessed via an ioctl interface, which can only be extended by defining a new ioctl. - There is no statistics interface for individual PAV base and alias devices. To overcome theses shortcomings we create a new DASD statistics interface in debugfs. This interface will contain one entry for global data, one per DASD block device, and one per DASD base and alias device. Each file contains the statistic data in easy to parse name/value and name/array pairs. The existing interfaces will remain functional, but they will not be extended. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm: make sigp emerg smp capableChristian Ehrhardt
SIGP emerg needs to pass the source vpu adress into __LC_CPU_ADDRESS of the target guest. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] disable cpu measurement alerts on a dying cpuJan Glauber
The cpu measurement alerts that are used for instance by oprofile for hardware sampling are not turned off on a cpu that is going offline. Add the appropriate control register bit that should be disabled to the list. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] initial cr0 bitsMartin Schwidefsky
Remove outdated bits from the initial cr0 register. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] iucv cr0 enablement bitMartin Schwidefsky
Do not set the cr0 enablement bit for iucv by default in head[31|64].S, move the enablement to iucv_init in the iucv base layer. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] race safe external interrupt registrationJan Glauber
The (un-)register_external_interrupt functions are not race safe if more than one interrupt handler is added or deleted for an external interrupt concurrently. Make the registration / unregistration of external interrupts race safe by using RCU and a spinlock. RCU is used to avoid a performance penalty in the external interrupt handler, the register and unregister functions are protected by the spinlock and are not performance critical. call_rcu must be used since the SCLP driver uses the interface with IRQs disabled. Also use the generic list implementation rather than homebrewn list code. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] remove tape block docuMartin Schwidefsky
After git commit 66ceed5ad1318863c21710f316942bcefff8081c removed the tape block device driver, remove its documentation as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] ap: toleration support for ap device type 10Holger Dengler
Add toleration support for ap devices with device type 10. Signed-off-by: Holger Dengler <hd@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] cleanup program check handler prototypesMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] remove kvm mmu reload on s390Carsten Otte
This patch removes the mmu reload logic for kvm on s390. Via Martin's new gmap interface, we can safely add or remove memory slots while guest CPUs are in-flight. Thus, the mmu reload logic is not needed anymore. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] Use gmap translation for accessing guest memoryCarsten Otte
This patch removes kvm-s390 internal assumption of a linear mapping of guest address space to user space. Previously, guest memory was translated to user addresses using a fixed offset (gmsor). The new code uses gmap_fault to resolve guest addresses. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] use gmap address spaces for kvm guest imagesCarsten Otte
This patch switches kvm from using (Qemu's) user address space to Martin's gmap address space. This way QEMU does not have to use a linker script in order to fit large guests at low addresses in its address space. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm guest address space mappingMartin Schwidefsky
Add code that allows KVM to control the virtual memory layout that is seen by a guest. The guest address space uses a second page table that shares the last level pte-tables with the process page table. If a page is unmapped from the process page table it is automatically unmapped from the guest page table as well. The guest address space mapping starts out empty, KVM can map any individual 1MB segments from the process virtual memory to any 1MB aligned location in the guest virtual memory. If a target segment in the process virtual memory does not exist or is unmapped while a guest mapping exists the desired target address is stored as an invalid segment table entry in the guest page table. The population of the guest page table is fault driven. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] fix s390 assembler code alignmentsJan Glauber
The alignment is missing for various global symbols in s390 assembly code. With a recent gcc and an instruction like stgrl this can lead to a specification exception if the instruction uses such a mis-aligned address. Specify the alignment explicitely and while add it define __ALIGN for s390 and use the ENTRY define to save some lines of code. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] move sie code to entry.SMartin Schwidefsky
The entry to / exit from sie has subtle dependencies to the first level interrupt handler. Move the sie assembler code to entry64.S and replace the SIE_HOOK callback with a test and the new _TIF_SIE bit. In addition this patch fixes several problems in regard to the check for the_TIF_EXIT_SIE bits. The old code checked the TIF bits before executing the interrupt handler and it only modified the instruction address if it pointed directly to the sie instruction. In both cases it could miss a TIF bit that normally would cause an exit from the guest and would reenter the guest context. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm: handle tprot interceptsChristian Borntraeger
When running a kvm guest we can get intercepts for tprot, if the host page table is read-only or not populated. This patch implements the most common case (linux memory detection). This also allows host copy on write for guest memory on newer systems. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] qdio: clear shared DSCI before scheduling the queue handlerJan Glauber
The following race can occur with qdio devices that use the shared device state change indicator: Device (Shared DSCI) CPU0 CPU1 =============================================================================== 1. DSCI 0 => 1, INT pending 2. Thinint handler * si_used = 1 * Inbound tasklet_schedule * DSCI 1 => 0 3. DSCI 0 => 1, INT pending 4. Thinint handler * si_used = 1 * Inbound tasklet_schedu le => NOP 5. Inbound tasklet run 6. DSCI = 1, INT surpressed 7. DSCI 1 => 0 The race would lead to a stall where new data in the input queue is not recognized so the device stops working in case of no further traffic. Fix the race by resetting the DSCI before scheduling the inbound tasklet so the device generates an interrupt if new data arrives in the above scenario in step 6. Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] reference bit testing for unmapped pagesMartin Schwidefsky
On x86 a page without a mapper is by definition not referenced / old. The s390 architecture keeps the reference bit in the storage key and the current code will check the storage key for page without a mapper. This leads to an interesting effect: the first time an s390 system needs to write pages to swap it only finds referenced pages. This causes a lot of pages to get added and written to the swap device. To avoid this behaviour change page_referenced to query the storage key only if there is a mapper of the page. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] irqs: Do not trace arch_local_{*,irq_*} functionsSteven Rostedt
Do not trace arch_local_save_flags(), arch_local_irq_*() and friends. Although they are marked inline, gcc may still make a function out of them and add it to the pool of functions that are traced by the function tracer. This can cause undesirable results (kernel panic, triple faults, etc). Add the notrace notation to prevent them from ever being traced. Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kconfig: remove tape interface support commentMartin Schwidefsky
There is nothing below the menu entry "S/390 tape interface support". Remove it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-23of: fix missing include from of_pci.cGrant Likely
of_pci.c references symbols from linux/of.h. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-23gre: fix improper error handlingxeb@mail.ru
Fix improper protocol err_handler, current implementation is fully unapplicable and may cause kernel crash due to double kfree_skb. Signed-off-by: Dmitry Kozlov <xeb@mail.ru> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23ipv4: use RT_TOS after some rt_tos conversionsJulian Anastasov
rt_tos was changed to iph->tos but it must be filtered by RT_TOS Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23via-velocity: remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in drivers/net/via-velocity.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23qlge: remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in drivers/net/qlge/qlge_main.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23igb: remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in drivers/net/igb/igb_main.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23can: c_can: remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in drivers/net/can/c_can/c_can.c drivers/net/can/c_can/c_can_platform.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23bnad: remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in drivers/net/bna/bnad.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-24mm: properly reflect task dirty limits in dirty_exceeded logicJan Kara
We set bdi->dirty_exceeded (and thus ratelimiting code starts to call balance_dirty_pages() every 8 pages) when a per-bdi limit is exceeded or global limit is exceeded. But per-bdi limit also depends on the task. Thus different tasks reach the limit on that bdi at different levels of dirty pages. The result is that with current code bdi->dirty_exceeded ping-ponged between 1 and 0 depending on which task just got into balance_dirty_pages(). We fix the issue by clearing bdi->dirty_exceeded only when per-bdi amount of dirty pages drops below the threshold (7/8 * bdi_dirty_limit) where task limits already do not have any influence. Impact: The end result is, the dirty pages are kept more tightly under control, with the average number slightly lowered than before. This reduces the risk to throttle light dirtiers and hence more responsive. However it may add overheads by enforcing balance_dirty_pages() calls on every 8 pages when there are 2+ heavy dirtiers. CC: Andrew Morton <akpm@linux-foundation.org> CC: Christoph Hellwig <hch@infradead.org> CC: Dave Chinner <david@fromorbit.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-24writeback: don't busy retry writeback on new/freeing inodesWu Fengguang
Fix a system hang bug introduced by commit b7a2441f9966 ("writeback: remove writeback_control.more_io") and e8dfc3058 ("writeback: elevate queue_io() into wb_writeback()") easily reproducible with high memory pressure and lots of file creation/deletions, for example, a kernel build in limited memory. It hangs when some inode is in the I_NEW, I_FREEING or I_WILL_FREE state, the flusher will get stuck busy retrying that inode, never releasing wb->list_lock. The lock in turn blocks all kinds of other tasks when they are trying to grab it. As put by Jan, it's a safe change regarding data integrity. I_FREEING or I_WILL_FREE inodes are written back by iput_final() and it is reclaim code that is responsible for eventually removing them. So writeback code can safely ignore them. I_NEW inodes should move out of this state when they are fully set up and in the writeback round following that, we will consider them for writeback. So the change makes sense. CC: Jan Kara <jack@suse.cz> Reported-by: Hugh Dickins <hughd@google.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-23ata: PATA_ARASAN_CF depends on DMADEVICESRandy Dunlap
Fix kconfig unmet dependency warning: warning: (PATA_ARASAN_CF && VIDEO_TIMBERDALE && SND_SOC_SH4_SIU) selects DMA_ENGINE which has unmet direct dependencies (DMADEVICES) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Viresh Kumar <viresh.kumar@st.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23ata: remove unnecessary codeGreg Dietsche
Compile tested. remove unnecessary code that matches this coccinelle pattern if (...) return ret; return ret; Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23[libata] Prevent warning during PMP error recoveryGwendal Grignou
Cleanup sff_pio_task_link when a command is cancel while the pio_task thread has been scheduled. Signed-off-by: Gwendal Grignou <gwendal@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23ahci: RAID-mode SATA patch for Intel Panther Point DeviceIDsSeth Heasley
This patch adds an additional SATA RAID controller DeviceID for the Intel Panther Point PCH. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23pata_it821x: Fix RAID type display, by adding missing commaJean Delvare
The missing comma causes the wrong RAID type to be displayed. Introduced by commit 963e4975c6f93c148ca809d986d412201df9af89 three years ago, odd that nobody noticed before. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23sata_dwc_460ex: fix error pathVasiliy Kulikov
Fixed hsdev memleak on sata_dwc_probe() error. As dma_dwc_exit() can be called multiple times without sata_dma_regs and irq_dma changes, it might lead to double free on sequential dma_dwc_exit() calls. So, zero these fields after free calls. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23ahci: Enable SB600 64bit DMA on Asus M3AMark Nelson
Like e65cc194f7628ecaa02462f22f42fb09b50dcd49 this patch enables 64bit DMA for the AHCI SATA controller of a board that has the SB600 southbridge. In this case though we're enabling 64bit DMA for the Asus M3A motherboard. It is a new enough board that all of the BIOS releases since the initial release (0301 from 2007-10-22) work correctly with 64bit DMA enabled. Signed-off-by: Mark Nelson <mdnelson8@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>