summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2011-07-14net: unexport netdev_fix_features()Michał Mirosław
It is not used anywhere except net/core/dev.c now. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14tracing: Have dynamic size event stack tracesSteven Rostedt
Currently the stack trace per event in ftace is only 8 frames. This can be quite limiting and sometimes useless. Especially when the "ignore frames" is wrong and we also use up stack frames for the event processing itself. Change this to be dynamic by adding a percpu buffer that we can write a large stack frame into and then copy into the ring buffer. For interrupts and NMIs that come in while another event is being process, will only get to use the 8 frame stack. That should be enough as the task that it interrupted will have the full stack frame anyway. Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14ARM / shmobile: Use genpd_queue_power_off_work()Rafael J. Wysocki
Make pd_power_down_a3rv() use genpd_queue_power_off_work() to queue up the powering off of the A4LC domain to avoid queuing it up when it is pending. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Magnus Damm <damm@opensource.se>
2011-07-14Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c
2011-07-14net: Embed hh_cache inside of struct neighbour.David S. Miller
Now that there is a one-to-one correspondance between neighbour and hh_cache entries, we no longer need: 1) dynamic allocation 2) attachment to dst->hh 3) refcounting Initialization of the hh_cache entry is indicated by hh_len being non-zero, and such initialization is always done with the neighbour's lock held as a writer. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14KVM: Steal time implementationGlauber Costa
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM, while the vcpu had meaningful work to do - halt time does not count. This information is acquired through the run_delay field of delayacct/schedstats infrastructure, that counts time spent in a runqueue but not running. Steal time is a per-cpu information, so the traditional MSR-based infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the memory area address containing information about steal time This patch contains the hypervisor part of the steal time infrasructure, and can be backported independently of the guest portion. [avi, yongjie: export delayacct_on, to avoid build failures in some configs] Signed-off-by: Glauber Costa <glommer@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Rik van Riel <riel@redhat.com> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Yongjie Ren <yongjie.ren@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-14Merge branch 'drm-intel-next' of ↵Dave Airlie
ssh://master.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6 into drm-core-next * 'drm-intel-next' of ssh://master.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6: (52 commits) drm/i915: provide module parameter description drm/i915: add module parameter compiler hints drm/i915/bios: Avoid temporary allocation whilst searching for downclock drm/i915: Cache GT fifo count for SandyBridge i915: Fix opregion notifications drm/i915: TVDAC_STATE_CHG does not indicate successful load-detect drm/i915: Select correct pipe during TV detect drm/i915/ringbuffer: Idling requires waiting for the ring to be empty Revert "drm/i915: enable rc6 by default" drm/i915: Clean up i915_driver_load failure path drm/i915: Enable i915 frame buffer compression by default drm/i915: Share the common work of disabling active FBC before updating drm/i915: Perform intel_enable_fbc() from a delayed task drm/i915: Disable FBC across page-flipping drm/i915: Set persistent-mode for ILK/SNB framebuffer compression drm/i915: Use of a CPU fence is mandatory to update FBC regions upon CPU writes drm/i915: Remove vestigial pitch from post-gen2 FBC control routines drm/i915: Replace direct calls to vfunc.disable_fbc with intel_disable_fbc() drm/i915: Only export the generic intel_disable_fbc() interface drm/i915: Enable GPU reset on Ivybridge. ...
2011-07-14Merge branches 'd3cold', 'bugzilla-37412' and 'bugzilla-38152' into releaseLen Brown
2011-07-13ACPI: Fixes device power states array overflowLin Ming
Commit 28c2103 added new state ACPI_STATE_D3_COLD, so the device power states array must be expanded by one also. v2: Use ACPI_D_STATE_COUNT instead of number 5 for the array size. Reported-by: Dan Carpenter <error27@gmail.com> Suggested-by: Oldřich Jedlička <oldium.pro@seznam.cz> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: core: Bus width testing needs to handle suspend/resume
2011-07-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) slip: fix wrong SLIP6 ifdef-endif placing natsemi: fix another dma-debug report sctp: ABORT if receive, reassmbly, or reodering queue is not empty while closing socket net: Fix default in docs for tcp_orphan_retries. hso: fix a use after free condition net/natsemi: Fix module parameter permissions XFRM: Fix memory leak in xfrm_state_update sctp: Enforce retransmission limit during shutdown mac80211: fix TKIP replay vulnerability mac80211: fix ie memory allocation for scheduled scans ssb: fix init regression of hostmode PCI core rtlwifi: rtl8192cu: Add new USB ID for Netgear WNA1000M ath9k: Fix tx throughput drops for AR9003 chips with AES encryption carl9170: add NEC WL300NU-AG usbid cfg80211: fix deadlock with rfkill/sched_scan by adding new mutex ath5k: fix incorrect use of drvdata in PCI suspend/resume code ath5k: fix incorrect use of drvdata in sysfs code Bluetooth: Fix memory leak under page timeouts Bluetooth: Fix regression with incoming L2CAP connections Bluetooth: Fix hidp disconnect deadlocks and lost wakeup ...
2011-07-13block: reorder request_queue to remove 64 bit alignment paddingRichard Kennedy
Reorder request_queue to remove 16 bytes of alignment padding in 64 bit builds. On my config this shrinks the size of this structure from 1608 to 1592 bytes and therefore needs one fewer cachelines. Also trivially move the open bracket { to be on the same line as the structure name to make it easier to grep. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-13mmc: core: Bus width testing needs to handle suspend/resumePhilip Rakity
On reading the ext_csd for the first time (in 1 bit mode), save the ext_csd information needed for bus width compare. On every pass we make re-reading the ext_csd, compare the data against the saved ext_csd data. This fixes a regression introduced in 3.0-rc1 by 08ee80cc397ac1a3 ("mmc: core: eMMC bus width may not work on all platforms"), which incorrectly assumed we would be re-reading the ext_csd at resume- time. Signed-off-by: Philip Rakity <prakity@marvell.com> Tested-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-07-13mac80211: allow driver to disconnect after resumeJohannes Berg
In WoWLAN, devices may use crypto keys for TX/RX and could also implement GTK rekeying. If the driver isn't able to retrieve replay counters and similar information from the device upon resume, or if the device isn't responsive due to platform issues, it isn't safe to keep the connection up as GTK rekey messages from during the sleep time could be replayed against it. The only protection against that is disconnecting from the AP. Modifying mac80211 to do that while it is resuming would be very complex and invasive in the case that the driver requires a reconfig, so do it after it has resumed completely. In that case, however, packets might be replayed since it can then only happen after TX/RX are up again, so mark keys for interfaces that need to disconnect as "tainted" and drop all packets that are sent or received with those keys. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-13ACPI: Fix lockdep false positives in acpi_power_off()Rafael J. Wysocki
All ACPICA locks are allocated by the same function, acpi_os_create_lock(), with the help of a local variable called "lock". Thus, when lockdep is enabled, it uses "lock" as the name of all those locks and regards them as instances of the same lock, which causes it to report possible locking problems with them when there aren't any. To work around this problem, define acpi_os_create_lock() as a macro and make it pass its argument to spin_lock_init(), so that lockdep uses it as the name of the new lock. Define this macron in a Linux-specific file, to minimize the resulting modifications of the OS-independent ACPICA parts. This change is based on an earlier patch from Andrea Righi and it addresses a regression from 2.6.39 tracked as https://bugzilla.kernel.org/show_bug.cgi?id=38152 Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Andrea Righi <andrea@betterlinux.com> Reviewed-by: Florian Mickler <florian@mickler.org> Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-13clocksource: Replace vread with generic arch dataAndy Lutomirski
The vread field was bloating struct clocksource everywhere except x86_64, and I want to change the way this works on x86_64, so let's split it out into per-arch data. Cc: x86@kernel.org Cc: Clemens Ladisch <clemens@ladisch.de> Cc: linux-ia64@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/3ae5ec76a168eaaae63f08a2a1060b91aa0b7759.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13i915: Fix opregion notificationsMatthew Garrett
opregion-based platforms will send ACPI video event 0x80 for a range of notification types for legacy compatibility. This is interpreted as a display switch event, which may not be appropriate in the circumstances. When we receive such an event we should make sure that the platform is genuinely requesting a display switch before passing that event through to userspace. Signed-off-by: Matthew Garrett <mjg@redhat.com> Tested-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Keith Packard <keithp@keithp.com>
2011-07-13PM / Domains: Introduce function to power off all unused PM domainsRafael J. Wysocki
Add a new function pm_genpd_poweroff_unused() queuing up the execution of pm_genpd_poweroff() for every initialized generic PM domain. Calling it will cause every generic PM domain without devices in use to be powered off. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Magnus Damm <damm@opensource.se>
2011-07-13net: Kill support for multiple hh_cache entries per neighbourDavid S. Miller
This never, ever, happens. Neighbour entries are always tied to one address family, and therefore one set of dst_ops, and therefore one dst_ops->protocol "hh_type" value. This capability was blindly imported by Alexey Kuznetsov when he wrote the neighbour layer. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-13net: Push protocol type directly down to header_ops->cache()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-13ipv4: Inline neigh binding.David Miller
Get rid of all of the useless and costly indirection by doing the neigh hash table lookup directly inside of the neighbour binding. Rename from arp_bind_neighbour to rt_bind_neighbour. Use new helpers {__,}ipv4_neigh_lookup() In rt_bind_neighbour() get rid of useless tests which are never true in the context this function is called, namely dev is never NULL and the dst->neighbour is always NULL. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-13Merge 3.0-rc7 into drm-core-nextDave Airlie
This pulls in all the drm fixes up to this point which are needed for some -next patches to work.
2011-07-13DRM: remove drm_pci_device_is_pcieJon Mason
drm_pci_device_is_pcie duplicates the funcationality of pci_is_pcie. Convert callers of the former to the latter. This has the side benefit of removing an unnecessary search in the PCI configuration space due to using a saved PCIe capability offset. [airlied: update for new callsite] Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-07-12x86, numa: Implement pfn -> nid mapping granularity checkTejun Heo
SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use sections array to map pfn to nid which is limited in granularity. If NUMA nodes are laid out such that the mapping cannot be accurate, boot will fail triggering BUG_ON() in mminit_verify_page_links(). On 32bit, it's 512MiB w/ PAE and SPARSEMEM. This seems to have been granular enough until commit 2706a0bf7b (x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too). Apparently, there is a machine which aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT. This led to the following BUG_ON(). On node 0 totalpages: 2096615 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 3927 pages, LIFO batch:0 Normal zone: 1740 pages used for memmap Normal zone: 220978 pages, LIFO batch:31 HighMem zone: 16405 pages used for memmap HighMem zone: 1853533 pages, LIFO batch:31 BUG: Int 6: CR2 (null) EDI (null) ESI 00000002 EBP 00000002 ESP c1543ecc EBX f2400000 EDX 00000006 ECX (null) EAX 00000001 err (null) EIP c16209aa CS 00000060 flg 00010002 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000 (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe (null) f7200b80 c16395f0 00200a02 f7200a80 (null) 000375fe 00000002 (null) Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17 Call Trace: [<c136b1e5>] ? early_fault+0x2e/0x2e [<c16209aa>] ? mminit_verify_page_links+0x12/0x42 [<c1620613>] ? memmap_init_zone+0xaf/0x10c [<c1620929>] ? free_area_init_node+0x2b9/0x2e3 [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451 [<c1601d80>] ? paging_init+0x112/0x118 [<c15f578d>] ? setup_arch+0x791/0x82f [<c15f43d9>] ? start_kernel+0x6a/0x257 This patch implements node_map_pfn_alignment() which determines maximum internode alignment and update numa_register_memblks() to reject NUMA configuration if alignment exceeds the pfn -> nid mapping granularity of the memory model as determined by PAGES_PER_SECTION. This makes the problematic machine boot w/ flatmem by rejecting the NUMA config and provides protection against crazy NUMA configurations. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com> Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Conny Seidel <conny.seidel@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/mm: Fix memory_block_size_bytes() for non-pseries mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
2011-07-12Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: pcmcia: pxa2xx/vpac270: free gpios on exist rather than requesting ARM: pxa/raumfeld: fix device name for codec ak4104 ARM: pxa/raumfeld: display initialisation fixes ARM: pxa/raumfeld: adapt to upcoming hardware change ARM: pxa: fix gpio_to_chip() clash with gpiolib namespace genirq: replace irq_gc_ack() with {set,clr}_bit variants (fwd) arm: mach-vt8500: add forgotten irq_data conversion ARM: pxa168: correct nand pmu setting ARM: pxa910: correct nand pmu setting ARM: pxa: fix PGSR register address calculation
2011-07-12netdevice: Kill 'feature' test macros.David S. Miller
Almost all of these have long outstayed their welcome. And for every one of these macros, there are 10 features for which we didn't add macros. Let's just delete them all, and get out of habit of doing things this way. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
2011-07-12CFQ: move think time check variables to a separate structShaohua Li
Move the variables to do think time check to a sepatate struct. This is to prepare adding think time check for service tree and group. No functional change. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-12KVM: introduce kvm_read_guest_cachedGleb Natapov
Introduce kvm_read_guest_cached() function in addition to write one we already have. [ by glauber: export function signature in kvm header ] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric Munson <emunson@mgebm.net> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guestsPaul Mackerras
This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Allow book3s_hv guests to use SMT processor modesPaul Mackerras
This lifts the restriction that book3s_hv guests can only run one hardware thread per core, and allows them to use up to 4 threads per core on POWER7. The host still has to run single-threaded. This capability is advertised to qemu through a new KVM_CAP_PPC_SMT capability. The return value of the ioctl querying this capability is the number of vcpus per virtual CPU core (vcore), currently 4. To use this, the host kernel should be booted with all threads active, and then all the secondary threads should be offlined. This will put the secondary threads into nap mode. KVM will then wake them from nap mode and use them for running guest code (while they are still offline). To wake the secondary threads, we send them an IPI using a new xics_wake_cpu() function, implemented in arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage we assume that the platform has a XICS interrupt controller and we are using icp-native.c to drive it. Since the woken thread will need to acknowledge and clear the IPI, we also export the base physical address of the XICS registers using kvmppc_set_xics_phys() for use in the low-level KVM book3s code. When a vcpu is created, it is assigned to a virtual CPU core. The vcore number is obtained by dividing the vcpu number by the number of threads per core in the host. This number is exported to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes to run the guest in single-threaded mode, it should make all vcpu numbers be multiples of the number of threads per core. We distinguish three states of a vcpu: runnable (i.e., ready to execute the guest), blocked (that is, idle), and busy in host. We currently implement a policy that the vcore can run only when all its threads are runnable or blocked. This way, if a vcpu needs to execute elsewhere in the kernel or in qemu, it can do so without being starved of CPU by the other vcpus. When a vcore starts to run, it executes in the context of one of the vcpu threads. The other vcpu threads all go to sleep and stay asleep until something happens requiring the vcpu thread to return to qemu, or to wake up to run the vcore (this can happen when another vcpu thread goes from busy in host state to blocked). It can happen that a vcpu goes from blocked to runnable state (e.g. because of an interrupt), and the vcore it belongs to is already running. In that case it can start to run immediately as long as the none of the vcpus in the vcore have started to exit the guest. We send the next free thread in the vcore an IPI to get it to start to execute the guest. It synchronizes with the other threads via the vcore->entry_exit_count field to make sure that it doesn't go into the guest if the other vcpus are exiting by the time that it is ready to actually enter the guest. Note that there is no fixed relationship between the hardware thread number and the vcpu number. Hardware threads are assigned to vcpus as they become runnable, so we will always use the lower-numbered hardware threads in preference to higher-numbered threads if not all the vcpus in the vcore are runnable, regardless of which vcpus are runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Accelerate H_PUT_TCE by implementing it in real modeDavid Gibson
This improves I/O performance for guests using the PAPR paravirtualization interface by making the H_PUT_TCE hcall faster, by implementing it in real mode. H_PUT_TCE is used for updating virtual IOMMU tables, and is used both for virtual I/O and for real I/O in the PAPR interface. Since this moves the IOMMU tables into the kernel, we define a new KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The ioctl returns a file descriptor which can be used to mmap the newly created table. The qemu driver models use them in the same way as userspace managed tables, but they can be updated directly by the guest with a real-mode H_PUT_TCE implementation, reducing the number of host/guest context switches during guest IO. There are certain circumstances where it is useful for userland qemu to write to the TCE table even if the kernel H_PUT_TCE path is used most of the time. Specifically, allowing this will avoid awkwardness when we need to reset the table. More importantly, we will in the future need to write the table in order to restore its state after a checkpoint resume or migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Add support for Book3S processors in hypervisor modePaul Mackerras
This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentationJan Kiszka
Neither host_irq nor the guest_msi struct are used anymore today. Tag the former, drop the latter to avoid confusion. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12PM / Runtime: Add new helper function: pm_runtime_status_suspended()Kevin Hilman
This boolean function simply returns whether or not the runtime status of the device is 'suspended'. Unlike pm_runtime_suspended(), this function returns the runtime status whether or not runtime PM for the device has been disabled or not. Also add entry to Documentation/power/runtime.txt Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-12fixlet: Remove fs_excl from struct task.Justin TerAvest
fs_excl is a poor man's priority inheritance for filesystems to hint to the block layer that an operation is important. It was never clearly specified, not widely adopted, and will not prevent starvation in many cases (like across cgroups). fs_excl was introduced with the time sliced CFQ IO scheduler, to indicate when a process held FS exclusive resources and thus needed a boost. It doesn't cover all file systems, and it was never fully complete. Lets kill it. Signed-off-by: Justin TerAvest <teravest@google.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-11net: introduce __netdev_alloc_skb_ip_alignEric Dumazet
RX rings should use GFP_KERNEL allocations if possible, add __netdev_alloc_skb_ip_align() helper to ease this. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-12mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a headerBenjamin Herrenschmidt
The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c files, and I need it in a third one to fix a powerpc bug, so let's first move it into a header Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Ingo Molnar <mingo@elte.hu>
2011-07-11Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: [media] msp3400: fill in v4l2_tuner based on vt->type field [media] tuner-core.c: don't change type field in g_tuner or g_frequency [media] cx18/ivtv: fix g_tuner support [media] tuner-core: power up tuner when called with s_power(1) [media] v4l2-ioctl.c: check for valid tuner type in S_HW_FREQ_SEEK [media] tuner-core: simplify the standard fixup [media] tuner-core/v4l2-subdev: document that the type field has to be filled in [media] v4l2-subdev.h: remove unused s_mode tuner op [media] feature-removal-schedule: change in how radio device nodes are handled [media] bttv: fix s_tuner for radio [media] pvrusb2: fix g/s_tuner support [media] v4l2-ioctl.c: prefill tuner type for g_frequency and g/s_tuner [media] tuner-core: fix tuner_resume: use t->mode instead of t->type [media] tuner-core: fix s_std and s_tuner
2011-07-12PM / Domains: Allow callbacks to execute all runtime PM helpersRafael J. Wysocki
A deadlock may occur if one of the PM domains' .start_device() or .stop_device() callbacks or a device driver's .runtime_suspend() or .runtime_resume() callback executed by the core generic PM domain code uses a "wrong" runtime PM helper function. This happens, for example, if .runtime_resume() from one device's driver calls pm_runtime_resume() for another device in the same PM domain. A similar situation may take place if a device's parent is in the same PM domain, in which case the runtime PM framework may execute pm_genpd_runtime_resume() automatically for the parent (if it is suspended at the moment). This, of course, is undesirable, so the generic PM domains code should be modified to prevent it from happening. The runtime PM framework guarantees that pm_genpd_runtime_suspend() and pm_genpd_runtime_resume() won't be executed in parallel for the same device, so the generic PM domains code need not worry about those cases. Still, it needs to prevent the other possible race conditions between pm_genpd_runtime_suspend(), pm_genpd_runtime_resume(), pm_genpd_poweron() and pm_genpd_poweroff() from happening and it needs to avoid deadlocks at the same time. To this end, modify the generic PM domains code to relax synchronization rules so that: * pm_genpd_poweron() doesn't wait for the PM domain status to change from GPD_STATE_BUSY. If it finds that the status is not GPD_STATE_POWER_OFF, it returns without powering the domain on (it may modify the status depending on the circumstances). * pm_genpd_poweroff() returns as soon as it finds that the PM domain's status changed from GPD_STATE_BUSY after it's released the PM domain's lock. * pm_genpd_runtime_suspend() doesn't wait for the PM domain status to change from GPD_STATE_BUSY after executing the domain's .stop_device() callback and executes pm_genpd_poweroff() only if pm_genpd_runtime_resume() is not executed in parallel. * pm_genpd_runtime_resume() doesn't wait for the PM domain status to change from GPD_STATE_BUSY after executing pm_genpd_poweron() and sets the domain's status to GPD_STATE_BUSY and increments its counter of resuming devices (introduced by this change) immediately after acquiring the lock. The counter of resuming devices is then decremented after executing __pm_genpd_runtime_resume() for the device and the domain's status is reset to GPD_STATE_ACTIVE (unless there are more resuming devices in the domain, in which case the status remains GPD_STATE_BUSY). This way, for example, if a device driver's .runtime_resume() callback executes pm_runtime_resume() for another device in the same PM domain, pm_genpd_poweron() called by pm_genpd_runtime_resume() invoked by the runtime PM framework will not block and it will see that there's nothing to do for it. Next, the PM domain's lock will be acquired without waiting for its status to change from GPD_STATE_BUSY and the device driver's .runtime_resume() callback will be executed. In turn, if pm_runtime_suspend() is executed by one device driver's .runtime_resume() callback for another device in the same PM domain, pm_genpd_poweroff() executed by pm_genpd_runtime_suspend() invoked by the runtime PM framework as a result will notice that one of the devices in the domain is being resumed, so it will return immediately. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-12PM / Domains: Do not execute device callbacks under locksRafael J. Wysocki
Currently, the .start_device() and .stop_device() callbacks from struct generic_pm_domain() as well as the device drivers' runtime PM callbacks used by the generic PM domains code are executed under the generic PM domain lock. This, unfortunately, is prone to deadlocks, for example if a device and its parent are boths members of the same PM domain. For this reason, it would be better if the PM domains code didn't execute device callbacks under the lock. Rework the locking in the generic PM domains code so that the lock is dropped for the execution of device callbacks. To this end, introduce PM domains states reflecting the current status of a PM domain and such that the PM domain lock cannot be acquired if the status is GPD_STATE_BUSY. Make threads attempting to acquire a PM domain's lock wait until the status changes to either GPD_STATE_ACTIVE or GPD_STATE_POWER_OFF. This change by itself doesn't fix the deadlock problem mentioned above, but the mechanism introduced by it will be used for for this purpose by a subsequent patch. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-11cfg80211: fix docbookJohannes Berg
Looks like I forgot to document the "gfp" parameter to cfg80211_gtk_rekey_notify, add it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-11mac80211: add driver RSSI threshold eventsMeenakshi Venkataraman
mac80211 maintains a running average of the RSSI when a STA is associated to an AP. Report threshold events to any driver that has registered callbacks for getting RSSI measurements. Implement callbacks in mac80211 so that driver can set thresholds. Add callbacks in mac80211 which is invoked when an RSSI threshold event occurs. mac80211: add tracing to rssi_reports api and remove extraneous fn argument mac80211: scale up rssi thresholds from driver by 16 before storing Signed-off-by: Meenakshi Venkataraman <meenakshi.venkataraman@intel.com> Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-11Merge branch 'master' of ↵John W. Linville
master.kernel.org:/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6 Conflicts: net/bluetooth/l2cap_core.c
2011-07-11xen/pci: Remove 'xen_allocate_pirq_gsi'.Konrad Rzeszutek Wilk
In the past (2.6.38) the 'xen_allocate_pirq_gsi' would allocate an entry in a Linux IRQ -> {XEN_IRQ, type, event, ..} array. All of that has been removed in 2.6.39 and the Xen IRQ subsystem uses an linked list that is populated when the call to 'xen_allocate_irq_gsi' (universally done from any of the xen_bind_* calls) is done. The 'xen_allocate_pirq_gsi' is a NOP and there is no need for it anymore so lets remove it. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11ftrace: Fix warning when CONFIG_FUNCTION_TRACER is not definedSteven Rostedt
The struct ftrace_hash was declared within CONFIG_FUNCTION_TRACER but was referenced outside of it. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-11Merge branch 'master' into for-nextJiri Kosina
Sync with Linus' tree to be able to apply pending patches that are based on newer code already present upstream.
2011-07-11ipv4: Use universal hash for ARP.David S. Miller
We need to make sure the multiplier is odd. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-11neigh: Store hash shift instead of mask.David S. Miller
And mask the hash function result by simply shifting down the "->hash_shift" most significant bits. Currently which bits we use is arbitrary since jhash produces entropy evenly across the whole hash function result. But soon we'll be using universal hashing functions, and in those cases more entropy exists in the higher bits than the lower bits, because they use multiplies. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-11Bluetooth: Fixes l2cap "command reject" reply according to specIlia Kolomisnky
There can 3 reasons for the "command reject" reply produced by the stack. Each such reply should be accompanied by the relevand data ( as defined in spec. ). Currently there is one instance of "command reject" reply with reason "invalid cid" wich is fixed. Also, added clean-up definitions related to the "command reject" replies. Signed-off-by: Ilia Kolomisnky <iliak@ti.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>