summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2011-08-22usb: add usb_endpoint_maxp() macrokuninori.morimoto.gx@renesas.com
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22drivers:misc:ti-st: platform hooks for chip statesPavan Savoy
Certain platform specific or Host-WiLink Interface specific actions would be required to be taken when the chip is being enabled and after the chip is disabled such as configuration of the mux modes for the GPIO of host connected to the nshutdown of the chip or relinquishing UART after the chip is disabled. Similar actions can also be taken when the chip is in deep sleep or when the chip is awake. Performance enhancements such as configuring the host to run faster when chip is awake and slower when chip is asleep can also be made here. Signed-off-by: Pavan Savoy <pavan_savoy@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22target: Make standard INQUIRY return 'not connected' for tpg_virt_lun0Nicholas Bellinger
This patch changes target_emulate_inquiry_std() to set the 'not connected' (0x35) bit in standard INQUIRY response data when we are processing a request to a virtual LUN=0 mapping from struct se_device *g_lun0_dev that have been setup for us in transport_lookup_cmd_lun(). This addresses an issue where qla2xxx FC clients need to be able to create demo-mode I_T FC Nexuses by default, but should not be exposing the default set of TPG LUNs to all FC clients. This includes adding an new optional target_core_fabric_ops->tpg_check_demo_mode_login_only() caller to allow demo_mode nexuses to skip the old default of bulding a demo-mode MappedLUNs list via core_tpg_add_node_to_devs(). (roland: Add missing tpg_check_demo_mode_login_only check in core_dev_add_lun) Reported-by: Roland Dreier <roland@purestorage.com> Cc: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@risingtidesystems.com>
2011-08-22mac80211: update mesh path selection frame formatThomas Pedersen
Make mesh path selection frames Mesh Action category, remove outdated Mesh Path Selection category and defines, use updated reason codes, add mesh_action_is_path_sel for readability, and update/correct path selection IEs. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-22ieee80211: add mesh action codesThomas Pedersen
Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-22mac80211: update mesh peering frame formatThomas Pedersen
This patch updates the mesh peering frames to the format specified in the recently ratified 802.11s standard. Several changes took place to make this happen: - Change RX path to handle new self-protected frames - Add new Peering management IE - Remove old Peer Link IE - Remove old plink_action field in ieee80211_mgmt header These changes by themselves would either break peering, or work by coincidence, so squash them all into this patch. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-22ieee80211: introduce Self Protected Action codesThomas Pedersen
802.11s introduces a new action frame category, add action codes as well as an entry in ieee80211_mgmt. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-22bcma: implement BCM4331 workaround for external PA linesRafał Miłecki
We need to disable ext. PA lines for reading SPROM. It's disabled by default, but this patch allows using bcma after loading wl, which leaves workaround enabled. Cc: Arend van Spriel <arend@broadcom.com> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-22wireless: Introduce defines for BAR TID_INFO & MULTI_TID fieldsHelmut Schaa
While at it also fix the indention of the other IEEE80211_BAR_CTRL_ defines. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-22Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/staging/ath6kl/miscdrv/ar3kps/ar3kpsparser.c drivers/staging/ath6kl/os/linux/ar6000_drv.c
2011-08-22mac80211: fix suspend/resume races with unregister hwStanislaw Gruszka
Do not call ->suspend, ->resume methods after we unregister wiphy. Also delete sta_clanup timer after we finish wiphy unregister to avoid this: WARNING: at lib/debugobjects.c:262 debug_print_object+0x85/0xa0() Hardware name: 6369CTO ODEBUG: free active (active state 0) object type: timer_list hint: sta_info_cleanup+0x0/0x180 [mac80211] Modules linked in: aes_i586 aes_generic fuse bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq mperf ext2 dm_mod uinput thinkpad_acpi hwmon sg arc4 rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib mac80211 cfg80211 i2c_i801 iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom yenta_socket ahci libahci pata_acpi ata_generic ata_piix i915 drm_kms_helper drm i2c_algo_bit video [last unloaded: microcode] Pid: 5663, comm: pm-hibernate Not tainted 3.1.0-rc1-wl+ #19 Call Trace: [<c0454cfd>] warn_slowpath_common+0x6d/0xa0 [<c05e05e5>] ? debug_print_object+0x85/0xa0 [<c05e05e5>] ? debug_print_object+0x85/0xa0 [<c0454dae>] warn_slowpath_fmt+0x2e/0x30 [<c05e05e5>] debug_print_object+0x85/0xa0 [<f8a808e0>] ? sta_info_alloc+0x1a0/0x1a0 [mac80211] [<c05e0bd2>] debug_check_no_obj_freed+0xe2/0x180 [<c051175b>] kfree+0x8b/0x150 [<f8a126ae>] cfg80211_dev_free+0x7e/0x90 [cfg80211] [<f8a13afd>] wiphy_dev_release+0xd/0x10 [cfg80211] [<c068d959>] device_release+0x19/0x80 [<c05d06ba>] kobject_release+0x7a/0x1c0 [<c07646a8>] ? rtnl_unlock+0x8/0x10 [<f8a13adb>] ? wiphy_resume+0x6b/0x80 [cfg80211] [<c05d0640>] ? kobject_del+0x30/0x30 [<c05d1a6d>] kref_put+0x2d/0x60 [<c05d056d>] kobject_put+0x1d/0x50 [<c08015f4>] ? mutex_lock+0x14/0x40 [<c068d60f>] put_device+0xf/0x20 [<c069716a>] dpm_resume+0xca/0x160 [<c04912bd>] hibernation_snapshot+0xcd/0x260 [<c04903df>] ? freeze_processes+0x3f/0x90 [<c049151b>] hibernate+0xcb/0x1e0 [<c048fdc0>] ? pm_async_store+0x40/0x40 [<c048fe60>] state_store+0xa0/0xb0 [<c048fdc0>] ? pm_async_store+0x40/0x40 [<c05d0200>] kobj_attr_store+0x20/0x30 [<c0575ea4>] sysfs_write_file+0x94/0xf0 [<c051e26a>] vfs_write+0x9a/0x160 [<c0575e10>] ? sysfs_open_file+0x200/0x200 [<c051e3fd>] sys_write+0x3d/0x70 [<c080959f>] sysenter_do_call+0x12/0x28 Cc: stable@kernel.org Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-08-22Merge branch 'regmap-mfd' into regmap-nextMark Brown
2011-08-22mfd: Convert WM8400 to regmap APIMark Brown
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-08-22mfd: Convert WM8994 to use new register map APIMark Brown
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Samuel Ortiz <sameo@linux.intel.com>
2011-08-22mfd: Convert WM831x to use regmap APIMark Brown
Factor out the register read/write code to use the register map API. We still need some wm831x specific code and locking in place to check that the user key is handled correctly but only on the write side, reads are not affected by the key. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Samuel Ortiz <sameo@linux.intel.com>
2011-08-21Merge branch 'regmap-interface' into regmap-nextMark Brown
2011-08-21regmap: Allow drivers to specify register defaultsMark Brown
It is useful for the register cache code to be able to specify the default values for the device registers. The major use is when restoring the register cache after suspend, knowing the register defaults allows us to skip registers that are at their default values when we resume which can be a substantial win on larger modern devices. For some devices (mostly older ones) the hardware does not support readback so the only way we can know the values is from code and so initializing the cache with default values makes it much easier for drivers work with read/modify/write updates. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-08-20Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-08-19PM / Runtime: Add macro to test for runtime PM eventsAlan Stern
This patch (as1482) adds a macro for testing whether or not a pm_message value represents an autosuspend or autoresume (i.e., a runtime PM) event. Encapsulating this notion seems preferable to open-coding the test all over the place. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-19Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits) Revert "cfq: Remove special treatment for metadata rqs." block: fix flush machinery for stacking drivers with differring flush flags block: improve rq_affinity placement blktrace: add FLUSH/FUA support Move some REQ flags to the common bio/request area allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH xen/blkback: Make description more obvious. cfq-iosched: Add documentation about idling block: Make rq_affinity = 1 work as expected block: swim3: fix unterminated of_device_id table block/genhd.c: remove useless cast in diskstats_show() drivers/cdrom/cdrom.c: relax check on dvd manufacturer value drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse bsg-lib: add module.h include cfq-iosched: Reduce linked group count upon group destruction blk-throttle: correctly determine sync bio loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices loop: add management interface for on-demand device allocation loop: replace linked list of allocated devices with an idr index ...
2011-08-19sunrpc: use better NUMA affinitiesEric Dumazet
Use NUMA aware allocations to reduce latencies and increase throughput. sunrpc kthreads can use kthread_create_on_node() if pool_mode is "percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can also take into account NUMA node affinity for memory allocations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: "J. Bruce Fields" <bfields@fieldses.org> CC: Neil Brown <neilb@suse.de> CC: David Miller <davem@davemloft.net> Reviewed-by: Greg Banks <gnb@fastmail.fm> [bfields@redhat.com: fix up caller nfs41_callback_up] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19locks: fix tracking of inprogress lease breaksJ. Bruce Fields
We currently use a bit in fl_flags to record whether a lease is being broken, and set fl_type to the type (RDLCK or UNLCK) that it will eventually have. This means that once the lease break starts, we forget what the lease's type *used* to be. Breaking a read lease will then result in blocking read opens, even though there's no conflict--because the lease type is now F_UNLCK and we can no longer tell whether it was previously a read or write lease. So, instead keep fl_type as the original type (the type which we enforce), and keep track of whether we're unlocking or merely downgrading by replacing the single FL_INPROGRESS flag by FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags. To get this right we also need to track separate downgrade and break times, to handle the case where a write-leased file gets conflicting opens first for read, then later for write. (I first considered just eliminating the downgrade behavior completely--nfsv4 doesn't need it, and nobody as far as I can tell actually uses it currently--but Jeremy Allison tells me that Windows oplocks do behave this way, so Samba will probably use this some day.) Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19locks: move F_INPROGRESS from fl_type to fl_flags fieldJ. Bruce Fields
F_INPROGRESS isn't exposed to userspace. To me it makes more sense in fl_flags.... Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: OF: Don't crash when bridge parent is NULL. PCI: export pcie_bus_configure_settings symbol PCI: code and comments cleanup PCI: make cardbus-bridge resources optional PCI: make SRIOV resources optional PCI : ability to relocate assigned pci-resources PCI: honor child buses add_size in hot plug configuration PCI: Set PCI-E Max Payload Size on fabric
2011-08-19slub: per cpu cache for partial pagesChristoph Lameter
Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to partial pages. The partial page list is used in slab_free() to avoid per node lock taking. In __slab_alloc() we can then take multiple partial pages off the per node partial list in one go reducing node lock pressure. We can also use the per cpu partial list in slab_alloc() to avoid scanning partial lists for pages with free objects. The main effect of a per cpu partial list is that the per node list_lock is taken for batches of partial pages instead of individual ones. Potential future enhancements: 1. The pickup from the partial list could be perhaps be done without disabling interrupts with some work. The free path already puts the page into the per cpu partial list without disabling interrupts. 2. __slab_free() may have some code paths that could use optimization. Performance: Before After ./hackbench 100 process 200000 Time: 1953.047 1564.614 ./hackbench 100 process 20000 Time: 207.176 156.940 ./hackbench 100 process 20000 Time: 204.468 156.940 ./hackbench 100 process 20000 Time: 204.879 158.772 ./hackbench 10 process 20000 Time: 20.153 15.853 ./hackbench 10 process 20000 Time: 20.153 15.986 ./hackbench 10 process 20000 Time: 19.363 16.111 ./hackbench 1 process 20000 Time: 2.518 2.307 ./hackbench 1 process 20000 Time: 2.258 2.339 ./hackbench 1 process 20000 Time: 2.864 2.163 Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-08-19squeeze max-pause area and drop pass-good areaWu Fengguang
Revert the pass-good area introduced in ffd1f609ab10 ("writeback: introduce max-pause and pass-good dirty limits") and make the max-pause area smaller and safe. This fixes ~30% performance regression in the ext3 data=writeback fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are 12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes. Using deadline scheduler also has a regression, but not that big as CFQ, so this suggests we have some write starvation. The test logs show that - the disks are sometimes under utilized - global dirty pages sometimes rush high to the pass-good area for several hundred seconds, while in the mean time some bdi dirty pages drop to very low value (bdi_dirty << bdi_thresh). Then suddenly the global dirty pages dropped under global dirty threshold and bdi_dirty rush very high (for example, 2 times higher than bdi_thresh). During which time balance_dirty_pages() is not called at all. So the problems are 1) The random writes progress so slow that they break the assumption of the max-pause logic that "8 pages per 200ms is typically more than enough to curb heavy dirtiers". 2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh) and then (bdi_thresh >> bdi_dirty) for others. 3) The higher max-pause/pass-good thresholds somehow leads to the bad swing of dirty pages. The fix is to allow the task to slightly dirty over task_bdi_thresh, but no way to exceed bdi_dirty and/or global dirty_thresh. Tests show that it fixed the JBOD regression completely (both behavior and performance), while still being able to cut down large pause times in balance_dirty_pages() for single-disk cases. Reported-by: Li Shaohua <shaohua.li@intel.com> Tested-by: Li Shaohua <shaohua.li@intel.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-08-19net: add the comment for skb->l4_rxhashChangli Gao
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17net: remove ndo_set_multicast_list callbackJiri Pirko
Remove no longer used operation. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17net: introduce IFF_UNICAST_FLT private flagJiri Pirko
Use IFF_UNICAST_FTL to find out if driver handles unicast address filtering. In case it does not, promisc mode is entered. Patch also fixes following drivers: stmmac, niu: support uc filtering and yet it propagated ndo_set_multicast_list bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set ndo_set_rx_mode but do not support uc filtering Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17rps: Add flag to skb to indicate rxhash is based on L4 tupleTom Herbert
The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
2011-08-17mm: fix __page_to_pfn for a const struct page argumentIan Campbell
This allows the cast in lowmem_page_address (introduced as a warning fixup to 33dd4e0ec911 "mm: make some struct page's const") to be removed. Propagate const'ness to page_to_section() as well since it is required by __page_to_pfn. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const.Ian Campbell
Followup to 33dd4e0ec911 "mm: make some struct page's const" which missed the HASHED_PAGE_VIRTUAL case. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rtc: Limit RTC PIE frequency rtc: Fix hrtimer deadlock rtc: Handle errors correctly in rtc_irq_set_state() Fixup trivial conflicts in drivers/rtc/interface.c due to slightly trivially versions of the same patch coming in two different ways.
2011-08-17Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: Track the owner of irq descriptor irq: Always set IRQF_ONESHOT if no primary handler is specified genirq: Fix wrong bit operation
2011-08-16ethtool: Note common alternate exit condition for interrupt coalescingBen Hutchings
Many implementations ignore the value of max_frames and do not treat usecs == 0 as special. Document this as deprecated. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16ethtool: Explicitly state the exit condition for interrupt coalescingBen Hutchings
Also explicitly state how to disable interrupt coalescing. Remove the now-redundant text from field descriptions. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16ethtool: Correct description of 'max_coalesced_frames' fieldsBen Hutchings
The current descriptions state that these fields specify 'How many packets to delay ... after a packet ...' which implies that the hardware should wait for (max_coalesced_frames + 1) completions before generating an interrupt. It is also stated that setting both this field and the corresponding 'coalesce_usecs' field to 0 is invalid. Together, this implies that the hardware must always be configured to delay a completion IRQ for at least 1 usec or 1 more completion. I believe that the addition of 1 is not intended, and David Miller confirms that the original implementation (in tg3) does not do this. Clarify the descriptions of these fields to avoid this interpretation. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16ethtool: Specify what kind of coalescing struct ethtool_coalesce coversBen Hutchings
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16ethtool: Reformat struct ethtool_coalesce comments into kernel-doc formatBen Hutchings
This reorders and duplicates some wording, but should make no substantive changes. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16net: Fix sungem_phy sharing.David S. Miller
Since sungem_phy is used by multiple, unrelated, drivers make it build as a real module under drivers/net. depmod will pick up the symbol dependency and make sure sungem_phy.ko gets loaded any time sungem.ko or spider_net.ko is loaded. Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16evm: fix build problemsMimi Zohar
- Make the previously missing security_old_inode_init_security() stub function definition static inline. - The stub security_inode_init_security() function previously returned -EOPNOTSUPP and relied on the callers to change it to 0. The stub security/security_old_inode_init_security() functions now return 0. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-08-15block: fix flush machinery for stacking drivers with differring flush flagsJeff Moyer
Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement FLUSH/FUA to support merge, introduced a performance regression when running any sort of fsyncing workload using dm-multipath and certain storage (in our case, an HP EVA). The test I ran was fs_mark, and it dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out that dm-multipath always advertised flush+fua support, and passed commands on down the stack, where those flags used to get stripped off. The above commit changed that behavior: static inline struct request *__elv_next_request(struct request_queue *q) { struct request *rq; while (1) { - while (!list_empty(&q->queue_head)) { + if (!list_empty(&q->queue_head)) { rq = list_entry_rq(q->queue_head.next); - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) || - (rq->cmd_flags & REQ_FLUSH_SEQ)) - return rq; - rq = blk_do_flush(q, rq); - if (rq) - return rq; + return rq; } Note that previously, a command would come in here, have REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush: struct request *blk_do_flush(struct request_queue *q, struct request *rq) { unsigned int fflags = q->flush_flags; /* may change, cache it */ bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA; bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH); bool do_postflush = has_flush && !has_fua && (rq->cmd_flags & REQ_FUA); unsigned skip = 0; ... if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) { rq->cmd_flags &= ~REQ_FLUSH; if (!has_fua) rq->cmd_flags &= ~REQ_FUA; return rq; } So, the flush machinery was bypassed in such cases (q->flush_flags == 0 && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)). Now, however, we don't get into the flush machinery at all. Instead, __elv_next_request just hands a request with flush and fua bits set to the scsi_request_fn, even if the underlying request_queue does not support flush or fua. The agreed upon approach is to fix the flush machinery to allow stacking. While this isn't used in practice (since there is only one request-based dm target, and that target will now reflect the flush flags of the underlying device), it does future-proof the solution, and make it function as designed. In order to make this work, I had to add a field to the struct request, inside the flush structure (to store the original req->end_io). Shaohua had suggested overloading the union with rb_node and completion_data, but the completion data is used by device mapper and can also be used by other drivers. So, I didn't see a way around the additional field. I tested this patch on an HP EVA with both ext4 and xfs, and it recovers the lost performance. Comments and other testers, as always, are appreciated. Cheers, Jeff Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-14net: Move sungem_phy.h under include/linuxDavid S. Miller
Fixes build failures of the spider_net driver because it tries to use a convoluted path to include this header. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: remove unused "ddr" parameter in struct mmc_ios mmc: dw_mmc: Fix DDR mode support. mmc: core: use defined R1_STATE_PRG macro for card status mmc: sdhci: use f_max instead of host->clock for timeouts mmc: sdhci: move timeout_clk calculation farther down mmc: sdhci: check host->clock before using it as a denominator mmc: Revert "mmc: sdhci: Fix SDHCI_QUIRK_TIMEOUT_USES_SDCLK" mmc: tmio: eliminate unused variable 'mmc' warning mmc: esdhc-imx: fix card interrupt loss on freescale eSDHC mmc: sdhci-s3c: Fix build for header change mmc: dw_mmc: Fix mask in IDMAC_SET_BUFFER1_SIZE macro mmc: cb710: fix possible pci_dev leak in cb710_pci_configure() mmc: core: Detect eMMC v4.5 ext_csd entries mmc: mmc_test: avoid stalled file in debugfs mmc: sdhci-s3c: add BROKEN_ADMA_ZEROLEN_DESC quirk mmc: sdhci: pxav3: controller needs 32 bit ADMA addressing mmc: sdhci: fix retuning timer wrongly deleted in sdhci_tasklet_finish
2011-08-14PM / Domains: Fix build for CONFIG_PM_RUNTIME unsetRafael J. Wysocki
Function genpd_queue_power_off_work() is not defined for CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build error to happen in that case. Fix the problem by making pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-08-14sched: Accumulate per-cfs_rq cpu usage and charge against bandwidthPaul Turner
Account bandwidth usage on the cfs_rq level versus the task_groups to which they belong. Whether we are tracking bandwidth on a given cfs_rq is maintained under cfs_rq->runtime_enabled. cfs_rq's which belong to a bandwidth constrained task_group have their runtime accounted via the update_curr() path, which withdraws bandwidth from the global pool as desired. Updates involving the global pool are currently protected under cfs_bandwidth->lock, local runtime is protected by rq->lock. This patch only assigns and tracks quota, no action is taken in the case that cfs_rq->runtime_used exceeds cfs_rq->runtime_assigned. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Nikhil Rao <ncrao@google.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110721184757.179386821@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ASoC: Fix compile warning in wm8750.c ASoC: omap: Update e-mail address of Jarkko Nikula ASoC: SAMSUNG: Add I2S0 internal dma driver ASoC: Terminate WM8750 SPI device ID table ASoC: Add missing break in WM8994 probe ALSA: snd-usb-caiaq: Correct offset fields of outbound iso_frame_desc ALSA: azt3328 - adjust error handling code to include debugging code ALSA: hda - Add CONFIG_SND_HDA_POWER_SAVE to stac_vrefout_set() ALSA: usb-audio - Add quirk for BOSS Micro BR-80 ASoC: Fix typo in wm8750 spi_ids ASoC: Fix warning in Speyside WM8962 ASoC: Fix SPI driver binding for WM8987 ASoC: Fix binding of WM8750 on Jive ASoC: WM8903: Free IRQ on device removal ASoC: Tegra: wm8903 machine driver: Allow re-insertion of module ASoC: Tegra: tegra_pcm_deallocate_dma_buffer: Don't OOPS
2011-08-13mmc: remove unused "ddr" parameter in struct mmc_iosJaehoon Chung
"mmc: dw_mmc: Fix DDR mode support" removed the last user. Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2011-08-13af_iucv: add HiperSockets transportUrsula Braun
The current transport mechanism for af_iucv is the z/VM offered communications facility IUCV. To provide equivalent support when running Linux in an LPAR, HiperSockets transport is added to the AF_IUCV address family. It requires explicit binding of an AF_IUCV socket to a HiperSockets device. A new packet_type ETH_P_AF_IUCV is announced. An af_iucv specific transport header is defined preceding the skb data. A small protocol is implemented for connecting and for flow control/congestion management. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>