summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2011-07-10skbuff: update struct sk_buff members commentsDaniel Baluta
Rearrange struct sk_buff members comments to follow their definition order. Also, add missing comments for ooo_okay and dropcount members. Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-10PM / Domains: Export pm_genpd_poweron() in headerMagnus Damm
Allow SoC-specific code to call pm_genpd_poweron(). Signed-off-by: Magnus Damm <damm@opensource.se> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-09writeback: scale IO chunk size up to half device bandwidthWu Fengguang
Originally, MAX_WRITEBACK_PAGES was hard-coded to 1024 because of a concern of not holding I_SYNC for too long. (At least, that was the comment previously.) This doesn't make sense now because the only time we wait for I_SYNC is if we are calling sync or fsync, and in that case we need to write out all of the data anyway. Previously there may have been other code paths that waited on I_SYNC, but not any more. -- Theodore Ts'o So remove the MAX_WRITEBACK_PAGES constraint. The writeback pages will adapt to as large as the storage device can write within 500ms. XFS is observed to do IO completions in a batch, and the batch size is equal to the write chunk size. To avoid dirty pages to suddenly drop out of balance_dirty_pages()'s dirty control scope and create large fluctuations, the chunk size is also limited to half the control scope. The balance_dirty_pages() control scrope is [(background_thresh + dirty_thresh) / 2, dirty_thresh] which is by default [15%, 20%] of global dirty pages, whose range size is dirty_thresh / DIRTY_FULL_SCOPE. The adpative write chunk size will be rounded to the nearest 4MB boundary. http://bugzilla.kernel.org/show_bug.cgi?id=13930 CC: Theodore Ts'o <tytso@mit.edu> CC: Dave Chinner <david@fromorbit.com> CC: Chris Mason <chris.mason@oracle.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-09writeback: trace global_dirty_stateWu Fengguang
Add trace event balance_dirty_state for showing the global dirty page counts and thresholds at each global_dirty_limits() invocation. This will cover the callers throttle_vm_writeout(), over_bground_thresh() and each balance_dirty_pages() loop. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-09writeback: introduce max-pause and pass-good dirty limitsWu Fengguang
The max-pause limit helps to keep the sleep time inside balance_dirty_pages() within MAX_PAUSE=200ms. The 200ms max sleep means per task rate limit of 8pages/200ms=160KB/s when dirty exceeded, which normally is enough to stop dirtiers from continue pushing the dirty pages high, unless there are a sufficient large number of slow dirtiers (eg. 500 tasks doing 160KB/s will still sum up to 80MB/s, exceeding the write bandwidth of a slow disk and hence accumulating more and more dirty pages). The pass-good limit helps to let go of the good bdi's in the presence of a blocked bdi (ie. NFS server not responding) or slow USB disk which for some reason build up a large number of initial dirty pages that refuse to go away anytime soon. For example, given two bdi's A and B and the initial state bdi_thresh_A = dirty_thresh / 2 bdi_thresh_B = dirty_thresh / 2 bdi_dirty_A = dirty_thresh / 2 bdi_dirty_B = dirty_thresh / 2 Then A get blocked, after a dozen seconds bdi_thresh_A = 0 bdi_thresh_B = dirty_thresh bdi_dirty_A = dirty_thresh / 2 bdi_dirty_B = dirty_thresh / 2 The (bdi_dirty_B < bdi_thresh_B) test is now useless and the dirty pages will be effectively throttled by condition (nr_dirty < dirty_thresh). This has two problems: (1) we lose the protections for light dirtiers (2) balance_dirty_pages() effectively becomes IO-less because the (bdi_nr_reclaimable > bdi_thresh) test won't be true. This is good for IO, but balance_dirty_pages() loses an important way to break out of the loop which leads to more spread out throttle delays. DIRTY_PASSGOOD_AREA can eliminate the above issues. The only problem is, DIRTY_PASSGOOD_AREA needs to be defined as 2 to fully cover the above example while this patch uses the more conservative value 8 so as not to surprise people with too many dirty pages than expected. The max-pause limit won't noticeably impact the speed dirty pages are knocked down when there is a sudden drop of global/bdi dirty thresholds. Because the heavy dirties will be throttled below 160KB/s which is slow enough. It does help to avoid long dirty throttle delays and especially will make light dirtiers more responsive. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-09writeback: introduce smoothed global dirty limitWu Fengguang
The start of a heavy weight application (ie. KVM) may instantly knock down determine_dirtyable_memory() if the swap is not enabled or full. global_dirty_limits() and bdi_dirty_limit() will in turn get global/bdi dirty thresholds that are _much_ lower than the global/bdi dirty pages. balance_dirty_pages() will then heavily throttle all dirtiers including the light ones, until the dirty pages drop below the new dirty thresholds. During this _deep_ dirty-exceeded state, the system may appear rather unresponsive to the users. About "deep" dirty-exceeded: task_dirty_limit() assigns 1/8 lower dirty threshold to heavy dirtiers than light ones, and the dirty pages will be throttled around the heavy dirtiers' dirty threshold and reasonably below the light dirtiers' dirty threshold. In this state, only the heavy dirtiers will be throttled and the dirty pages are carefully controlled to not exceed the light dirtiers' dirty threshold. However if the threshold itself suddenly drops below the number of dirty pages, the light dirtiers will get heavily throttled. So introduce global_dirty_limit for tracking the global dirty threshold with policies - follow downwards slowly - follow up in one shot global_dirty_limit can effectively mask out the impact of sudden drop of dirtyable memory. It will be used in the next patch for two new type of dirty limits. Note that the new dirty limits are not going to avoid throttling the light dirtiers, but could limit their sleep time to 200ms. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-09writeback: bdi write bandwidth estimationWu Fengguang
The estimation value will start from 100MB/s and adapt to the real bandwidth in seconds. It tries to update the bandwidth only when disk is fully utilized. Any inactive period of more than one second will be skipped. The estimated bandwidth will be reflecting how fast the device can writeout when _fully utilized_, and won't drop to 0 when it goes idle. The value will remain constant at disk idle time. At busy write time, if not considering fluctuations, it will also remain high unless be knocked down by possible concurrent reads that compete for the disk time and bandwidth with async writes. The estimation is not done purely in the flusher because there is no guarantee for write_cache_pages() to return timely to update bandwidth. The bdi->avg_write_bandwidth smoothing is very effective for filtering out sudden spikes, however may be a little biased in long term. The overheads are low because the bdi bandwidth update only occurs at 200ms intervals. The 200ms update interval is suitable, because it's not possible to get the real bandwidth for the instance at all, due to large fluctuations. The NFS commits can be as large as seconds worth of data. One XFS completion may be as large as half second worth of data if we are going to increase the write chunk to half second worth of data. In ext4, fluctuations with time period of around 5 seconds is observed. And there is another pattern of irregular periods of up to 20 seconds on SSD tests. That's why we are not only doing the estimation at 200ms intervals, but also averaging them over a period of 3 seconds and then go further to do another level of smoothing in avg_write_bandwidth. CC: Li Shaohua <shaohua.li@intel.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-09writeback: account per-bdi accumulated written pagesJan Kara
Introduce the BDI_WRITTEN counter. It will be used for estimating the bdi's write bandwidth. Peter Zijlstra <a.p.zijlstra@chello.nl>: Move BDI_WRITTEN accounting into __bdi_writeout_inc(). This will cover and fix fuse, which only calls bdi_writeout_inc(). CC: Michael Rubin <mrubin@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-09writeback: make writeback_control.nr_to_write straightWu Fengguang
Pass struct wb_writeback_work all the way down to writeback_sb_inodes(), and initialize the struct writeback_control there. struct writeback_control is basically designed to control writeback of a single file, but we keep abuse it for writing multiple files in writeback_sb_inodes() and its callers. It immediately clean things up, e.g. suddenly wbc.nr_to_write vs work->nr_pages starts to make sense, and instead of saving and restoring pages_skipped in writeback_sb_inodes it can always start with a clean zero value. It also makes a neat IO pattern change: large dirty files are now written in the full 4MB writeback chunk size, rather than whatever remained quota in wbc->nr_to_write. Acked-by: Jan Kara <jack@suse.cz> Proposed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-07-08w1: ds1wm: add a reset recovery parameterJean-François Dagenais
This fixes a regression in 3.0 reported by Paul Parsons regarding the removal of the msleep(1) in the ds1wm_reset() function: : The linux-3.0-rc4 DS1WM 1-wire driver is logging "bus error, retrying" : error messages on an HP iPAQ hx4700 PDA (XScale-PXA270): : : <snip> : Driver for 1-wire Dallas network protocol. : DS1WM w1 busmaster driver - (c) 2004 Szabolcs Gyurko : 1-Wire driver for the DS2760 battery monitor chip - (c) 2004-2005, Szabolcs Gyurko : ds1wm ds1wm: pass: 1 bus error, retrying : ds1wm ds1wm: pass: 2 bus error, retrying : ds1wm ds1wm: pass: 3 bus error, retrying : ds1wm ds1wm: pass: 4 bus error, retrying : ds1wm ds1wm: pass: 5 bus error, retrying : ... : : The visible result is that the battery charging LED is erratic; sometimes : it works, mostly it doesn't. : : The linux-2.6.39 DS1WM 1-wire driver worked OK. I haven't tried 3.0-rc1, : 3.0-rc2, or 3.0-rc3. This sleep should not be required on normal circuitry provided the pull-ups on the bus are correctly adapted to the slaves. Unfortunately, this is not always the case. The sleep is restored but as a parameter to the probe function in the pdata. [akpm@linux-foundation.org: coding-style fixes] Reported-by: Paul Parsons <lost.distance@yahoo.com> Tested-by: Paul Parsons <lost.distance@yahoo.com> Signed-off-by: Jean-François Dagenais <dagenaisj@sonatest.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-08Merge branch 'for-next' of ↵Greg Kroah-Hartman
master.kernel.org:/pub/scm/linux/kernel/git/balbi/usb into usb-next * 'for-next' of master.kernel.org:/pub/scm/linux/kernel/git/balbi/usb: usb: gadget: m66592-udc: add pullup function usb: gadget: m66592-udc: add function for external controller usb: gadget: r8a66597-udc: add pullup function usb: gadget: zero: add superspeed support usb: gadget: add SS descriptors to Ethernet gadget usb: gadget: r8a66597-udc: add support for TEST_MODE usb: gadget: m66592-udc: add support for TEST_MODE usb: gadget: r8a66597-udc: Make BUSWAIT configurable through platform data usb: gadget: r8a66597-udc: fix cannot connect after rmmod gadget driver usb: update email address in r8a66597-udc and m66592-udc usb: musb: restore INDEX register in resume path usb: gadget: fix up depencies usb: gadget: fusb300_udc: fix compile warnings usb: gadget: ci13xx_udc.c: fix compile warning usb: gadget: net2272: fix compile warnings usb: gadget: langwell_udc: fix compile warnings usb: gadget: fusb300_udc: drop dead code
2011-07-08amba pl011: workaround for uart registers lockupShreshtha Kumar Sahu
This workaround aims to break the deadlock situation which raises during continuous transfer of data for long duration over uart with hardware flow control. It is observed that CTS interrupt cannot be cleared in uart interrupt register (ICR). Hence further transfer over uart gets blocked. It is seen that during such deadlock condition ICR don't get cleared even on multiple write. This leads pass_counter to decrease and finally reach zero. This can be taken as trigger point to run this UART_BT_WA. Workaround backups the register configuration, does soft reset of UART using BIT-0 of PRCC_K_SOFTRST_SET/CLEAR registers and restores the registers. This patch also provides support for uart init and exit function calls if present. Signed-off-by: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-09usb: gadget: m66592-udc: add function for external controllerYoshihiro Shimoda
M66592 has the pin of WR0 and WR1. So, if one write-pin of CPU connects to the pins, we have to change the setting of FIFOSEL register in the controller. If we don't change the setting, the controller cannot send the data of odd length. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-07-08usb: update email address in ohci-sh and r8a66597-hcdYoshihiro Shimoda
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-08usb: r8a66597-hcd: add function for external controllerYoshihiro Shimoda
R8A66597 has the pin of WR0 and WR1. So, if one write-pin of CPU connects to the pins, we have to change the setting of FIFOSEL register in the controller. If we don't change the setting, the controller cannot send the data of odd length. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-08Bluetooth: Add support for returning the encryption key sizeVinicius Costa Gomes
This will be useful when userspace wants to restrict some kinds of operations based on the length of the key size used to encrypt the link. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-07-08Bluetooth: Add support for storing the key sizeVinicius Costa Gomes
In some cases it will be useful having the key size used for encrypting the link. For example, some profiles may restrict some operations depending on the key length. The key size is stored in the key that is passed to userspace using the pin_length field in the key structure. For now this field is only valid for LE controllers. 3.0+HS controllers define the Read Encryption Key Size command, this field is intended for storing the value returned by that command. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-07-08Bluetooth: Remove unused field in hci_connVinicius Costa Gomes
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-07-08Bluetooth: Add functions to manipulate the link key list for SMPVinicius Costa Gomes
As the LTK (the new type of key being handled now) has more data associated with it, we need to store this extra data and retrieve the keys based on that data. Methods for searching for a key and for adding a new LTK are introduced here. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-07-08Bluetooth: Add new structures for supporting SM key distributionVinicius Costa Gomes
We need these changes because SMP keys may have more information associated with them, for example, in the LTK case, it has an encrypted diversifier (ediv) and a random number (rand). Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-07-08rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_checkMichal Hocko
Since ca5ecddf (rcu: define __rcu address space modifier for sparse) rcu_dereference_check use rcu_read_lock_held as a part of condition automatically so callers do not have to do that as well. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-07-08Bluetooth: Add support for SMP phase 3 (key distribution)Vinicius Costa Gomes
This adds support for generating and distributing all the keys specified in the third phase of SMP. This will make possible to re-establish secure connections, resolve private addresses and sign commands. For now, the values generated are random. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-07-08Merge branch 'xen-tmem-selfballoon-v8' of ↵Konrad Rzeszutek Wilk
git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem into stable/drivers * 'xen-tmem-selfballoon-v8' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem: xen: tmem: self-ballooning and frontswap-selfshrinking
2011-07-08dt: add empty of_property_read_u32[_array] for non-dtShawn Guo
The patch adds empty functions of_property_read_u32 and of_property_read_u32_array for non-dt build, so that drivers migrating to dt can save some '#ifdef CONFIG_OF'. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> [grant.likely: Moved things around so only one new static inline is needed] [grant.likely: Added _string variant] Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-08xen: tmem: self-ballooning and frontswap-selfshrinkingDan Magenheimer
This patch introduces two in-kernel drivers for Xen transcendent memory ("tmem") functionality that complement cleancache and frontswap. Both use control theory to dynamically adjust and optimize memory utilization. Selfballooning controls the in-kernel Xen balloon driver, targeting a goal value (vm_committed_as), thus pushing less frequently used clean page cache pages (through the cleancache code) into Xen tmem where Xen can balance needs across all VMs residing on the physical machine. Frontswap-selfshrinking controls the number of pages in frontswap, driving it towards zero (effectively doing a partial swapoff) when in-kernel memory pressure subsides, freeing up RAM for other VMs. More detail is provided in the header comment of xen-selfballooning.c. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> [v8: konrad.wilk@oracle.com: set default enablement depending on frontswap] [v7: konrad.wilk@oracle.com: fix capitalization and punctuation in comments] [v6: fix frontswap-selfshrinking initialization] [v6: konrad.wilk@oracle.com: fix init pr_infos; add comments about swap] [v5: konrad.wilk@oracle.com: add NULL to attr list; move inits up to decls] [v4: dkiper@net-space.pl: use strict_strtoul plus a few syntactic nits] [v3: konrad.wilk@oracle.com: fix potential divides-by-zero] [v3: konrad.wilk@oracle.com: add many more comments, fix nits] [v2: rebased to linux-3.0-rc1] [v2: Ian.Campbell@citrix.com: reorganize as new file (xen-selfballoon.c)] [v2: dkiper@net-space.pl: proper access to vm_committed_as] [v2: dkiper@net-space.pl: accounting fixes] Cc: Jan Beulich <JBeulich@novell.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: <xen-devel@lists.xensource.com>
2011-07-08sctp: ABORT if receive, reassmbly, or reodering queue is not empty while ↵Thomas Graf
closing socket Trigger user ABORT if application closes a socket which has data queued on the socket receive queue or chunks waiting on the reassembly or ordering queue as this would imply data being lost which defeats the point of a graceful shutdown. This behavior is already practiced in TCP. We do not check the input queue because that would mean to parse all chunks on it to look for unacknowledged data which seems too much of an effort. Control chunks or duplicated chunks may also be in the input queue and should not be stopping a graceful shutdown. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-08cfg80211: return -ENOENT when stopping sched_scan while not runningLuciano Coelho
If we try to stop a scheduled scan while it is not running, we should return -ENOENT instead of simply ignoring the command and returning success. This is more consistent with other parts of the code. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Luciano Coelho <coelho@ti.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-08mac80211: allow driver to generate P1K for IV32Johannes Berg
In order to support pre-populating the P1K cache in iwlwifi hardware for WoWLAN, we need to calculate the P1K for the current IV32. Allow drivers to get the P1K for any given IV32 instead of for a given packet, but keep the packet-based version around as an inline. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-08mac80211: allow drivers to access key sequence counterJohannes Berg
In order to implement GTK rekeying, the device needs to be able to encrypt frames with the right PN/IV and check the PN/IV in RX frames. To be able to tell it about all those counters, we need to be able to get them from mac80211, this adds the required API. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-08mac80211: fix TKIP races, make API easier to useJohannes Berg
Our current TKIP code races against itself on TX since we can process multiple packets at the same time on different ACs, but they all share the TX context for TKIP. This can lead to bad IVs etc. Also, the crypto offload helper code just obtains the P1K/P2K from the cache, and can update it as well, but there's no guarantee that packets are really processed in order. To fix these issues, first introduce a spinlock that will protect the IV16/IV32 values in the TX context. This first step makes sure that we don't assign the same IV multiple times or get confused in other ways. Secondly, change the way the P1K cache works. I add a field "p1k_iv32" that stores the value of the IV32 when the P1K was last recomputed, and if different from the last time, then a new P1K is recomputed. This can cause the P1K computation to flip back and forth if packets are processed out of order. All this also happens under the new spinlock. Finally, because there are argument differences, split up the ieee80211_get_tkip_key() API into ieee80211_get_tkip_p1k() and ieee80211_get_tkip_p2k() and give them the correct arguments. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-07-08Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
2011-07-08plist: Remove the need to supply locks to plist headsDima Zavin
This was legacy code brought over from the RT tree and is no longer necessary. Signed-off-by: Dima Zavin <dima@android.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Walker <dwalker@codeaurora.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1310084879-10351-2-git-send-email-dima@android.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-08usb: gadget: r8a66597-udc: Make BUSWAIT configurable through platform dataYoshihiro Shimoda
BUSWAIT is a 4-bit-wide value that controls the number of access waits from the CPU to on-chip USB module. b'0000 inserts 0 wait (2 access cycles) and b'1111 inserts 15 waits (17 access cycles, hardware initial value), respectively. BUSWAIT value depends on peripheral clock frequency supplied to on-chip of each CPU, hence should be configurable through platform data. Note that this patch assumes that b'0000 (0 wait, 2 access cycles) is rerely used and considered as invalid. If valid 'buswait' data is not provided by platform, initial b'1111 (15 waits, 17 access cycles) will be applied as a safe default. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-07-08block: document blk_plug list accessShaohua Li
I'm often confused why not disable preempt when changing blk_plug list. It would be better to add comments here in case others have the similar concerns. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-08block: avoid building too big plug listShaohua Li
When I test fio script with big I/O depth, I found the total throughput drops compared to some relative small I/O depth. The reason is the thread accumulates big requests in its plug list and causes some delays (surely this depends on CPU speed). I thought we'd better have a threshold for requests. When a threshold reaches, this means there is no request merge and queue lock contention isn't severe when pushing per-task requests to queue, so the main advantages of blk plug don't exist. We can force a plug list flush in this case. With this, my test throughput actually increases and almost equals to small I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list() for big I/O depth. The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to reduce lock contention to me. But I'm open here, 32 is ok in my test too. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-08driver core: Add ability for arch code to setup pdev_archdataKumar Gala
On some architectures we need to setup pdev_archdata before we add the device. Waiting til a bus_notifier is too late since we might need the pdev_archdata in the bus notifier. One example is setting up of dma_mask pointers such that it can be used in a bus_notifier. We add weak noop version of arch_setup_pdev_archdata() and allow the arch code to override with access the full definitions of struct device, struct platform_device, and struct pdev_archdata. Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-07-08drivers/virt: introduce Freescale hypervisor management driverTimur Tabi
Add the drivers/virt directory, which houses drivers that support virtualization environments, and add the Freescale hypervisor management driver. The Freescale hypervisor management driver provides several services to drivers and applications related to the Freescale hypervisor: 1. An ioctl interface for querying and managing partitions 2. A file interface to reading incoming doorbells 3. An interrupt handler for shutting down the partition upon receiving the shutdown doorbell from a manager partition 4. A kernel interface for receiving callbacks when a managed partition shuts down. Signed-off-by: Timur Tabi <timur@freescale.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-07-07sctp: Enforce retransmission limit during shutdownThomas Graf
When initiating a graceful shutdown while having data chunks on the retransmission queue with a peer which is in zero window mode the shutdown is never completed because the retransmission error count is reset periodically by the following two rules: - Do not timeout association while doing zero window probe. - Reset overall error count when a heartbeat request has been acknowledged. The graceful shutdown will wait for all outstanding TSN to be acknowledged before sending the SHUTDOWN request. This never happens due to the peer's zero window not acknowledging the continuously retransmitted data chunks. Although the error counter is incremented for each failed retransmission, the receiving of the SACK announcing the zero window clears the error count again immediately. Also heartbeat requests continue to be sent periodically. The peer acknowledges these requests causing the error counter to be reset as well. This patch changes behaviour to only reset the overall error counter for the above rules while not in shutdown. After reaching the maximum number of retransmission attempts, the T5 shutdown guard timer is scheduled to give the receiver some additional time to recover. The timer is stopped as soon as the receiver acknowledges any data. The issue can be easily reproduced by establishing a sctp association over the loopback device, constantly queueing data at the sender while not reading any at the receiver. Wait for the window to reach zero, then initiate a shutdown by killing both processes simultaneously. The association will never be freed and the chunks on the retransmission queue will be retransmitted indefinitely. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-07Merge branch 'drm-intel-fixes' into drm-intel-nextKeith Packard
2011-07-07Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-block: drbd: we should write meta data updates with FLUSH FUA drbd: fix limit define, we support 1 PiByte now drbd: when receive times out on meta socket, also check last receive time on data socket drbd: account bitmap IO during resync as resync-(related-)-io drbd: don't cond_resched_lock with IRQs disabled drbd: add missing spinlock to bitmap receive drbd: Use the correct max_bio_size when creating resync requests cfq-iosched: make code consistent cfq-iosched: fix a rcu warning
2011-07-07FS-Cache: Add a helper to bulk uncache pages on an inodeDavid Howells
Add an FS-Cache helper to bulk uncache pages on an inode. This will only work for the circumstance where the pages in the cache correspond 1:1 with the pages attached to an inode's page cache. This is required for CIFS and NFS: When disabling inode cookie, we were returning the cookie and setting cifsi->fscache to NULL but failed to invalidate any previously mapped pages. This resulted in "Bad page state" errors and manifested in other kind of errors when running fsstress. Fix it by uncaching mapped pages when we disable the inode cookie. This patch should fix the following oops and "Bad page state" errors seen during fsstress testing. ------------[ cut here ]------------ kernel BUG at fs/cachefiles/namei.c:201! invalid opcode: 0000 [#1] SMP Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles] RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282 RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282 RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300 R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840 R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0 FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0) Stack: 0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00 ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380 ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56 Call Trace: cachefiles_lookup_object+0x78/0xd4 [cachefiles] fscache_lookup_object+0x131/0x16d [fscache] fscache_object_work_func+0x1bc/0x669 [fscache] process_one_work+0x186/0x298 worker_thread+0xda/0x15d kthread+0x84/0x8c kernel_thread_helper+0x4/0x10 RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles] ---[ end trace 1d481c9af1804caa ]--- I tested the uncaching by the following means: (1) Create a big file on my NFS server (104857600 bytes). (2) Read the file into the cache with md5sum on the NFS client. Look in /proc/fs/fscache/stats: Pages : mrk=25601 unc=0 (3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc again: Pages : mrk=25601 unc=25601 Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-07Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds
'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: debugobjects: Fix boot crash when kmemleak and debugobjects enabled * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: jump_label: Fix jump_label update for modules oprofile, x86: Fix race in nmi handler while starting counters * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Disable (revert) SCHED_LOAD_SCALE increase sched, cgroups: Fix MIN_SHARES on 64-bit boxen
2011-07-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits) sctp: fix missing send up SCTP_SENDER_DRY_EVENT when subscribe it net: refine {udp|tcp|sctp}_mem limits vmxnet3: round down # of queues to power of two net: sh_eth: fix the parameter for the ETHER of SH7757 net: sh_eth: fix cannot work half-duplex mode net: vlan: enable soft features regardless of underlying device vmxnet3: fix starving rx ring whenoc_skb kb fails bridge: Always flood broadcast packets greth: greth_set_mac_add would corrupt the MAC address. net: bind() fix error return on wrong address family natsemi: silence dma-debug warnings net: 8139too: Initial necessary vlan_features to support vlan Fix call trace when interrupts are disabled while sleeping function kzalloc is called qlge:Version change to v1.00.00.29 qlge: Fix printk priority so chip fatal errors are always reported. qlge:Fix crash caused by mailbox execution on wedged chip. xfrm4: Don't call icmp_send on local error ipv4: Don't use ufo handling on later transformed packets xfrm: Remove family arg from xfrm_bundle_ok ipv6: Don't put artificial limit on routing table size. ...
2011-07-07slub: Add method to verify memory is not freedBen Greear
This is for tracking down suspect memory usage. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-07-07Bluetooth: Remove L2CAP busy queueMat Martineau
The ERTM receive buffer is now handled in a way that does not require the busy queue and the associated polling code. Signed-off-by: Mat Martineau <mathewm@codeaurora.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-07-07Bluetooth: Use event-driven approach for handling ERTM receive bufferMat Martineau
This change moves most L2CAP ERTM receive buffer handling out of the L2CAP core and in to the socket code. It's up to the higher layer (the socket code, in this case) to tell the core when its buffer is full or has space available. The recv op should always accept incoming ERTM data or else the connection will go down. Within the socket layer, an skb that does not fit in the socket receive buffer will be temporarily stored. When the socket is read from, that skb will be placed in the receive buffer if possible. Once adequate buffer space becomes available, the L2CAP core is informed and the ERTM local busy state is cleared. Receive buffer management for non-ERTM modes is unchanged. Signed-off-by: Mat Martineau <mathewm@codeaurora.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-07-07asm-generic: adapt delay.h to common implementationJonas Bonn
Several architectures are using a common delay.h implementation that appears to have originated with the x86 architecture. This common implementation is a bit fuller than the current asm-generic version and has some compile-time checks that should be interesting for all architectures. This patch takes the common delay.h version and replaces the rather trivial asm-generic version with it. As no architecture was actually using asm-generic/delay.h, this change is rather innocuous; it will, however, allow us to switch at least four architectures over to using the asm-generic version. Signed-off-by: Jonas Bonn <jonas@southpole.se> Acked-by: Arnd Bergmann <arnd@arndb.de>
2011-07-07[media] tuner-core/v4l2-subdev: document that the type field has to be filled inHans Verkuil
The tuner ops g_frequency, g_tuner and s_tuner require that the tuner type field is filled in. Document this. The tuner-core doc is based on a patch from Mauro Carvalho Chehab <mchehab@redhat.com>. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-07-07slab allocators: Provide generic description of alignment definesChristoph Lameter
Provide description for alignment defines. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-07-07[media] v4l2-subdev.h: remove unused s_mode tuner opHans Verkuil
s_mode is no longer used, so remove it. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>