summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-05-26mfd: Use mfd cell platform_data for ab3100 cells platform bitsSamuel Ortiz
With the addition of a platform device mfd_cell pointer, MFD drivers can go back to passing platform data back to their sub drivers. This allows for an mfd_cell->mfd_data removal and thus keep the sub drivers MFD agnostic. This is mostly needed for non MFD aware sub drivers. Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-05-26mfd: Use mfd cell platform_data for ab3550 cells platform bitsSamuel Ortiz
With the addition of a platform device mfd_cell pointer, MFD drivers can go back to passing platform data back to their sub drivers. This allows for an mfd_cell->mfd_data removal and thus keep the sub drivers MFD agnostic. This is mostly needed for non MFD aware sub drivers. Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-05-26mfd: Add platform data pointer backSamuel Ortiz
Now that we have a way to pass MFD cells down to the sub drivers, we can gradually get rid of mfd_data by putting the platform pointer back in place. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-05-26x86, amd: Do not enable ARAT feature on AMD processors below family 0x12Boris Ostrovsky
Commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 added support for ARAT (Always Running APIC timer) on AMD processors that are not affected by erratum 400. This erratum is present on certain processor families and prevents APIC timer from waking up the CPU when it is in a deep C state, including C1E state. Determining whether a processor is affected by this erratum may have some corner cases and handling these cases is somewhat complicated. In the interest of simplicity we won't claim ARAT support on processor families below 0x12 and will go back to broadcasting timer when going idle. Signed-off-by: Boris Ostrovsky <ostr@amd64.org> Link: http://lkml.kernel.org/r/1306423192-19774-1-git-send-email-ostr@amd64.org Tested-by: Boris Petkov <borislav.petkov@amd.com> Cc: Hans Rosenfeld <Hans.Rosenfeld@amd.com> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: stable@kernel.org # 32.x, 38.x, 39.x Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-26netfilter: Fix several warnings in compat_mtw_from_user().David Miller
Kill set but not used 'entry_offset'. Add a default case to the switch statement so the compiler can see that we always initialize off and size_kern before using them. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-26netfilter: ipset: fix ip_set_flush return codeJozsef Kadlecsik
ip_set_flush returned -EPROTO instead of -IPSET_ERR_PROTOCOL, fixed Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-26netfilter: ipset: remove unused variable from type_pf_tdel()Jozsef Kadlecsik
Variable 'ret' is set in type_pf_tdel() but not used, remove. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-26netfilter: ipset: Use proper timeout value to jiffies conversionJozsef Kadlecsik
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-26Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (61 commits) jbd2: Add MAINTAINERS entry jbd2: fix a potential leak of a journal_head on an error path ext4: teach ext4_ext_split to calculate extents efficiently ext4: Convert ext4 to new truncate calling convention ext4: do not normalize block requests from fallocate() ext4: enable "punch hole" functionality ext4: add "punch hole" flag to ext4_map_blocks() ext4: punch out extents ext4: add new function ext4_block_zero_page_range() ext4: add flag to ext4_has_free_blocks ext4: reserve inodes and feature code for 'quota' feature ext4: add support for multiple mount protection ext4: ensure f_bfree returned by ext4_statfs() is non-negative ext4: protect bb_first_free in ext4_trim_all_free() with group lock ext4: only load buddy bitmap in ext4_trim_fs() when it is needed jbd2: Fix comment to match the code in jbd2__journal_start() ext4: fix waiting and sending of a barrier in ext4_sync_file() jbd2: Add function jbd2_trans_will_send_data_barrier() jbd2: fix sending of data flush on journal commit ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly ...
2011-05-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (25 commits) cifs: remove unnecessary dentry_unhash on rmdir/rename_dir ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir exofs: remove unnecessary dentry_unhash on rmdir/rename_dir nfs: remove unnecessary dentry_unhash on rmdir/rename_dir ext2: remove unnecessary dentry_unhash on rmdir/rename_dir ext3: remove unnecessary dentry_unhash on rmdir/rename_dir ext4: remove unnecessary dentry_unhash on rmdir/rename_dir btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir ceph: remove unnecessary dentry_unhash calls vfs: clean up vfs_rename_other vfs: clean up vfs_rename_dir vfs: clean up vfs_rmdir vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems libfs: drop unneeded dentry_unhash vfs: update dentry_unhash() comment vfs: push dentry_unhash on rename_dir into file systems vfs: push dentry_unhash on rmdir into file systems vfs: remove dget() from dentry_unhash() vfs: dentry_unhash immediately prior to rmdir vfs: Block mmapped writes while the fs is frozen ...
2011-05-26rcu: Decrease memory-barrier usage based on semi-formal proofPaul E. McKenney
(Note: this was reverted, and is now being re-applied in pieces, with this being the fifth and final piece. See below for the reason that it is now felt to be safe to re-apply this.) Commit d09b62d fixed grace-period synchronization, but left some smp_mb() invocations in rcu_process_callbacks() that are no longer needed, but sheer paranoia prevented them from being removed. This commit removes them and provides a proof of correctness in their absence. It also adds a memory barrier to rcu_report_qs_rsp() immediately before the update to rsp->completed in order to handle the theoretical possibility that the compiler or CPU might move massive quantities of code into a lock-based critical section. This also proves that the sheer paranoia was not entirely unjustified, at least from a theoretical point of view. In addition, the old dyntick-idle synchronization depended on the fact that grace periods were many milliseconds in duration, so that it could be assumed that no dyntick-idle CPU could reorder a memory reference across an entire grace period. Unfortunately for this design, the addition of expedited grace periods breaks this assumption, which has the unfortunate side-effect of requiring atomic operations in the functions that track dyntick-idle state for RCU. (There is some hope that the algorithms used in user-level RCU might be applied here, but some work is required to handle the NMIs that user-space applications can happily ignore. For the short term, better safe than sorry.) This proof assumes that neither compiler nor CPU will allow a lock acquisition and release to be reordered, as doing so can result in deadlock. The proof is as follows: 1. A given CPU declares a quiescent state under the protection of its leaf rcu_node's lock. 2. If there is more than one level of rcu_node hierarchy, the last CPU to declare a quiescent state will also acquire the ->lock of the next rcu_node up in the hierarchy, but only after releasing the lower level's lock. The acquisition of this lock clearly cannot occur prior to the acquisition of the leaf node's lock. 3. Step 2 repeats until we reach the root rcu_node structure. Please note again that only one lock is held at a time through this process. The acquisition of the root rcu_node's ->lock must occur after the release of that of the leaf rcu_node. 4. At this point, we set the ->completed field in the rcu_state structure in rcu_report_qs_rsp(). However, if the rcu_node hierarchy contains only one rcu_node, then in theory the code preceding the quiescent state could leak into the critical section. We therefore precede the update of ->completed with a memory barrier. All CPUs will therefore agree that any updates preceding any report of a quiescent state will have happened before the update of ->completed. 5. Regardless of whether a new grace period is needed, rcu_start_gp() will propagate the new value of ->completed to all of the leaf rcu_node structures, under the protection of each rcu_node's ->lock. If a new grace period is needed immediately, this propagation will occur in the same critical section that ->completed was set in, but courtesy of the memory barrier in #4 above, is still seen to follow any pre-quiescent-state activity. 6. When a given CPU invokes __rcu_process_gp_end(), it becomes aware of the end of the old grace period and therefore makes any RCU callbacks that were waiting on that grace period eligible for invocation. If this CPU is the same one that detected the end of the grace period, and if there is but a single rcu_node in the hierarchy, we will still be in the single critical section. In this case, the memory barrier in step #4 guarantees that all callbacks will be seen to execute after each CPU's quiescent state. On the other hand, if this is a different CPU, it will acquire the leaf rcu_node's ->lock, and will again be serialized after each CPU's quiescent state for the old grace period. On the strength of this proof, this commit therefore removes the memory barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp(). The effect is to reduce the number of memory barriers by one and to reduce the frequency of execution from about once per scheduling tick per CPU to once per grace period. This was reverted do to hangs found during testing by Yinghai Lu and Ingo Molnar. Frederic Weisbecker supplied Yinghai with tracing that located the underlying problem, and Frederic also provided the fix. The underlying problem was that the HARDIRQ_ENTER() macro from lib/locking-selftest.c invoked irq_enter(), which in turn invokes rcu_irq_enter(), but HARDIRQ_EXIT() invoked __irq_exit(), which does not invoke rcu_irq_exit(). This situation resulted in calls to rcu_irq_enter() that were not balanced by the required calls to rcu_irq_exit(). Therefore, after these locking selftests completed, RCU's dyntick-idle nesting count was a large number (for example, 72), which caused RCU to to conclude that the affected CPU was not in dyntick-idle mode when in fact it was. RCU would therefore incorrectly wait for this dyntick-idle CPU, resulting in hangs. In contrast, with Frederic's patch, which replaces the irq_enter() in HARDIRQ_ENTER() with an __irq_enter(), these tests don't ever call either rcu_irq_enter() or rcu_irq_exit(), which works because the CPU running the test is already marked as not being in dyntick-idle mode. This means that the rcu_irq_enter() and rcu_irq_exit() calls and RCU then has no problem working out which CPUs are in dyntick-idle mode and which are not. The reason that the imbalance was not noticed before the barrier patch was applied is that the old implementation of rcu_enter_nohz() ignored the nesting depth. This could still result in delays, but much shorter ones. Whenever there was a delay, RCU would IPI the CPU with the unbalanced nesting level, which would eventually result in rcu_enter_nohz() being called, which in turn would force RCU to see that the CPU was in dyntick-idle mode. The reason that very few people noticed the problem is that the mismatched irq_enter() vs. __irq_exit() occured only when the kernel was built with CONFIG_DEBUG_LOCKING_API_SELFTESTS. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-26rcu: Make rcu_enter_nohz() pay attention to nestingPaul E. McKenney
The old version of rcu_enter_nohz() forced RCU into nohz mode even if the nesting count was non-zero. This change causes rcu_enter_nohz() to hold off for non-zero nesting counts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-26rcu: Don't do reschedule unless in irqPaul E. McKenney
Condition the set_need_resched() in rcu_irq_exit() on in_irq(). This should be a no-op, because rcu_irq_exit() should only be called from irq. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-26rcu: Remove old memory barriers from rcu_process_callbacks()Paul E. McKenney
Second step of partitioning of commit e59fb3120b. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-26rcu: Add memory barriersPaul E. McKenney
Add the memory barriers added by e59fb3120b. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-26rcu: Fix unpaired rcu_irq_enter() from locking selftestsFrederic Weisbecker
HARDIRQ_ENTER() maps to irq_enter() which calls rcu_irq_enter(). But HARDIRQ_EXIT() maps to __irq_exit() which doesn't call rcu_irq_exit(). So for every locking selftest that simulates hardirq disabled, we create an imbalance in the rcu extended quiescent state internal state. As a result, after the first missing rcu_irq_exit(), subsequent irqs won't exit dyntick-idle mode after leaving the interrupt handler. This means that RCU won't see the affected CPU as being in an extended quiescent state, resulting in long grace-period delays (as in grace periods extending for hours). To fix this, just use __irq_enter() to simulate the hardirq context. This is sufficient for the locking selftests as we don't need to exit any extended quiescent state or perform any check that irqs normally do when they wake up from idle. As a side effect, this patch makes it possible to restore "rcu: Decrease memory-barrier usage based on semi-formal proof", which eventually helped finding this bug. Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stable <stable@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-26mm: don't access vm_flags as 'int'KOSAKI Motohiro
The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor 'unsigned int'. This patch fixes such misuse. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [ Changed to use a typedef - we'll extend it to cover more cases later, since there has been discussion about making it a 64-bit type.. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26xen: cleancache shim to Xen Transcendent MemoryDan Magenheimer
This patch provides a shim between the kernel-internal cleancache API (see Documentation/mm/cleancache.txt) and the Xen Transcendent Memory ABI (see http://oss.oracle.com/projects/tmem). Xen tmem provides "hypervisor RAM" as an ephemeral page-oriented pseudo-RAM store for cleancache pages, shared cleancache pages, and frontswap pages. Tmem provides enterprise-quality concurrency, full save/restore and live migration support, compression and deduplication. A presentation showing up to 8% faster performance and up to 52% reduction in sectors read on a kernel compile workload, despite aggressive in-kernel page reclamation ("self-ballooning") can be found at: http://oss.oracle.com/projects/tmem/dist/documentation/presentations/TranscendentMemoryXenSummit2010.pdf Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-26ocfs2: add cleancache supportDan Magenheimer
This eighth patch of eight in this cleancache series "opts-in" cleancache for ocfs2. Clustered filesystems must explicitly enable cleancache by calling cleancache_init_shared_fs anytime an instance of the filesystem is mounted. Ocfs2 is currently the only user of the clustered filesystem interface but nevertheless, the cleancache hooks in the VFS layer are sufficient for ocfs2 including the matching cleancache_flush_fs hook which must be called on unmount. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v8: trivial merge conflict update] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Tso <tytso@mit.edu> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-26ext4: add cleancache supportDan Magenheimer
This seventh patch of eight in this cleancache series "opts-in" cleancache for ext4. Filesystems must explicitly enable cleancache by calling cleancache_init_fs anytime an instance of the filesystem is mounted. For ext4, all other cleancache hooks are in the VFS layer including the matching cleancache_flush_fs hook which must be called on unmount. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v6-v8: no changes] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-26btrfs: add cleancache supportDan Magenheimer
This sixth patch of eight in this cleancache series "opts-in" cleancache for btrfs. Filesystems must explicitly enable cleancache by calling cleancache_init_fs anytime an instance of the filesystem is mounted. Btrfs uses its own readpage which must be hooked, but all other cleancache hooks are in the VFS layer including the matching cleancache_flush_fs hook which must be called on unmount. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v6-v8: no changes] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-26ext3: add cleancache supportDan Magenheimer
This fifth patch of eight in this cleancache series "opts-in" cleancache for ext3. Filesystems must explicitly enable cleancache by calling cleancache_init_fs anytime an instance of the filesystem is mounted. For ext3, all other cleancache hooks are in the VFS layer including the matching cleancache_flush_fs hook which must be called on unmount. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v6-v8: no changes] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-26mm/fs: add hooks to support cleancacheDan Magenheimer
This fourth patch of eight in this cleancache series provides the core hooks in VFS for: initializing cleancache per filesystem; capturing clean pages reclaimed by page cache; attempting to get pages from cleancache before filesystem read; and ensuring coherency between pagecache, disk, and cleancache. Note that the placement of these hooks was stable from 2.6.18 to 2.6.38; a minor semantic change was required due to a patchset in 2.6.39. All hooks become no-ops if CONFIG_CLEANCACHE is unset, or become a check of a boolean global if CONFIG_CLEANCACHE is set but no cleancache "backend" has claimed cleancache_ops. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v8: minchan.kim@gmail.com: adapt to new remove_from_page_cache function] Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-26mm: cleancache core ops functions and configDan Magenheimer
This third patch of eight in this cleancache series provides the core code for cleancache that interfaces between the hooks in VFS and individual filesystems and a cleancache backend. It also includes build and config patches. Two new files are added: mm/cleancache.c and include/linux/cleancache.h. Note that CONFIG_CLEANCACHE can default to on; in systems that do not provide a cleancache backend, all hooks devolve to a simple check of a global enable flag, so performance impact should be negligible but can be reduced to zero impact if config'ed off. However for this first commit, it defaults to off. Details and a FAQ can be found in Documentation/vm/cleancache.txt Credits: Cleancache_ops design derived from Jeremy Fitzhardinge design for tmem [v8: dan.magenheimer@oracle.com: fix exportfs call affecting btrfs] [v8: akpm@linux-foundation.org: use static inline function, not macro] [v7: dan.magenheimer@oracle.com: cleanup sysfs and remove cleancache prefix] [v6: JBeulich@novell.com: robustly handle buggy fs encode_fh actor definition] [v5: jeremy@goop.org: clean up global usage and static var names] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] [v5: hch@infradead.org: cleaner non-global interface for ops registration] [v4: adilger@sun.com: interface must support exportfs FS's] [v4: hch@infradead.org: interface must support 64-bit FS on 32-bit kernel] [v3: akpm@linux-foundation.org: use one ops struct to avoid pointer hops] [v3: akpm@linux-foundation.org: document and ensure PageLocked reqts are met] [v3: ngupta@vflare.org: fix success/fail codes, change funcs to void] [v2: viro@ZenIV.linux.org.uk: use sane types] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Nitin Gupta <ngupta@vflare.org> Acked-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Andreas Dilger <adilger@sun.com> Acked-by: Jan Beulich <JBeulich@novell.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com>
2011-05-26fs: add field to superblock to support cleancacheDan Magenheimer
This second patch of eight in this cleancache series adds a field to the generic superblock to squirrel away a pool identifier that is dynamically provided by cleancache-enabled filesystems at mount time to uniquely identify files and pages belonging to this mounted filesystem. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v8: trivial merge conflict update] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-26mm/fs: cleancache documentationDan Magenheimer
This patchset introduces cleancache, an optional new feature exposed by the VFS layer that potentially dramatically increases page cache effectiveness for many workloads in many environments at a negligible cost. It does this by providing an interface to transcendent memory, which is memory/storage that is not otherwise visible to and/or directly addressable by the kernel. Instead of being discarded, hooks in the reclaim code "put" clean pages to cleancache. Filesystems that "opt-in" may "get" pages from cleancache that were previously put, but pages in cleancache are "ephemeral", meaning they may disappear at any time. And the size of cleancache is entirely dynamic and unknowable to the kernel. Filesystems currently supported by this patchset include ext3, ext4, btrfs, and ocfs2. Other filesystems (especially those built entirely on VFS) should be easy to add, but should first be thoroughly tested to ensure coherency. Details and a FAQ are provided in Documentation/vm/cleancache.txt This first patch of eight in this cleancache series only adds two new documentation files. [v8: minor documentation changes by author] [v3: akpm@linux-foundation.org: document sysfs API] [v3: hch@infradead.org: move detailed description to Documentation/vm] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
2011-05-26Merge tag 'v2.6.39' of ↵Chris Metcalf
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus
2011-05-26ASoC: Fix power down for widgetless per-card DAPM context caseJarkko Nikula
Commit 52ba67b ("ASoC: Force all DAPM contexts into the same bias state") powers up all the DAPM contexts in a card if any DAPM context becomes active. Unfortunately power down newer happens if per-card DAPM context doesn't have any widgets. Reason for this is that power state of per-card DAPM context without widgets is never cleared and thus all the DAPM contexts remain permanently active. Test for widgetless calling DAPM context in dapm_power_widgets() doesn't work for per-card DAPM context since power change is never originating from widgetless per-card DAPM context. Fix this by pre-clearing power state flag of non-codec DAPM context at the beginning of power sequence. Signed-off-by: Jarkko Nikula <jhnikula@gmail.com> Acked-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-05-26perf tools: Fix build on older systemsArnaldo Carvalho de Melo
Where /usr/include/linux/const.h is not present, e.g. RHEL5. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Link: http://lkml.kernel.org/n/tip-ypcw2mu0w7dl1rrc6ncz3pee@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-05-26perf symbols: Handle /proc/sys/kernel/kptr_restrictArnaldo Carvalho de Melo
Perf uses /proc/modules to figure out where kernel modules are loaded. With the advent of kptr_restrict, non root users get zeroes for all module start addresses. So check if kptr_restrict is non zero and don't generate the syntethic PERF_RECORD_MMAP events for them. Warn the user about it in perf record and in perf report. In perf report the reference relocation symbol being zero means that kptr_restrict was set, thus /proc/kallsyms has only zeroed addresses, so don't use it to fixup symbol addresses when using a valid kallsyms (in the buildid cache) or vmlinux (in the vmlinux path) build-id located automatically or specified by the user. Provide an explanation about it in 'perf report' if kernel samples were taken, checking if a suitable vmlinux or kallsyms was found/specified. Restricted /proc/kallsyms don't go to the buildid cache anymore. Example: [acme@emilia ~]$ perf record -F 100000 sleep 1 WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict. Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path. Samples in kernel modules won't be resolved at all. If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.005 MB perf.data (~231 samples) ] [acme@emilia ~]$ [acme@emilia ~]$ perf report --stdio Kernel address maps (/proc/{kallsyms,modules}) were restricted, check /proc/sys/kernel/kptr_restrict before running 'perf record'. If some relocation was applied (e.g. kexec) symbols may be misresolved. Samples in kernel modules can't be resolved as well. # Events: 13 cycles # # Overhead Command Shared Object Symbol # ........ ....... ................. ..................... # 20.24% sleep [kernel.kallsyms] [k] page_fault 20.04% sleep [kernel.kallsyms] [k] filemap_fault 19.78% sleep [kernel.kallsyms] [k] __lru_cache_add 19.69% sleep ld-2.12.so [.] memcpy 14.71% sleep [kernel.kallsyms] [k] dput 4.70% sleep [kernel.kallsyms] [k] flush_signal_handlers 0.73% sleep [kernel.kallsyms] [k] perf_event_comm 0.11% sleep [kernel.kallsyms] [k] native_write_msr_safe # # (For a higher level overview, try: perf report --sort comm,dso) # [acme@emilia ~]$ This is because it found a suitable vmlinux (build-id checked) in /lib/modules/2.6.39-rc7+/build/vmlinux (use -v in perf report to see the long file name). If we remove that file from the vmlinux path: [root@emilia ~]# mv /lib/modules/2.6.39-rc7+/build/vmlinux \ /lib/modules/2.6.39-rc7+/build/vmlinux.OFF [acme@emilia ~]$ perf report --stdio [kernel.kallsyms] with build id 57298cdbe0131f6871667ec0eaab4804dcf6f562 not found, continuing without symbols Kernel address maps (/proc/{kallsyms,modules}) were restricted, check /proc/sys/kernel/kptr_restrict before running 'perf record'. As no suitable kallsyms nor vmlinux was found, kernel samples can't be resolved. Samples in kernel modules can't be resolved as well. # Events: 13 cycles # # Overhead Command Shared Object Symbol # ........ ....... ................. ...... # 80.31% sleep [kernel.kallsyms] [k] 0xffffffff8103425a 19.69% sleep ld-2.12.so [.] memcpy # # (For a higher level overview, try: perf report --sort comm,dso) # [acme@emilia ~]$ Reported-by: Stephane Eranian <eranian@google.com> Suggested-by: David Miller <davem@davemloft.net> Cc: Dave Jones <davej@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Link: http://lkml.kernel.org/n/tip-mt512joaxxbhhp1odop04yit@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-05-26ASoC: wm1250-ev1: Define "WM1250 Output" with SND_SOC_DAPM_OUTPUTAxel Lin
Codec output pin should be defined with SND_SOC_DAPM_OUTPUT. Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-05-26ASoC: Remove duplicate linux/delay.h inclusion.Jesper Juhl
It's enough to include linux/delay.h just once in sound/soc/codecs/wm8915.c, so remove the duplicate. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-05-26ASoC: sam9g20_wm8731: use the proper SYSCKL valueNicolas Ferre
at91sam9g20 is providing master clock to wm8731: not using a crystal but an external MCLK. We can avoid conflict and save power using WM8731_SYSCLK_MCLK as we do not need oscillator to be powered. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-05-26ASoC: wm8731: fix wm8731_check_osc() connected conditionNicolas Ferre
The crystal oscillator is only enabled if the WM8731_SYSCLK_XTAL master clock is specified. Fix the connected() struct snd_soc_dapm_route function to take this into account. Oscillator is not enabled on machine that need it otherwise. Machine drivers have to make sure that they use the proper SYSCLK value. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2011-05-26jbd2: Add MAINTAINERS entryTheodore Ts'o
Create a separate MAINTAINERS entry for jbd2 Cc: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-26ALSA: hda - Reorganize controller quriks with bit flagsTakashi Iwai
Introduce bit-flags indicating the necessary controller quirks, and set them in pci driver_data field. This simplifies the checks in the driver code and avoids the pci-id lookup in different places. Also, this patch adds the PCI ID entry for AMD Hudson. AMD Hudson requires a similar workaround like ATI SB while other generic ATI and AMD controllers don't need but some ATI-HDMI quirks. So, we need a different entry for Hudson. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-05-26ALSA: hda - Use snd_printd() in snd_hda_parse_pin_def_config()Takashi Iwai
Fixed the wrong usage of snd_printdd() for debug prints of input entries. It should be snd_printd() like others. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-05-26x86: Move do_page_fault()'s error path under unlikely()KOSAKI Motohiro
Ingo suggested SIGKILL check should be moved into slowpath function. This will reduce the page fault fastpath impact of this recent commit: 37b23e0525d3: x86,mm: make pagefault killable Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: minchan.kim@gmail.com Cc: willy@linux.intel.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4DDE0B5C.9050907@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-26Merge branch 'linus' into x86/urgentIngo Molnar
Merge reason: we want to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-26perf: Remove duplicate headersJesper Juhl
Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: trivial@kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1105261011290.17400@swampdragon.chaosbits.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-26Merge branch 'linus' into perf/urgentIngo Molnar
Merge reason: Linus applied an overlapping commit: 5f2e8e2b0bf0: kernel/watchdog.c: Use proper ANSI C prototypes So merge it in to make sure we can iterate the file without conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-26cifs: remove unnecessary dentry_unhash on rmdir/rename_dirSage Weil
Cifs has no problems with lingering references to unlinked directory inodes. CC: Steve French <sfrench@samba.org> CC: linux-cifs@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dirSage Weil
Ocfs2 has no issues with lingering references to unlinked directory inodes. CC: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26exofs: remove unnecessary dentry_unhash on rmdir/rename_dirSage Weil
Exofs has no problems with lingering references to unlinked directory inodes. CC: Benny Halevy <bhalevy@panasas.com> CC: osd-dev@open-osd.org Acked-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26nfs: remove unnecessary dentry_unhash on rmdir/rename_dirSage Weil
NFS has no problems with lingering references to unlinked directory inodes. CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: linux-nfs@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26ext2: remove unnecessary dentry_unhash on rmdir/rename_dirSage Weil
ext2 has no problems with lingering references to unlinked directory inodes. CC: Jan Kara <jack@suse.cz> CC: linux-ext4@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26ext3: remove unnecessary dentry_unhash on rmdir/rename_dirSage Weil
ext3 has no problems with lingering references to unlinked directory inodes. CC: Jan Kara <jack@suse.cz> CC: Andrew Morton <akpm@linux-foundation.org> CC: Andreas Dilger <adilger.kernel@dilger.ca> CC: linux-ext4@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26ext4: remove unnecessary dentry_unhash on rmdir/rename_dirSage Weil
ext4 has no problems with lingering references to unlinked directory inodes. CC: "Theodore Ts'o" <tytso@mit.edu> CC: Andreas Dilger <adilger.kernel@dilger.ca> CC: linux-ext4@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26btrfs: remove unnecessary dentry_unhash in rmdir/rename_dirSage Weil
Btrfs has no problems with lingering references to unlinked directory inodes. CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26ceph: remove unnecessary dentry_unhash callsSage Weil
Ceph does not need these, and they screw up our use of the dcache as a consistent cache. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>