summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-19FS-Cache: Don't use spin_is_locked() in assertionsDavid Howells
Under certain circumstances, spin_is_locked() is hardwired to 0 - even when the code would normally be in a locked section where it should return 1. This means it cannot be used for an assertion that checks that a spinlock is locked. Remove such usages from FS-Cache. The following oops might otherwise be observed: FS-Cache: Assertion failed BUG: failure at fs/fscache/operation.c:270/fscache_start_operations()! Kernel panic - not syncing: BUG! CPU: 0 PID: 10 Comm: kworker/u2:1 Not tainted 3.10.0-rc1-00133-ge7ebb75 #2 Workqueue: fscache_operation fscache_op_work_func [fscache] 7f091c48 603c8947 7f090000 7f9b1361 7f25f080 00000001 7f26d440 7f091c90 60299eb8 7f091d90 602951c5 7f26d440 3000000008 7f091da0 7f091cc0 7f091cd0 00000007 00000007 00000006 7f091ae0 00000010 0000010e 7f9af330 7f091ae0 Call Trace: 7f091c88: [<60299eb8>] dump_stack+0x17/0x19 7f091c98: [<602951c5>] panic+0xf4/0x1e9 7f091d38: [<6002b10e>] set_signals+0x1e/0x40 7f091d58: [<6005b89e>] __wake_up+0x4e/0x70 7f091d98: [<7f9aa003>] fscache_start_operations+0x43/0x50 [fscache] 7f091da8: [<7f9aa1e3>] fscache_op_complete+0x1d3/0x220 [fscache] 7f091db8: [<60082985>] unlock_page+0x55/0x60 7f091de8: [<7fb25bb0>] cachefiles_read_copier+0x250/0x330 [cachefiles] 7f091e58: [<7f9ab03c>] fscache_op_work_func+0xac/0x120 [fscache] 7f091e88: [<6004d5b0>] process_one_work+0x250/0x3a0 7f091ef8: [<6004edc7>] worker_thread+0x177/0x2a0 7f091f38: [<6004ec50>] worker_thread+0x0/0x2a0 7f091f58: [<60054418>] kthread+0xd8/0xe0 7f091f68: [<6005bb27>] finish_task_switch.isra.64+0x37/0xa0 7f091fd8: [<600185cf>] new_thread_handler+0x8f/0xb0 Reported-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-By: Milosz Tanski <milosz@adfin.com>
2013-06-19FS-Cache: The retrieval remaining-pages counter needs to be atomic_tDavid Howells
struct fscache_retrieval contains a count of the number of pages that still need some processing (n_pages). This is decremented as the pages are processed. However, this needs to be atomic as fscache_retrieval_complete() (I think) just occasionally may be called from cachefiles_read_backing_file() and cachefiles_read_copier() simultaneously. This happens when an fscache_read_or_alloc_pages() request containing a lot of pages (say a couple of hundred) is being processed. The read on each backing page is dispatched individually because we need to insert a monitor into the waitqueue to catch when the read completes. However, under low-memory conditions, we might be forced to wait in the allocator - and this gives the I/O on the backing page a chance to complete first. When the I/O completes, fscache_enqueue_retrieval() chucks the retrieval onto the workqueue without waiting for the operation to finish the initial I/O dispatch (we want to release any pages we can as soon as we can), thus both can end up running simultaneously and potentially attempting to partially complete the retrieval simultaneously (ENOMEM may occur, backing pages may already be in the page cache). This was demonstrated by parallelling the non-atomic counter with an atomic counter and printing both of them when the assertion fails. At this point, the atomic counter has reached zero, but the non-atomic counter has not. To fix this, make the counter an atomic_t. This results in the following bug appearing FS-Cache: Assertion failed 3 == 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:421! or FS-Cache: Assertion failed 3 == 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:414! With a backtrace like the following: RIP: 0010:[<ffffffffa0211b1d>] fscache_put_operation+0x1ad/0x240 [fscache] Call Trace: [<ffffffffa0213185>] fscache_retrieval_work+0x55/0x270 [fscache] [<ffffffffa0213130>] ? fscache_retrieval_work+0x0/0x270 [fscache] [<ffffffff81090b10>] worker_thread+0x170/0x2a0 [<ffffffff81096d10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810909a0>] ? worker_thread+0x0/0x2a0 [<ffffffff81096966>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffff810968d0>] ? kthread+0x0/0xa0 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19cachefiles: remove unused macro list_to_page()Haicheng Li
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19FS-Cache: Simplify cookie retention for fscache_objects, fixing oopsDavid Howells
Simplify the way fscache cache objects retain their cookie. The way I implemented the cookie storage handling made synchronisation a pain (ie. the object state machine can't rely on the cookie actually still being there). Instead of the the object being detached from the cookie and the cookie being freed in __fscache_relinquish_cookie(), we defer both operations: (*) The detachment of the object from the list in the cookie now takes place in fscache_drop_object() and is thus governed by the object state machine (fscache_detach_from_cookie() has been removed). (*) The release of the cookie is now in fscache_object_destroy() - which is called by the cache backend just before it frees the object. This means that the fscache_cookie struct is now available to the cache all the way through from ->alloc_object() to ->drop_object() and ->put_object() - meaning that it's no longer necessary to take object->lock to guarantee access. However, __fscache_relinquish_cookie() doesn't wait for the object to go all the way through to destruction before letting the netfs proceed. That would massively slow down the netfs. Since __fscache_relinquish_cookie() leaves the cookie around, in must therefore break all attachments to the netfs - which includes ->def, ->netfs_data and any outstanding page read/writes. To handle this, struct fscache_cookie now has an n_active counter: (1) This starts off initialised to 1. (2) Any time the cache needs to get at the netfs data, it calls fscache_use_cookie() to increment it - if it is not zero. If it was zero, then access is not permitted. (3) When the cache has finished with the data, it calls fscache_unuse_cookie() to decrement it. This does a wake-up on it if it reaches 0. (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to reach 0. The initialisation to 1 in step (1) ensures that we only get wake ups when we're trying to get rid of the cookie. This leaves __fscache_relinquish_cookie() a lot simpler. *** This fixes a problem in the current code whereby if fscache_invalidate() is followed sufficiently quickly by fscache_relinquish_cookie() then it is possible for __fscache_relinquish_cookie() to have detached the cookie from the object and cleared the pointer before a thread is dispatched to process the invalidation state in the object state machine. Since the pending write clearance was deferred to the invalidation state to make it asynchronous, we need to either wait in relinquishment for the stores tree to be cleared in the invalidation state or we need to handle the clearance in relinquishment. Further, if the relinquishment code does clear the tree, then the invalidation state need to make the clearance contingent on still having the cookie to hand (since that's where the tree is rooted) and we have to prevent the cookie from disappearing for the duration. This can lead to an oops like the following: BUG: unable to handle kernel NULL pointer dereference at 000000000000000c ... RIP: 0010:[<ffffffff8151023e>] _spin_lock+0xe/0x30 ... CR2: 000000000000000c ... ... Process kslowd002 (...) .... Call Trace: [<ffffffffa01c3278>] fscache_invalidate_writes+0x38/0xd0 [fscache] [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150 [<ffffffff8110ddd4>] ? slow_work_enqueue+0x104/0x180 [<ffffffffa01c1303>] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache] [<ffffffff81096b67>] ? bit_waitqueue+0x17/0xd0 [<ffffffff8110e233>] slow_work_execute+0x233/0x310 [<ffffffff8110e515>] slow_work_thread+0x205/0x360 [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8110e310>] ? slow_work_thread+0x0/0x360 [<ffffffff81096936>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffff810968a0>] ? kthread+0x0/0xa0 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 The parameter to fscache_invalidate_writes() was object->cookie which is NULL. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19FS-Cache: Fix object state machine to have separate work and wait statesDavid Howells
Fix object state machine to have separate work and wait states as that makes it easier to envision. There are now three kinds of state: (1) Work state. This is an execution state. No event processing is performed by a work state. The function attached to a work state returns a pointer indicating the next state to which the OSM should transition. Returning NO_TRANSIT repeats the current state, but goes back to the scheduler first. (2) Wait state. This is an event processing state. No execution is performed by a wait state. Wait states are just tables of "if event X occurs, clear it and transition to state Y". The dispatcher returns to the scheduler if none of the events in which the wait state has an interest are currently pending. (3) Out-of-band state. This is a special work state. Transitions to normal states can be overridden when an unexpected event occurs (eg. I/O error). Instead the dispatcher disables and clears the OOB event and transits to the specified work state. This then acts as an ordinary work state, though object->state points to the overridden destination. Returning NO_TRANSIT resumes the overridden transition. In addition, the states have names in their definitions, so there's no need for tables of state names. Further, the EV_REQUEUE event is no longer necessary as that is automatic for work states. Since the states are now separate structs rather than values in an enum, it's not possible to use comparisons other than (non-)equality between them, so use some object->flags to indicate what phase an object is in. The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one (EV_KILL). An object flag now carries the information about retirement. Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged into an KILL_OBJECT state and additional states have been added for handling waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS). A state has also been added for synchronising with parent object initialisation (WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY). Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19FS-Cache: Wrap checks on object stateDavid Howells
Wrap checks on object state (mostly outside of fs/fscache/object.c) with inline functions so that the mechanism can be replaced. Some of the state checks within object.c are left as-is as they will be replaced. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19FS-Cache: Uninline fscache_object_init()David Howells
Uninline fscache_object_init() so as not to expose some of the FS-Cache internals to the cache backend. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19FS-Cache: Don't sleep in page release if __GFP_FS is not setDavid Howells
Don't sleep in __fscache_maybe_release_page() if __GFP_FS is not set. This goes some way towards mitigating fscache deadlocking against ext4 by way of the allocator, eg: INFO: task flush-8:0:24427 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-8:0 D ffff88003e2b9fd8 0 24427 2 0x00000000 ffff88003e2b9138 0000000000000046 ffff880012e3a040 ffff88003e2b9fd8 0000000000011c80 ffff88003e2b9fd8 ffffffff81a10400 ffff880012e3a040 0000000000000002 ffff880012e3a040 ffff88003e2b9098 ffffffff8106dcf5 Call Trace: [<ffffffff8106dcf5>] ? __lock_is_held+0x31/0x53 [<ffffffff81219b61>] ? radix_tree_lookup_element+0xf4/0x12a [<ffffffff81454bed>] schedule+0x60/0x62 [<ffffffffa01d349c>] __fscache_wait_on_page_write+0x8b/0xa5 [fscache] [<ffffffff810498a8>] ? __init_waitqueue_head+0x4d/0x4d [<ffffffffa01d393a>] __fscache_maybe_release_page+0x30c/0x324 [fscache] [<ffffffffa01d369a>] ? __fscache_maybe_release_page+0x6c/0x324 [fscache] [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170 [<ffffffffa01fd7b2>] nfs_fscache_release_page+0x68/0x94 [nfs] [<ffffffffa01ef73e>] nfs_release_page+0x7e/0x86 [nfs] [<ffffffff810aa553>] try_to_release_page+0x32/0x3b [<ffffffff810b6c70>] shrink_page_list+0x535/0x71a [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170 [<ffffffff810b7352>] shrink_inactive_list+0x20a/0x2dd [<ffffffff81071a13>] ? mark_held_locks+0xbe/0xea [<ffffffff810b7a65>] shrink_lruvec+0x34c/0x3eb [<ffffffff810b7bd3>] do_try_to_free_pages+0xcf/0x355 [<ffffffff810b7fc8>] try_to_free_pages+0x9a/0xa1 [<ffffffff810b08d2>] __alloc_pages_nodemask+0x494/0x6f7 [<ffffffff810d9a07>] kmem_getpages+0x58/0x155 [<ffffffff810dc002>] fallback_alloc+0x120/0x1f3 [<ffffffff8106db23>] ? trace_hardirqs_off+0xd/0xf [<ffffffff810dbed3>] ____cache_alloc_node+0x177/0x186 [<ffffffff81162a6c>] ? ext4_init_io_end+0x1c/0x37 [<ffffffff810dc403>] kmem_cache_alloc+0xf1/0x176 [<ffffffff810b17ac>] ? test_set_page_writeback+0x101/0x113 [<ffffffff81162a6c>] ext4_init_io_end+0x1c/0x37 [<ffffffff81162ce4>] ext4_bio_write_page+0x20f/0x3af [<ffffffff8115cc02>] mpage_da_submit_io+0x26e/0x2f6 [<ffffffff811088e5>] ? __find_get_block_slow+0x38/0x133 [<ffffffff81161348>] mpage_da_map_and_submit+0x3a7/0x3bd [<ffffffff81161a60>] ext4_da_writepages+0x30d/0x426 [<ffffffff810b3359>] do_writepages+0x1c/0x2a [<ffffffff81102f4d>] __writeback_single_inode+0x3e/0xe5 [<ffffffff81103995>] writeback_sb_inodes+0x1bd/0x2f4 [<ffffffff81103b3b>] __writeback_inodes_wb+0x6f/0xb4 [<ffffffff81103c81>] wb_writeback+0x101/0x195 [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170 [<ffffffff811043aa>] ? wb_do_writeback+0xaa/0x173 [<ffffffff8110434a>] wb_do_writeback+0x4a/0x173 [<ffffffff81071bbc>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81038554>] ? del_timer+0x4b/0x5b [<ffffffff811044e0>] bdi_writeback_thread+0x6d/0x147 [<ffffffff81104473>] ? wb_do_writeback+0x173/0x173 [<ffffffff81048fbc>] kthread+0xd0/0xd8 [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55 [<ffffffff81456aac>] ret_from_fork+0x7c/0xb0 [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55 2 locks held by flush-8:0/24427: #0: (&type->s_umount_key#41){.+.+..}, at: [<ffffffff810e3b73>] grab_super_passive+0x4c/0x76 #1: (jbd2_handle){+.+...}, at: [<ffffffff81190d81>] start_this_handle+0x475/0x4ea The problem here is that another thread, which is attempting to write the to-be-stored NFS page to the on-ext4 cache file is waiting for the journal lock, eg: INFO: task kworker/u:2:24437 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u:2 D ffff880039589768 0 24437 2 0x00000000 ffff8800395896d8 0000000000000046 ffff8800283bf040 ffff880039589fd8 0000000000011c80 ffff880039589fd8 ffff880039f0b040 ffff8800283bf040 0000000000000006 ffff8800283bf6b8 ffff880039589658 ffffffff81071a13 Call Trace: [<ffffffff81071a13>] ? mark_held_locks+0xbe/0xea [<ffffffff81455e73>] ? _raw_spin_unlock_irqrestore+0x3a/0x50 [<ffffffff81071b53>] ? trace_hardirqs_on_caller+0x114/0x170 [<ffffffff81071bbc>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81454bed>] schedule+0x60/0x62 [<ffffffff81190c23>] start_this_handle+0x317/0x4ea [<ffffffff810498a8>] ? __init_waitqueue_head+0x4d/0x4d [<ffffffff81190fcc>] jbd2__journal_start+0xb3/0x12e [<ffffffff81176606>] __ext4_journal_start_sb+0xb2/0xc6 [<ffffffff8115f137>] ext4_da_write_begin+0x109/0x233 [<ffffffff810a964d>] generic_file_buffered_write+0x11a/0x264 [<ffffffff811032cf>] ? __mark_inode_dirty+0x2d/0x1ee [<ffffffff810ab1ab>] __generic_file_aio_write+0x2a5/0x2d5 [<ffffffff810ab24a>] generic_file_aio_write+0x6f/0xd0 [<ffffffff81159a2c>] ext4_file_write+0x38c/0x3c4 [<ffffffff810e0915>] do_sync_write+0x91/0xd1 [<ffffffffa00a17f0>] cachefiles_write_page+0x26f/0x310 [cachefiles] [<ffffffffa01d470b>] fscache_write_op+0x21e/0x37a [fscache] [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e [<ffffffffa01d2479>] fscache_op_work_func+0x78/0xd7 [fscache] [<ffffffff8104455a>] process_one_work+0x232/0x3a8 [<ffffffff810444ff>] ? process_one_work+0x1d7/0x3a8 [<ffffffff81044ee0>] worker_thread+0x214/0x303 [<ffffffff81044ccc>] ? manage_workers+0x245/0x245 [<ffffffff81048fbc>] kthread+0xd0/0xd8 [<ffffffff81455eb2>] ? _raw_spin_unlock_irq+0x29/0x3e [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55 [<ffffffff81456aac>] ret_from_fork+0x7c/0xb0 [<ffffffff81048eec>] ? __init_kthread_worker+0x55/0x55 4 locks held by kworker/u:2/24437: #0: (fscache_operation){.+.+.+}, at: [<ffffffff810444ff>] process_one_work+0x1d7/0x3a8 #1: ((&op->work)){+.+.+.}, at: [<ffffffff810444ff>] process_one_work+0x1d7/0x3a8 #2: (sb_writers#14){.+.+.+}, at: [<ffffffff810ab22c>] generic_file_aio_write+0x51/0xd0 #3: (&sb->s_type->i_mutex_key#19){+.+.+.}, at: [<ffffffff810ab236>] generic_file_aio_write+0x5b/0x fscache already tries to cancel pending stores, but it can't cancel a write for which I/O is already in progress. An alternative would be to accept writing garbage to the cache under extreme circumstances and to kill the afflicted cache object if we have to do this. However, we really need to know how strapped the allocator is before deciding to do that. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19CacheFiles: name i_mutex lock class explicitlyJ. Bruce Fields
Just some cleanup. (And note the caller of this function may, for example, call vfs_unlink on a child, so the "1" (I_MUTEX_PARENT) really was what was intended here.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19fs/fscache: remove spin_lock() from the condition in while()Sebastian Andrzej Siewior
The spinlock() within the condition in while() will cause a compile error if it is not a function. This is not a problem on mainline but it does not look pretty and there is no reason to do it that way. That patch writes it a little differently and avoids the double condition. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Milosz Tanski <milosz@adfin.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2013-06-19gpio-rcar: Remove #ifdef CONFIG_OF around OF-specific sectionsLaurent Pinchart
All functions and data types used by OF-specific code paths are declared in <linux/of.h> regardless of CONFIG_OF. Replace the #ifdef CONFIG_OF guard with a if(IS_ENABLED(CONFIG_OF)) and let the compiler optimize the unused code away. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2013-06-19gpio-rcar: Reference core gpio documentation in the DT bindingsLaurent Pinchart
Replaced the detailed gpio-ranges documentation with a reference to the code gpio DT bindings, and mention the gpio flags symbolic names. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2013-06-19ARM: shmobile: irqpin: add a DT property to enable masking on parentGuennadi Liakhovetski
To disable spurious interrupts, that get triggered on certain hardware, the irqpin driver masks them on the parent interrupt controller. To specify such broken devices a .control_parent parameter can be provided in the platform data. In the DT case we need a property, to do the same. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Magnus Damm <damm@opensource.se> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2013-06-19[media] exynos4-is: Fix FIMC-IS clocks initializationSylwester Nawrocki
The ISP clock register content is not preserved over the ISP power domain off/on cycle. Instead of setting the clock frequencies once at probe time the clock rates set up is moved to the runtime_resume handler, which is invoked after the related power domain is already enabled, ensuring the clocks are properly configured when the device is actively used. This fixes the FIMC-IS malfunctions and STREAM ON timeout errors accuring on some boards: [ 59.860000] fimc_is_general_irq_handler:583 ISR_NDONE: 5: 0x800003e8, IS_ERROR_UNKNOWN [ 59.860000] fimc_is_general_irq_handler:586 IS_ERROR_TIME_OUT Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-06-19tracing/context-tracking: Add preempt_schedule_context() for tracingSteven Rostedt
Dave Jones hit the following bug report: =============================== [ INFO: suspicious RCU usage. ] 3.10.0-rc2+ #1 Not tainted ------------------------------- include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 2 locks held by cc1/63645: #0: (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0 CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28 ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645 Call Trace: [<ffffffff816ae383>] dump_stack+0x19/0x1b [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130 [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0 [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0 [<ffffffff8108dffc>] update_curr+0xec/0x240 [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480 [<ffffffff816b3a71>] __schedule+0x161/0x9b0 [<ffffffff816b4721>] preempt_schedule+0x51/0x80 [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60 [<ffffffff816b6824>] ? retint_careful+0x12/0x2e [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210 [<ffffffff816be280>] ftrace_call+0x5/0x2f [<ffffffff816b681d>] ? retint_careful+0xb/0x2e [<ffffffff816b4805>] ? schedule_user+0x5/0x70 [<ffffffff816b4805>] ? schedule_user+0x5/0x70 [<ffffffff816b6824>] ? retint_careful+0x12/0x2e ------------[ cut here ]------------ What happened was that the function tracer traced the schedule_user() code that tells RCU that the system is coming back from userspace, and to add the CPU back to the RCU monitoring. Because the function tracer does a preempt_disable/enable_notrace() calls the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set, then preempt_schedule() is called. But this is called before the user_exit() function can inform the kernel that the CPU is no longer in user mode and needs to be accounted for by RCU. The fix is to create a new preempt_schedule_context() that checks if the kernel is still in user mode and if so to switch it to kernel mode before calling schedule. It also switches back to user mode coming back from schedule in need be. The only user of this currently is the preempt_enable_notrace(), which is only used by the tracing subsystem. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19sched: Fix clear NOHZ_BALANCE_KICKVincent Guittot
I have faced a sequence where the Idle Load Balance was sometime not triggered for a while on my platform, in the following scenario: CPU 0 and CPU 1 are running tasks and CPU 2 is idle CPU 1 kicks the Idle Load Balance CPU 1 selects CPU 2 as the new Idle Load Balancer CPU 2 sets NOHZ_BALANCE_KICK for CPU 2 CPU 2 sends a reschedule IPI to CPU 2 While CPU 3 wakes up, CPU 0 or CPU 1 migrates a waking up task A on CPU 2 CPU 2 finally wakes up, runs task A and discards the Idle Load Balance task A quickly goes back to sleep (before a tick occurs on CPU 2) CPU 2 goes back to idle with NOHZ_BALANCE_KICK set Whenever CPU 2 will be selected as the ILB, no reschedule IPI will be sent because NOHZ_BALANCE_KICK is already set and no Idle Load Balance will be performed. We must wait for the sched softirq to be raised on CPU 2 thanks to another part the kernel to come back to clear NOHZ_BALANCE_KICK. The proposed solution clears NOHZ_BALANCE_KICK in schedule_ipi if we can't raise the sched_softirq for the Idle Load Balance. Change since V1: - move the clear of NOHZ_BALANCE_KICK in got_nohz_idle_kick if the ILB can't run on this CPU (as suggested by Peter) Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1370419991-13870-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EPStephane Eranian
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP. For some reason, the LDLAT extra reg definition for snb_ep showed up as duplicate in the snb table. This patch moves the definition of LDLAT back into the snb_ep table. Thanks to Don Zickus for tracking this one down. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19perf: Fix mmap() accounting holePeter Zijlstra
Vince's fuzzer once again found holes. This time it spotted a leak in the locked page accounting. When an event had redirected output and its close() was the last reference to the buffer we didn't have a vm context to undo accounting. Change the code to destroy the buffer on the last munmap() and detach all redirected events at that time. This provides us the right context to undo the vm accounting. Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130604084421.GI8923@twins.programming.kicks-ass.net Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19ARM: dts: AM43x EPOS EVM supportAfzal Mohammed
Add AM43x ePOS EVM minimal DT source - this is a minimal one to get it booting. Also include it in omap2plus dtbs and document bindings. The hardware is under development. Signed-off-by: Afzal Mohammed <afzal@ti.com> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19ARM: dts: OMAP5: Add bandgap DT entryEduardo Valentin
Add bandgap device DT entry for OMAP5 dtsi. Cc: Tony Lindgren <tony@atomide.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com> Signed-off-by: J Keerthy <j-keerthy@ti.com> [benoit.cousson@linaro.org: Fix alignement and use the macros for IRQ attributes] Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19x86: kvmclock: zero initialize pvclock shared memory areaIgor Mammedov
kernel might hung in pvclock_clocksource_read() due to uninitialized memory might contain odd version value in following cycle: do { version = __pvclock_read_cycles(src, &ret, &flags); } while ((src->version & 1) || version != src->version); if secondary kvmclock is accessed before it's registered with kvm. Clear garbage in pvclock shared memory area right after it's allocated to avoid this issue. Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521 Signed-off-by: Igor Mammedov <imammedo@redhat.com> [See BZ for analysis. We may want a different fix for 3.11, but this is the safest for now - Paolo] Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-19kvm/ppc/booke: Delay kvmppc_lazy_ee_enableScott Wood
kwmppc_lazy_ee_enable() should be called as late as possible, or else we get things like WARN_ON(preemptible()) in enable_kernel_fp() in configurations where preemptible() works. Note that book3s_pr already waits until just before __kvmppc_vcpu_run to call kvmppc_lazy_ee_enable(). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-19ARM: dts: AM33XX: Add pinmux configuration for CPSW to am335x EVMMugunthan V N
Add pinmux configurations for RGMII based CPSW ethernet to am335x-evm. Default mode is nothing but the values required for the module during active state. With this configurations module is functional as expected. Sleep mode is nothing but the values required for the module during inactive state. The pins are configured to its reset state to optimize energy usage for the pins for the suspend/resume cycle Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19ARM: dts: AM33XX: Add pinmux configuration for CPSW to EVMskMugunthan V N
Add pinmux configurations for RGMII based CPSW ethernet to AM335x EVMsk. Default mode is nothing but the values required for the module during active state. With this configurations module is functional as expected. Sleep mode is nothing but the values required for the module during inactive state. The pins are configured to its reset state to optimize energy usage for the pins for the suspend/resume cycle Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19ARM: dts: AM33XX: Add pinmux configuration for CPSW to beagleboneMugunthan V N
Add pinmux configurations for MII based CPSW ethernet to am335x-bone. Default mode is nothing but the values required for the module during active state. With this configurations module is functional as expected. Sleep mode is nothing but the values required for the module during inactive state. The pins are configured to its reset state to optimize energy usage for the pins for the suspend/resume cycle Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19ARM: dts: omap3-overo: Add default trigger for TWL4030 LEDFlorian Vaussard
Commit c971ff1 'leds: leds-pwm: Defer led_pwm_set() if PWM can sleep' fixed a crash when using a trigger with a pwm-led provided by an external chip. Now it is safe to add the default trigger according to board-overo.c. Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19ARM: dts: omap3-tobi: Correct polarity for GPIO LEDFlorian Vaussard
The LED is active low, not active high. Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19ARM: dts: omap3-tobi: Add SMSC911X nodeFlorian Vaussard
The Tobi expansion boards embeds a SMSC LAN8700 PHY. Add the corresponding node into the DT. The regulators are not designed to be turned off. Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19ARM: dts: OMAP3: Include IRQ headerFlorian Vaussard
Some nodes in OMAP3 DTS now use edge or level sensitive interrupts. Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19ARM: dts: Protect pinctrl headers against multiple inclusionsFlorian Vaussard
Pinctrl headers were not protected with #ifndef. Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benoit Cousson <benoit.cousson@linaro.org>
2013-06-19sparc: tsb must be flushed before tlbDave Kleikamp
This fixes a race where a cpu may re-load a tlb from a stale tsb right after it has been flushed by a remote function call. I still see some instability when stressing the system with parallel kernel builds while creating memory pressure by writing to /proc/sys/vm/nr_hugepages, but this patch improves the stability significantly. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19sparc,leon: Convert to use devm_ioremap_resourceTushar Behera
Commit 75096579c3ac ("lib: devres: Introduce devm_ioremap_resource()") introduced devm_ioremap_resource() and deprecated the use of devm_request_and_ioremap(). While at it, also remove the error message as devm_ioremap_resource() also prints similar error message. Signed-off-by: Tushar Behera <tushar.behera@linaro.org> CC: sparclinux@vger.kernel.org CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19sparc64 address-congruence propertybob picco
The Machine Description (MD) property "address-congruence-offset" is optional. According to the MD specification the value is assumed 0UL when not present. This caused early boot failure on T5. Signed-off-by: Bob Picco <bob.picco@oracle.com> CC: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19sparc32, leon: Enable interrupts before going idle to avoid getting stuckAndreas Larsson
This enables interrupts for Leon before having the CPU enter power-down mode. Commit 87fa05aeb3a5e8e21b1a5510eef6983650eff092, "sparc: Use generic idle loop", gets the CPU stuck on idle for Leon systems. On Leon, disabling interrupts and powering down the processor will get the processor stuck waiting for an interrupt that will never be reacted to. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19sparc32, leon: Remove separate "ticker" timer for SMPAndreas Larsson
This reduces the need from two timers to one timer. Moreover, without this patch, when the "ticker" timer triggers timer_cs_read via tick_periodic it reads the value of the usual timer it can get an wrapped timer value without timer_cs_internal_counter having been updated leading to the clock going backwards. This effectively hangs one cpu that gets stuck in update_wall_time with an offset slightly smaller than 0xffffffffffffffff. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19sparc: kernel: using strlcpy() instead of strcpy()Zhao Hongjiang
'boot_command_line' and 'full_boot_str' has a fix length, 'cmdline_p' and 'boot_command' maybe larger than them. So use strlcpy() instead of strcpy() to avoid memory overflow. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19arch: sparc: prom: looping issue, need additional length check in the ↵Chen Gang
outside looping When "cp >= barg_buf + BARG_LEN-2", it breaks internel looping 'while', but outside loop 'for' still has effect, so "*cp++ = ' '" will continue repeating which may cause memory overflow. So need additional length check for it in the outside looping. Also beautify the related code which found by "./scripts/checkpatch.pl" Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19sparc: remove inline marking of EXPORT_SYMBOL functionsDenis Efremov
EXPORT_SYMBOL and inline directives are contradictory to each other. The patch fixes this inconsistency. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Denis Efremov <yefremov.denis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19sparc: Switch to asm-generic/linkage.hGeert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19ARM: multi_v7: Enable Allwinner EMAC in multi_v7_defconfigMaxime Ripard
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2013-06-19ARM: sunxi: Add Olimex A10s-Olinuxino-micro device treeMaxime Ripard
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Tested-by: Emilio López <emilio@elopez.com.ar> Acked-by: Emilio López <emilio@elopez.com.ar>
2013-06-19GFS2: aggressively issue revokes in gfs2_log_flushBenjamin Marzinski
This patch looks at all the outstanding blocks in all the transactions on the log, and moves the completed ones to the ail2 list. Then it issues revokes for these blocks. This will hopefully speed things up in situations where there is a lot of contention for glocks, especially if they are acquired serially. revoke_lo_before_commit will issue at most one log block's full of these preemptive revokes. The amount of reserved log space that gfs2_log_reserve() ignores has been incremented to allow for this extra block. This patch also consolidates the common revoke instructions into one function, gfs2_add_revoke(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-19mconsole: we'd better initialize pos before passing it to vfs_read()...Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-19vxlan: fix check for migration of static entrystephen hemminger
The check introduced by: commit 26a41ae604381c5cc0caf1c3261ca6b298b5fe69 Author: stephen hemminger <stephen@networkplumber.org> Date: Mon Jun 17 12:09:58 2013 -0700 vxlan: only migrate dynamic FDB entries was not correct because it is checking flag about type of FDB entry, rather than the state (dynamic versus static). The confusion arises because vxlan is reusing values from bridge, and bridge is reusing values from neighbour table, and easy to get lost in translation. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19s390/dma: support debug_dma_mapping_errorSebastian Ott
Without this patch drivers will get blamed (CONFIG_DMA_API_DEBUG=y) for not calling dma_mapping_error (even if they do). Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-19s390/dma: fix mapping_error detectionSebastian Ott
The map_page implementation of s390 returns DMA_ERROR_CODE in an error situation. Correctly test if a mapping was erroneous (DMA_ERROR_CODE is defined as ~0). Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-19s390/irq: Only define synchronize_irq() on SMPBen Hutchings
In uniprocessor configurations, synchronize_irq() is defined in <linux/hardirq.h> as a macro, and this function definition fails to compile. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org # 3.9 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-19can: usb_8dev: unregister netdev before free()ingMarc Kleine-Budde
The usb_8dev hardware has problems on some xhci USB hosts. The driver fails to read the firmware revision in the probe function. This leads to the following Oops: [ 3356.635912] kernel BUG at net/core/dev.c:5701! The driver tries to free the netdev, which has already been registered, without unregistering it. This patch fixes the problem by unregistering the netdev in the error path. Reported-by: Michael Olbrich <m.olbrich@pengutronix.de> Reviewed-by: Bernd Krumboeck <krumboeck@universalnet.at> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-06-18Input: xpad - fix for "Mad Catz Street Fighter IV FightPad" controllersShawn Joseph
Added MAP_TRIGGERS_TO_BUTTONS for Mad Catz Street Fighter IV FightPad device. This controller model was already supported by the xpad driver, but none of the buttons work correctly without this change. Tested on kernel version 3.9.5. Signed-off-by: Shawn Joseph <jms.576@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2013-06-18Input: wacom - add a new stylus (0x100802) for Intuos5 and CintiqsPing Cheng
Signed-off-by: Ping Cheng <pingc@wacom.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>