summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-10-09fs/proc/task_mmu.c: introduce m_next_vma() helperOleg Nesterov
Extract the tail_vma/vm_next calculation from m_next() into the new trivial helper, m_next_vma(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/proc/task_mmu.c: simplify m_start() to make it readableOleg Nesterov
Now that m->version is gone we can cleanup m_start(). In particular, - Remove the "unsigned long" typecast, m->index can't be negative or exceed ->map_count. But lets use "unsigned int pos" to make it clear that "pos < map_count" is safe. - Remove the unnecessary "vma != NULL" check in the main loop. It can't be NULL unless we have a vm bug. - This also means that "pos < map_count" case can simply return the valid vma and avoid "goto" and subsequent checks. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logicOleg Nesterov
m_start() carefully documents, checks, and sets "m->version = -1" if we are going to return NULL. The only problem is that we will be never called again if m_start() returns NULL, so this is simply pointless and misleading. Otoh, ->show() methods m->version = 0 if vma == tail_vma and this is just wrong, we want -1 in this case. And in fact we also want -1 if ->vm_next == NULL and ->tail_vma == NULL. And it is not used consistently, the "scan vmas" loop in m_start() should update last_addr too. Finally, imo the whole "last_addr" logic in m_start() looks horrible. find_vma(last_addr) is called unconditionally even if we are not going to use the result. But the main problem is that this code participates in tail_vma-or-NULL mess, and this looks simply unfixable. Remove this optimization. We will add it back after some cleanups. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop()Oleg Nesterov
1. There is no reason to reset ->tail_vma in m_start(), if we return IS_ERR_OR_NULL() it won't be used. 2. m_start() also clears priv->task to ensure that m_stop() won't use the stale pointer if we fail before get_task_struct(). But this is ugly and confusing, move this initialization in m_stop(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next()Oleg Nesterov
1. Kill the first "vma != NULL" check. Firstly this is not possible, m_next() won't be called if ->start() or the previous ->next() returns NULL. And if it was possible the 2nd "vma != tail_vma" check is buggy, we should not wrongly return ->tail_vma. 2. Make this function readable. The logic is very simple, we should return check "vma != tail" once and return "vm_next || tail_vma". Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/proc/task_mmu.c: simplify the vma_stop() logicOleg Nesterov
m_start() drops ->mmap_sem and does mmput() if it retuns vsyscall vma. This is because in this case m_stop()->vma_stop() obviously can't use gate_vma->vm_mm. Now that we have proc_maps_private->mm we can simplify this logic: - Change m_start() to return with ->mmap_sem held unless it returns IS_ERR_OR_NULL(). - Change vma_stop() to use priv->mm and avoid the ugly vma checks, this makes "vm_area_struct *vma" unnecessary. - This also allows m_start() to use vm_stop(). - Cleanup m_next() to follow the new locking rule. Note: m_stop() looks very ugly, and this temporary uglifies it even more. Fixed by the next change. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()Oleg Nesterov
A simple test-case from Kirill Shutemov cat /proc/self/maps >/dev/null chmod +x /proc/self/net/packet exec /proc/self/net/packet makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in the opposite order. It's a false positive and probably we should not allow "chmod +x" on proc files. Still I think that we should avoid mm_access() and cred_guard_mutex in sys_read() paths, security checking should happen at open time. Besides, this doesn't even look right if the task changes its ->mm between m_stop() and m_start(). Add the new "mm_struct *mm" member into struct proc_maps_private and change proc_maps_open() to initialize it using proc_mem_open(). Change m_start() to use priv->mm if atomic_inc_not_zero(mm_users) succeeds or return NULL (eof) otherwise. The only complication is that proc_maps_open() users should additionally do mmdrop() in fop->release(), add the new proc_map_release() helper for that. Note: this is the user-visible change, if the task execs after open("maps") the new ->mm won't be visible via this file. I hope this is fine, and this matches /proc/pid/mem bahaviour. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09proc: introduce proc_mem_open()Oleg Nesterov
Extract the mm_access() code from __mem_open() into the new helper, proc_mem_open(), the next patch will add another caller. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open()Oleg Nesterov
do_maps_open() and numa_maps_open() are overcomplicated, they could use __seq_open_private(). Plus they do the same, just sizeof(*priv) Change them to use a new simple helper, proc_maps_open(ops, psize). This simplifies the code and allows us to do the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()Oleg Nesterov
get_gate_vma(priv->task->mm) looks ugly and wrong, task->mm can be NULL or it can changed by exec right after mm_access(). And in theory this race is not harmless, the task can exec and then later exit and free the new mm_struct. In this case get_task_mm(oldmm) can't help, get_gate_vma(task->mm) can read the freed/unmapped memory. I think that priv->task should simply die and hold_task_mempolicy() logic can be simplified. tail_vma logic asks for cleanups too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09softlockup: make detector be aware of task switch of processes hogging cpuchai wen
For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of "warn only once for a process", as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. [dzickus@redhat.com: update comment] Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2: fix deadlock due to wrong locking orderJunxiao Bi
For commit ocfs2 journal, ocfs2 journal thread will acquire the mutex osb->journal->j_trans_barrier and wake up jbd2 commit thread, then it will wait until jbd2 commit thread done. In order journal mode, jbd2 needs flushing dirty data pages first, and this needs get page lock. So osb->journal->j_trans_barrier should be got before page lock. But ocfs2_write_zero_page() and ocfs2_write_begin_inline() obey this locking order, and this will cause deadlock and hung the whole cluster. One deadlock catched is the following: PID: 13449 TASK: ffff8802e2f08180 CPU: 31 COMMAND: "oracle" #0 [ffff8802ee3f79b0] __schedule at ffffffff8150a524 #1 [ffff8802ee3f7a58] schedule at ffffffff8150acbf #2 [ffff8802ee3f7a68] rwsem_down_failed_common at ffffffff8150cb85 #3 [ffff8802ee3f7ad8] rwsem_down_read_failed at ffffffff8150cc55 #4 [ffff8802ee3f7ae8] call_rwsem_down_read_failed at ffffffff812617a4 #5 [ffff8802ee3f7b50] ocfs2_start_trans at ffffffffa0498919 [ocfs2] #6 [ffff8802ee3f7ba0] ocfs2_zero_start_ordered_transaction at ffffffffa048b2b8 [ocfs2] #7 [ffff8802ee3f7bf0] ocfs2_write_zero_page at ffffffffa048e9bd [ocfs2] #8 [ffff8802ee3f7c80] ocfs2_zero_extend_range at ffffffffa048ec83 [ocfs2] #9 [ffff8802ee3f7ce0] ocfs2_zero_extend at ffffffffa048edfd [ocfs2] #10 [ffff8802ee3f7d50] ocfs2_extend_file at ffffffffa049079e [ocfs2] #11 [ffff8802ee3f7da0] ocfs2_setattr at ffffffffa04910ed [ocfs2] #12 [ffff8802ee3f7e70] notify_change at ffffffff81187d29 #13 [ffff8802ee3f7ee0] do_truncate at ffffffff8116bbc1 #14 [ffff8802ee3f7f50] sys_ftruncate at ffffffff8116bcbd #15 [ffff8802ee3f7f80] system_call_fastpath at ffffffff81515142 RIP: 00007f8de750c6f7 RSP: 00007fffe786e478 RFLAGS: 00000206 RAX: 000000000000004d RBX: ffffffff81515142 RCX: 0000000000000000 RDX: 0000000000000200 RSI: 0000000000028400 RDI: 000000000000000d RBP: 00007fffe786e040 R8: 0000000000000000 R9: 000000000000000d R10: 0000000000000000 R11: 0000000000000206 R12: 000000000000000d R13: 00007fffe786e710 R14: 00007f8de70f8340 R15: 0000000000028400 ORIG_RAX: 000000000000004d CS: 0033 SS: 002b crash64> bt PID: 7610 TASK: ffff88100fd56140 CPU: 1 COMMAND: "ocfs2cmt" #0 [ffff88100f4d1c50] __schedule at ffffffff8150a524 #1 [ffff88100f4d1cf8] schedule at ffffffff8150acbf #2 [ffff88100f4d1d08] jbd2_log_wait_commit at ffffffffa01274fd [jbd2] #3 [ffff88100f4d1d98] jbd2_journal_flush at ffffffffa01280b4 [jbd2] #4 [ffff88100f4d1dd8] ocfs2_commit_cache at ffffffffa0499b14 [ocfs2] #5 [ffff88100f4d1e38] ocfs2_commit_thread at ffffffffa0499d38 [ocfs2] #6 [ffff88100f4d1ee8] kthread at ffffffff81090db6 #7 [ffff88100f4d1f48] kernel_thread_helper at ffffffff81516284 crash64> bt PID: 7609 TASK: ffff88100f2d4480 CPU: 0 COMMAND: "jbd2/dm-20-86" #0 [ffff88100def3920] __schedule at ffffffff8150a524 #1 [ffff88100def39c8] schedule at ffffffff8150acbf #2 [ffff88100def39d8] io_schedule at ffffffff8150ad6c #3 [ffff88100def39f8] sleep_on_page at ffffffff8111069e #4 [ffff88100def3a08] __wait_on_bit_lock at ffffffff8150b30a #5 [ffff88100def3a58] __lock_page at ffffffff81110687 #6 [ffff88100def3ab8] write_cache_pages at ffffffff8111b752 #7 [ffff88100def3be8] generic_writepages at ffffffff8111b901 #8 [ffff88100def3c48] journal_submit_data_buffers at ffffffffa0120f67 [jbd2] #9 [ffff88100def3cf8] jbd2_journal_commit_transaction at ffffffffa0121372[jbd2] #10 [ffff88100def3e68] kjournald2 at ffffffffa0127a86 [jbd2] #11 [ffff88100def3ee8] kthread at ffffffff81090db6 #12 [ffff88100def3f48] kernel_thread_helper at ffffffff81516284 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Alex Chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2: fix deadlock between o2hb thread and o2net_wqJoseph Qi
The following case may lead to o2net_wq and o2hb thread deadlock on o2hb_callback_sem. Currently there are 2 nodes say N1, N2 in the cluster. And N2 down, at the same time, N3 tries to join the cluster. So N1 will handle node down (N2) and join (N3) simultaneously. o2hb o2net_wq ->o2hb_do_disk_heartbeat ->o2hb_check_slot ->o2hb_run_event_list ->o2hb_fire_callbacks ->down_write(&o2hb_callback_sem) ->o2net_hb_node_down_cb ->flush_workqueue(o2net_wq) ->o2net_process_message ->dlm_query_join_handler ->o2hb_check_node_heartbeating ->o2hb_fill_node_map ->down_read(&o2hb_callback_sem) No need to take o2hb_callback_sem in dlm_query_join_handler, o2hb_live_lock is enough to protect live node map. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: xMark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: jiangyiwen <jiangyiwen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2: don't fire quorum before connection establishedJunxiao Bi
Firing quorum before connection established can cause unexpected node to reboot. Assume there are 3 nodes in the cluster, Node 1, 2, 3. Node 2 and 3 have wrong ip address of Node 1 in cluster.conf and global heartbeat is enabled in the cluster. After the heatbeats are started on these three nodes, Node 1 will reboot due to quorum fencing. It is similar case if Node 1's networking is not ready when starting the global heartbeat. The reboot is not friendly as customer is not fully ready for ocfs2 to work. Fix it by not allowing firing quorum before the connection is established. In this case, ocfs2 will wait until the wrong configuration is fixed or networking is up to continue. Also update the log to guide the user where to check when connection is not built for a long time. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/ocfs2/dlmglue.c: use __seq_open_private() not seq_open()Rob Jones
Reduce boilerplate code by using seq_open_private() instead of seq_open() Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/ocfs2/cluster/netdebug.c: use seq_open_private() not seq_open()Rob Jones
Reduce boilerplate code by using seq_open_private() instead of seq_open() Note that the code in and using sc_common_open() has been quite extensively changed. Not least because there was a latent memory leak in the code as was: if sc_common_open() failed, the previously allocated buffer was not freed. Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/ocfs2/dlm/dlmdebug.c: use seq_open_private() not seq_open()Rob Jones
Reduce boilerplate code by using seq_open_private() instead of seq_open() Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2: remove unused code in dlm_new_lockres()Xue jiufei
Remove the branch that free res->lockname.name because the condition is never satisfied when jump to label error. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2/dlm: call dlm_lockres_put without resource spinlockalex chen
dlm_lockres_put() should be called without &res->spinlock, otherwise a deadlock case may happen. spin_lock(&res->spinlock) ... dlm_lockres_put ->dlm_lockres_release ->dlm_print_one_lock_resource ->spin_lock(&res->spinlock) Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2: call o2quo_exit() if malloc failed in o2net_init()Joseph Qi
In o2net_init, if malloc failed, it directly returns -ENOMEM. Then o2quo_exit won't be called in init_o2nm. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2: fix shift left operations overflowJoseph Qi
ocfs2_inode_info->ip_clusters and ocfs2_dinode->id1.bitmap1.i_total are defined as type u32, so the shift left operations may overflow if volume size is large, for example, 2TB and cluster size is 1MB. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2/dlm: refactor error handling in dlm_alloc_ctxtJoseph Qi
Refactoring error handling in dlm_alloc_ctxt to simplify code. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/ocfs2/stack_user.c: fix typo in ocfs2_control_release()Andrew Morton
It is supposed to zero pv_minor. Reported-by: Himangi Saraogi <himangi774@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09score: use Kbuild logic to include <asm-generic/sections.h>Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Lennox Wu <lennox.wu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ntfs: remove bogus spaceAndrea Gelmini
fs/ntfs/debug.c:124: WARNING: space prohibited between function name and open parenthesis '(' Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ntfs: use find_get_page_flags() to mark page accessed as it is no longer ↵Anton Altaparmakov
marked later on Mel Gorman's commit 2457aec63745 ("mm: non-atomically mark page accessed during page cache allocation where possible") removed mark_page_accessed() calls from NTFS without updating the matching find_lock_page() to find_get_page_flags(GFP_LOCK | FGP_ACCESSED) thus causing the page to never be marked accessed. This patch fixes that. Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09m32r: remove deprecated IRQF_DISABLEDMichael Opdenacker
This patch removes the use of the IRQF_DISABLED flag from arch/m32r/kernel/time.c It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09m32r: use Kbuild logic to include <asm-generic/sections.h>Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fanotify: enable close-on-exec on events' fd when requested in fanotify_init()Yann Droneaud
According to commit 80af258867648 ("fanotify: groups can specify their f_flags for new fd"), file descriptors created as part of file access notification events inherit flags from the event_f_flags argument passed to syscall fanotify_init(2)[1]. Unfortunately O_CLOEXEC is currently silently ignored. Indeed, event_f_flags are only given to dentry_open(), which only seems to care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in open_check_o_direct() and O_LARGEFILE in generic_file_open(). It's a pity, since, according to some lookup on various search engines and http://codesearch.debian.net/, there's already some userspace code which use O_CLOEXEC: - in systemd's readahead[2]: fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); - in clsync[3]: #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC) int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS); - in examples [4] from "Filesystem monitoring in the Linux kernel" article[5] by Aleksander Morgado: if ((fanotify_fd = fanotify_init (FAN_CLOEXEC, O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0) Additionally, since commit 48149e9d3a7e ("fanotify: check file flags passed in fanotify_init"). having O_CLOEXEC as part of fanotify_init() second argument is expressly allowed. So it seems expected to set close-on-exec flag on the file descriptors if userspace is allowed to request it with O_CLOEXEC. But Andrew Morton raised[6] the concern that enabling now close-on-exec might break existing applications which ask for O_CLOEXEC but expect the file descriptor to be inherited across exec(). In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file descriptor returned as part of file access notify can break applications due to deadlock. So close-on-exec is needed for most applications. More, applications asking for close-on-exec are likely expecting it to be enabled, relying on O_CLOEXEC being effective. If not, it might weaken their security, as noted by Jan Kara[8]. So this patch replaces call to macro get_unused_fd() by a call to function get_unused_fd_flags() with event_f_flags value as argument. This way O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is interpreted and close-on-exec get enabled when requested. [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294 [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631 https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38 [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/ [6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l [8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de> Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Mihai Don\u021bu <mihai.dontu@gmail.com> Cc: Pádraig Brady <P@draigBrady.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Jan Kara <jack@suse.cz> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Richard Guy Briggs <rgb@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fsnotify: don't put user context if it was never assignedSasha Levin
On some failure paths we may attempt to free user context even if it wasn't assigned yet. This will cause a NULL ptr deref and a kernel BUG. The path I was looking at is in inotify_new_group(): oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL); if (unlikely(!oevent)) { fsnotify_destroy_group(group); return ERR_PTR(-ENOMEM); } fsnotify_destroy_group() would get called here, but group->inotify_data.user is only getting assigned later: group->inotify_data.user = get_current_user(); Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09fs/notify/group.c: make fsnotify_final_destroy_group() staticAndrew Morton
No callers outside this file. Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09cris: use Kbuild logic to include <asm-generic/sections.h>Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mn10300: use Kbuild logic to include <asm-generic/sections.h>Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09net_sched: restore qdisc quota fairness limits after bulk dequeueJesper Dangaard Brouer
Restore the quota fairness between qdisc's, that we broke with commit 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"). Before that commit, the quota in __qdisc_run() were in packets as dequeue_skb() would only dequeue a single packet, that assumption broke with bulk dequeue. We choose not to account for the number of packets inside the TSO/GSO packets (accessable via "skb_gso_segs"). As the previous fairness also had this "defect". Thus, GSO/TSO packets counts as a single packet. Further more, we choose to slack on accuracy, by allowing a bulk dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only limited by the BQL bytelimit. This is done because BQL prefers to get its full budget for appropriate feedback from TX completion. In future, we might consider reworking this further and, if it allows, switch to a time-based model, as suggested by Eric. Right now, we only restore old semantics. Joint work with Eric, Hannes, Daniel and Jesper. Hannes wrote the first patch in cooperation with Daniel and Jesper. Eric rewrote the patch. Fixes: 5772e9a346 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09Merge branch 'r8152'David S. Miller
Hayes Wang says: ==================== r8152: use mutex for hw settings v2: Make sure the autoresume wouldn't occur inside the mutex, otherwise the dead lock would happen. For the purpose, adjust some code about autosuspend/autoresume. v1: Use mutex to avoid that the serial hw settings would be interrupted by other settings. Although there is no problem now, it makes the driver more safe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09r8152: add mutex for hw settingshayeswang
Use the mutex to avoid the settings are interrupted by other ones. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09r8152: adjust usb_autopm_xxxhayeswang
Add usb_autopm_xxx for rtl8152_get_settings() ,and remove usb_autopm_xxx from read_mii_word() and write_mii_word(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09r8152: autoresume before setting featurehayeswang
Resume the device before setting the feature. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09net: Missing @ before descriptions cause make xmldocs warningMasanari Iida
This patch fix following warning. Warning(.//net/core/skbuff.c:4142): No description found for parameter 'header_len' Warning(.//net/core/skbuff.c:4142): No description found for parameter 'data_len' Warning(.//net/core/skbuff.c:4142): No description found for parameter 'max_page_order' Warning(.//net/core/skbuff.c:4142): No description found for parameter 'errcode' Warning(.//net/core/skbuff.c:4142): No description found for parameter 'gfp_mask' Acutually the descriptions exist, but missing "@" in front. This problem start to happen when following commit was merged into Linus's tree during 3.18-rc1 merge period. commit 2e4e44107176d552f8bb1bb76053e850e3809841 net: add alloc_skb_with_frags() helper Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09Merge branch 'cxgb4'David S. Miller
Hariprasad Shenai says: ==================== cxgb4/cxgb4vf: Misc fixes and 40G support for cxgb4vf This patch series adds 40G support for cxgb4vf driver. Update the LSO length for cxgb4vf, fix macro. Wait for device to get ready before reading PL_WHOAMI register. The patches series is created against 'net-next' tree. And includes patches on cxgb4 and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09cxgb4: Wait for device to get ready before reading any registerHariprasad Shenai
Call t4_wait_dev_ready() before attempting to read the PL_WHOAMI register (to determine which function we have been attached to). This prevents us from failing on that read if it comes right after a RESET. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09cxgb4vf: Add 40G support for cxgb4vf driverHariprasad Shenai
Add 40G support for cxgb4vf driver. ethtool speed values are just numbers of megabits and there is no SPEED_40000 in ethtool speed values. To be consistent, use integer constants directly for all speeds. Use is_x_10g_port()("is 10Gb/s or higher") in cfg_queues() instead of is_10g_port() ("is exactly 10Gb/s"). Else we will end up using a single "Queue Set" on 40Gb/s adapters. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09cxgb4/cxgb4vf: Updated the LSO transfer length in CPL_TX_PKT_LSO for T5Hariprasad Shenai
Update the lso length for T5 adapter and fix PIDX_T5 macro Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-09Input: Add Microchip AR1021 i2c touchscreenChristian Gmeiner
This patch adds support for the ar1021 i2c based touchscreen. The driver is quite simple and only supports the Touch Reporting Protocol. This is the final version for an RFC patch send a while ago. http://www.spinics.net/lists/linux-input/msg29419.html Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-10-09Input: cros_ec_keyb - add of match tableSjoerd Simons
To enable the cros_ec_keyb driver to be auto-loaded when build as module add an of match table (and export it) to match the modalias information passed on to userspace as the Cros EC MFD driver registers the MFD subdevices with an of_compatibility string. Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-10-09Input: serio - avoid negative serio device numbersRichard Leitner
Fix the format string for serio device name generation to avoid negative device numbers when the id exceeds the maximum signed integer value. Signed-off-by: Richard Leitner <richard.leitner@skidata.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-10-09Input: avoid negative input device numbersRichard Leitner
Fix the format string for input device name generation to avoid negative device numbers when the id exceeds the maximum signed integer value. Signed-off-by: Richard Leitner <richard.leitner@skidata.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2014-10-09[IA64] Enable bpf syscall for ia64Tony Luck
See commit 99c55f7d47c0dc6fc64729f37bf435abf43f4c60 bpf: introduce BPF syscall and maps Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-10-09Merge tag 'pm+acpi-3.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "Features-wise, to me the most important this time is a rework of wakeup interrupts handling in the core that makes them work consistently across all of the available sleep states, including suspend-to-idle. Many thanks to Thomas Gleixner for his help with this work. Second is an update of the generic PM domains code that has been in need of some care for quite a while. Unused code is being removed, DT support is being added and domains are now going to be attached to devices in bus type code in analogy with the ACPI PM domain. The majority of work here was done by Ulf Hansson who also has been the most active developer this time. Apart from this we have a traditional ACPICA update, this time to upstream version 20140828 and a few ACPI wakeup interrupts handling patches on top of the general rework mentioned above. There also are several cpufreq commits including renaming the cpufreq-cpu0 driver to cpufreq-dt, as this is what implements generic DT-based cpufreq support, and a new DT-based idle states infrastructure for cpuidle. In addition to that, the ACPI LPSS driver is updated, ACPI support for Apple machines is improved, a few bugs are fixed and a few cleanups are made all over. Finally, the Adaptive Voltage Scaling (AVS) subsystem now has a tree maintained by Kevin Hilman that will be merged through the PM tree. Numbers-wise, the generic PM domains update takes the lead this time with 32 non-merge commits, second is cpufreq (15 commits) and the 3rd place goes to the wakeup interrupts handling rework (13 commits). Specifics: - Rework the handling of wakeup IRQs by the IRQ core such that all of them will be switched over to "wakeup" mode in suspend_device_irqs() and in that mode the first interrupt will abort system suspend in progress or wake up the system if already in suspend-to-idle (or equivalent) without executing any interrupt handlers. Among other things that eliminates the wakeup-related motivation to use the IRQF_NO_SUSPEND interrupt flag with interrupts which don't really need it and should not use it (Thomas Gleixner and Rafael Wysocki) - Switch over ACPI to handling wakeup interrupts with the help of the new mechanism introduced by the above IRQ core rework (Rafael Wysocki) - Rework the core generic PM domains code to eliminate code that's not used, add DT support and add a generic mechanism by which devices can be added to PM domains automatically during enumeration (Ulf Hansson, Geert Uytterhoeven and Tomasz Figa). - Add debugfs-based mechanics for debugging generic PM domains (Maciej Matraszek). - ACPICA update to upstream version 20140828. Included are updates related to the SRAT and GTDT tables and the _PSx methods are in the METHOD_NAME list now (Bob Moore and Hanjun Guo). - Add _OSI("Darwin") support to the ACPI core (unfortunately, that can't really be done in a straightforward way) to prevent Thunderbolt from being turned off on Apple systems after boot (or after resume from system suspend) and rework the ACPI Smart Battery Subsystem (SBS) driver to work correctly with Apple platforms (Matthew Garrett and Andreas Noever). - ACPI LPSS (Low-Power Subsystem) driver update cleaning up the code, adding support for 133MHz I2C source clock on Intel Baytrail to it and making it avoid using UART RTS override with Auto Flow Control (Heikki Krogerus). - ACPI backlight updates removing the video_set_use_native_backlight quirk which is not necessary any more, making the code check the list of output devices returned by the _DOD method to avoid creating acpi_video interfaces that won't work and adding a quirk for Lenovo Ideapad Z570 (Hans de Goede, Aaron Lu and Stepan Bujnak) - New Win8 ACPI OSI quirks for some Dell laptops (Edward Lin) - Assorted ACPI code cleanups (Fabian Frederick, Rasmus Villemoes, Sudip Mukherjee, Yijing Wang, and Zhang Rui) - cpufreq core updates and cleanups (Viresh Kumar, Preeti U Murthy, Rasmus Villemoes) - cpufreq driver updates: cpufreq-cpu0/cpufreq-dt (driver name change among other things), ppc-corenet, powernv (Viresh Kumar, Preeti U Murthy, Shilpasri G Bhat, Lucas Stach) - cpuidle support for DT-based idle states infrastructure, new ARM64 cpuidle driver, cpuidle core cleanups (Lorenzo Pieralisi, Rasmus Villemoes) - ARM big.LITTLE cpuidle driver updates: support for DT-based initialization and Exynos5800 compatible string (Lorenzo Pieralisi, Kevin Hilman) - Rework of the test_suspend kernel command line argument and a new trace event for console resume (Srinivas Pandruvada, Todd E Brandt) - Second attempt to optimize swsusp_free() (hibernation core) to make it avoid going through all PFNs which may be way too slow on some systems (Joerg Roedel) - devfreq updates (Paul Bolle, Punit Agrawal, Ãrjan Eide). - rockchip-io Adaptive Voltage Scaling (AVS) driver and AVS entry update in MAINTAINERS (Heiko Stübner, Kevin Hilman) - PM core fix related to clock management (Geert Uytterhoeven) - PM core's sysfs code cleanup (Johannes Berg)" * tag 'pm+acpi-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (105 commits) ACPI / fan: printk replacement PM / clk: Fix crash in clocks management code if !CONFIG_PM_RUNTIME PM / Domains: Rename cpu_data to cpuidle_data cpufreq: cpufreq-dt: fix potential double put of cpu OF node cpufreq: cpu0: rename driver and internals to 'cpufreq_dt' PM / hibernate: Iterate over set bits instead of PFNs in swsusp_free() cpufreq: ppc-corenet: remove duplicate update of cpu_data ACPI / sleep: Rework the handling of ACPI GPE wakeup from suspend-to-idle PM / sleep: Rename platform suspend/resume functions in suspend.c PM / sleep: Export dpm_suspend_late/noirq() and dpm_resume_early/noirq() ACPICA: Introduce acpi_enable_all_wakeup_gpes() ACPICA: Clear all non-wakeup GPEs in acpi_hw_enable_wakeup_gpe_block() ACPI / video: check _DOD list when creating backlight devices PM / Domains: Move dev_pm_domain_attach|detach() to pm_domain.h cpufreq: Replace strnicmp with strncasecmp cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec cpufreq: powernv: Set the pstate of the last hotplugged out cpu in policy->cpus to minimum cpufreq: Allow stop CPU callback to be used by all cpufreq drivers PM / devfreq: exynos: Enable building exynos PPMU as module PM / devfreq: Export helper functions for drivers ...
2014-10-09blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bioMing Lei
It isn't correct to figure out req->bi_phys_segments from bio->bi_vcnt if the bio is cloned. Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>