sched/core: Introduce set_special_state()

Gaurav reported a perceived problem with TASK_PARKED, which turned out to be a broken wait-loop pattern in __kthread_parkme(), but the reported issue can (and does) in fact happen for states that do not do condition based sleeps. When the 'current->state = TASK_RUNNING' store of a previous (concurrent) try_to_wake_up() collides with the setting of a 'special' sleep state, we can loose the sleep state. Normal condition based wait-loops are immune to this problem, but for sleep states that are not condition based are subject to this problem. There already is a fix for TASK_DEAD. Abstract that and also apply it to TASK_STOPPED and TASK_TRACED, both of which are also without condition based wait-loop. Reported-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Peter Zijlstra <peterz@infradead.org> 2018-04-30 14:51:01 +0200
committer: Ingo Molnar <mingo@kernel.org> 2018-05-04 07:54:54 +0200
commit: b5bf9a90bbebffba888c9144c5a8a10317b04064 (patch)
tree: 4c059f0785c26ca66df0009544a23b68f7228be3 /kernel/sched
parent: 85f1abe0019fcb3ea10df7029056cf42702283a8 (diff)
1 files changed, 1 insertions, 16 deletions
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7ad60e00a6a8..ffde9eebc846 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3508,23 +3508,8 @@ static void __sched notrace __schedule(bool preempt)
 
 void __noreturn do_task_dead(void)
 {
-	/*
-	 * The setting of TASK_RUNNING by try_to_wake_up() may be delayed
-	 * when the following two conditions become true.
-	 *   - There is race condition of mmap_sem (It is acquired by
-	 *     exit_mm()), and
-	 *   - SMI occurs before setting TASK_RUNINNG.
-	 *     (or hypervisor of virtual machine switches to other guest)
-	 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
-	 *
-	 * To avoid it, we have to wait for releasing tsk->pi_lock which
-	 * is held by try_to_wake_up()
-	 */
-	raw_spin_lock_irq(&current->pi_lock);
-	raw_spin_unlock_irq(&current->pi_lock);
-
 	/* Causes final put_task_struct in finish_task_switch(): */
-	__set_current_state(TASK_DEAD);
+	set_special_state(TASK_DEAD);
 
 	/* Tell freezer to ignore us: */
 	current->flags |= PF_NOFREEZE;
author	Peter Zijlstra <peterz@infradead.org>	2018-04-30 14:51:01 +0200
committer	Ingo Molnar <mingo@kernel.org>	2018-05-04 07:54:54 +0200
commit	b5bf9a90bbebffba888c9144c5a8a10317b04064 (patch)
tree	4c059f0785c26ca66df0009544a23b68f7228be3 /kernel/sched
parent	85f1abe0019fcb3ea10df7029056cf42702283a8 (diff)