From 1f3ebc8253ee56bfaa883c5114fb5569c56f6197 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 4 Jun 2019 14:05:52 -0700 Subject: rcu: Restore barrier() to rcu_read_lock() and rcu_read_unlock() Commit bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers") removed the barrier() calls from rcu_read_lock() and rcu_write_lock() in CONFIG_PREEMPT=n&&CONFIG_PREEMPT_COUNT=n kernels. Within RCU, this commit was OK, but it failed to account for things like get_user() that can pagefault and that can be reordered by the compiler. Lack of the barrier() calls in rcu_read_lock() and rcu_read_unlock() can cause these page faults to migrate into RCU read-side critical sections, which in CONFIG_PREEMPT=n kernels could result in too-short grace periods and arbitrary misbehavior. Please see commit 386afc91144b ("spinlocks and preemption points need to be at least compiler barriers") and Linus's commit 66be4e66a7f4 ("rcu: locking and unlocking need to always be at least barriers"), this last of which restores the barrier() call to both rcu_read_lock() and rcu_read_unlock(). This commit removes barrier() calls that are no longer needed given that the addition of them in Linus's commit noted above. The combination of this commit and Linus's commit effectively reverts commit bb73c52bad36 ("rcu: Don't disable preemption for Tiny and Tree RCU readers"). Reported-by: Herbert Xu Reported-by: Linus Torvalds Signed-off-by: Paul E. McKenney [ paulmck: Fix embarrassing typo located by Alan Stern. ] --- .../RCU/Design/Requirements/Requirements.html | 71 ++++++++++++++++++++++ 1 file changed, 71 insertions(+) (limited to 'Documentation/RCU') diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 5a9238a2883c..f04c467e55c5 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
  • Hotplug CPU.
  • Scheduler and RCU.
  • Tracing and RCU. +
  • +Accesses to User Memory and RCU.
  • Energy Efficiency.
  • Scheduling-Clock Interrupts and RCU. @@ -2521,6 +2523,75 @@ cannot be used. The tracing folks both located the requirement and provided the needed fix, so this surprise requirement was relatively painless. +

    +Accesses to User Memory and RCU

    + +

    +The kernel needs to access user-space memory, for example, to access +data referenced by system-call parameters. +The get_user() macro does this job. + +

    +However, user-space memory might well be paged out, which means +that get_user() might well page-fault and thus block while +waiting for the resulting I/O to complete. +It would be a very bad thing for the compiler to reorder +a get_user() invocation into an RCU read-side critical +section. +For example, suppose that the source code looked like this: + +

    +
    + 1 rcu_read_lock();
    + 2 p = rcu_dereference(gp);
    + 3 v = p->value;
    + 4 rcu_read_unlock();
    + 5 get_user(user_v, user_p);
    + 6 do_something_with(v, user_v);
    +
    +
    + +

    +The compiler must not be permitted to transform this source code into +the following: + +

    +
    + 1 rcu_read_lock();
    + 2 p = rcu_dereference(gp);
    + 3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!!
    + 4 v = p->value;
    + 5 rcu_read_unlock();
    + 6 do_something_with(v, user_v);
    +
    +
    + +

    +If the compiler did make this transformation in a +CONFIG_PREEMPT=n kernel build, and if get_user() did +page fault, the result would be a quiescent state in the middle +of an RCU read-side critical section. +This misplaced quiescent state could result in line 4 being +a use-after-free access, which could be bad for your kernel's +actuarial statistics. +Similar examples can be constructed with the call to get_user() +preceding the rcu_read_lock(). + +

    +Unfortunately, get_user() doesn't have any particular +ordering properties, and in some architectures the underlying asm +isn't even marked volatile. +And even if it was marked volatile, the above access to +p->value is not volatile, so the compiler would not have any +reason to keep those two accesses in order. + +

    +Therefore, the Linux-kernel definitions of rcu_read_lock() +and rcu_read_unlock() must act as compiler barriers, +at least for outermost instances of rcu_read_lock() and +rcu_read_unlock() within a nested set of RCU read-side critical +sections. +

    Energy Efficiency

    -- cgit From 0500873de968df6fdef5752d7bbdca317ddc220b Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 8 Jul 2019 08:01:50 -0700 Subject: doc: Add rcutree.kthread_prio pointer to stallwarn.txt This commit adds mention of the rcutree.kthread_prio kernel boot parameter to the discussion of how high-priority real-time tasks can result in RCU CPU stall warnings. (However, this does not necessarily help when the high-priority real-time tasks are using dubious deadlines.) Signed-off-by: Paul E. McKenney --- Documentation/RCU/stallwarn.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation/RCU') diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 13e88fc00f01..f48f4621ccbc 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that CONFIG_PREEMPT_RCU case, you might see stall-warning messages. + You can use the rcutree.kthread_prio kernel boot parameter to + increase the scheduling priority of RCU's kthreads, which can + help avoid this problem. However, please note that doing this + can increase your system's context-switch rate and thus degrade + performance. + o A periodic interrupt whose handler takes longer than the time interval between successive pairs of interrupts. This can prevent RCU's kthreads and softirq handlers from running. -- cgit From 0a5b99f57873e233ad42ef71e23c629f6ea1fcfe Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Thu, 11 Jul 2019 16:45:41 -0400 Subject: treewide: Rename rcu_dereference_raw_notrace() to _check() The rcu_dereference_raw_notrace() API name is confusing. It is equivalent to rcu_dereference_raw() except that it also does sparse pointer checking. There are only a few users of rcu_dereference_raw_notrace(). This patches renames all of them to be rcu_dereference_raw_check() with the "_check()" indicating sparse checking. Signed-off-by: Joel Fernandes (Google) [ paulmck: Fix checkpatch warnings about parentheses. ] Signed-off-by: Paul E. McKenney --- Documentation/RCU/Design/Requirements/Requirements.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/RCU') diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 5a9238a2883c..bdbc84f1b949 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -2512,7 +2512,7 @@ disabled across the entire RCU read-side critical section.

    It is possible to use tracing on RCU code, but tracing itself uses RCU. -For this reason, rcu_dereference_raw_notrace() +For this reason, rcu_dereference_raw_check() is provided for use by tracing, which avoids the destructive recursion that could otherwise ensue. This API is also used by virtualization in some architectures, -- cgit