summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-04-27rcutorture: Add torture tests for RCU Tasks RudePaul E. McKenney
This commit adds the definitions required to torture the rude flavor of RCU tasks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu-tasks: Add an RCU-tasks rude variantPaul E. McKenney
This commit adds a "rude" variant of RCU-tasks that has as quiescent states schedule(), cond_resched_tasks_rcu_qs(), userspace execution, and (in theory, anyway) cond_resched(). In other words, RCU-tasks rude readers are regions of code with preemption disabled, but excluding code early in the CPU-online sequence and late in the CPU-offline sequence. Updates make use of IPIs and force an IPI and a context switch on each online CPU. This variant is useful in some situations in tracing. Suggested-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: Apply EXPORT_SYMBOL_GPL() feedback from Qiujun Huang. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Apply review feedback from Steve Rostedt. ]
2020-04-27rcu-tasks: Refactor RCU-tasks to allow variants to be addedPaul E. McKenney
This commit splits out generic processing from RCU-tasks-specific processing in order to allow additional flavors to be added. It also adds a def_bool TASKS_RCU_GENERIC to enable the common RCU-tasks infrastructure code. This is primarily, but not entirely, a code-movement commit. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcutorture: Add a test for synchronize_rcu_mult()Paul E. McKenney
This commit adds a crude test for synchronize_rcu_mult(). This is currently a smoke test rather than a high-quality stress test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Reinstate synchronize_rcu_mult()Paul E. McKenney
With the advent and likely usage of synchronize_rcu_rude(), there is again a need to wait on multiple types of RCU grace periods, for example, call_rcu_tasks() and call_rcu_tasks_rude(). This commit therefore reinstates synchronize_rcu_mult() in order to allow these grace periods to be straightforwardly waited on concurrently. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu-tasks: Create struct to hold state informationPaul E. McKenney
This commit creates an rcu_tasks struct to hold state information for RCU Tasks. This is a preparation commit for adding additional flavors of Tasks RCU, each of which would have its own rcu_tasks struct. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu-tasks: Move Tasks RCU to its own filePaul E. McKenney
This code-movement-only commit is in preparation for adding an additional flavor of Tasks RCU, which relies on workqueues to detect grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add per-task state to RCU CPU stall warningsPaul E. McKenney
Currently, an RCU-preempt CPU stall warning simply lists the PIDs of those tasks holding up the current grace period. This can be helpful, but more can be even more helpful. To this end, this commit adds the nesting level, whether the task thinks it was preempted in its current RCU read-side critical section, whether RCU core has asked this task for a quiescent state, whether the expedited-grace-period hint is set, and whether the task believes that it is on the blocked-tasks list (it must be, or it would not be printed, but if things are broken, best not to take too much for granted). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27sched/core: Add function to sample state of locked-down taskPaul E. McKenney
A running task's state can be sampled in a consistent manner (for example, for diagnostic purposes) simply by invoking smp_call_function_single() on its CPU, which may be obtained using task_cpu(), then having the IPI handler verify that the desired task is in fact still running. However, if the task is not running, this sampling can in theory be done immediately and directly. In practice, the task might start running at any time, including during the sampling period. Gaining a consistent sample of a not-running task therefore requires that something be done to lock down the target task's state. This commit therefore adds a try_invoke_on_locked_down_task() function that invokes a specified function if the specified task can be locked down, returning true if successful and if the specified function returns true. Otherwise this function simply returns false. Given that the function passed to try_invoke_on_nonrunning_task() might be invoked with a runqueue lock held, that function had better be quite lightweight. The function is passed the target task's task_struct pointer and the argument passed to try_invoke_on_locked_down_task(), allowing easy access to task state and to a location for further variables to be passed in and out. Note that the specified function will be called even if the specified task is currently running. The function can use ->on_rq and task_curr() to quickly and easily determine the task's state, and can return false if this state is not to the function's liking. The caller of the try_invoke_on_locked_down_task() would then see the false return value, and could take appropriate action, for example, trying again later or sending an IPI if matters are more urgent. It is expected that use cases such as the RCU CPU stall warning code will simply return false if the task is currently running. However, there are use cases involving nohz_full CPUs where the specified function might instead fall back to an alternative sampling scheme that relies on heavier synchronization (such as memory barriers) in the target task. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> [ paulmck: Apply feedback from Peter Zijlstra and Steven Rostedt. ] [ paulmck: Invoke if running to handle feedback from Mathieu Desnoyers. ] Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu-tasks: Use context-switch hook for PREEMPT=y kernelsPaul E. McKenney
Currently, the PREEMPT=y version of rcu_note_context_switch() does not invoke rcu_tasks_qs(), and we need it to in order to keep RCU Tasks Trace's IPIs down to a dull roar. This commit therefore enables this hook. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add comments marking transitions between RCU watching and notPaul E. McKenney
It is not as clear as it might be just where in RCU's idle entry/exit code RCU stops and starts watching the current CPU. This commit therefore adds comments calling out the transitions. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcutorture: Add test of holding scheduler locks across rcu_read_unlock()Paul E. McKenney
Now that it should be safe to hold scheduler locks across rcu_read_unlock(), even in cases where the corresponding RCU read-side critical section might have been preempted and boosted, the commit adds a test of this capability to rcutorture. This has been tested on current mainline (which can deadlock in this situation), and lockdep duly reported the expected deadlock. On -rcu, lockdep is silent, thus far, anyway. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Don't use negative nesting depth in __rcu_read_unlock()Lai Jiangshan
Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_read_unlock() being invoked within such a handler that interrupted an __rcu_read_unlock_special(), in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for __rcu_read_unlock() to set the nesting level negative. This commit therefore removes this recursion-protection code from __rcu_read_unlock(). [ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ] [ paulmck: Adjust other checks given no more negative nesting. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs fieldLai Jiangshan
The ->rcu_read_unlock_special.b.deferred_qs field is set to true in rcu_read_unlock_special() but never set to false. This is not particularly useful, so this commit removes this field. The only possible justification for this field is to ease debugging of RCU deferred quiscent states, but the combination of the other ->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course ->rcu_read_lock_nesting should cover debugging needs. And if this last proves incorrect, this patch can always be reverted, along with the required setting of ->rcu_read_unlock_special.b.deferred_qs to false in rcu_preempt_deferred_qs_irqrestore(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Don't set nesting depth negative in rcu_preempt_deferred_qs()Lai Jiangshan
Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_preempt_deferred_qs() being invoked within such a handler that interrupted an extended RCU read-side critical section, in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for rcu_preempt_deferred_qs() to set the nesting level negative. This commit therefore removes this recursion-protection code from rcu_preempt_deferred_qs(). [ paulmck: Fix typo in commit log per Steve Rostedt. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Make rcu_read_unlock_special() safe for rq/pi locksPaul E. McKenney
The scheduler is currently required to hold rq/pi locks across the entire RCU read-side critical section or not at all. This is inconvenient and leaves traps for the unwary, including the author of this commit. But now that excessively long grace periods enable scheduling-clock interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in rcu_read_unlock_special() can be dispensed with. In other words, the rcu_read_unlock_special() function can refrain from doing wakeups unless such wakeups are guaranteed safe. This commit therefore avoids unsafe wakeups, freeing the scheduler to hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU read-side critical section might have been preempted. This commit also updates RCU's requirements documentation. This commit is inspired by a patch from Lai Jiangshan: https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com This commit is further intended to be a step towards his goal of permitting the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock(). Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add KCSAN stubs to update.cPaul E. McKenney
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add rcu_gp_might_be_stalled()Paul E. McKenney
This commit adds rcu_gp_might_be_stalled(), which returns true if there is some reason to believe that the RCU grace period is stalled. The use case is where an RCU free-memory path needs to allocate memory in order to free it, a situation that should be avoided where possible. But where it is necessary, there is always the alternative of using synchronize_rcu() to wait for a grace period in order to avoid the allocation. And if the grace period is stalled, allocating memory to asynchronously wait for it is a bad idea of epic proportions: Far better to let others use the memory, because these others might actually be able to free that memory before the grace period ends. Thus, rcu_gp_might_be_stalled() can be used to help decide whether allocating memory on an RCU free path is a semi-reasonable course of action. Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu/tree: Count number of batched kfree_rcu() locklesslyJoel Fernandes (Google)
We can relax the correctness of counting of number of queued objects in favor of not hurting performance, by locklessly sampling per-cpu counters. This should be Ok since under high memory pressure, it should not matter if we are off by a few objects while counting. The shrinker will still do the reclaim. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Remove unused "flags" variable. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu/tree: Add a shrinker to prevent OOM due to kfree_rcu() batchingJoel Fernandes (Google)
To reduce grace periods and improve kfree() performance, we have done batching recently dramatically bringing down the number of grace periods while giving us the ability to use kfree_bulk() for efficient kfree'ing. However, this has increased the likelihood of OOM condition under heavy kfree_rcu() flood on small memory systems. This patch introduces a shrinker which starts grace periods right away if the system is under memory pressure due to existence of objects that have still not started a grace period. With this patch, I do not observe an OOM anymore on a system with 512MB RAM and 8 CPUs, with the following rcuperf options: rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_mult=2 Otherwise it easily OOMs with the above parameters. NOTE: 1. On systems with no memory pressure, the patch has no effect as intended. 2. In the future, we can use this same mechanism to prevent grace periods from happening even more, by relying on shrinkers carefully. Cc: urezki@gmail.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcuperf: Add ability to increase object allocation sizeJoel Fernandes (Google)
This allows us to increase memory pressure dynamically using a new rcuperf boot command line parameter called 'rcumult'. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert rcu_nohz_full_cpu() ULONG_CMP_LT() to time_before()Paul E. McKenney
This commit converts the ULONG_CMP_LT() in rcu_nohz_full_cpu() to time_before() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert rcu_initiate_boost() ULONG_CMP_GE() to time_after()Paul E. McKenney
This commit converts the ULONG_CMP_GE() in rcu_initiate_boost() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert ULONG_CMP_GE() to time_after() for jiffy comparisonPaul E. McKenney
This commit converts the ULONG_CMP_GE() in rcu_gp_fqs_loop() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Replace 1 by trueJules Irenge
Coccinelle reports a warning at use_softirq declaration WARNING: Assignment of 0/1 to bool variable The root cause is use_softirq a variable of bool type is initialised with the integer 1 Replacing 1 with value true solve the issue. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Replace assigned pointer ret value by corresponding boolean valueJules Irenge
Coccinelle reports warnings at rcu_read_lock_held_common() WARNING: Assignment of 0/1 to bool variable To fix this, the assigned pointer ret values are replaced by corresponding boolean value. Given that ret is a pointer of bool type Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Mark rcu_state.gp_seq to detect more concurrent writesPaul E. McKenney
The rcu_state structure's gp_seq field is only to be modified by the RCU grace-period kthread, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. This commit applies KCSAN-specific primitives, so cannot go upstream until KCSAN does. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Get rid of some doc warnings in update.cMauro Carvalho Chehab
This commit escapes *ret, because otherwise the documentation system thinks that this is an incomplete emphasis block: ./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:70: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:82: WARNING: Inline emphasis start-string without end-string. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27Revert "rculist: Describe variadic macro argument in a Sphinx-compatible way"Jonathan Neuschäfer
This reverts commit f452ee096d95482892b101bde4fd037fa025d3cc. The workaround became unnecessary with commit 43756e347f21 ("scripts/kernel-doc: Add support for named variable macro arguments"). Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Fix the (t=0 jiffies) false positiveZhaolong Zhang
It is possible that an over-long grace period will end while the RCU CPU stall warning message is printing. In this case, the estimate of the offending grace period's duration can be erroneous due to refetching of rcu_state.gp_start, which will now be the time of the newly started grace period. Computation of this duration clearly needs to use the start time for the old over-long grace period, not the fresh new one. This commit avoids such errors by causing both print_other_cpu_stall() and print_cpu_stall() to reuse the value previously fetched by their caller. Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Expedite first two FQS scans under callback-overload conditionsPaul E. McKenney
Even if some CPUs have excessive numbers of callbacks, RCU's grace-period kthread will still wait normally between successive force-quiescent-state scans. The first two are the most important, as they are the ones that enlist aid from the scheduler when overloaded. This commit therefore omits the wait before the first and the second force-quiescent-state scan under callback-overload conditions. This approach was inspired by a discussion with Jeff Roberson. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27drm: Make drm_dp_mst_dsc_aux_for_port() safe for old compilersPaul E. McKenney
Older compilers either want two extra pairs of curly braces around the initializer for local variable desc, or they want a single pair of curly braces with nothing inside. Because current Linux-kernel practice favors the latter, this commit makes it so. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Joe Perches <joe@perches.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Use data_race() for RCU CPU stall-warning printsPaul E. McKenney
Although the accesses used to determine whether or not a stall should be printed are an integral part of the concurrency algorithm governing use of the corresponding variables, the values that are simply printed are ancillary. As such, it is best to use data_race() for these accesses in order to provide the greatest latitude in the use of KCSAN for the other accesses that are an integral part of the algorithm. This commit therefore changes the relevant uses of READ_ONCE() to data_race(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add WRITE_ONCE() to rcu_node ->boost_tasksPaul E. McKenney
The rcu_node structure's ->boost_tasks field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27srcu: Add data_race() to ->srcu_lock_count and ->srcu_unlock_count arraysPaul E. McKenney
The srcu_data structure's ->srcu_lock_count and ->srcu_unlock_count arrays are read and written locklessly, so this commit adds the data_race() to the diagnostic-print loads from these arrays in order mark them as known and approved data-racy accesses. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being used only by rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add READ_ONCE and data_race() to rcu_node ->boost_tasksPaul E. McKenney
The rcu_node structure's ->boost_tasks field is read locklessly, so this commit adds the READ_ONCE() to one load in order to avoid destructive compiler optimizations. The other load is from a diagnostic print, so data_race() suffices. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add *_ONCE() and data_race() to rcu_node ->exp_tasks plus lockingPaul E. McKenney
There are lockless loads from the rcu_node structure's ->exp_tasks field, so this commit causes all stores to use WRITE_ONCE() and all lockless loads to use READ_ONCE() or data_race(), with the latter for debug prints. This code also did a unprotected traversal of the linked list pointed into by ->exp_tasks, so this commit also acquires the rcu_node structure's ->lock to properly protect this traversal. This list was traversed unprotected only when printing an RCU CPU stall warning for an expedited grace period, so the odds of seeing this in production are not all that high. This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Mark rcu_state.ncpus to detect concurrent writesPaul E. McKenney
The rcu_state structure's ncpus field is only to be modified by the CPU-hotplug CPU-online code path, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27srcu: Add KCSAN stubsPaul E. McKenney
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add KCSAN stubsPaul E. McKenney
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27Merge branch 'vsock-virtio-fixes-about-packet-delivery-to-monitoring-devices'David S. Miller
Stefano Garzarella says: ==================== vsock/virtio: fixes about packet delivery to monitoring devices During the review of v1, Stefan pointed out an issue introduced by that patch, where replies can appear in the packet capture before the transmitted packet. While fixing my patch, reverting it and adding a new flag in 'struct virtio_vsock_pkt' (patch 2/2), I found that we already had that issue in vhost-vsock, so I fixed it (patch 1/2). v1 -> v2: - reverted the v1 patch, to avoid that replies can appear in the packet capture before the transmitted packet [Stefan] - added patch to fix packet delivering to monitoring devices in vhost-vsock - added patch to check if the packet is already delivered to monitoring devices v1: https://patchwork.ozlabs.org/project/netdev/patch/20200421092527.41651-1-sgarzare@redhat.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27vsock/virtio: fix multiple packet delivery to monitoring devicesStefano Garzarella
In virtio_transport.c, if the virtqueue is full, the transmitting packet is queued up and it will be sent in the next iteration. This causes the same packet to be delivered multiple times to monitoring devices. We want to continue to deliver packets to monitoring devices before it is put in the virtqueue, to avoid that replies can appear in the packet capture before the transmitted packet. This patch fixes the issue, adding a new flag (tap_delivered) in struct virtio_vsock_pkt, to check if the packet is already delivered to monitoring devices. In vhost/vsock.c, we are splitting packets, so we must set 'tap_delivered' to false when we queue up the same virtio_vsock_pkt to handle the remaining bytes. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27vhost/vsock: fix packet delivery order to monitoring devicesStefano Garzarella
We want to deliver packets to monitoring devices before it is put in the virtqueue, to avoid that replies can appear in the packet capture before the transmitted packet. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27drm/dp_mst: Fix drm_dp_send_dpcd_write() return codeLyude Paul
drm_dp_mst_wait_tx_reply() returns > 1 if time elapsed in wait_event_timeout() before check_txmsg_state(mgr, txmsg) evaluated to true. However, we make the mistake of returning this time from drm_dp_send_dpcd_write() on success instead of returning the number of bytes written - causing spontaneous failures during link probing: [drm:drm_dp_send_link_address [drm_kms_helper]] *ERROR* GUID check on 10:01 failed: 3975 Yikes! So, fix this by returning the number of bytes written on success instead. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: cb897542c6d2 ("drm/dp_mst: Fix W=1 warnings") Cc: Benjamin Gaignard <benjamin.gaignard@st.com> Cc: Sean Paul <sean@poorly.run> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20200424190722.775284-1-lyude@redhat.com
2020-04-27drm/i915/gt: Check cacheline is valid before acquiringChris Wilson
The hwsp_cacheline pointer from i915_request is very, very flimsy. The i915_request.timeline (and the hwsp_cacheline) are lost upon retiring (after an RCU grace). Therefore we need to confirm that once we have the right pointer for the cacheline, it is not in the process of being retired and disposed of before we attempt to acquire a reference to the cacheline. <3>[ 547.208237] BUG: KASAN: use-after-free in active_debug_hint+0x6a/0x70 [i915] <3>[ 547.208366] Read of size 8 at addr ffff88822a0d2710 by task gem_exec_parall/2536 <4>[ 547.208547] CPU: 3 PID: 2536 Comm: gem_exec_parall Tainted: G U 5.7.0-rc2-ged7a286b5d02d-kasan_117+ #1 <4>[ 547.208556] Hardware name: Dell Inc. XPS 13 9350/, BIOS 1.4.12 11/30/2016 <4>[ 547.208564] Call Trace: <4>[ 547.208579] dump_stack+0x96/0xdb <4>[ 547.208707] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208719] print_address_description.constprop.6+0x16/0x310 <4>[ 547.208841] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208963] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208975] __kasan_report+0x137/0x190 <4>[ 547.209106] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.209127] kasan_report+0x32/0x50 <4>[ 547.209257] ? i915_gemfs_fini+0x40/0x40 [i915] <4>[ 547.209376] active_debug_hint+0x6a/0x70 [i915] <4>[ 547.209389] debug_print_object+0xa7/0x220 <4>[ 547.209405] ? lockdep_hardirqs_on+0x348/0x5f0 <4>[ 547.209426] debug_object_assert_init+0x297/0x430 <4>[ 547.209449] ? debug_object_free+0x360/0x360 <4>[ 547.209472] ? lock_acquire+0x1ac/0x8a0 <4>[ 547.209592] ? intel_timeline_read_hwsp+0x4f/0x840 [i915] <4>[ 547.209737] ? i915_active_acquire_if_busy+0x66/0x120 [i915] <4>[ 547.209861] i915_active_acquire_if_busy+0x66/0x120 [i915] <4>[ 547.209990] ? __live_alloc.isra.15+0xc0/0xc0 [i915] <4>[ 547.210005] ? rcu_read_lock_sched_held+0xd0/0xd0 <4>[ 547.210017] ? print_usage_bug+0x580/0x580 <4>[ 547.210153] intel_timeline_read_hwsp+0xbc/0x840 [i915] <4>[ 547.210284] __emit_semaphore_wait+0xd5/0x480 [i915] <4>[ 547.210415] ? i915_fence_get_timeline_name+0x110/0x110 [i915] <4>[ 547.210428] ? lockdep_hardirqs_on+0x348/0x5f0 <4>[ 547.210442] ? _raw_spin_unlock_irq+0x2a/0x40 <4>[ 547.210567] ? __await_execution.constprop.51+0x2e0/0x570 [i915] <4>[ 547.210706] i915_request_await_dma_fence+0x8f7/0xc70 [i915] Fixes: 85bedbf191e8 ("drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200427093038.29219-1-chris@chris-wilson.co.uk (cherry picked from commit 2759e395358b2b909577928894f856ab75bea41a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-04-27drm/i915/gem: Hold obj->vma.lock over for_each_ggtt_vma()Chris Wilson
While the ggtt vma are protected by their object lifetime, the list continues until it hits a non-ggtt vma, and that vma is not protected and may be freed as we inspect it. Hence, we require the obj->vma.lock to protect the list as we iterate. An example of forgetting to hold the obj->vma.lock is [1642834.464973] general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] SMP PTI [1642834.464977] CPU: 3 PID: 1954 Comm: Xorg Not tainted 5.6.0-300.fc32.x86_64 #1 [1642834.464979] Hardware name: LENOVO 20ARS25701/20ARS25701, BIOS GJET94WW (2.44 ) 09/14/2017 [1642834.465021] RIP: 0010:i915_gem_object_set_tiling+0x2c0/0x3e0 [i915] [1642834.465024] Code: 8b 84 24 18 01 00 00 f6 c4 80 74 59 49 8b 94 24 a0 00 00 00 49 8b 84 24 e0 00 00 00 49 8b 74 24 10 48 8b 92 30 01 00 00 89 c7 <80> ba 0a 06 00 00 03 0f 87 86 00 00 00 ba 00 00 08 00 b9 00 00 10 [1642834.465025] RSP: 0018:ffffa98780c77d60 EFLAGS: 00010282 [1642834.465028] RAX: ffff8d232bfb2578 RBX: 0000000000000002 RCX: ffff8d25873a0000 [1642834.465029] RDX: dead000000000122 RSI: fffff0af8ac6e408 RDI: 000000002bfb2578 [1642834.465030] RBP: ffff8d25873a0000 R08: ffff8d252bfb5638 R09: 0000000000000000 [1642834.465031] R10: 0000000000000000 R11: ffff8d252bfb5640 R12: ffffa987801cb8f8 [1642834.465032] R13: 0000000000001000 R14: ffff8d233e972e50 R15: ffff8d233e972d00 [1642834.465034] FS: 00007f6a3d327f00(0000) GS:ffff8d25926c0000(0000) knlGS:0000000000000000 [1642834.465036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1642834.465037] CR2: 00007f6a2064d000 CR3: 00000002fb57c001 CR4: 00000000001606e0 [1642834.465038] Call Trace: [1642834.465083] i915_gem_set_tiling_ioctl+0x122/0x230 [i915] [1642834.465121] ? i915_gem_object_set_tiling+0x3e0/0x3e0 [i915] [1642834.465151] drm_ioctl_kernel+0x86/0xd0 [drm] [1642834.465156] ? avc_has_perm+0x3b/0x160 [1642834.465178] drm_ioctl+0x206/0x390 [drm] [1642834.465216] ? i915_gem_object_set_tiling+0x3e0/0x3e0 [i915] [1642834.465221] ? selinux_file_ioctl+0x122/0x1c0 [1642834.465226] ? __do_munmap+0x24b/0x4d0 [1642834.465231] ksys_ioctl+0x82/0xc0 [1642834.465235] __x64_sys_ioctl+0x16/0x20 [1642834.465238] do_syscall_64+0x5b/0xf0 [1642834.465243] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [1642834.465245] RIP: 0033:0x7f6a3d7b047b [1642834.465247] Code: 0f 1e fa 48 8b 05 1d aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed a9 0c 00 f7 d8 64 89 01 48 [1642834.465249] RSP: 002b:00007ffe71adba28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [1642834.465251] RAX: ffffffffffffffda RBX: 000055f99048fa40 RCX: 00007f6a3d7b047b [1642834.465253] RDX: 00007ffe71adba30 RSI: 00000000c0106461 RDI: 000000000000000e [1642834.465254] RBP: 0000000000000002 R08: 000055f98f3f1798 R09: 0000000000000002 [1642834.465255] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000080 [1642834.465257] R13: 000055f98f3f1690 R14: 00000000c0106461 R15: 00007ffe71adba30 Now to take the spinlock during the list iteration, we need to break it down into two phases. In the first phase under the lock, we cannot sleep and so must defer the actual work to a second list, protected by the ggtt->mutex. We also need to hold the spinlock during creation of a new vma to serialise with updates of the tiling on the object. Reported-by: Dave Airlie <airlied@redhat.com> Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200422072805.17340-1-chris@chris-wilson.co.uk (cherry picked from commit cb593e5d2b6d3ad489669914d9fd1c64c7a4a6af) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-04-27drm/i915/selftests: Fix i915_address_space refcnt leakXiyu Yang
igt_ppgtt_pin_update() invokes i915_gem_context_get_vm_rcu(), which returns a reference of the i915_address_space object to "vm" with increased refcount. When igt_ppgtt_pin_update() returns, "vm" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in two exception handling paths of igt_ppgtt_pin_update(). When i915_gem_object_create_internal() returns IS_ERR, the refcnt increased by i915_gem_context_get_vm_rcu() is not decreased, causing a refcnt leak. Fix this issue by jumping to "out_vm" label when i915_gem_object_create_internal() returns IS_ERR. Fixes: a4e7ccdac38e ("drm/i915: Move context management under GEM") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/1587361342-83494-1-git-send-email-xiyuyang19@fudan.edu.cn (cherry picked from commit e07c7606a00c4361bad72ff4e72ed0dfbefa23b0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-04-27io_uring: statx must grab the file table for valid fdJens Axboe
Clay reports that OP_STATX fails for a test case with a valid fd and empty path: -- Test 0: statx:fd 3: SUCCEED, file mode 100755 -- Test 1: statx:path ./uring_statx: SUCCEED, file mode 100755 -- Test 2: io_uring_statx:fd 3: FAIL, errno 9: Bad file descriptor -- Test 3: io_uring_statx:path ./uring_statx: SUCCEED, file mode 100755 This is due to statx not grabbing the process file table, hence we can't lookup the fd in async context. If the fd is valid, ensure that we grab the file table so we can grab the file from async context. Cc: stable@vger.kernel.org # v5.6 Reported-by: Clay Harris <bugs@claycon.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-27dmaengine: dmatest: Fix iteration non-stop logicAndy Shevchenko
Under some circumstances, i.e. when test is still running and about to time out and user runs, for example, grep -H . /sys/module/dmatest/parameters/* the iterations parameter is not respected and test is going on and on until user gives echo 0 > /sys/module/dmatest/parameters/run This is not what expected. The history of this bug is interesting. I though that the commit 2d88ce76eb98 ("dmatest: add a 'wait' parameter") is a culprit, but looking closer to the code I think it simple revealed the broken logic from the day one, i.e. in the commit 0a2ff57d6fba ("dmaengine: dmatest: add a maximum number of test iterations") which adds iterations parameter. So, to the point, the conditional of checking the thread to be stopped being first part of conjunction logic prevents to check iterations. Thus, we have to always check both conditions to be able to stop after given iterations. Since it wasn't visible before second commit appeared, I add a respective Fixes tag. Fixes: 2d88ce76eb98 ("dmatest: add a 'wait' parameter") Cc: Dan Williams <dan.j.williams@intel.com> Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20200424161147.16895-1-andriy.shevchenko@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-04-27dmaengine: tegra-apb: Ensure that clock is enabled during of DMA synchronizationDmitry Osipenko
DMA synchronization hook checks whether interrupt is raised by testing corresponding bit in a hardware status register, and thus, clock should be enabled in this case, otherwise CPU may hang if synchronization is invoked while Runtime PM is in suspended state. This patch resumes the RPM during of the DMA synchronization process in order to avoid potential problems. It is a minor clean up of a previous commit, no real problem is fixed by this patch because currently RPM is always in a resumed state while DMA is synchronized, although this may change in the future. Fixes: 6697255f239f ("dmaengine: tegra-apb: Improve DMA synchronization") Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Link: https://lore.kernel.org/r/20200426190835.21950-1-digetx@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org>