Age | Commit message (Collapse) | Author |
|
The rcu_data structure's ->rcu_forced_tick field is read locklessly, so
this commit adds WRITE_ONCE() to all updates and READ_ONCE() to all
lockless reads.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_data structure's ->gpwrap field is read locklessly, and so
this commit adds the required READ_ONCE() to a pair of laods in order
to avoid destructive compiler optimizations.
This data race was reported by KCSAN.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Convert to plural and add a note that this is for Tree RCU.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The various RCU structures' ->gp_seq, ->gp_seq_needed, ->gp_req_activity,
and ->gp_activity fields are read locklessly, so they must be updated with
WRITE_ONCE() and, when read locklessly, with READ_ONCE(). This commit makes
these changes.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_segcblist structure's ->tails[] array entries are read
locklessly, so this commit adds the READ_ONCE() to a load in order to
avoid destructive compiler optimizations.
This data race was reported by KCSAN.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rt_mutex structure's ->owner field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
|
|
The rcu_state structure's ->qsmaskinitnext field is read locklessly,
so this commit adds the WRITE_ONCE() to an update in order to provide
proper documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely for systems not doing incessant CPU-hotplug
operations.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_state structure's ->gp_req_activity field is read locklessly,
so this commit adds the WRITE_ONCE() to an update in order to provide
proper documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_node structure's ->gp_seq field is read locklessly, so this
commit adds the READ_ONCE() to several loads in order to avoid
destructive compiler optimizations.
This data race was reported by KCSAN. Not appropriate for backporting
because this affects only tracing and warnings.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_node structure's ->exp_seq_rq field is read locklessly, so
this commit adds the WRITE_ONCE() to a load in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_node structure's ->qsmask field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit adds "-g -fno-omit-frame-pointer" to ease interpretation
of KCSAN output, but only for CONFIG_KCSAN=y kerrnels.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The rcu_node structure's ->exp_seq_rq field is accessed locklessly, so
updates must use WRITE_ONCE(). This commit therefore adds the needed
WRITE_ONCE() invocation where it was missed.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The for_each_leaf_node_cpu_mask() and for_each_leaf_node_possible_cpu()
macros must be invoked only on leaf rcu_node structures. Failing to
abide by this restriction can result in infinite loops on systems with
more than 64 CPUs (or for more than 32 CPUs on 32-bit systems). This
commit therefore adds WARN_ON_ONCE() calls to make misuse of these two
macros easier to debug.
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Set CONFIG_BOOT_CONFIG=n by default. This also warns
user if CONFIG_BOOT_CONFIG=n but "bootconfig" is given
in the kernel command line.
Link: http://lkml.kernel.org/r/158220111291.26565.9036889083940367969.stgit@devnote2
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Clear trace_state data structure when starting trace
in __synth_event_trace_start() internal function.
Currently trace_state is initialized only in the
synth_event_trace_start() API, but the trace_state
in synth_event_trace() and synth_event_trace_array()
are on the stack without initialization.
This means those APIs will see wrong parameters and
wil skip closing process in __synth_event_trace_end()
because trace_state->disabled may be !0.
Link: http://lkml.kernel.org/r/158193315899.8868.1781259176894639952.stgit@devnote2
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The tracing seftests checks various aspects of the tracing infrastructure,
and one is filtering. If trace_printk() is active during a self test, it can
cause the filtering to fail, which will disable that part of the trace.
To keep the selftests from failing because of trace_printk() calls,
trace_printk() checks the variable tracing_selftest_running, and if set, it
does not write to the tracing buffer.
As some tracers were registered earlier in boot, the selftest they triggered
would fail because not all the infrastructure was set up for the full
selftest. Thus, some of the tests were post poned to when their
infrastructure was ready (namely file system code). The postpone code did
not set the tracing_seftest_running variable, and could fail if a
trace_printk() was added and executed during their run.
Cc: stable@vger.kernel.org
Fixes: 9afecfbb95198 ("tracing: Postpone tracer start-up tests till the system is more robust")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The test code that tests synthetic event creation pushes in as one of its
test fields the current CPU using "smp_processor_id()". As this is just
something to see if the value is correctly passed in, and the actual CPU
used does not matter, use raw_smp_processor_id(), otherwise with debug
preemption enabled, a warning happens as the smp_processor_id() is called
without preemption enabled.
Link: http://lkml.kernel.org/r/20200220162950.35162579@gandalf.local.home
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Fix a varargs-related bug in print_synth_event() which resulted in
strange output and oopses on 32-bit x86 systems. The problem is that
trace_seq_printf() expects the varargs to match the format string, but
print_synth_event() was always passing u64 values regardless. This
results in unspecified behavior when unpacking with va_arg() in
trace_seq_printf().
Add a function that takes the size into account when calling
trace_seq_printf().
Before:
modprobe-1731 [003] .... 919.039758: gen_synth_test: next_pid_field=777(null)next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=3(null)my_string_field=thneed my_int_field=598(null)
After:
insmod-1136 [001] .... 36.634590: gen_synth_test: next_pid_field=777 next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=1 my_string_field=thneed my_int_field=598
Link: http://lkml.kernel.org/r/a9b59eb515dbbd7d4abe53b347dccf7a8e285657.1581720155.git.zanussi@kernel.org
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Commit 7276531d4036('tracing: Consolidate trace() functions')
inadvertently dropped the synth_event_trace() and
synth_event_trace_array() checks that verify the number of values
passed in matches the number of fields in the synthetic event being
traced, so add them back.
Link: http://lkml.kernel.org/r/32819cac708714693669e0dfe10fe9d935e94a16.1581720155.git.zanussi@kernel.org
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
synth_event_trace(), synth_event_trace_array() and
__synth_event_add_val() write directly into the trace buffer and need
to take endianness into account, like trace_event_raw_event_synth()
does.
Link: http://lkml.kernel.org/r/2011354355e405af9c9d28abba430d1f5ff7771a.1581720155.git.zanussi@kernel.org
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
synth_event_trace() is the varargs version of synth_event_trace_array(),
which takes an array of u64, as do synth_event_add_val() et al.
To not only be consistent with those, but also to address the fact
that synth_event_trace() expects every arg to be of the same type
since it doesn't also pass in e.g. a format string, the caller needs
to make sure all args are of the same type, u64. u64 is used because
it needs to accomodate the largest type available in synthetic events,
which is u64.
This fixes the bug reported by the kernel test robot/Rong Chen.
Link: https://lore.kernel.org/lkml/20200212113444.GS12867@shao2-debian/
Link: http://lkml.kernel.org/r/894c4e955558b521210ee0642ba194a9e603354c.1581720155.git.zanussi@kernel.org
Fixes: 9fe41efaca084 ("tracing: Add synth event generation test module")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Capacity-awareness in the wake-up path previously involved disabling
wake_affine in certain scenarios. We have just made select_idle_sibling()
capacity-aware, so this isn't needed anymore.
Remove wake_cap() entirely.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[Changelog tweaks]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Changelog tweaks]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200206191957.12325-5-valentin.schneider@arm.com
|
|
The last remaining user of this macro has just been removed, get rid of it.
Suggested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lkml.kernel.org/r/20200206191957.12325-4-valentin.schneider@arm.com
|
|
SD_BALANCE_WAKE was previously added to lower sched_domain levels on
asymmetric CPU capacity systems by commit:
9ee1cda5ee25 ("sched/core: Enable SD_BALANCE_WAKE for asymmetric capacity systems")
to enable the use of find_idlest_cpu() and friends to find an appropriate
CPU for tasks.
That responsibility has now been shifted to select_idle_sibling() and
friends, and hence the flag can be removed. Note that this causes
asymmetric CPU capacity systems to no longer enter the slow wakeup path
(find_idlest_cpu()) on wakeups - only on execs and forks (which is aligned
with all other mainline topologies).
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[Changelog tweaks]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lkml.kernel.org/r/20200206191957.12325-3-valentin.schneider@arm.com
|
|
Issue
=====
On asymmetric CPU capacity topologies, we currently rely on wake_cap() to
drive select_task_rq_fair() towards either:
- its slow-path (find_idlest_cpu()) if either the previous or
current (waking) CPU has too little capacity for the waking task
- its fast-path (select_idle_sibling()) otherwise
Commit:
3273163c6775 ("sched/fair: Let asymmetric CPU configurations balance at wake-up")
points out that this relies on the assumption that "[...]the CPU capacities
within an SD_SHARE_PKG_RESOURCES domain (sd_llc) are homogeneous".
This assumption no longer holds on newer generations of big.LITTLE
systems (DynamIQ), which can accommodate CPUs of different compute capacity
within a single LLC domain. To hopefully paint a better picture, a regular
big.LITTLE topology would look like this:
+---------+ +---------+
| L2 | | L2 |
+----+----+ +----+----+
|CPU0|CPU1| |CPU2|CPU3|
+----+----+ +----+----+
^^^ ^^^
LITTLEs bigs
which would result in the following scheduler topology:
DIE [ ] <- sd_asym_cpucapacity
MC [ ] [ ] <- sd_llc
0 1 2 3
Conversely, a DynamIQ topology could look like:
+-------------------+
| L3 |
+----+----+----+----+
| L2 | L2 | L2 | L2 |
+----+----+----+----+
|CPU0|CPU1|CPU2|CPU3|
+----+----+----+----+
^^^^^ ^^^^^
LITTLEs bigs
which would result in the following scheduler topology:
MC [ ] <- sd_llc, sd_asym_cpucapacity
0 1 2 3
What this means is that, on DynamIQ systems, we could pass the wake_cap()
test (IOW presume the waking task fits on the CPU capacities of some LLC
domain), thus go through select_idle_sibling().
This function operates on an LLC domain, which here spans both bigs and
LITTLEs, so it could very well pick a CPU of too small capacity for the
task, despite there being fitting idle CPUs - it very much depends on the
CPU iteration order, on which we have absolutely no guarantees
capacity-wise.
Implementation
==============
Introduce yet another select_idle_sibling() helper function that takes CPU
capacity into account. The policy is to pick the first idle CPU which is
big enough for the task (task_util * margin < cpu_capacity). If no
idle CPU is big enough, we pick the idle one with the highest capacity.
Unlike other select_idle_sibling() helpers, this one operates on the
sd_asym_cpucapacity sched_domain pointer, which is guaranteed to span all
known CPU capacities in the system. As such, this will work for both
"legacy" big.LITTLE (LITTLEs & bigs split at MC, joined at DIE) and for
newer DynamIQ systems (e.g. LITTLEs and bigs in the same MC domain).
Note that this limits the scope of select_idle_sibling() to
select_idle_capacity() for asymmetric CPU capacity systems - the LLC domain
will not be scanned, and no further heuristic will be applied.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lkml.kernel.org/r/20200206191957.12325-2-valentin.schneider@arm.com
|
|
A redundant "curr = rq->curr" was added; remove it.
Fixes: ebc0f83c78a2 ("timers/nohz: Update NOHZ load in remote tick")
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1580776558-12882-1-git-send-email-swood@redhat.com
|
|
Fix a mistake in a variable name in a comment.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2020-02-19
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 93 insertions(+), 31 deletions(-).
The main changes are:
1) batched bpf hashtab fixes from Brian and Yonghong.
2) various selftests and libbpf fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 057996380a42 ("bpf: Add batch ops to all htab bpf map")
added lookup_and_delete batch operation for hash table.
The current implementation has bpf_lru_push_free() inside
the bucket lock, which may cause a deadlock.
syzbot reports:
-> #2 (&htab->buckets[i].lock#2){....}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
htab_lru_map_delete_node+0xce/0x2f0 kernel/bpf/hashtab.c:593
__bpf_lru_list_shrink_inactive kernel/bpf/bpf_lru_list.c:220 [inline]
__bpf_lru_list_shrink+0xf9/0x470 kernel/bpf/bpf_lru_list.c:266
bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:340 [inline]
bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
bpf_lru_pop_free+0x87c/0x1670 kernel/bpf/bpf_lru_list.c:499
prealloc_lru_pop+0x2c/0xa0 kernel/bpf/hashtab.c:132
__htab_lru_percpu_map_update_elem+0x67e/0xa90 kernel/bpf/hashtab.c:1069
bpf_percpu_hash_update+0x16e/0x210 kernel/bpf/hashtab.c:1585
bpf_map_update_value.isra.0+0x2d7/0x8e0 kernel/bpf/syscall.c:181
generic_map_update_batch+0x41f/0x610 kernel/bpf/syscall.c:1319
bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348
__do_sys_bpf+0x9b7/0x41e0 kernel/bpf/syscall.c:3460
__se_sys_bpf kernel/bpf/syscall.c:3355 [inline]
__x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (&loc_l->lock){....}:
check_prev_add kernel/locking/lockdep.c:2475 [inline]
check_prevs_add kernel/locking/lockdep.c:2580 [inline]
validate_chain kernel/locking/lockdep.c:2970 [inline]
__lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3954
lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4484
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:516 [inline]
bpf_lru_push_free+0x250/0x5b0 kernel/bpf/bpf_lru_list.c:555
__htab_map_lookup_and_delete_batch+0x8d4/0x1540 kernel/bpf/hashtab.c:1374
htab_lru_map_lookup_and_delete_batch+0x34/0x40 kernel/bpf/hashtab.c:1491
bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348
__do_sys_bpf+0x1f7d/0x41e0 kernel/bpf/syscall.c:3456
__se_sys_bpf kernel/bpf/syscall.c:3355 [inline]
__x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Possible unsafe locking scenario:
CPU0 CPU2
---- ----
lock(&htab->buckets[i].lock#2);
lock(&l->lock);
lock(&htab->buckets[i].lock#2);
lock(&loc_l->lock);
*** DEADLOCK ***
To fix the issue, for htab_lru_map_lookup_and_delete_batch() in CPU0,
let us do bpf_lru_push_free() out of the htab bucket lock. This can
avoid the above deadlock scenario.
Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
Reported-by: syzbot+a38ff3d9356388f2fb83@syzkaller.appspotmail.com
Reported-by: syzbot+122b5421d14e68f29cd1@syzkaller.appspotmail.com
Suggested-by: Hillf Danton <hdanton@sina.com>
Suggested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Brian Vazquez <brianvv@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200219234757.3544014-1-yhs@fb.com
|
|
Grabbing the spinlock for every bucket even if it's empty, was causing
significant perfomance cost when traversing htab maps that have only a
few entries. This patch addresses the issue by checking first the
bucket_cnt, if the bucket has some entries then we go and grab the
spinlock and proceed with the batching.
Tested with a htab of size 50K and different value of populated entries.
Before:
Benchmark Time(ns) CPU(ns)
---------------------------------------------
BM_DumpHashMap/1 2759655 2752033
BM_DumpHashMap/10 2933722 2930825
BM_DumpHashMap/200 3171680 3170265
BM_DumpHashMap/500 3639607 3635511
BM_DumpHashMap/1000 4369008 4364981
BM_DumpHashMap/5k 11171919 11134028
BM_DumpHashMap/20k 69150080 69033496
BM_DumpHashMap/39k 190501036 190226162
After:
Benchmark Time(ns) CPU(ns)
---------------------------------------------
BM_DumpHashMap/1 202707 200109
BM_DumpHashMap/10 213441 210569
BM_DumpHashMap/200 478641 472350
BM_DumpHashMap/500 980061 967102
BM_DumpHashMap/1000 1863835 1839575
BM_DumpHashMap/5k 8961836 8902540
BM_DumpHashMap/20k 69761497 69322756
BM_DumpHashMap/39k 187437830 186551111
Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200218172552.215077-1-brianvv@google.com
|
|
Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.
We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.
Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
|
|
s390 math emulation was removed with commit 5a79859ae0f3 ("s390:
remove 31 bit support"), rendering ieee_emulation_warnings useless.
The code still built because it was protected by CONFIG_MATHEMU, which
was no longer selectable.
This patch removes the sysctl_ieee_emulation_warnings declaration and
the sysctl entry declaration.
Link: https://lkml.kernel.org/r/20200214172628.3598516-1-steve@sk2.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Pull dma-mapping fixes from Christoph Hellwig:
- give command line cma= precedence over the CONFIG_ option (Nicolas
Saenz Julienne)
- always allow 32-bit DMA, even for weirdly placed ZONE_DMA
- improve the debug printks when memory is not addressable, to help
find problems with swiotlb initialization
* tag 'dma-mapping-5.6' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: improve DMA mask overflow reporting
dma-direct: improve swiotlb error reporting
dma-direct: relax addressability checks in dma_direct_supported
dma-contiguous: CMA: give precedence to cmdline
|
|
bpf_perf_event_read_value() is NMI safe. Enable it for all BPF programs.
This can be used in fentry/fexit to profile BPF program and individual
kernel function with hardware counters.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200214234146.2910011-1-songliubraving@fb.com
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
|
|
head is traversed using hlist_for_each_entry_rcu outside an RCU read-side
critical section but under the protection of hash_lock.
Hence, add corresponding lockdep expression to silence false-positive
lockdep warnings, and harden RCU lists.
[ tglx: Removed the macro and put the condition right where it's used ]
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200216074330.GA14025@workstation-portable
|
|
When working commit 6dcd5d7a7a29c1e, a mistake was noticed by Linus:
schedule_timeout() was called without setting the task state to anything
particular.
It calls the scheduler, but doesn't delay anything, because the task stays
runnable. That happens because sched_submit_work() does nothing for tasks
in TASK_RUNNING state.
That turned out to be the intended behavior. Adding a WARN() is not useful
as the task could be woken up right after setting the state and before
reaching schedule_timeout().
Improve the comment about schedule_timeout() and describe that more
explicitly.
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200117225900.16340-1-alex.popov@linux.com
|
|
Move the time namespace indicator clock mode to the other ones for
consistency sake.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124403.656097274@linutronix.de
|
|
If the current clocksource is not VDSO capable there is no point in
updating the high resolution parts of the VDSO data.
Replace the architecture specific check with a check for a VDSO capable
clocksource and skip the update if there is none.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124403.563379423@linutronix.de
|
|
Now that all architectures are converted to use the generic storage the
helpers and conditionals can be removed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124403.470699892@linutronix.de
|
|
This if guards whether user-space wants a copy of the offload-jited
bytecode and whether this bytecode exists. By erroneously doing a bitwise
AND instead of a logical AND on user- and kernel-space buffer-size can lead
to no data being copied to user-space especially when user-space size is a
power of two and bigger then the kernel-space buffer.
Fixes: fcfb126defda ("bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info")
Signed-off-by: Johannes Krude <johannes@krude.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20200212193227.GA3769@phlox.h.transitiv.net
|
|
All architectures which use the generic VDSO code have their own storage
for the VDSO clock mode. That's pointless and just requires duplicate code.
Provide generic storage for it. The new Kconfig symbol is intermediate and
will be removed once all architectures are converted over.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/20200207124403.028046322@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes all over the place:
- Fix NUMA over-balancing between lightly loaded nodes. This is
fallout of the big load-balancer rewrite.
- Fix the NOHZ remote loadavg update logic, which fixes anomalies
like reported 150 loadavg on mostly idle CPUs.
- Fix XFS performance/scalability
- Fix throttled groups unbound task-execution bug
- Fix PSI procfs boundary condition
- Fix the cpu.uclamp.{min,max} cgroup configuration write checks
- Fix DocBook annotations
- Fix RCU annotations
- Fix overly CPU-intensive housekeeper CPU logic loop on large CPU
counts"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix kernel-doc warning in attach_entity_load_avg()
sched/core: Annotate curr pointer in rq with __rcu
sched/psi: Fix OOB write when writing 0 bytes to PSI files
sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression
sched/fair: Prevent unlimited runtime on throttled group
sched/nohz: Optimize get_nohz_timer_target()
sched/uclamp: Reject negative values in cpu_uclamp_write()
sched/fair: Allow a small load imbalance between low utilisation SD_NUMA domains
timers/nohz: Update NOHZ load in remote tick
sched/core: Don't skip remote tick for idle CPUs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix three issues related to the handling of wakeup events signaled
through the ACPI SCI while suspended to idle (Rafael Wysocki) and
unexport an internal cpufreq variable (Yangtao Li)"
* tag 'pm-5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system
ACPICA: Introduce acpi_any_gpe_status_set()
ACPI: PM: s2idle: Avoid possible race related to the EC GPE
ACPI: EC: Fix flushing of pending work
cpufreq: Make cpufreq_global_kobject static
|
|
A few archs (x86, arm, arm64) don't rely anymore on TIF_NOHZ to call
into context tracking on user entry/exit but instead use static keys
(or not) to optimize those calls. Ideally every arch should migrate to
that behaviour in the long run.
Settle a config option to let those archs remove their TIF_NOHZ
definitions.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
|
|
Because cpuidle is the only user of the effective constraint coming
from the CPU latency QoS, add #ifdef CONFIG_CPU_IDLE around that code
to avoid building it unnecessarily.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
|
|
Update the file information comments in include/linux/pm_qos.h
and kernel/power/qos.c by adding titles along with copyright and
authors information to them and changing the qos.c description to
better reflect its contents (outdated information is dropped from
it in particular).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
|
|
Drop the PM QoS classes enum including PM_QOS_CPU_DMA_LATENCY,
drop the wrappers around pm_qos_request(), pm_qos_request_active(),
and pm_qos_add/update/remove_request() introduced previously, rename
these functions, respectively, to cpu_latency_qos_limit(),
cpu_latency_qos_request_active(), and
cpu_latency_qos_add/update/remove_request(), and update their
kerneldoc comments. [While at it, drop some useless comments from
these functions.]
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
|
|
Qian Cai reported that the WARN_ON() in the x86/msi affinity setting code,
which catches cases where the affinity setting is not done on the CPU which
is the current target of the interrupt, triggers during CPU hotplug stress
testing.
It turns out that the warning which was added with the commit addressing
the MSI affinity race unearthed yet another long standing bug.
If user space writes a bogus affinity mask, i.e. it contains no online CPUs,
then it calls irq_select_affinity_usr(). This was introduced for ALPHA in
eee45269b0f5 ("[PATCH] Alpha: convert to generic irq framework (generic part)")
and subsequently made available for all architectures in
18404756765c ("genirq: Expose default irq affinity mask (take 3)")
which introduced the circumvention of the affinity setting restrictions for
interrupt which cannot be moved in process context.
The whole exercise is bogus in various aspects:
1) If the interrupt is already started up then there is absolutely
no point to honour a bogus interrupt affinity setting from user
space. The interrupt is already assigned to an online CPU and it
does not make any sense to reassign it to some other randomly
chosen online CPU.
2) If the interupt is not yet started up then there is no point
either. A subsequent startup of the interrupt will invoke
irq_setup_affinity() anyway which will chose a valid target CPU.
So the only correct solution is to just return -EINVAL in case user space
wrote an affinity mask which does not contain any online CPUs, except for
ALPHA which has it's own magic sauce for this.
Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/878sl8xdbm.fsf@nanos.tec.linutronix.de
|