Age | Commit message (Collapse) | Author |
|
syzbot found a flush_work() caller who forgot to call INIT_WORK()
because that work_struct was allocated by kzalloc() [1]. But the message
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
by lock_map_acquire() is failing to tell that INIT_WORK() is missing.
Since flush_work() without INIT_WORK() is a bug, and INIT_WORK() should
set ->func field to non-zero, let's warn if ->func field is zero.
[1] https://syzkaller.appspot.com/bug?id=a5954455fcfa51c29ca2ab55b203076337e1c770
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Let offload JITs know when instructions are replaced and optimized
out, so they can update their state appropriately. The optimizations
are best effort, if JIT returns an error from any callback verifier
will stop notifying it as state may now be out of sync, but the
verifier continues making progress.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The communication between the verifier and advanced JITs is based
on instruction indexes. We have to keep them stable throughout
the optimizations otherwise referring to a particular instruction
gets messy quickly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Unconditional branches by 0 instructions are basically noops
but they can result from earlier optimizations, e.g. a conditional
jumps which would never be taken or a conditional jump around
dead code.
Remove those branches.
v0.2:
- s/opt_remove_dead_branches/opt_remove_nops/ (Jiong).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Instead of overwriting dead code with jmp -1 instructions
remove it completely for root. Adjust verifier state and
line info appropriately.
v2:
- adjust func_info (Alexei);
- make sure first instruction retains line info (Alexei).
v4: (Yonghong)
- remove unnecessary if (!insn to remove) checks;
- always keep last line info if first live instruction lacks one.
v5: (Martin Lau)
- improve and clarify comments.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Loading programs with dead code becomes more and more
common, as people begin to patch constants at load time.
Turn conditional jumps to unconditional ones, to avoid
potential branch misprediction penalty.
This optimization is enabled for privileged users only.
For branches which just fall through we could just mark
them as not seen and have dead code removal take care of
them, but that seems less clean.
v0.2:
- don't call capable(CAP_SYS_ADMIN) twice (Jiong).
v3:
- fix GCC warning;
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In preparation for code removal change parameters to branch
and call adjustment functions to be more universal. The
current parameters assume we are patching a single instruction
with a longer set.
A diagram may help reading the change, this is for the patch
single case, patching instruction 1 with a replacement of 4:
____
0 |____|
1 |____| <-- pos ^
2 | | <-- end old ^ |
3 | | | delta | len
4 |____| | | (patch region)
5 | | <-- end new v v
6 |____|
end_old = pos + 1
end_new = pos + delta + 1
If we are before the patch region - curr variable and the target
are fully in old coordinates (hence comparing against end_old).
If we are after the region curr is in new coordinates (hence
the comparison to end_new) but target is in mixed coordinates,
so we just check if it falls before end_new, and if so it needs
the adjustment.
Note that we will not fix up branches which land in removed region
in case of removal, which should be okay, as we are only going to
remove dead code.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The recently introduced Energy Model (EM) framework manages power cost
tables of CPUs. These tables are currently only visible from kernel
space. However, in order to debug the behaviour of subsystems that use
the EM (EAS for example), it is often required to know what the power
costs are from userspace.
For this reason, introduce under /sys/kernel/debug/energy_model a set of
directories representing the performance domains of the system. Each
performance domain contains a set of sub-directories representing the
different capacity states (cs) and their attributes, as well as a file
exposing the related CPUs.
The resulting hierarchy is as follows on Arm juno r0 for example:
/sys/kernel/debug/energy_model
├── pd0
│ ├── cpus
│ ├── cs:450000
│ │ ├── cost
│ │ ├── frequency
│ │ └── power
│ ├── cs:575000
│ │ ├── cost
│ │ ├── frequency
│ │ └── power
│ ├── cs:700000
│ │ ├── cost
│ │ ├── frequency
│ │ └── power
│ ├── cs:775000
│ │ ├── cost
│ │ ├── frequency
│ │ └── power
│ └── cs:850000
│ ├── cost
│ ├── frequency
│ └── power
└── pd1
├── cpus
├── cs:1100000
│ ├── cost
│ ├── frequency
│ └── power
├── cs:450000
│ ├── cost
│ ├── frequency
│ └── power
├── cs:625000
│ ├── cost
│ ├── frequency
│ └── power
├── cs:800000
│ ├── cost
│ ├── frequency
│ └── power
└── cs:950000
├── cost
├── frequency
└── power
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warning (W=1).
In this particular case change put the fall through comment on a single
line so as to match the regular expression expected by GCC.
This commit remove the following warning:
kernel/capability.c:95:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
Sync to Linux 5.0-rc3 to pull in the VFS changes which impacted a lot
of the LSM code.
|
|
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
Kernel:
Stephane Eranian:
- Fix perf_proc_update_handler() bug.
perf script:
Andi Kleen:
- Fix crash with printing mixed trace point and other events.
Tony Jones:
- Fix crash when processing recorded stat data.
perf top:
He Kuang:
- Fix wrong hottest instruction highlighted.
perf python:
Arnaldo Carvalho de Melo:
- Remove -fstack-clash-protection when building with some clang versions.
perf ordered_events:
Jiri Olsa:
- Fix out of buffers crash in ordered_events__free().
perf cpu_map:
Stephane Eranian:
- Handle TOPOLOGY headers with no CPU.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With this patch, /proc/kallsyms will show BPF programs as
<addr> t bpf_prog_<tag>_<name> [bpf]
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-10-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For better performance analysis of BPF programs, this patch introduces
PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
load/unload information to user space.
Each BPF program may contain up to BPF_MAX_SUBPROGS (256) sub programs.
The following example shows kernel symbols for a BPF program with 7 sub
programs:
ffffffffa0257cf9 t bpf_prog_b07ccb89267cf242_F
ffffffffa02592e1 t bpf_prog_2dcecc18072623fc_F
ffffffffa025b0e9 t bpf_prog_bb7a405ebaec5d5c_F
ffffffffa025dd2c t bpf_prog_a7540d4a39ec1fc7_F
ffffffffa025fcca t bpf_prog_05762d4ade0e3737_F
ffffffffa026108f t bpf_prog_db4bd11e35df90d4_F
ffffffffa0263f00 t bpf_prog_89d64e4abf0f0126_F
ffffffffa0257cf9 t bpf_prog_ae31629322c4b018__dummy_tracepoi
When a bpf program is loaded, PERF_RECORD_KSYMBOL is generated for each
of these sub programs. Therefore, PERF_RECORD_BPF_EVENT is not needed
for simple profiling.
For annotation, user space need to listen to PERF_RECORD_BPF_EVENT and
gather more information about these (sub) programs via sys_bpf.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradeaed.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-4-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For better performance analysis of dynamically JITed and loaded kernel
functions, such as BPF programs, this patch introduces
PERF_RECORD_KSYMBOL, a new perf_event_type that exposes kernel symbol
register/unregister information to user space.
The following data structure is used for PERF_RECORD_KSYMBOL.
/*
* struct {
* struct perf_event_header header;
* u64 addr;
* u32 len;
* u16 ksym_type;
* u16 flags;
* char name[];
* struct sample_id sample_id;
* };
*/
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-2-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For the original mode of operation it isn't needed, since we report back
errors via PERF_RECORD_LOST records in the ring buffer, but for use in
bpf_perf_event_output() it is convenient to return the errors, basically
-ENOSPC.
Currently bpf_perf_event_output() returns an error indication, the last
thing it does, which is to push it to the ring buffer is that can fail
and if so, this failure won't be reported back to its users, fix it.
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20190118150938.GN5823@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It makes the code more self-explanatory and tells throughout the code
what magic number refers to:
- state (Hardirq/Softirq)
- direction (used in or enabled above state)
- read or write
We can even remove some comments that were compensating for the lack of
those constant names.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1545973321-24422-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The enum mark_type appears a bit artificial here. We can directly pass
the base enum lock_usage_bit value to mark_held_locks(). All we need
then is to add the read index for each lock if necessary. It makes the
code clearer.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1545973321-24422-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This reverts commit 40fa3780bac2b654edf23f6b13f4e2dd550aea10.
Now that we have a system-wide muting of hotplug lockdep during init,
this is no longer needed.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: cai@gmx.us
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: longman@redhat.com
Cc: marc.zyngier@arm.com
Cc: mark.rutland@arm.com
Link: https://lkml.kernel.org/r/1545243796-23224-3-git-send-email-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since we've had:
commit cb538267ea1e ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")
we've been getting some lockdep warnings during init, such as on HiKey960:
[ 0.820495] WARNING: CPU: 4 PID: 0 at kernel/cpu.c:316 lockdep_assert_cpus_held+0x3c/0x48
[ 0.820498] Modules linked in:
[ 0.820509] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G S 4.20.0-rc5-00051-g4cae42a #34
[ 0.820511] Hardware name: HiKey960 (DT)
[ 0.820516] pstate: 600001c5 (nZCv dAIF -PAN -UAO)
[ 0.820520] pc : lockdep_assert_cpus_held+0x3c/0x48
[ 0.820523] lr : lockdep_assert_cpus_held+0x38/0x48
[ 0.820526] sp : ffff00000a9cbe50
[ 0.820528] x29: ffff00000a9cbe50 x28: 0000000000000000
[ 0.820533] x27: 00008000b69e5000 x26: ffff8000bff4cfe0
[ 0.820537] x25: ffff000008ba69e0 x24: 0000000000000001
[ 0.820541] x23: ffff000008fce000 x22: ffff000008ba70c8
[ 0.820545] x21: 0000000000000001 x20: 0000000000000003
[ 0.820548] x19: ffff00000a35d628 x18: ffffffffffffffff
[ 0.820552] x17: 0000000000000000 x16: 0000000000000000
[ 0.820556] x15: ffff00000958f848 x14: 455f3052464d4d34
[ 0.820559] x13: 00000000769dde98 x12: ffff8000bf3f65a8
[ 0.820564] x11: 0000000000000000 x10: ffff00000958f848
[ 0.820567] x9 : ffff000009592000 x8 : ffff00000958f848
[ 0.820571] x7 : ffff00000818ffa0 x6 : 0000000000000000
[ 0.820574] x5 : 0000000000000000 x4 : 0000000000000001
[ 0.820578] x3 : 0000000000000000 x2 : 0000000000000001
[ 0.820582] x1 : 00000000ffffffff x0 : 0000000000000000
[ 0.820587] Call trace:
[ 0.820591] lockdep_assert_cpus_held+0x3c/0x48
[ 0.820598] static_key_enable_cpuslocked+0x28/0xd0
[ 0.820606] arch_timer_check_ool_workaround+0xe8/0x228
[ 0.820610] arch_timer_starting_cpu+0xe4/0x2d8
[ 0.820615] cpuhp_invoke_callback+0xe8/0xd08
[ 0.820619] notify_cpu_starting+0x80/0xb8
[ 0.820625] secondary_start_kernel+0x118/0x1d0
We've also had a similar warning in sched_init_smp() for every
asymmetric system that would enable the sched_asym_cpucapacity static
key, although that was singled out in:
commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
Those warnings are actually harmless, since we cannot have hotplug
operations at the time they appear. Instead of starting to sprinkle
useless hotplug lock operations in the init codepaths, mute the
warnings until they start warning about real problems.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: cai@gmx.us
Cc: daniel.lezcano@linaro.org
Cc: dietmar.eggemann@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: longman@redhat.com
Cc: marc.zyngier@arm.com
Cc: mark.rutland@arm.com
Link: https://lkml.kernel.org/r/1545243796-23224-2-git-send-email-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
warning right after a previous lockdep warning. It is likely that the
previous warning turned off lock debugging causing the lockdep to have
inconsistency states leading to the lock downgrade warning.
Fix that by add a check for debug_locks at the beginning of
__lock_downgrade().
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The cmpxchg() will fail when the task is already in the process
of waking up, and as such is an extremely rare occurrence.
Micro-optimize the call and put an unlikely() around it.
To no surprise, when using CONFIG_PROFILE_ANNOTATED_BRANCHES
under a number of workloads the incorrect rate was a mere 1-2%.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yongji Xie <elohimes@gmail.com>
Cc: andrea.parri@amarulasolutions.com
Cc: lilin24@baidu.com
Cc: liuqi16@baidu.com
Cc: nixun@baidu.com
Cc: xieyongji@baidu.com
Cc: yuanlinsi01@baidu.com
Cc: zhangyu31@baidu.com
Link: https://lkml.kernel.org/r/20181203053130.gwkw6kg72azt2npb@linux-r8p5
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Because wake_q_add() can imply an immediate wakeup (cmpxchg failure
case), we must not rely on the wakeup being delayed. However, commit:
e38513905eea ("locking/rwsem: Rework zeroing reader waiter->task")
relies on exactly that behaviour in that the wakeup must not happen
until after we clear waiter->task.
[ peterz: Added changelog. ]
Signed-off-by: Xie Yongji <xieyongji@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: e38513905eea ("locking/rwsem: Rework zeroing reader waiter->task")
Link: https://lkml.kernel.org/r/1543495830-2644-1-git-send-email-xieyongji@baidu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We must not rely on wake_q_add() to delay the wakeup; in particular
commit:
1d0dcb3ad9d3 ("futex: Implement lockless wakeups")
moved wake_q_add() before smp_store_release(&q->lock_ptr, NULL), which
could result in futex_wait() waking before observing ->lock_ptr ==
NULL and going back to sleep again.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 1d0dcb3ad9d3 ("futex: Implement lockless wakeups")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Notable cmpxchg() does not provide ordering when it fails, however
wake_q_add() requires ordering in this specific case too. Without this
it would be possible for the concurrent wakeup to not observe our
prior state.
Andrea Parri provided:
C wake_up_q-wake_q_add
{
int next = 0;
int y = 0;
}
P0(int *next, int *y)
{
int r0;
/* in wake_up_q() */
WRITE_ONCE(*next, 1); /* node->next = NULL */
smp_mb(); /* implied by wake_up_process() */
r0 = READ_ONCE(*y);
}
P1(int *next, int *y)
{
int r1;
/* in wake_q_add() */
WRITE_ONCE(*y, 1); /* wake_cond = true */
smp_mb__before_atomic();
r1 = cmpxchg_relaxed(next, 1, 2);
}
exists (0:r0=0 /\ 1:r1=0)
This "exists" clause cannot be satisfied according to the LKMM:
Test wake_up_q-wake_q_add Allowed
States 3
0:r0=0; 1:r1=1;
0:r0=1; 1:r1=0;
0:r0=1; 1:r1=1;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (0:r0=0 /\ 1:r1=0)
Observation wake_up_q-wake_q_add Never 0 3
Reported-by: Yongji Xie <elohimes@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The only guarantee provided by wake_q_add() is that a wakeup will
happen after it, it does _NOT_ guarantee the wakeup will be delayed
until the matching wake_up_q().
If wake_q_add() fails the cmpxchg() a concurrent wakeup is pending and
that can happen at any time after the cmpxchg(). This means we should
not rely on the wakeup happening at wake_q_up(), but should be ready
for wake_q_add() to issue the wakeup.
The delay; if provided (most likely); should only result in more efficient
behaviour.
Reported-by: Yongji Xie <elohimes@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
For some peculiar reason rcuwait_wake_up() has the right barrier in
the comment, but not in the code.
This mistake has been observed to cause a deadlock in the following
situation:
P1 P2
percpu_up_read() percpu_down_write()
rcu_sync_is_idle() // false
rcu_sync_enter()
...
__percpu_up_read()
[S] ,- __this_cpu_dec(*sem->read_count)
| smp_rmb();
[L] | task = rcu_dereference(w->task) // NULL
|
| [S] w->task = current
| smp_mb();
| [L] readers_active_check() // fail
`-> <store happens here>
Where the smp_rmb() (obviously) fails to constrain the store.
[ peterz: Added changelog. ]
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8f95c90ceb54 ("sched/wait, RCU: Introduce rcuwait machinery")
Link: https://lkml.kernel.org/r/1543590656-7157-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags. This approach is error prone and
often inconsistent.
Let's instead allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where
there is no support in the PMU.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: robin.murphy@arm.com
Cc: suzuki.poulose@arm.com
Link: https://lkml.kernel.org/r/1547128414-50693-4-git-send-email-andrew.murray@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull networking fixes from David Miller:
1) Fix endless loop in nf_tables, from Phil Sutter.
2) Fix cross namespace ip6_gre tunnel hash list corruption, from
Olivier Matz.
3) Don't be too strict in phy_start_aneg() otherwise we might not allow
restarting auto negotiation. From Heiner Kallweit.
4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue.
5) Memory leak in act_tunnel_key, from Davide Caratti.
6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn.
7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet.
8) Missing udplite rehash callbacks, from Alexey Kodanev.
9) Log dirty pages properly in vhost, from Jason Wang.
10) Use consume_skb() in neigh_probe() as this is a normal free not a
drop, from Yang Wei. Likewise in macvlan_process_broadcast().
11) Missing device_del() in mdiobus_register() error paths, from Thomas
Petazzoni.
12) Fix checksum handling of short packets in mlx5, from Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits)
bpf: in __bpf_redirect_no_mac pull mac only if present
virtio_net: bulk free tx skbs
net: phy: phy driver features are mandatory
isdn: avm: Fix string plus integer warning from Clang
net/mlx5e: Fix cb_ident duplicate in indirect block register
net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
net/mlx5e: Fix wrong error code return on FEC query failure
net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
tools: bpftool: Cleanup license mess
bpf: fix inner map masking to prevent oob under speculation
bpf: pull in pkt_sched.h header for tooling to fix bpftool build
selftests: forwarding: Add a test case for externally learned FDB entries
selftests: mlxsw: Test FDB offload indication
mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
net: bridge: Mark FDB entries that were added by user as such
mlxsw: spectrum_fid: Update dummy FID index
mlxsw: pci: Return error on PCI reset timeout
mlxsw: pci: Increase PCI SW reset timeout
mlxsw: pci: Ring CQ's doorbell before RDQ's
MAINTAINERS: update email addresses of liquidio driver maintainers
...
|
|
During review I noticed that inner meta map setup for map in
map is buggy in that it does not propagate all needed data
from the reference map which the verifier is later accessing.
In particular one such case is index masking to prevent out of
bounds access under speculative execution due to missing the
map's unpriv_array/index_mask field propagation. Fix this such
that the verifier is generating the correct code for inlined
lookups in case of unpriviledged use.
Before patch (test_verifier's 'map in map access' dump):
# bpftool prog dump xla id 3
0: (62) *(u32 *)(r10 -4) = 0
1: (bf) r2 = r10
2: (07) r2 += -4
3: (18) r1 = map[id:4]
5: (07) r1 += 272 |
6: (61) r0 = *(u32 *)(r2 +0) |
7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup
8: (54) (u32) r0 &= (u32) 0 | with index masking for
9: (67) r0 <<= 3 | map->unpriv_array.
10: (0f) r0 += r1 |
11: (79) r0 = *(u64 *)(r0 +0) |
12: (15) if r0 == 0x0 goto pc+1 |
13: (05) goto pc+1 |
14: (b7) r0 = 0 |
15: (15) if r0 == 0x0 goto pc+11
16: (62) *(u32 *)(r10 -4) = 0
17: (bf) r2 = r10
18: (07) r2 += -4
19: (bf) r1 = r0
20: (07) r1 += 272 |
21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!)
22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite
23: (67) r0 <<= 3 | map->unpriv_array set.
24: (0f) r0 += r1 |
25: (05) goto pc+1 |
26: (b7) r0 = 0 |
27: (b7) r0 = 0
28: (95) exit
After patch:
# bpftool prog dump xla id 1
0: (62) *(u32 *)(r10 -4) = 0
1: (bf) r2 = r10
2: (07) r2 += -4
3: (18) r1 = map[id:2]
5: (07) r1 += 272 |
6: (61) r0 = *(u32 *)(r2 +0) |
7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup
8: (54) (u32) r0 &= (u32) 0 | with index masking due to
9: (67) r0 <<= 3 | map->unpriv_array.
10: (0f) r0 += r1 |
11: (79) r0 = *(u64 *)(r0 +0) |
12: (15) if r0 == 0x0 goto pc+1 |
13: (05) goto pc+1 |
14: (b7) r0 = 0 |
15: (15) if r0 == 0x0 goto pc+12
16: (62) *(u32 *)(r10 -4) = 0
17: (bf) r2 = r10
18: (07) r2 += -4
19: (bf) r1 = r0
20: (07) r1 += 272 |
21: (61) r0 = *(u32 *)(r2 +0) |
22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map
23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking
24: (67) r0 <<= 3 | for map->unpriv_array.
25: (0f) r0 += r1 |
26: (05) goto pc+1 |
27: (b7) r0 = 0 |
28: (b7) r0 = 0
29: (95) exit
Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Tie syscall information to all CONFIG_CHANGE calls since they are all a
result of user actions.
Exclude user records from syscall context:
Since the function audit_log_common_recv_msg() is shared by a number of
AUDIT_CONFIG_CHANGE and the entire range of AUDIT_USER_* record types,
and since the AUDIT_CONFIG_CHANGE message type has been converted to a
syscall accompanied record type, special-case the AUDIT_USER_* range of
messages so they remain standalone records.
See: https://github.com/linux-audit/audit-kernel/issues/59
See: https://github.com/linux-audit/audit-kernel/issues/50
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: fix line lengths in kernel/audit.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
RDMA cgroup registration routine always returns success, so simplify
function to be void and run clang formatter over whole CONFIG_CGROUP_RDMA
art of core_priv.h.
This reduces unwinding error path for regular registration and future net
namespace change functionality for rdma device.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The perf_proc_update_handler() handles /proc/sys/kernel/perf_event_max_sample_rate
syctl variable. When the PMU IRQ handler timing monitoring is disabled, i.e,
when /proc/sys/kernel/perf_cpu_time_max_percent is equal to 0 or 100,
then no modification to sysctl_perf_event_sample_rate is allowed to prevent
possible hang from wrong values.
The problem is that the test to prevent modification is made after the
sysctl variable is modified in perf_proc_update_handler().
You get an error:
$ echo 10001 >/proc/sys/kernel/perf_event_max_sample_rate
echo: write error: invalid argument
But the value is still modified causing all sorts of inconsistencies:
$ cat /proc/sys/kernel/perf_event_max_sample_rate
10001
This patch fixes the problem by moving the parsing of the value after
the test.
Committer testing:
# echo 100 > /proc/sys/kernel/perf_cpu_time_max_percent
# echo 10001 > /proc/sys/kernel/perf_event_max_sample_rate
-bash: echo: write error: Invalid argument
# cat /proc/sys/kernel/perf_event_max_sample_rate
10001
#
Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1547169436-6266-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The sys_ipc() and compat_ksys_ipc() functions are meant to only
be used from the system call table, not called by another function.
Introduce ksys_*() interfaces for this purpose, as we have done
for many other system calls.
Link: https://lore.kernel.org/lkml/20190116131527.2071570-3-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[heiko.carstens@de.ibm.com: compile fix for !CONFIG_COMPAT]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
The recent rework of alloc_descs() introduced a double increment of the
loop counter. As a consequence only every second affinity mask is
validated.
Remove it.
[ tglx: Massaged changelog ]
Fixes: c410abbbacb9 ("genirq/affinity: Add is_managed to struct irq_affinity_desc")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Cc: Huacai Chen <chenhuacai@gmail.com>
Cc: Dou Liyang <douliyangs@gmail.com>
Link: https://lkml.kernel.org/r/1547694009-16261-1-git-send-email-chenhc@lemote.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
"A tiny fix for v5.0-rc2:
This fixes an issue with GPU cards not working anymore with the DMA
mapping work Christopher did - as the SWIOTLB is initialized first and
then free'd (as IOMMU is available) but we forgot to clear our start
and end entries which are used and BOOM"
* 'stable/for-linus-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: clear io_tlb_start and io_tlb_end in swiotlb_exit
|
|
* make the reference from superblock to cgroup_root counting -
do cgroup_put() in cgroup_kill_sb() whether we'd done
percpu_ref_kill() or not; matching grab is done when we allocate
a new root. That gives the same refcounting rules for all callers
of cgroup_do_mount() - a reference to cgroup_root has been grabbed
by caller and it either is transferred to new superblock or dropped.
* have cgroup_kill_sb() treat an already killed refcount as "just
don't bother killing it, then".
* after successful cgroup_do_mount() have cgroup1_mount() recheck
if we'd raced with mount/umount from somebody else and cgroup_root
got killed. In that case we drop the superblock and bugger off
with -ERESTARTSYS, same as if we'd found it in the list already
dying.
* don't bother with delayed initialization of refcount - it's
unreliable and not needed. No need to prevent attempts to bump
the refcount if we find cgroup_root of another mount in progress -
sget will reuse an existing superblock just fine and if the
other sb manages to die before we get there, we'll catch
that immediately after cgroup_do_mount().
* don't bother with kernfs_pin_sb() - no need for doing that
either.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
same story as with last May fixes in sysfs (7b745a4e4051
"unfuck sysfs_mount()"); new_sb is left uninitialized
in case of early errors in kernfs_mount_ns() and papering
over it by treating any error from kernfs_mount_ns() as
equivalent to !new_ns ends up conflating the cases when
objects had never been transferred to a superblock with
ones when that has happened and resulting new superblock
had been dropped. Easily fixed (same way as in sysfs
case). Additionally, there's a superblock leak on
kernfs_node_dentry() failure *and* a dentry leak inside
kernfs_node_dentry() itself - the latter on probably
impossible errors, but the former not impossible to trigger
(as the matter of fact, injecting allocation failures
at that point *does* trigger it).
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When printing multiple uprobe arguments as strings the output for the
earlier arguments would also include all later string arguments.
This is best explained in an example:
Consider adding a uprobe to a function receiving two strings as
parameters which is at offset 0xa0 in strlib.so and we want to print
both parameters when the uprobe is hit (on x86_64):
$ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
/sys/kernel/debug/tracing/uprobe_events
When the function is called as func("foo", "bar") and we hit the probe,
the trace file shows a line like the following:
[...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"
Note the extra "bar" printed as part of arg1. This behaviour stacks up
for additional string arguments.
The strings are stored in a dynamically growing part of the uprobe
buffer by fetch_store_string() after copying them from userspace via
strncpy_from_user(). The return value of strncpy_from_user() is then
directly used as the required size for the string. However, this does
not take the terminating null byte into account as the documentation
for strncpy_from_user() cleary states that it "[...] returns the
length of the string (not including the trailing NUL)" even though the
null byte will be copied to the destination.
Therefore, subsequent calls to fetch_store_string() will overwrite
the terminating null byte of the most recently fetched string with
the first character of the current string, leading to the
"accumulation" of strings in earlier arguments in the output.
Fix this by incrementing the return value of strncpy_from_user() by
one if we did not hit the maximum buffer size.
Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 5baaa59ef09e ("tracing/probes: Implement 'memory' fetch method for uprobes")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
There is a plan to build the kernel with -Wimplicit-fallthrough
and this place in the code produced a warnings (W=1).
This commit removes the following warning:
kernel/bpf/cgroup.c:719:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Initially in commit 69b693f0aefa ("bpf: btf: Introduce BPF Type Format
(BTF)") the function 'btf_name_offset_valid' was introduced as static
function it was later on changed to a non-static one, and then finally
in commit 23127b33ec80 ("bpf: Create a new btf_name_by_offset() for
non type name use case") the function prototype was removed.
Revert back to original implementation and make the function static.
Remove warning triggered with W=1:
kernel/bpf/btf.c:470:6: warning: no previous prototype for 'btf_name_offset_valid' [-Wmissing-prototypes]
Fixes: 23127b33ec80 ("bpf: Create a new btf_name_by_offset() for non type name use case")
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When returning BPF_STACK_BUILD_ID_IP from stack_map_get_build_id_offset,
make sure that build_id field is empty. Since we are using percpu
free list, there is a possibility that we might reuse some previous
bpf_stack_build_id with non-zero build_id.
Fixes: 615755a77b24 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Build-id length is not fixed to 20, it can be (`man ld` /--build-id):
* 128-bit (uuid)
* 160-bit (sha1)
* any length specified in ld --build-id=0xhexstring
To fix the issue of missing BPF_STACK_BUILD_ID_VALID for shorter build-ids,
assume that build-id is somewhere in the range of 1 .. 20.
Set the remaining bytes to zero.
v2:
* don't introduce new "len = min(BPF_BUILD_ID_SIZE, nhdr->n_descsz)",
we already know that nhdr->n_descsz <= BPF_BUILD_ID_SIZE if we enter
this 'if' condition
Fixes: 615755a77b24 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The subsystem-specific message prefix for uprobes was also
"trace_kprobe: " instead of "trace_uprobe: " as described in
the original commit message.
Link: http://lkml.kernel.org/r/20190117133023.19292-1-andreas.ziegler@fau.de
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 7257634135c24 ("tracing/probe: Show subsystem name in messages")
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
An older GCC compiler complains:
kernel/bpf/verifier.c: In function 'bpf_check':
kernel/bpf/verifier.c:4***:13: error: 'prev_offset' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
} else if (krecord[i].insn_offset <= prev_offset) {
^
kernel/bpf/verifier.c:4***:38: note: 'prev_offset' was declared here
u32 i, nfuncs, urec_size, min_size, prev_offset;
Although the compiler is wrong here, the patch makes sure
that prev_offset is always initialized, just to silence the warning.
v2: fix a spelling error in the commit message.
Signed-off-by: Peter Oskolkov <posk@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Currently, btf only supports up to 64-bit integer.
On the other hand, 128bit support for gcc and clang
has existed for a long time. For example, both gcc 4.8
and llvm 3.7 supports types "__int128" and
"unsigned __int128" for virtually all 64bit architectures
including bpf.
The requirement for __int128 support comes from two areas:
. bpf program may use __int128. For example, some bcc tools
(https://github.com/iovisor/bcc/tree/master/tools),
mostly tcp v6 related, tcpstates.py, tcpaccept.py, etc.,
are using __int128 to represent the ipv6 addresses.
. linux itself is using __int128 types. Hence supporting
__int128 type in BTF is required for vmlinux BTF,
which will be used by "compile once and run everywhere"
and other projects.
For 128bit integer, instead of base-10, hex numbers are pretty
printed out as large decimal number is hard to decipher, e.g.,
for ipv6 addresses.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The fake signal is send automatically now. We can rely on it completely
and remove the sysfs attribute.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
An administrator may send a fake signal to all remaining blocking tasks
of a running transition by writing to
/sys/kernel/livepatch/<patch>/signal attribute. Let's do it
automatically after 15 seconds. The timeout is chosen deliberately. It
gives the tasks enough time to transition themselves.
Theoretically, sending it once should be more than enough. However,
every task must get outside of a patched function to be successfully
transitioned. It could prove not to be simple and resending could be
helpful in that case.
A new workqueue job could be a cleaner solution to achieve it, but it
could also introduce deadlocks and cause more headaches with
synchronization and cancelling.
[jkosina@suse.cz: removed added newline]
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Otherwise is_swiotlb_buffer will return false positives when
we first initialize a swiotlb buffer, but then free it because
we have an IOMMU available.
Fixes: 55897af63091 ("dma-direct: merge swiotlb_dma_ops into the dma_direct code")
Reported-by: Sibren Vasse <sibren@sibrenvasse.nl>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sibren Vasse <sibren@sibrenvasse.nl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
On the failure path, we do an fput() of the listener fd if the filter fails
to install (e.g. because of a TSYNC race that's lost, or if the thread is
killed, etc.). fput() doesn't actually release the fd, it just ads it to a
work queue. Then the thread proceeds to free the filter, even though the
listener struct file has a reference to it.
To fix this, on the failure path let's set the private data to null, so we
know in ->release() to ignore the filter.
Reported-by: syzbot+981c26489b2d1c6316ba@syzkaller.appspotmail.com
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.morris@microsoft.com>
|