Age | Commit message (Collapse) | Author |
|
The ib_write_umad() is protected by taking the umad file mutex.
However, it accesses file->port->ib_dev -- which is protected only by the
port's mutex (field file_mutex).
The ib_umad_remove_one() calls ib_umad_kill_port() which sets
port->ib_dev to NULL under the port mutex (NOT the file mutex).
It then sets the mad agent to "dead" under the umad file mutex.
This is a race condition -- because there is a window where
port->ib_dev is NULL, while the agent is not "dead".
As a result, we saw stack traces like:
[16490.678059] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[16490.678246] IP: ib_umad_write+0x29c/0xa3a [ib_umad]
[16490.678333] PGD 0 P4D 0
[16490.678404] Oops: 0000 [#1] SMP PTI
[16490.678466] Modules linked in: rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx4_en(OE) ptp pps_core mlx4_ib(OE-) ib_core(OE) mlx4_core(OE) mlx_compat
(OE) memtrack(OE) devlink mst_pciconf(OE) mst_pci(OE) netconsole nfsv3 nfs_acl nfs lockd grace fscache cfg80211 rfkill esp6_offload esp6 esp4_offload esp4 sunrpc kvm_intel kvm ppdev parport_pc irqbypass
parport joydev i2c_piix4 virtio_balloon cirrus drm_kms_helper ttm drm e1000 serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg [last unloaded: mlxfw]
[16490.679202] CPU: 4 PID: 3115 Comm: sminfo Tainted: G OE 4.14.13-300.fc27.x86_64 #1
[16490.679339] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[16490.679477] task: ffff9cf753890000 task.stack: ffffaf70c26b0000
[16490.679571] RIP: 0010:ib_umad_write+0x29c/0xa3a [ib_umad]
[16490.679664] RSP: 0018:ffffaf70c26b3d90 EFLAGS: 00010202
[16490.679747] RAX: 0000000000000010 RBX: ffff9cf75610fd80 RCX: 0000000000000000
[16490.679856] RDX: 0000000000000001 RSI: 00007ffdf2bfd714 RDI: ffff9cf6bb2a9c00
In the above trace, ib_umad_write is trying to dereference the NULL
file->port->ib_dev pointer.
Fix this by using the agent's device pointer (the device field
in struct ib_mad_agent) -- which IS protected by the umad file mutex.
Cc: <stable@vger.kernel.org> # v4.11
Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The script "checkpatch.pl" pointed information out like the following.
WARNING: quoted string split across lines
Thus fix the affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The variable "tx_desc" will be set to an appropriate pointer a bit later.
Thus omit the explicit initialisation at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
iser_send_data_out()
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Introduce a errno code to string facility to allow tools such as
'perf trace' to support multi-arch perf.data decoding/beautifying,
uses errno header files copied from the kernel, this allows
removing the need for audit-libs in arches generating its syscall
tables from the kernel sources, so far x86 and s/390 (Hendrik Brueckner)
- Intel-PT/BTS sample synthesizing fixes (Adrian Hunter)
- Intel vendor event JSON updates for the Broadwell, BroadwellDE,
BroadwellX, Goldmont, Haswell, HaswellX, IvyBridge, IvyBridge, IvyTown,
IvyTown, Silvermont, Skylake and SkylakeX architectures. (Andi Kleen)
- Use ui__error() to have --field error handling messages not to
disappear due to TUI exit cleanup (Arnaldo Carvalho de Melo)
- Don't warn about unavailability of builtin clang in 'perf trace', just
continue fallbacking to using the external toolchain (Arnaldo Carvalho de Melo)
- Move conditional O_CLOEXEC define to util.h to keep the build
working on older distros where that is not available now that
this define will be used in more source files (Arnaldo Carvalho de Melo)
- Using O_CLOEXEC in do_open() to avoid having too many open files when
using 'perf script' from the 'perf report' TUI scripts browser (Wang YanQing)
- Add 'perf trace --print-sample' to help in debugging the printing
of timestamp calculations, for instance (Arnaldo Carvalho de Melo)
- Do not print from time delta for interrupted syscall lines in 'perf
trace' (those ending with '...') just print the total syscall duration
at raw_syscalls:sys_exit time (Arnaldo Carvalho de Melo)
- Beautify FUTEX_BITSET_MATCH_ANY in the futex syscall in 'perf trace'
(Arnaldo Carvalho de Melo)
- Display EXTRA features for 'make VF=1' build (Jiri Olsa)
- Add support for CoreSight trace decoding by making the perf tools
use the external openCSD (Mathieu Poirier, Tor Jeremiassen)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 retpoline fixlet from Thomas Gleixner:
"Remove the ESP/RSP thunks for retpoline as they cannot ever work.
Get rid of them before they show up in a release"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Remove the esp/rsp thunk
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of small fixes for 4.15:
- Fix vmapped stack synchronization on systems with 4-level paging
and a large amount of memory caused by a missing 5-level folding
which made the pgd synchronization logic to fail and causing double
faults.
- Add a missing sanity check in the vmalloc_fault() logic on 5-level
paging systems.
- Bring back protection against accessing a freed initrd in the
microcode loader which was lost by a wrong merge conflict
resolution.
- Extend the Broadwell micro code loading sanity check.
- Add a missing ENDPROC annotation in ftrace assembly code which
makes ORC unhappy.
- Prevent loading the AMD power module on !AMD platforms. The load
itself is uncritical, but an unload attempt results in a kernel
crash.
- Update Peter Anvins role in the MAINTAINERS file"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ftrace: Add one more ENDPROC annotation
x86: Mark hpa as a "Designated Reviewer" for the time being
x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels
x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems
x86/microcode: Fix again accessing initrd after having been freed
x86/microcode/intel: Extend BDW late-loading further with LLC size check
perf/x86/amd/power: Do not load AMD power module on !AMD platforms
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for a ~10 years old problem which causes high resolution
timers to stop after a CPU unplug/plug cycle due to a stale flag in
the per CPU hrtimer base struct.
Paul McKenney was hunting this for about a year, but the heisenbug
nature made it resistant against debug attempts for quite some time"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Reset hrtimer cpu base proper on CPU hotplug
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"A single bug fix to prevent a subtle deadlock in the scheduler core
code vs cpu hotplug"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix cpu.max vs. cpuhotplug deadlock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Four patches which all address lock inversions and deadlocks in the
perf core code and the Intel debug store"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix perf,x86,cpuhp deadlock
perf/core: Fix ctx::mutex deadlock
perf/core: Fix another perf,trace,cpuhp lock inversion
perf/core: Fix lock inversion between perf,trace,cpuhp
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two final locking fixes for 4.15:
- Repair the OWNER_DIED logic in the futex code which got wreckaged
with the recent fix for a subtle race condition.
- Prevent the hard lockup detector from triggering when dumping all
held locks in the system"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Avoid triggering hardlockup from debug_show_all_locks()
futex: Fix OWNER_DEAD fixup
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-01-26
This series contains updates to ixgbe and ixgbevf.
Emil updates ixgbevf to match ixgbe functionality, starting with the
consolidating of functions that represent logical steps in the receive
process so we can later update them more easily. Updated ixgbevf to
only synchronize the length of the frame, which will typically be the
MTU or smaller. Updated the VF driver to use the length of the packet
instead of the DD status bit to determine if a new descriptor is ready
to be processed, which saves on reads and we can save time on
initialization. Added support for DMA_ATTR_SKIP_CPU_SYNC/WEAK_ORDERING
to help improve performance on some platforms. Updated the VF driver to
do bulk updates of the page reference count instead of just incrementing
it by one reference at a time. Updated the VF driver to only go through
the region of the receive ring that was designated to be cleaned up,
rather than process the entire ring.
Colin Ian King adds the use of ARRAY_SIZE() on various arrays.
Miroslav Lichvar fixes an issue where ethtool was reporting timestamping
filters unsupported for X550, which is incorrect.
Paul adds support for reporting 5G link speed for some devices.
Dan Carpenter fixes a typo where && was used when it should have been
||.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Achilles Gaikwad <achillesgaikwad@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
The "return 0" instruction follows other return instruction
and it makes it impossible to execute, hence remove it.
Fixes: 00fc0c51e35b ("rocker: Change world_ops API and implementation to be switchdev independant")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We don't need to call unmap_mapping_range() prior to calling
nfs_sync_mapping().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When ORC support was added for the ftrace_64.S code, an ENDPROC
for function_hook() was missed. This results in the following warning:
arch/x86/kernel/ftrace_64.o: warning: objtool: .entry.text+0x0: unreachable instruction
Fixes: e2ac83d74a4d ("x86/ftrace: Fix ORC unwinding from ftrace handlers")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180128022150.dqierscqmt3uwwsr@treble
|
|
As explained by Michal Marek at https://lkml.org/lkml/2011/8/31/189
silentoldconfig has become a misnomer. It has become an internal interface
so remove it from "make help" and Documentation/ to stop confusing people
using it as seen for instance at
https://chromium-review.googlesource.com/835632 Don't remove it from
kconfig/Makefile yet not to break any (other) tool using it.
On the other hand, correct and expand its description in the help of
the (internal) scripts/kconfig/conf.c
Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Make it all a function which does the WRMSR instead of having a hairy
inline asm.
[dwmw2: export it, fix CONFIG_RETPOLINE issues]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-4-git-send-email-dwmw@amazon.co.uk
|
|
Simplify it to call an asm-function instead of pasting 41 insn bytes at
every call site. Also, add alignment to the macro as suggested here:
https://support.google.com/faqs/answer/7625886
[dwmw2: Clean up comments, let it clobber %ebx and just tell the compiler]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-3-git-send-email-dwmw@amazon.co.uk
|
|
We want to expose the hardware features simply in /proc/cpuinfo as "ibrs",
"ibpb" and "stibp". Since AMD has separate CPUID bits for those, use them
as the user-visible bits.
When the Intel SPEC_CTRL bit is set which indicates both IBRS and IBPB
capability, set those (AMD) bits accordingly. Likewise if the Intel STIBP
bit is set, set the AMD STIBP that's used for the generic hardware
capability.
Hide the rest from /proc/cpuinfo by putting "" in the comments. Including
RETPOLINE and RETPOLINE_AMD which shouldn't be visible there. There are
patches to make the sysfs vulnerabilities information non-readable by
non-root, and the same should apply to all information about which
mitigations are actually in use. Those *shouldn't* appear in /proc/cpuinfo.
The feature bit for whether IBPB is actually used, which is needed for
ALTERNATIVEs, is renamed to X86_FEATURE_USE_IBPB.
Originally-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517070274-12128-2-git-send-email-dwmw@amazon.co.uk
|
|
Calling fan related SMM functions implemented by Dell BIOS firmware on Dell
Vostro 3360 freeze kernel for about 500ms.
Unfortunately, it is unlikely for Dell to fix this since the machine
is pretty old, so this commit just disables fan support to make the
system usable again.
Via "force" module param fan support can be enabled.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=195751
Link: http://lkml.iu.edu/hypermail/linux/kernel/1711.2/06083.html
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Calling fan related SMM functions implemented by Dell BIOS firmware on Dell
Inspiron 7720 freeze kernel for about 500ms. Until Dell fixes it we need to
disable fan support for Dell Inspiron 7720 as it makes system unusable.
Via "force" module param fan support can be enabled.
Reported-by: vova7890@mail.ru
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=195751
Cc: stable@vger.kernel.org # v4.0+, will need backport
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Some Dell machines are broken and some functionality is disabled. Show
warning into dmesg about this fact and allow user via "force" module param
to override brokenness and enable broken functionality.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
If sysfs is disabled and RETPOLINE not defined:
arch/x86/kernel/cpu/bugs.c:97:13: warning: ‘spectre_v2_bad_module’ defined but not used
[-Wunused-variable]
static bool spectre_v2_bad_module;
Hide it.
Fixes: caf7501a1b4e ("module/retpoline: Warn about missing retpoline in module")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
|
|
Pick up urgent bug fix and resolve the conflict.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.
If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.
Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.
Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.
Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
|
|
Due to some unfortunate events, I have not been directly involved in
the x86 kernel patch flow for a while now. I have also not been able
to ramp back up by now like I had hoped to, and after reviewing what I
will need to work on both internally at Intel and elsewhere in the near
term, it is clear that I am not going to be able to ramp back up until
late 2018 at the very earliest.
It is not acceptable to not recognize that this load is currently
taken by Ingo and Thomas without my direct participation, so I mark
myself as R: (designated reviewer) rather than M: (maintainer) until
further notice. This is in fact recognizing the de facto situation
for the past few years.
I have obviously no intention of going away, and I will do everything
within my power to improve Linux on x86 and x86 for Linux. This,
however, puts credit where it is due and reflects a change of focus.
This patch also removes stale entries for portions of the x86
architecture which have not been maintained separately from arch/x86
for a long time. If there is a reason to re-introduce them then that
can happen later.
Signed-off-by: H. Peter Anvin <h.peter.anvin@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Yonghong Song says:
====================
A kernel page fault which happens in lpm map trie_get_next_key is reported
by syzbot and Eric. The issue was introduced by commit b471f2f1de8b
("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map").
Patch #1 fixed the issue in the kernel and patch #2 adds a multithreaded
test case in tools/testing/selftests/bpf/test_lpm_map.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The new test will spawn four threads, doing map update, delete, lookup
and get_next_key in parallel. It is able to reproduce the issue in the
previous commit found by syzbot and Eric Dumazet.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command
for LPM_TRIE map") introduces a bug likes below:
if (!rcu_dereference(trie->root))
return -ENOENT;
if (!key || key->prefixlen > trie->max_prefixlen) {
root = &trie->root;
goto find_leftmost;
}
......
find_leftmost:
for (node = rcu_dereference(*root); node;) {
In the code after label find_leftmost, it is assumed
that *root should not be NULL, but it is not true as
it is possbile trie->root is changed to NULL by an
asynchronous delete operation.
The issue is reported by syzbot and Eric Dumazet with the
below error log:
......
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 8033 Comm: syz-executor3 Not tainted 4.15.0-rc8+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:trie_get_next_key+0x3c2/0xf10 kernel/bpf/lpm_trie.c:682
......
This patch fixed the issue by use local rcu_dereferenced
pointer instead of *(&trie->root) later on.
Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command or LPM_TRIE map")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
'clk-allwinner' into clk-next
* clk-aspeed:
clk: aspeed: Handle inverse polarity of USB port 1 clock gate
clk: aspeed: Fix return value check in aspeed_cc_init()
clk: aspeed: Add reset controller
clk: aspeed: Register gated clocks
clk: aspeed: Add platform driver and register PLLs
clk: aspeed: Register core clocks
clk: Add clock driver for ASPEED BMC SoCs
dt-bindings: clock: Add ASPEED constants
* clk-lock-UP:
clk: fix reentrancy of clk_enable() on UP systems
* clk-mediatek:
clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built
clk: mediatek: Fix all warnings for missing struct clk_onecell_data
clk: mediatek: fixup test-building of MediaTek clock drivers
clk: mediatek: group drivers under indpendent menu
* clk-allwinner:
clk: sunxi-ng: a83t: Add M divider to TCON1 clock
clk: sunxi-ng: fix the A64/H5 clock description of DE2 CCU
clk: sunxi-ng: add support for Allwinner H3 DE2 CCU
dt-bindings: fix the binding of Allwinner DE2 CCU of A83T and H3
clk: sunxi-ng: sun8i: a83t: Use sigma-delta modulation for audio PLL
clk: sunxi-ng: sun8i: a83t: Add /2 fixed post divider to audio PLL
clk: sunxi-ng: Support fixed post-dividers on NM style clocks
clk: sunxi-ng: sun50i: a64: Add 2x fixed post-divider to MMC module clocks
clk: sunxi-ng: Support fixed post-dividers on MP style clocks
clk: sunxi: Use PTR_ERR_OR_ZERO()
|
|
and 'clk-meson' into clk-next
* clk-remove-asm-clkdev:
clk: Move __clk_{get,put}() into private clk.h API
clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks
arch: Remove clkdev.h asm-generic from Kbuild
clk: Prepare to remove asm-generic/clkdev.h
blackfin: Use generic clkdev.h header
* clk-debugfs-fixes:
clk: Simplify debugfs registration
clk: Fix debugfs_create_*() usage
clk: Show symbolic clock flags in debugfs
clk: Improve flags doc for of_clk_detect_critical()
* clk-renesas:
clk: renesas: r8a7796: Add FDP clock
clk: renesas: cpg-mssr: Keep wakeup sources active during system suspend
clk: renesas: mstp: Keep wakeup sources active during system suspend
clk: renesas: r8a77970: Add LVDS clock
* clk-meson:
clk: meson-axg: fix potential NULL dereference in axg_clkc_probe()
clk: meson-axg: make local symbol axg_gp0_params_table static
clk: meson-axg: fix return value check in axg_clkc_probe()
clk: meson: mpll: use 64-bit maths in params_from_rate
clk: meson-axg: add clock controller drivers
clk: meson-axg: add clocks dt-bindings required header
dt-bindings: clock: add compatible variant for the Meson-AXG
clk: meson: make the spinlock naming more specific
clk: meson: gxbb: remove IGNORE_UNUSED from mmc clocks
clk: meson: gxbb: fix wrong clock for SARADC/SANA
|
|
* clk-divider-container:
clk: divider: fix incorrect usage of container_of
Plus fixup sprd/div.c to pass the width too.
|
|
Daniel Borkmann says:
====================
This set contains a small cleanup in cBPF prologue generation and
otherwise fixes an outstanding issue related to BPF to BPF calls
and exception handling. For details please see related patches.
Last but not least, BPF selftests is extended with several new
test cases.
Thanks!
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Update selftests to relfect recent changes and add various new
test cases.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from arm32 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The verifier in both cBPF and eBPF reject div/mod by 0 imm,
so this can never load. Remove emitting such test and reject
it from being JITed instead (the latter is actually also not
needed, but given practice in sparc64, ppc64 today, so
doesn't hurt to add it here either).
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Daney <david.daney@cavium.com>
Reviewed-by: David Daney <david.daney@cavium.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from mips64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Daney <david.daney@cavium.com>
Reviewed-by: David Daney <david.daney@cavium.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from sparc64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from ppc64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from s390x JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from arm64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since we've changed div/mod exception handling for src_reg in
eBPF verifier itself, remove the leftovers from x86_64 JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
One of the ugly leftovers from the early eBPF days is that div/mod
operations based on registers have a hard-coded src_reg == 0 test
in the interpreter as well as in JIT code generators that would
return from the BPF program with exit code 0. This was basically
adopted from cBPF interpreter for historical reasons.
There are multiple reasons why this is very suboptimal and prone
to bugs. To name one: the return code mapping for such abnormal
program exit of 0 does not always match with a suitable program
type's exit code mapping. For example, '0' in tc means action 'ok'
where the packet gets passed further up the stack, which is just
undesirable for such cases (e.g. when implementing policy) and
also does not match with other program types.
While trying to work out an exception handling scheme, I also
noticed that programs crafted like the following will currently
pass the verifier:
0: (bf) r6 = r1
1: (85) call pc+8
caller:
R6=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
callee:
frame1: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_1
10: (b4) (u32) r2 = (u32) 0
11: (b4) (u32) r3 = (u32) 1
12: (3c) (u32) r3 /= (u32) r2
13: (61) r0 = *(u32 *)(r1 +76)
14: (95) exit
returning from callee:
frame1: R0_w=pkt(id=0,off=0,r=0,imm=0)
R1=ctx(id=0,off=0,imm=0) R2_w=inv0
R3_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R10=fp0,call_1
to caller at 2:
R0_w=pkt(id=0,off=0,r=0,imm=0) R6=ctx(id=0,off=0,imm=0)
R10=fp0,call_-1
from 14 to 2: R0=pkt(id=0,off=0,r=0,imm=0)
R6=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
2: (bf) r1 = r6
3: (61) r1 = *(u32 *)(r1 +80)
4: (bf) r2 = r0
5: (07) r2 += 8
6: (2d) if r2 > r1 goto pc+1
R0=pkt(id=0,off=0,r=8,imm=0) R1=pkt_end(id=0,off=0,imm=0)
R2=pkt(id=0,off=8,r=8,imm=0) R6=ctx(id=0,off=0,imm=0)
R10=fp0,call_-1
7: (71) r0 = *(u8 *)(r0 +0)
8: (b7) r0 = 1
9: (95) exit
from 6 to 8: safe
processed 16 insns (limit 131072), stack depth 0+0
Basically what happens is that in the subprog we make use of a
div/mod by 0 exception and in the 'normal' subprog's exit path
we just return skb->data back to the main prog. This has the
implication that the verifier thinks we always get a pkt pointer
in R0 while we still have the implicit 'return 0' from the div
as an alternative unconditional return path earlier. Thus, R0
then contains 0, meaning back in the parent prog we get the
address range of [0x0, skb->data_end] as read and writeable.
Similar can be crafted with other pointer register types.
Since i) BPF_ABS/IND is not allowed in programs that contain
BPF to BPF calls (and generally it's also disadvised to use in
native eBPF context), ii) unknown opcodes don't return zero
anymore, iii) we don't return an exception code in dead branches,
the only last missing case affected and to fix is the div/mod
handling.
What we would really need is some infrastructure to propagate
exceptions all the way to the original prog unwinding the
current stack and returning that code to the caller of the
BPF program. In user space such exception handling for similar
runtimes is typically implemented with setjmp(3) and longjmp(3)
as one possibility which is not available in the kernel,
though (kgdb used to implement it in kernel long time ago). I
implemented a PoC exception handling mechanism into the BPF
interpreter with porting setjmp()/longjmp() into x86_64 and
adding a new internal BPF_ABRT opcode that can use a program
specific exception code for all exception cases we have (e.g.
div/mod by 0, unknown opcodes, etc). While this seems to work
in the constrained BPF environment (meaning, here, we don't
need to deal with state e.g. from memory allocations that we
would need to undo before going into exception state), it still
has various drawbacks: i) we would need to implement the
setjmp()/longjmp() for every arch supported in the kernel and
for x86_64, arm64, sparc64 JITs currently supporting calls,
ii) it has unconditional additional cost on main program
entry to store CPU register state in initial setjmp() call,
and we would need some way to pass the jmp_buf down into
___bpf_prog_run() for main prog and all subprogs, but also
storing on stack is not really nice (other option would be
per-cpu storage for this, but it also has the drawback that
we need to disable preemption for every BPF program types).
All in all this approach would add a lot of complexity.
Another poor-man's solution would be to have some sort of
additional shared register or scratch buffer to hold state
for exceptions, and test that after every call return to
chain returns and pass R0 all the way down to BPF prog caller.
This is also problematic in various ways: i) an additional
register doesn't map well into JITs, and some other scratch
space could only be on per-cpu storage, which, again has the
side-effect that this only works when we disable preemption,
or somewhere in the input context which is not available
everywhere either, and ii) this adds significant runtime
overhead by putting conditionals after each and every call,
as well as implementation complexity.
Yet another option is to teach verifier that div/mod can
return an integer, which however is also complex to implement
as verifier would need to walk such fake 'mov r0,<code>; exit;'
sequeuence and there would still be no guarantee for having
propagation of this further down to the BPF caller as proper
exception code. For parent prog, it is also is not distinguishable
from a normal return of a constant scalar value.
The approach taken here is a completely different one with
little complexity and no additional overhead involved in
that we make use of the fact that a div/mod by 0 is undefined
behavior. Instead of bailing out, we adapt the same behavior
as on some major archs like ARMv8 [0] into eBPF as well:
X div 0 results in 0, and X mod 0 results in X. aarch64 and
aarch32 ISA do not generate any traps or otherwise aborts
of program execution for unsigned divides. I verified this
also with a test program compiled by gcc and clang, and the
behavior matches with the spec. Going forward we adapt the
eBPF verifier to emit such rewrites once div/mod by register
was seen. cBPF is not touched and will keep existing 'return 0'
semantics. Given the options, it seems the most suitable from
all of them, also since major archs have similar schemes in
place. Given this is all in the realm of undefined behavior,
we still have the option to adapt if deemed necessary and
this way we would also have the option of more flexibility
from LLVM code generation side (which is then fully visible
to verifier). Thus, this patch i) fixes the panic seen in
above program and ii) doesn't bypass the verifier observations.
[0] ARM Architecture Reference Manual, ARMv8 [ARM DDI 0487B.b]
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487b.b/DDI0487B_b_armv8_arm.pdf
1) aarch64 instruction set: section C3.4.7 and C6.2.279 (UDIV)
"A division by zero results in a zero being written to
the destination register, without any indication that
the division by zero occurred."
2) aarch32 instruction set: section F1.4.8 and F5.1.263 (UDIV)
"For the SDIV and UDIV instructions, division by zero
always returns a zero result."
Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Recent findings by syzcaller fixed in 7891a87efc71 ("bpf: arsh is
not supported in 32 bit alu thus reject it") triggered a warning
in the interpreter due to unknown opcode not being rejected by
the verifier. The 'return 0' for an unknown opcode is really not
optimal, since with BPF to BPF calls, this would go untracked by
the verifier.
Do two things here to improve the situation: i) perform basic insn
sanity check early on in the verification phase and reject every
non-uapi insn right there. The bpf_opcode_in_insntable() table
reuses the same mapping as the jumptable in ___bpf_prog_run() sans
the non-public mappings. And ii) in ___bpf_prog_run() we do need
to BUG in the case where the verifier would ever create an unknown
opcode due to some rewrites.
Note that JITs do not have such issues since they would punt to
interpreter in these situations. Moreover, the BPF_JIT_ALWAYS_ON
would also help to avoid such unknown opcodes in the first place.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Given we recently had c131187db2d3 ("bpf: fix branch pruning
logic") and 95a762e2c8c9 ("bpf: fix incorrect sign extension in
check_alu_op()") in particular where before verifier skipped
verification of the wrongly assumed dead branch, we should not
just replace the dead code parts with nops (mov r0,r0). If there
is a bug such as fixed in 95a762e2c8c9 in future again, where
runtime could execute those insns, then one of the potential
issues with the current setting would be that given the nops
would be at the end of the program, we could execute out of
bounds at some point.
The best in such case would be to just exit the BPF program
altogether and return an exception code. However, given this
would require two instructions, and such a dead code gap could
just be a single insn long, we would need to place 'r0 = X; ret'
snippet at the very end after the user program or at the start
before the program (where we'd skip that region on prog entry),
and then place unconditional ja's into the dead code gap.
While more complex but possible, there's still another block
in the road that currently prevents from this, namely BPF to
BPF calls. The issue here is that such exception could be
returned from a callee, but the caller would not know that
it's an exception that needs to be propagated further down.
Alternative that has little complexity is to just use a ja-1
code for now which will trap the execution here instead of
silently doing bad things if we ever get there due to bugs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Very minor optimization; saves 1 byte per program in x86_64
JIT in cBPF prologue.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
* clk-iproc:
clk: iproc: Minor tidy up of iproc pll data structures
clk: iproc: Allow plls to do minor rate changes without reset
clk: iproc: Fix error in the pll post divider rate calculation
clk: iproc: Allow iproc pll to runtime calculate vco parameters
* clk-mvebu:
clk: mvebu: armada-37xx-periph: Use PTR_ERR_OR_ZERO()
* clk-qcom-a53:
clk: qcom: Add APCS clock controller support
clk: qcom: Add regmap mux-div clocks support
clk: qcom: Add A53 PLL support
|
|
'clk-pxa' into clk-next
* clk-at91:
clk: at91: pmc: Support backup for programmable clocks
clk: at91: pmc: Save SCSR during suspend
clk: at91: pmc: Wait for clocks when resuming
* clk-imx7ulp:
clk: Don't touch hardware when reparenting during registration
* clk-axigen:
clk: axi-clkgen: Round closest in round_rate() and recalc_rate()
clk: axi-clkgen: Correctly handle nocount bit in recalc_rate()
* clk-si5351:
clk: si5351: _si5351_clkout_reset_pll() can be static
clk: si5351: Do not enable parent clocks on probe
clk: si5351: Rename internal plls to avoid name collisions
clk: si5351: Apply PLL soft reset before enabling the outputs
clk: si5351: Add DT property to enable PLL reset
clk: si5351: implement remove handler
* clk-pxa:
clk: pxa: unbreak lookup of CLK_POUT
|
|
and 'clk-qcom-ipq8074' into clk-next
* clk-spreadtrum:
clk: sprd: add clocks support for SC9860
clk: sprd: Add dt-bindings include file for SC9860
dt-bindings: Add Spreadtrum clock binding documentation
clk: sprd: add adjustable pll support
clk: sprd: add composite clock support
clk: sprd: add divider clock support
clk: sprd: add mux clock support
clk: sprd: add gate clock support
clk: sprd: Add common infrastructure
clk: move clock common macros out from vendor directories
* clk-mvebu-dvfs:
clk: mvebu: armada-37xx-periph: add DVFS support for cpu clocks
clk: mvebu: armada-37xx-periph: prepare cpu clk to be used with DVFS
clk: mvebu: armada-37xx-periph: cosmetic changes
* clk-qoriq:
clk: qoriq: add more divider clocks support
* clk-imx:
clk: imx51: uart4, uart5 gates only exist on imx50, imx53
* clk-qcom-ipq8074:
clk: qcom: ipq8074: add misc resets for PCIE and NSS
dt-bindings: clock: qcom: add misc resets for PCIE and NSS
clk: qcom: ipq8074: add GP and Crypto clocks
clk: qcom: ipq8074: add NSS ethernet port clocks
clk: qcom: ipq8074: add NSS clocks
clk: qcom: ipq8074: add PCIE, USB and SDCC clocks
clk: qcom: ipq8074: add remaining PLL’s
dt-bindings: clock: qcom: add remaining clocks for IPQ8074
clk: qcom: ipq8074: fix missing GPLL0 divider width
clk: qcom: add parent map for regmap mux
clk: qcom: add read-only divider operations
|