Age | Commit message (Collapse) | Author |
|
Currently, init_listener() tries to prevent adding a filter with
SECCOMP_FILTER_FLAG_NEW_LISTENER if one of the existing filters already
has a listener. However, this check happens without holding any lock that
would prevent another thread from concurrently installing a new filter
(potentially with a listener) on top of the ones we already have.
Theoretically, this is also a data race: The plain load from
current->seccomp.filter can race with concurrent writes to the same
location.
Fix it by moving the check into the region that holds the siglock to guard
against concurrent TSYNC.
(The "Fixes" tag points to the commit that introduced the theoretical
data race; concurrent installation of another filter with TSYNC only
became possible later, in commit 51891498f2da ("seccomp: allow TSYNC and
USER_NOTIF together").)
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201005014401.490175-1-jannh@google.com
|
|
In order to make adding configurable features into seccomp easier,
it's better to have the options at one single location, considering
especially that the bulk of seccomp code is arch-independent. An quick
look also show that many SECCOMP descriptions are outdated; they talk
about /proc rather than prctl.
As a result of moving the config option and keeping it default on,
architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP
on by default prior to this and SECCOMP will be default in this change.
Architectures microblaze, mips, powerpc, s390, sh, and sparc have an
outdated depend on PROC_FS and this dependency is removed in this change.
Suggested-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
[kees: added HAVE_ARCH_SECCOMP help text, tweaked wording]
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu
|
|
As the UAPI headers start to appear in distros, we need to avoid
outdated versions of struct clone_args to be able to test modern
features, named "struct __clone_args". Additionally update the struct
size macro names to match UAPI names.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921075432.u4gis3s2o5qrsb5g@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Some archs (like powerpc) only support changing the return code during
syscall exit when ptrace is used. Test entry vs exit phases for which
portions of the syscall number and return values need to be set at which
different phases. For non-powerpc, all changes are made during ptrace
syscall entry, as before. For powerpc, the syscall number is changed at
ptrace syscall entry and the syscall return value is changed on ptrace
syscall exit.
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Link: https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-cascardo@canonical.com/
Fixes: 58d0a862f573 ("seccomp: add tests for ptrace hole")
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921075300.7iylzof2w5vrutah@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In preparation for setting syscall nr and ret values separately, refactor
the helpers to take a pointer to a value, so that a NULL can indicate
"do not change this respective value". This is done to keep the regset
read/write happening once and in one code path.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921075031.j4gruygeugkp2zwd@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In preparation for performing actions during ptrace syscall exit, save
the syscall number during ptrace syscall entry. Some architectures do
no have the syscall number available during ptrace syscall exit.
Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Link: https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-cascardo@canonical.com/
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921074354.6shkt2e5yhzhj3sn@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
On powerpc, the errno is not inverted, and depends on ccr.so being
set. Add this to a powerpc definition of SYSCALL_RET_SET().
Co-developed-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Link: https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-cascardo@canonical.com/
Fixes: 5d83c2b37d43 ("selftests/seccomp: Add powerpc support")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-13-keescook@chromium.org
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Instead of special-casing the specific case of shared registers, create
a default SYSCALL_RET_SET() macro (mirroring SYSCALL_NUM_SET()), that
writes to the SYSCALL_RET register. For architectures that can't set the
return value (for whatever reason), they can define SYSCALL_RET_SET()
without an associated SYSCALL_RET() macro. This also paves the way for
architectures that need to do special things to set the return value
(e.g. powerpc).
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-12-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
When none of the registers have changed, don't flush them back. This can
happen if the architecture uses a non-register way to change the syscall
(e.g. arm64) , and a return value hasn't been written.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-11-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Consolidate the REGSET logic into the new ARCH_GETREG() and
ARCH_SETREG() macros, avoiding more #ifdef code in function bodies.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-10-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Instead of special-casing the get/set-registers routines, move the
HAVE_GETREG logic into the new ARCH_GETREG() and ARCH_SETREG() macros.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-9-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
With all architectures now using the common SYSCALL_NUM_SET() macro, the
arch-specific #ifdef can be removed from change_syscall() itself.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-8-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Instead of having the mips O32 macro special-cased, pull the logic into
the SYSCALL_NUM() macro. Additionally include the ABI headers, since
these appear to have been missing, leaving __NR_O32_Linux undefined.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-7-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Remove the arm64 special-case in change_syscall().
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-6-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Remove the arm special-case in change_syscall().
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-5-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Remove the mips special-case in change_syscall().
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-4-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
In order to avoid "#ifdef"s in the main function bodies, create a new
macro, SYSCALL_NUM_SET(), where arch-specific logic can live.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-3-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
To avoid an xtensa special-case, refactor all arch register macros to
take the register variable instead of depending on the macro expanding
as a struct member name.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-2-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
The __NR_mknod syscall doesn't exist on arm64 (only __NR_mknodat).
Switch to the modern syscall.
Fixes: ad5682184a81 ("selftests/seccomp: Check for EPOLLHUP for user_notif")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/lkml/20200912110820.597135-16-keescook@chromium.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
This silences the following coccinelle warning:
"WARNING: sum of probable bitmasks, consider |"
tools/testing/selftests/seccomp/seccomp_bpf.c:3131:17-18: WARNING: sum of probable bitmasks, consider |
tools/testing/selftests/seccomp/seccomp_bpf.c:3133:18-19: WARNING: sum of probable bitmasks, consider |
tools/testing/selftests/seccomp/seccomp_bpf.c:3134:18-19: WARNING: sum of probable bitmasks, consider |
tools/testing/selftests/seccomp/seccomp_bpf.c:3135:18-19: WARNING: sum of probable bitmasks, consider |
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Link: https://lore.kernel.org/r/1586924101-65940-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
As described in commit a3460a59747c ("new helper: current_pt_regs()"):
- arch versions are "optimized versions".
- some architectures have task_pt_regs() working only for traced tasks
blocked on signal delivery. current_pt_regs() needs to work for *all*
processes.
In preparation for adding a coccinelle rule for using current_*(), instead
of raw accesses to current members, modify seccomp_do_user_notification(),
__seccomp_filter(), __secure_computing() to use current_pt_regs().
Signed-off-by: Denis Efremov <efremov@linux.com>
Link: https://lore.kernel.org/r/20200824125921.488311-1-efremov@linux.com
[kees: Reworded commit log, add comment to populate_seccomp_data()]
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
While we were testing for the behavior of unknown seccomp filter return
values, there was no test for how it acted in a thread group. Add a test
in the thread group tests for this.
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Asynchronous termination of a thread outside of the userspace thread
library's knowledge is an unsafe operation that leaves the process in
an inconsistent, corrupt, and possibly unrecoverable state. In order
to make new actions that may be added in the future safe on kernels
not aware of them, change the default action from
SECCOMP_RET_KILL_THREAD to SECCOMP_RET_KILL_PROCESS.
Signed-off-by: Rich Felker <dalias@libc.org>
Link: https://lore.kernel.org/r/20200829015609.GA32566@brightrain.aerifal.cx
[kees: Fixed up coredump selection logic to match]
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Christian and Kees both pointed out that this is a bit sloppy to open-code
both places, and Christian points out that we leave a dangling pointer to
->notif if file allocation fails. Since we check ->notif for null in order
to determine if it's ok to install a filter, this means people won't be
able to install a filter if the file allocation fails for some reason, even
if they subsequently should be able to.
To fix this, let's hoist this free+null into its own little helper and use
it.
Reported-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200902140953.1201956-1-tycho@tycho.pizza
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
I've changed my e-mail address to tycho.pizza, so let's reflect that in
these files.
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200902014017.934315-2-tycho@tycho.pizza
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
In seccomp_set_mode_filter() with TSYNC | NEW_LISTENER, we first initialize
the listener fd, then check to see if we can actually use it later in
seccomp_may_assign_mode(), which can fail if anyone else in our thread
group has installed a filter and caused some divergence. If we can't, we
partially clean up the newly allocated file: we put the fd, put the file,
but don't actually clean up the *memory* that was allocated at
filter->notif. Let's clean that up too.
To accomplish this, let's hoist the actual "detach a notifier from a
filter" code to its own helper out of seccomp_notify_release(), so that in
case anyone adds stuff to init_listener(), they only have to add the
cleanup code in one spot. This does a bit of extra locking and such on the
failure path when the filter is not attached, but it's a slow failure path
anyway.
Fixes: 51891498f2da ("seccomp: allow TSYNC and USER_NOTIF together")
Reported-by: syzbot+3ad9614a12f80994c32e@syzkaller.appspotmail.com
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200902014017.934315-1-tycho@tycho.pizza
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Add perf support for emitting extended registers for power10.
- A fix for CPU hotplug on pseries, where on large/loaded systems we
may not wait long enough for the CPU to be offlined, leading to
crashes.
- Addition of a raw cputable entry for Power10, which is not required
to boot, but is required to make our PMU setup work correctly in
guests.
- Three fixes for the recent changes on 32-bit Book3S to move modules
into their own segment for strict RWX.
- A fix for a recent change in our powernv PCI code that could lead to
crashes.
- A change to our perf interrupt accounting to avoid soft lockups when
using some events, found by syzkaller.
- A change in the way we handle power loss events from the hypervisor
on pseries. We no longer immediately shut down if we're told we're
running on a UPS.
- A few other minor fixes.
Thanks to Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T
Sudhakar, Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz,
Kajol Jain, Madhavan Srinivasan, Michael Neuling, Michael Roth,
Nageswara R Sastry, Oliver O'Halloran, Thiago Jung Bauermann,
Vaidyanathan Srinivasan, Vasant Hegde.
* tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver
powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000
powerpc/pseries: Do not initiate shutdown when system is running on UPS
powerpc/perf: Fix soft lockups due to missed interrupt accounting
powerpc/powernv/pci: Fix possible crash when releasing DMA resources
powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death
powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
powerpc/fixmap: Fix the size of the early debug area
powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled
powerpc/kernel: Cleanup machine check function declarations
powerpc: Add POWER10 raw mode cputable entry
powerpc/perf: Add extended regs support for power10 platform
powerpc/perf: Add support for outputting extended regs in perf intr_regs
powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for x86 which removes the RDPID usage from the paranoid
entry path and unconditionally uses LSL to retrieve the CPU number.
RDPID depends on MSR_TSX_AUX. KVM has an optmization to avoid
expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values
and restores them either when leaving the run loop, on preemption or
when going out to user space. MSR_TSX_AUX is part of that lazy MSR
set, so after writing the guest value and before the lazy restore any
exception using the paranoid entry will read the guest value and use
it as CPU number to retrieve the GSBASE value for the current CPU when
FSGSBASE is enabled. As RDPID is only used in that particular entry
path, there is no reason to burden VMENTER/EXIT with two extra MSR
writes. Remove the RDPID optimization, which is not even backed by
numbers from the paranoid entry path instead"
* tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fix from Thomas Gleixner:
"A single update for perf on x86 which has support for the broken down
bandwith counters"
* tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Thomas Gleixner:
- Enforce NX on RO data in mixed EFI mode
- Destroy workqueue in an error handling path to prevent UAF
- Stop argument parser at '--' which is the delimiter for init
- Treat a NULL command line pointer as empty instead of dereferncing it
unconditionally.
- Handle an unterminated command line correctly
- Cleanup the 32bit code leftovers and remove obsolete documentation
* tag 'efi-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: efi: remove description of efi=old_map
efi/x86: Move 32-bit code into efi_32.c
efi/libstub: Handle unterminated cmdline
efi/libstub: Handle NULL cmdline
efi/libstub: Stop parsing arguments at "--"
efi: add missed destroy_workqueue when efisubsys_init fails
efi/x86: Mark kernel rodata non-executable for mixed mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull entry fix from Thomas Gleixner:
"A single bug fix for the common entry code.
The transcription of the x86 version messed up the reload of the
syscall number from pt_regs after ptrace and seccomp which breaks
syscall number rewriting"
* tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
core/entry: Respect syscall number rewrites
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"A single fix correcting a reversed error severity determination check
which lead to a recoverable error getting marked as fatal, by Tony
Luck"
* tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/{i7core,sb,pnd2,skx}: Fix error event severity
|
|
Pull networking fixes from David Miller:
"Nothing earth shattering here, lots of small fixes (f.e. missing RCU
protection, bad ref counting, missing memset(), etc.) all over the
place:
1) Use get_file_rcu() in task_file iterator, from Yonghong Song.
2) There are two ways to set remote source MAC addresses in macvlan
driver, but only one of which validates things properly. Fix this.
From Alvin Šipraga.
3) Missing of_node_put() in gianfar probing, from Sumera
Priyadarsini.
4) Preserve device wanted feature bits across multiple netlink
ethtool requests, from Maxim Mikityanskiy.
5) Fix rcu_sched stall in task and task_file bpf iterators, from
Yonghong Song.
6) Avoid reset after device destroy in ena driver, from Shay
Agroskin.
7) Missing memset() in netlink policy export reallocation path, from
Johannes Berg.
8) Fix info leak in __smc_diag_dump(), from Peilin Ye.
9) Decapsulate ECN properly for ipv6 in ipv4 tunnels, from Mark
Tomlinson.
10) Fix number of data stream negotiation in SCTP, from David Laight.
11) Fix double free in connection tracker action module, from Alaa
Hleihel.
12) Don't allow empty NHA_GROUP attributes, from Nikolay Aleksandrov"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
net: nexthop: don't allow empty NHA_GROUP
bpf: Fix two typos in uapi/linux/bpf.h
net: dsa: b53: check for timeout
tipc: call rcu_read_lock() in tipc_aead_encrypt_done()
net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flow
net: sctp: Fix negotiation of the number of data streams.
dt-bindings: net: renesas, ether: Improve schema validation
gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY
hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
hv_netvsc: Remove "unlikely" from netvsc_select_queue
bpf: selftests: global_funcs: Check err_str before strstr
bpf: xdp: Fix XDP mode when no mode flags specified
selftests/bpf: Remove test_align leftovers
tools/resolve_btfids: Fix sections with wrong alignment
net/smc: Prevent kernel-infoleak in __smc_diag_dump()
sfc: fix build warnings on 32-bit
net: phy: mscc: Fix a couple of spelling mistakes "spcified" -> "specified"
libbpf: Fix map index used in error message
net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
net: atlantic: Use readx_poll_timeout() for large timeout
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull epoll fixes from Al Viro:
"Fix reference counting and clean up exit paths"
* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_epoll_ctl(): clean the failure exits up a bit
epoll: Keep a reference on files added to the check list
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When adding a new fd to an epoll, and that this new fd is an
epoll fd itself, we recursively scan the fds attached to it
to detect cycles, and add non-epool files to a "check list"
that gets subsequently parsed.
However, this check list isn't completely safe when deletions
can happen concurrently. To sidestep the issue, make sure that
a struct file placed on the check list sees its f_count increased,
ensuring that a concurrent deletion won't result in the file
disapearing from under our feet.
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Currently the nexthop code will use an empty NHA_GROUP attribute, but it
requires at least 1 entry in order to function properly. Otherwise we
end up derefencing null or random pointers all over the place due to not
having any nh_grp_entry members allocated, nexthop code relies on having at
least the first member present. Empty NHA_GROUP doesn't make any sense so
just disallow it.
Also add a WARN_ON for any future users of nexthop_create_group().
BUG: kernel NULL pointer dereference, address: 0000000000000080
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
RIP: 0010:fib_check_nexthop+0x4a/0xaa
Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
FS: 00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
Call Trace:
fib_create_info+0x64d/0xaf7
fib_table_insert+0xf6/0x581
? __vma_adjust+0x3b6/0x4d4
inet_rtm_newroute+0x56/0x70
rtnetlink_rcv_msg+0x1e3/0x20d
? rtnl_calcit.isra.0+0xb8/0xb8
netlink_rcv_skb+0x5b/0xac
netlink_unicast+0xfa/0x17b
netlink_sendmsg+0x334/0x353
sock_sendmsg_nosec+0xf/0x3f
____sys_sendmsg+0x1a0/0x1fc
? copy_msghdr_from_user+0x4c/0x61
___sys_sendmsg+0x63/0x84
? handle_mm_fault+0xa39/0x11b5
? sockfd_lookup_light+0x72/0x9a
__sys_sendmsg+0x50/0x6e
do_syscall_64+0x54/0xbe
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f10dacc0bb7
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
Modules linked in:
CR2: 0000000000000080
CC: David Ahern <dsahern@gmail.com>
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- move -Wsign-compare warning from W=2 to W=3
- fix the keyword _restrict to __restrict in genksyms
- fix more bugs in qconf
* tag 'kbuild-fixes-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: qconf: replace deprecated QString::sprintf() with QTextStream
kconfig: qconf: remove redundant help in the info view
kconfig: qconf: remove qInfo() to get back Qt4 support
kconfig: qconf: remove unused colNr
kconfig: qconf: fix the popup menu in the ConfigInfoView window
kconfig: qconf: fix signal connection to invalid slots
genksyms: keywords: Use __restrict not _restrict
kbuild: remove redundant patterns in filter/filter-out
extract-cert: add static to local data
Makefile.extrawarn: Move sign-compare from W=2 to W=3
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Allow booting of late secondary CPUs affected by erratum 1418040
(currently they are parked if none of the early CPUs are affected by
this erratum).
- Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
vdso_install' installs the 32-bit compat vdso when it is compiled.
- Print a warning that untrusted guests without a CPU erratum
workaround (Cortex-A57 832075) may deadlock the affected system.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ARM64: vdso32: Install vdso32 from vdso_install
KVM: arm64: Print warning when cpu erratum can cause guests to deadlock
arm64: Allow booting of late CPUs affected by erratum 1418040
arm64: Move handling of erratum 1418040 into C code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- a couple of fixes for storage key handling relevant for debugging
- add cond_resched into potentially slow subchannels scanning loop
- fixes for PF/VF linking and to ignore stale PCI configuration request
events
* tag 's390-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: fix PF/VF linking on hot plug
s390/pci: re-introduce zpci_remove_device()
s390/pci: fix zpci_bus_link_virtfn()
s390/ptrace: fix storage key handling
s390/runtime_instrumentation: fix storage key handling
s390/pci: ignore stale configuration request event
s390/cio: add cond_resched() in the slow_eval_known_fn() loop
|
|
Pull kvm fixes from Paolo Bonzini:
- PAE and PKU bugfixes for x86
- selftests fix for new binutils
- MMU notifier fix for arm64
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
KVM: x86: fix access code passed to gva_to_gpa
selftests: kvm: Use a shorter encoding to clear RAX
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"23 fixes in 5 drivers (qla2xxx, ufs, scsi_debug, fcoe, zfcp). The bulk
of the changes are in qla2xxx and ufs and all are mostly small and
definitely don't impact the core"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
Revert "scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe"
Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command"
scsi: qla2xxx: Fix null pointer access during disconnect from subsystem
scsi: qla2xxx: Check if FW supports MQ before enabling
scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba
scsi: qla2xxx: Allow ql2xextended_error_logging special value 1 to be set anytime
scsi: qla2xxx: Reduce noisy debug message
scsi: qla2xxx: Fix login timeout
scsi: qla2xxx: Indicate correct supported speeds for Mezz card
scsi: qla2xxx: Flush I/O on zone disable
scsi: qla2xxx: Flush all sessions on zone disable
scsi: qla2xxx: Use MBX_TOV_SECONDS for mailbox command timeout values
scsi: scsi_debug: Fix scp is NULL errors
scsi: zfcp: Fix use-after-free in request timeout handlers
scsi: ufs: No need to send Abort Task if the task in DB was cleared
scsi: ufs: Clean up completed request without interrupt notification
scsi: ufs: Improve interrupt handling for shared interrupts
scsi: ufs: Fix interrupt error message for shared interrupts
scsi: ufs-pci: Add quirk for broken auto-hibernate for Intel EHL
scsi: ufs-mediatek: Fix incorrect time to wait link status
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
"Another set of DT fixes:
- restore range parsing error check
- workaround PCI range parsing with missing 'device_type' now
required
- correct description of 'phy-connection-type'
- fix erroneous matching on 'snps,dw-pcie' by 'intel,lgm-pcie' schema
- a couple of grammar and whitespace fixes
- update Shawn Guo's email"
* tag 'devicetree-fixes-for-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: vendor-prefixes: Remove trailing whitespace
dt-bindings: net: correct description of phy-connection-type
dt-bindings: PCI: intel,lgm-pcie: Fix matching on all snps,dw-pcie instances
of: address: Work around missing device_type property in pcie nodes
dt: writing-schema: Miscellaneous grammar fixes
dt-bindings: Use Shawn Guo's preferred e-mail for i.MX bindings
of/address: check for invalid range.cpu_addr
|
|
Fixes: f516fb704d02fff2 ("dt-bindings: Whitespace clean-ups in schema files")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20200819092058.1526-1-geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():
| BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
| in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
| INFO: lockdep is turned off.
| CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
| Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
| Call trace:
| dump_backtrace+0x0/0x284
| show_stack+0x1c/0x28
| dump_stack+0xf0/0x1a4
| ___might_sleep+0x2bc/0x2cc
| unmap_stage2_range+0x160/0x1ac
| kvm_unmap_hva_range+0x1a0/0x1c8
| kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
| __mmu_notifier_invalidate_range_start+0x218/0x31c
| mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
| __oom_reap_task_mm+0x128/0x268
| oom_reap_task+0xac/0x298
| oom_reaper+0x178/0x17c
| kthread+0x1e4/0x1fc
| ret_from_fork+0x10/0x30
Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.
Cc: <stable@vger.kernel.org>
Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-3-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.
Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.
Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The phy-connection-type parameter is described in ePAPR 1.1:
Specifies interface type between the Ethernet device and a physical
layer (PHY) device. The value of this property is specific to the
implementation.
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://lore.kernel.org/r/1597917724-11127-1-git-send-email-madalin.bucur@oss.nxp.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Pull io_uring fixes from Jens Axboe:
- Make sure the head link cancelation includes async work
- Get rid of kiocb_wait_page_queue_init(), makes no sense to have it as
a separate function since you moved it into io_uring itself
- io_import_iovec cleanups (Pavel, me)
- Use system_unbound_wq for ring exit work, to avoid spawning tons of
these if we have tons of rings exiting at the same time
- Fix req->flags overflow flag manipulation (Pavel)
* tag 'io_uring-5.9-2020-08-21' of git://git.kernel.dk/linux-block:
io_uring: kill extra iovec=NULL in import_iovec()
io_uring: comment on kfree(iovec) checks
io_uring: fix racy req->flags modification
io_uring: use system_unbound_wq for ring exit work
io_uring: cleanup io_import_iovec() of pre-mapped request
io_uring: get rid of kiocb_wait_page_queue_init()
io_uring: find and cancel head link async work on files exit
|
|
The intel,lgm-pcie binding is matching on all snps,dw-pcie instances
which is wrong. Add a custom 'select' entry to fix this.
Fixes: e54ea45a4955 ("dt-bindings: PCI: intel: Add YAML schemas for the PCIe RC controller")
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Reviewed-by: Dilip Kota <eswara.kota@linux.intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|