Age | Commit message (Collapse) | Author |
|
[ Note, this is a Git cherry-pick of the following commit:
d17a1d97dc20: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow")
... for easier x86 PTI code testing and back-porting. ]
The KASAN shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt. However,
since that no longer zeroes the mapped pages, it is not suitable for
KASAN, which requires zeroed shadow memory.
Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate(). Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.
Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
[ Note, this is a Git cherry-pick of the following commit:
506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")
... for easier x86 PTI code testing and back-porting. ]
READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
can be used instead of lockless_dereference() without any change in
semantics.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
[ Note, this is a Git cherry-pick of the following commit:
a23f06f06dbe ("bpf: fix build issues on um due to mising bpf_perf_event.h")
... for easier x86 PTI code testing and back-porting. ]
Since c895f6f703ad ("bpf: correct broken uapi for
BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
on i386 or x86_64:
[...]
CC init/main.o
In file included from ../include/linux/perf_event.h:18:0,
from ../include/linux/trace_events.h:10,
from ../include/trace/syscall.h:7,
from ../include/linux/syscalls.h:82,
from ../init/main.c:20:
../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
asm/bpf_perf_event.h: No such file or directory #include
<asm/bpf_perf_event.h>
[...]
Lets add missing bpf_perf_event.h also to um arch. This seems
to be the only one still missing.
Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Richard Weinberger <richard@sigma-star.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@sigma-star.at>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
[ Note, this is a Git cherry-pick of the following commit:
a47ba4d77e12 ("perf/x86: Enable free running PEBS for REGS_USER/INTR")
... for easier x86 PTI code testing and back-porting. ]
Currently free running PEBS is disabled when user or interrupt
registers are requested. Most of the registers are actually
available in the PEBS record and can be supported.
So we just need to check for the supported registers and then
allow it: it is all except for the segment register.
For user registers this only works when the counter is limited
to ring 3 only, so this also needs to be checked.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170831214630.21892-1-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
[ Note, this is a Git cherry-pick of the following commit:
2b67799bdf25 ("x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD")
... for easier x86 PTI code testing and back-porting. ]
The latest AMD AMD64 Architecture Programmer's Manual
adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]).
If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES
/ FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers,
thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs.
Signed-Off-By: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
[ Note, this is a Git cherry-pick of the following commit: (limited to the cpufeatures.h file)
3522c2a6a4f3 ("x86/cpufeature: Add User-Mode Instruction Prevention definitions")
... for easier x86 PTI code testing and back-porting. ]
User-Mode Instruction Prevention is a security feature present in new
Intel processors that, when set, prevents the execution of a subset of
instructions if such instructions are executed in user mode (CPL > 0).
Attempting to execute such instructions causes a general protection
exception.
The subset of instructions comprises:
* SGDT - Store Global Descriptor Table
* SIDT - Store Interrupt Descriptor Table
* SLDT - Store Local Descriptor Table
* SMSW - Store Machine Status Word
* STR - Store Task Register
This feature is also added to the list of disabled-features to allow
a cleaner handling of build-time configuration.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1509935277-22138-7-git-send-email-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Merge a minimal set of virt cleanups, for a base for the MM isolation patches.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Conflicts:
arch/x86/kernel/cpu/Makefile
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull in a minimal set of v4.15 entry code changes, for a base for the MM isolation patches.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This reverts commits 5c9d2d5c269c, c7da82b894e9, and e7fe7b5cae90.
We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.
Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was. Dave
Hansen says:
"So, the underlying bug here is that we now a get_user_pages_remote()
and then go ahead and do the p*_access_permitted() checks against the
current PKRU. This was introduced recently with the addition of the
new p??_access_permitted() calls.
We have checks in the VMA path for the "remote" gups and we avoid
consulting PKRU for them. This got missed in the pkeys selftests
because I did a ptrace read, but not a *write*. I also didn't
explicitly test it against something where a COW needed to be done"
It's also not entirely clear that it makes sense to check the protection
key bits at this level at all. But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.
We'll see.
Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back. Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit e585513b76f7 ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot.
2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik
Brueckner.
3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes
Berg.
4) Add missing barriers to ptr_ring, from Michael S. Tsirkin.
5) Don't advertise gigabit in sh_eth when not available, from Thomas
Petazzoni.
6) Check network namespace when delivering to netlink taps, from Kevin
Cernekee.
7) Kill a race in raw_sendmsg(), from Mohamed Ghannam.
8) Use correct address in TCP md5 lookups when replying to an incoming
segment, from Christoph Paasch.
9) Add schedule points to BPF map alloc/free, from Eric Dumazet.
10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also
from Eric Dumazet.
11) Fix SKB leak in tipc, from Jon Maloy.
12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz.
13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn.
14) Add some new qmi_wwan device IDs, from Daniele Palmas.
15) Fix static key imbalance in ingress qdisc, from Jiri Pirko.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net: qcom/emac: Reduce timeout for mdio read/write
net: sched: fix static key imbalance in case of ingress/clsact_init error
net: sched: fix clsact init error path
ip_gre: fix wrong return value of erspan_rcv
net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
pkt_sched: Remove TC_RED_OFFLOADED from uapi
net: sched: Move to new offload indication in RED
net: sched: Add TCA_HW_OFFLOAD
net: aquantia: Increment driver version
net: aquantia: Fix typo in ethtool statistics names
net: aquantia: Update hw counters on hw init
net: aquantia: Improve link state and statistics check interval callback
net: aquantia: Fill in multicast counter in ndev stats from hardware
net: aquantia: Fill ndev stat couters from hardware
net: aquantia: Extend stat counters to 64bit values
net: aquantia: Fix hardware DMA stream overload on large MRRS
net: aquantia: Fix actual speed capabilities reporting
sock: free skb in skb_complete_tx_timestamp on error
s390/qeth: update takeover IPs after configuration change
s390/qeth: lock IP table while applying takeover changes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There are some significant fixes in here for FP state corruption,
hardware access/dirty PTE corruption and an erratum workaround for the
Falkor CPU.
I'm hoping that things finally settle down now, but never say never...
Summary:
- Fix FPSIMD context switch regression introduced in -rc2
- Fix ABI break with SVE CPUID register reporting
- Fix use of uninitialised variable
- Fixes to hardware access/dirty management and sanity checking
- CPU erratum workaround for Falkor CPUs
- Fix reporting of writeable+executable mappings
- Fix signal reporting for RAS errors"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fpsimd: Fix copying of FP state from signal frame into task struct
arm64/sve: Report SVE to userspace via CPUID only if supported
arm64: fix CONFIG_DEBUG_WX address reporting
arm64: fault: avoid send SIGBUS two times
arm64: hw_breakpoint: Use linux/uaccess.h instead of asm/uaccess.h
arm64: Add software workaround for Falkor erratum 1041
arm64: Define cputype macros for Falkor CPU
arm64: mm: Fix false positives in set_pte_at access/dirty race detection
arm64: mm: Fix pte_mkclean, pte_mkdirty semantics
arm64: Initialise high_memory global variable earlier
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- fix the s2ram regression related to confusion around segment
register restoration, plus related cleanups that make the code more
robust
- a guess-unwinder Kconfig dependency fix
- an isoimage build target fix for certain tool chain combinations
- instruction decoder opcode map fixes+updates, and the syncing of
the kernel decoder headers to the objtool headers
- a kmmio tracing fix
- two 5-level paging related fixes
- a topology enumeration fix on certain SMP systems"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version
x86/decoder: Fix and update the opcodes map
x86/power: Make restore_processor_context() sane
x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y
x86/build: Don't verify mtools configuration file for isoimage
x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
x86/boot/compressed/64: Print error if 5-level paging is not supported
x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two minor fixes for running as Xen dom0:
- when built as 32 bit kernel on large machines the Xen LAPIC
emulation should report a rather modern LAPIC in order to support
enough APIC-Ids
- The Xen LAPIC emulation is needed for dom0 only, so build it only
for kernels supporting to run as Xen dom0"
* tag 'for-linus-4.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: XEN_ACPI_PROCESSOR is Dom0-only
x86/Xen: don't report ancient LAPIC version
|
|
When LD_ABS/IND is used in the program, and we have a BPF helper
call that changes packet data (bpf_helper_changes_pkt_data() returns
true), then in case of sparc JIT, we try to reload cached skb data
from bpf2sparc[BPF_REG_6]. However, there is no such guarantee or
assumption that skb sits in R6 at this point, all helpers changing
skb data only have a guarantee that skb sits in R1. Therefore,
store BPF R1 in L7 temporarily and after procedure call use L7 to
reload cached skb data. skb sitting in R6 is only true at the time
when LD_ABS/IND is executed.
Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the helper
would enforce a reload such as in case of XDP. Here, we do
have a struct xdp_buff instead of struct sk_buff as context,
thus this will access garbage.
JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB.
Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the
BPF helper would enforce a reload such as in case of XDP.
Here, we do have a struct xdp_buff instead of struct sk_buff
as context, thus this will access garbage.
JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB only. Tested on s390x.
Fixes: 9db7f2b81880 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
state after signals") fixed an issue reported in our FPSIMD signal
restore code but inadvertently introduced another issue which tends to
manifest as random SEGVs in userspace.
The problem is that when we copy the struct fpsimd_state from the kernel
stack (populated from the signal frame) into the struct held in the
current thread_struct, we blindly copy uninitialised stack into the
"cpu" field, which means that context-switching of the FP registers is
no longer reliable.
This patch fixes the problem by copying only the user_fpsimd member of
struct fpsimd_state. We should really rework the function prototypes
to take struct user_fpsimd_state * instead, but let's just get this
fixed for now.
Cc: Dave Martin <Dave.Martin@arm.com>
Fixes: 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD state after signals")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Update x86-opcode-map.txt based on the October 2017 Intel SDM publication.
Fix INVPID to INVVPID.
Add UD0 and UD1 instruction opcodes.
Also sync the objtool and perf tooling copies of this file.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/aac062d7-c0f6-96e3-5c92-ed299e2bd3da@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
My previous attempt to fix a couple of bugs in __restore_processor_context():
5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
... introduced yet another bug, breaking suspend-resume.
Rather than trying to come up with a minimal fix, let's try to clean it up
for real. This patch fixes quite a few things:
- The old code saved a nonsensical subset of segment registers.
The only registers that need to be saved are those that contain
userspace state or those that can't be trivially restored without
percpu access working. (On x86_32, we can restore percpu access
by writing __KERNEL_PERCPU to %fs. On x86_64, it's easier to
save and restore the kernel's GSBASE.) With this patch, we
restore hardcoded values to the kernel state where applicable and
explicitly restore the user state after fixing all the descriptor
tables.
- We used to use an unholy mix of inline asm and C helpers for
segment register access. Let's get rid of the inline asm.
This fixes the reported s2ram hangs and make the code all around
more logical.
Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reported-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Zhang Rui <rui.zhang@intel.com>
Fixes: 5b06bbcfc2c6 ("x86/power: Fix some ordering bugs in __restore_processor_context()")
Link: http://lkml.kernel.org/r/398ee68e5c0f766425a7b746becfc810840770ff.1513286253.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
x86_64 restores system call MSRs in fix_processor_context(), and
x86_32 restored them along with segment registers. The 64-bit
variant makes more sense, so move the 32-bit code to match the
64-bit code.
No side effects are expected to runtime behavior.
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/65158f8d7ee64dd6bbc6c1c83b3b34aaa854e3ae.1513286253.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
x86_64's saved_context nonsensically used separate idt_limit and
idt_base fields and then cast &idt_limit to struct desc_ptr *.
This was correct (with -fno-strict-aliasing), but it's confusing,
served no purpose, and required #ifdeffery. Simplify this by
using struct desc_ptr directly.
No change in functionality.
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/967909ce38d341b01d45eff53e278e2728a3a93a.1513286253.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Reported by syzkaller:
WARNING: CPU: 0 PID: 27962 at arch/x86/kvm/emulate.c:5631 x86_emulate_insn+0x557/0x15f0 [kvm]
Modules linked in: kvm_intel kvm [last unloaded: kvm]
CPU: 0 PID: 27962 Comm: syz-executor Tainted: G B W 4.15.0-rc2-next-20171208+ #32
Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.01.03.0006.040720161253 04/07/2016
RIP: 0010:x86_emulate_insn+0x557/0x15f0 [kvm]
RSP: 0018:ffff8807234476d0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88072d0237a0 RCX: ffffffffa0065c4d
RDX: 1ffff100e5a046f9 RSI: 0000000000000003 RDI: ffff88072d0237c8
RBP: ffff880723447728 R08: ffff88072d020000 R09: ffffffffa008d240
R10: 0000000000000002 R11: ffffed00e7d87db3 R12: ffff88072d0237c8
R13: ffff88072d023870 R14: ffff88072d0238c2 R15: ffffffffa008d080
FS: 00007f8a68666700(0000) GS:ffff880802200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002009506c CR3: 000000071fec4005 CR4: 00000000003626f0
Call Trace:
x86_emulate_instruction+0x3bc/0xb70 [kvm]
? reexecute_instruction.part.162+0x130/0x130 [kvm]
vmx_handle_exit+0x46d/0x14f0 [kvm_intel]
? trace_event_raw_event_kvm_entry+0xe7/0x150 [kvm]
? handle_vmfunc+0x2f0/0x2f0 [kvm_intel]
? wait_lapic_expire+0x25/0x270 [kvm]
vcpu_enter_guest+0x720/0x1ef0 [kvm]
...
When CS.L is set, vcpu should run in the 64 bit paging mode.
Current kvm set_sregs function doesn't have such check when
userspace inputs sreg values. This will lead unexpected behavior.
This patch is to add checks for CS.L, EFER.LME, EFER.LMA and
CR4.PAE when get SREG inputs from userspace in order to avoid
unexpected behavior.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Tianyu Lan <tianyu.lan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- add a pci_get_domain_bus_and_slot() stub for the CONFIG_PCI=n case to
avoid build breakage in the v4.16 merge window if a
pci_get_bus_and_slot() -> pci_get_domain_bus_and_slot() patch gets
merged before the PCI tree (Randy Dunlap)
- fix an AMD boot regression in the 64bit BAR support added in v4.15
(Christian König)
- fix an R-Car use-after-free that causes a crash if no PCIe card is
present (Geert Uytterhoeven)
* tag 'pci-v4.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: rcar: Fix use-after-free in probe error path
x86/PCI: Only enable a 64bit BAR on single-socket AMD Family 15h
x86/PCI: Fix infinite loop in search for 64bit BAR placement
PCI: Add pci_get_domain_bus_and_slot() stub
|
|
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, the SVE field in ID_AA64PFR0_EL1 is visible
unconditionally to userspace via the CPU ID register emulation,
irrespective of the kernel config. This means that if a kernel
configured with CONFIG_ARM64_SVE=n is run on SVE-capable hardware,
userspace will see SVE reported as present in the ID regs even
though the kernel forbids execution of SVE instructions.
This patch makes the exposure of the SVE field in ID_AA64PFR0_EL1
conditional on CONFIG_ARM64_SVE=y.
Since future architecture features are likely to encounter a
similar requirement, this patch adds a suitable helper macros for
use when declaring config-conditional ID register fields.
Fixes: 43994d824e84 ("arm64/sve: Detect SVE and activate runtime support")
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In ptdump_check_wx(), we pass walk_pgd() a start address of 0 (rather
than VA_START) for the init_mm. This means that any reported W&X
addresses are offset by VA_START, which is clearly wrong and can make
them appear like userspace addresses.
Fix this by telling the ptdump code that we're walking init_mm starting
at VA_START. We don't need to update the addr_markers, since these are
still valid bounds regardless.
Cc: <stable@vger.kernel.org>
Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <labbott@redhat.com>
Reported-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
------------[ cut here ]------------
Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers.
WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90
CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10
RIP: 0010:ex_handler_fprestore+0x88/0x90
Call Trace:
fixup_exception+0x4e/0x60
do_general_protection+0xff/0x270
general_protection+0x22/0x30
RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm]
RSP: 0018:ffff8803d5627810 EFLAGS: 00010246
kvm_vcpu_reset+0x3b4/0x3c0 [kvm]
kvm_apic_accept_events+0x1c0/0x240 [kvm]
kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm]
kvm_vcpu_ioctl+0x479/0x880 [kvm]
do_vfs_ioctl+0x142/0x9a0
SyS_ioctl+0x74/0x80
do_syscall_64+0x15f/0x600
where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu.
To fix it, move kvm_load_guest_fpu to the very beginning of
kvm_arch_vcpu_ioctl_run.
Cc: stable@vger.kernel.org
Fixes: f775b13eedee2f7f3c6fdd4e90fb79090ce5d339
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
*** Guest State ***
CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
CR3 = 0x00000000fffbc000
RSP = 0x0000000000000000 RIP = 0x0000000000000000
RFLAGS=0x00000000 DR7 = 0x0000000000000400
^^^^^^^^^^
The failed vmentry is triggered by the following testcase when ept=Y:
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <stdint.h>
#include <linux/kvm.h>
#include <fcntl.h>
#include <sys/ioctl.h>
long r[5];
int main()
{
r[2] = open("/dev/kvm", O_RDONLY);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
struct kvm_regs regs = {
.rflags = 0,
};
ioctl(r[4], KVM_SET_REGS, ®s);
ioctl(r[4], KVM_RUN, 0);
}
X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1
of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails.
This patch fixes it by oring X86_EFLAGS_FIXED during ioctl.
Cc: stable@vger.kernel.org
Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Quan Xu <quan.xu0@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The below test case can cause infinite loop in kvm when ept=0.
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <stdint.h>
#include <linux/kvm.h>
#include <fcntl.h>
#include <sys/ioctl.h>
long r[5];
int main()
{
r[2] = open("/dev/kvm", O_RDONLY);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
ioctl(r[4], KVM_RUN, 0);
}
It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in
kvm return 1 when kvm fails to allocate root page table which can result
in beblow infinite loop:
vcpu_run() {
for (;;) {
r = vcpu_enter_guest()::kvm_mmu_reload() returns 1
if (r <= 0)
break;
if (need_resched())
cond_resched();
}
}
This patch fixes it by returning -ENOSPC when there is no available kvm mmu
page for root page table.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 26eeb53cf0f (KVM: MMU: Bail out immediately if there is no available mmu page)
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains three small fixes:
- A fix to a typo in sys_riscv_flush_icache. This only effects error
handling, but I think it's a small and obvious enough change that
it's sane outside the merge window.
- The addition of smp_mb__after_spinlock(), which was recently
removed due to an incorrect comment. This is largly a comment
change (as there's a big one now), and while it's necessary for
complience with the RISC-V memory model the lack of this fence
shouldn't manifest as a bug on current implementations.
Nonetheless, it still seems saner to have the fence in 4.15.
- The removal of some of the HVC_RISCV_SBI driver that snuck into the
arch port. This is compile-time dead code in 4.15 (as the driver
isn't in yet), and during the review process we found a better way
to implement early printk on RISC-V. While this change doesn't do
anything, it will make staging our HVC driver easier: without this
change the HVC driver we hope to upstream won't build on 4.15
(because the 4.15 arch code would reference a function that no
longer exists).
I don't think this is the last patch set we'll want for 4.15: I think
I'll want to remove some of the first-level irqchip driver that snuck
in as well, which will look a lot like the HVC patch here. This is
pending some asm-generic cleanup I'm doing that I haven't quite gotten
clean enough to send out yet, though, but hopefully it'll be ready by
next week (and still OK for that late)"
* tag 'riscv-for-linus-4.15-rc4-riscv_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
RISC-V: Remove unused CONFIG_HVC_RISCV_SBI code
RISC-V: Resurrect smp_mb__after_spinlock()
RISC-V: Logical vs Bitwise typo
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-13
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Addition of explicit scheduling points to map alloc/free
in order to avoid having to hold the CPU for too long,
from Eric.
2) Fixing of a corruption in overlapping perf_event_output
calls from different BPF prog types on the same CPU out
of different contexts, from Daniel.
3) Fallout fixes for recent correction of broken uapi for
BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header
that needed to be pulled in from asm-generic and for
BPF selftests the asm-generic include did not work,
so similar asm include scheme was adapted for that
problematic header that perf is having with other
header files under tools, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use XAUI rather than XGMII for DSA link ports, as this is the interface
mode that the switches actually use. XAUI is the 4 lane bus with clock
per direction, whereas XGMII is a 32 bit bus with clock.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
do_sea() calls arm64_notify_die() which will always signal
user-space. It also returns whether APEI claimed the external
abort as a RAS notification. If it returns failure do_mem_abort()
will signal user-space too.
do_mem_abort() wants to know if we handled the error, we always
call arm64_notify_die() so can always return success.
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
imc_common_cpuhp_mem_free() is the common function for all
IMC (In-memory Collection counters) domains to unregister cpuhotplug
callback and free memory. Since kfree of memory allocated for
nest-imc (per_nest_pmu_arr) is in the common code, all
domains (core/nest/thread) can do the kfree in the failure case.
This could potentially create a call trace as shown below, where
core(/thread/nest) imc pmu initialization fails and in the failure
path imc_common_cpuhp_mem_free() free the memory(per_nest_pmu_arr),
which is allocated by successfully registered nest units.
The call trace is generated in a scenario where core-imc
initialization is made to fail and a cpuhotplug is performed in a p9
system. During cpuhotplug ppc_nest_imc_cpu_offline() tries to access
per_nest_pmu_arr, which is already freed by core-imc.
NIP [c000000000cb6a94] mutex_lock+0x34/0x90
LR [c000000000cb6a88] mutex_lock+0x28/0x90
Call Trace:
mutex_lock+0x28/0x90 (unreliable)
perf_pmu_migrate_context+0x90/0x3a0
ppc_nest_imc_cpu_offline+0x190/0x1f0
cpuhp_invoke_callback+0x160/0x820
cpuhp_thread_fun+0x1bc/0x270
smpboot_thread_fn+0x250/0x290
kthread+0x1a8/0x1b0
ret_from_kernel_thread+0x5c/0x74
To address this scenario do the kfree(per_nest_pmu_arr) only in case
of nest-imc initialization failure, and when there is no other nest
units registered.
Fixes: 73ce9aec65b1 ("powerpc/perf: Fix IMC_MAX_PMU macro")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Oops is observed during boot:
Faulting instruction address: 0xc000000000248340
cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000000ff66fb850]
pc: c000000000248340: event_function_call+0x50/0x1f0
lr: c00000000024878c: perf_remove_from_context+0x3c/0x100
sp: c000000ff66fbad0
msr: 9000000000009033
dar: 7d20e2a6f92d03c0
pid = 14, comm = cpuhp/0
While registering the cpuhotplug callbacks for nest-imc, if we fail in
the cpuhotplug online path for any random node in a multi node
system (because the opal call to stop nest-imc counters fails for that
node), ppc_nest_imc_cpu_offline() will get invoked for other nodes who
successfully returned from cpuhotplug online path.
This call trace is generated since in the ppc_nest_imc_cpu_offline()
path we are trying to migrate the event context, when nest-imc
counters are not even initialized.
Patch to add a check to ensure that nest-imc is registered before
migrating the event context.
Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
It's theoretically possible that branch instructions recorded in
BHRB (Branch History Rolling Buffer) entries have already been
unmapped before they are processed by the kernel. Hence, trying to
dereference such memory location will result in a crash. eg:
Unable to handle kernel paging request for data at address 0xd000000019c41764
Faulting instruction address: 0xc000000000084a14
NIP [c000000000084a14] branch_target+0x4/0x70
LR [c0000000000eb828] record_and_restart+0x568/0x5c0
Call Trace:
[c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
[c0000000000ec378] perf_event_interrupt+0x298/0x460
[c000000000027964] performance_monitor_exception+0x54/0x70
[c000000000009ba4] performance_monitor_common+0x114/0x120
Fix it by deferefencing the addresses safely.
Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB")
Cc: stable@vger.kernel.org # v3.10+
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use probe_kernel_read() which is clearer, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Since c895f6f703ad ("bpf: correct broken uapi for
BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
on i386 or x86_64:
[...]
CC init/main.o
In file included from ../include/linux/perf_event.h:18:0,
from ../include/linux/trace_events.h:10,
from ../include/trace/syscall.h:7,
from ../include/linux/syscalls.h:82,
from ../init/main.c:20:
../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
asm/bpf_perf_event.h: No such file or directory #include
<asm/bpf_perf_event.h>
[...]
Lets add missing bpf_perf_event.h also to um arch. This seems
to be the only one still missing.
Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Richard Weinberger <richard@sigma-star.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@sigma-star.at>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Unconditionally reporting a value seen on the P4 or older invokes
functionality like io_apic_get_unique_id() on 32-bit builds, resulting
in a panic() with sufficiently many CPUs and/or IO-APICs. Doing what
that function does would be the hypervisor's responsibility anyway, so
makes no sense to be used when running on Xen. Uniformly report a more
modern version; this shouldn't matter much as both LAPIC and IO-APIC are
being managed entirely / mostly by the hypervisor.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
The only inclusion of asm/uaccess.h should be by linux/uaccess.h. All
other headers should use the latter.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The ARM architecture defines the memory locations that are permitted
to be accessed as the result of a speculative instruction fetch from
an exception level for which all stages of translation are disabled.
Specifically, the core is permitted to speculatively fetch from the
4KB region containing the current program counter 4K and next 4K.
When translation is changed from enabled to disabled for the running
exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
Falkor core may errantly speculatively access memory locations outside
of the 4KB region permitted by the architecture. The errant memory
access may lead to one of the following unexpected behaviors.
1) A System Error Interrupt (SEI) being raised by the Falkor core due
to the errant memory access attempting to access a region of memory
that is protected by a slave-side memory protection unit.
2) Unpredictable device behavior due to a speculative read from device
memory. This behavior may only occur if the instruction cache is
disabled prior to or coincident with translation being changed from
enabled to disabled.
The conditions leading to this erratum will not occur when either of the
following occur:
1) A higher exception level disables translation of a lower exception level
(e.g. EL2 changing SCTLR_EL1[M] from a value of 1 to 0).
2) An exception level disabling its stage-1 translation if its stage-2
translation is enabled (e.g. EL1 changing SCTLR_EL1[M] from a value of 1
to 0 when HCR_EL2[VM] has a value of 1).
To avoid the errant behavior, software must execute an ISB immediately
prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Add cputype definition macros for Qualcomm Datacenter Technologies
Falkor CPU in cputype.h. It's unfortunate that the first revision
of the Falkor CPU used the wrong part number 0x800, got fixed in v2
chip with part number 0xC00, and would be used the same value for
future revisions.
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Jiankang reports that our race detection in set_pte_at is firing when
copying the page tables in dup_mmap as a result of a fork(). In this
situation, the page table isn't actually live and so there is no way
that we can race with a concurrent update from the hardware page table
walker.
This patch reworks the race detection so that we require either the
mm to match the current active_mm (i.e. currently installed in our TTBR0)
or the mm_users count to be greater than 1, implying that the page table
could be live in another CPU. The mm_users check might still be racy,
but we'll avoid false positives and it's not realistic to validate that
all the necessary locks are held as part of this assertion.
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Jiankang Chen <chenjiankang1@huawei.com>
Tested-by: Jiankang Chen <chenjiankang1@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This push fixes the following issues:
- buffer overread in RSA
- potential use after free in algif_aead.
- error path null pointer dereference in af_alg
- forbid combinations such as hmac(hmac(sha3)) which may crash
- crash in salsa20 due to incorrect API usage"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: salsa20 - fix blkcipher_walk API usage
crypto: hmac - require that the underlying hash algorithm is unkeyed
crypto: af_alg - fix NULL pointer dereference in
crypto: algif_aead - fix reference counting of null skcipher
crypto: rsa - fix buffer overread when stripping leading zeroes
|
|
CONFIG_STACKDEPOT=y
Stackdepot doesn't work well with CONFIG_UNWINDER_GUESS=y.
The 'guess' unwinder generate awfully large and inaccurate stacktraces,
thus stackdepot can't deduplicate stacktraces because they all look like
unique. Eventually stackdepot reaches its capacity limit:
WARNING: CPU: 0 PID: 545 at lib/stackdepot.c:119 depot_save_stack+0x28e/0x550
Call Trace:
? kasan_kmalloc+0x144/0x160
? depot_save_stack+0x1f5/0x550
? do_raw_spin_unlock+0xda/0xf0
? preempt_count_sub+0x13/0xc0
<...90 lines...>
? do_raw_spin_unlock+0xda/0xf0
Add a STACKDEPOT=n dependency to UNWINDER_GUESS to avoid the problem.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171130123554.4330-1-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
If mtools.conf is not generated before, 'make isoimage' could complain:
Kernel: arch/x86/boot/bzImage is ready (#597)
GENIMAGE arch/x86/boot/image.iso
*** Missing file: arch/x86/boot/mtools.conf
arch/x86/boot/Makefile:144: recipe for target 'isoimage' failed
mtools.conf is not used for isoimage generation, so do not check it.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4366d57af1 ("x86/build: Factor out fdimage/isoimage generation commands to standalone script")
Link: http://lkml.kernel.org/r/1512053480-8083-1-git-send-email-changbin.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
On systems with hardware dirty bit management, the ltp madvise09 unit
test fails due to dirty bit information being lost and pages being
incorrectly freed.
This was bisected to:
arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()
Reverting this commit leads to a separate problem, that the unit test
retains pages that should have been dropped due to the function
madvise_free_pte_range(.) not cleaning pte's properly.
Currently pte_mkclean only clears the software dirty bit, thus the
following code sequence can appear:
pte = pte_mkclean(pte);
if (pte_dirty(pte))
// this condition can return true with HW DBM!
This patch also adjusts pte_mkclean to set PTE_RDONLY thus effectively
clearing both the SW and HW dirty information.
In order for this to function on systems without HW DBM, we need to
also adjust pte_mkdirty to remove the read only bit from writable pte's
to avoid infinite fault loops.
Cc: <stable@vger.kernel.org>
Fixes: 64c26841b349 ("arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()")
Reported-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Tested-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The high_memory global variable is used by
cma_declare_contiguous(.) before it is defined.
We don't notice this as we compute __pa(high_memory - 1), and it looks
like we're processing a VA from the direct linear map.
This problem becomes apparent when we flip the kernel virtual address
space and the linear map is moved to the bottom of the kernel VA space.
This patch moves the initialisation of high_memory before it used.
Cc: <stable@vger.kernel.org>
Fixes: f7426b983a6a ("mm: cma: adjust address limit to avoid hitting low/high memory boundary")
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This is code that probably should never have made it into the kernel in
the first place: it depends on a driver that hadn't been reviewed yet.
During the HVC_SBI_RISCV review process a better way of doing this was
suggested, but that means this code is defunct. It's compile-time
disabled in 4.15 because the driver isn't in, so I think it's safe to
just remove this for now.
CC: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|