Age | Commit message (Collapse) | Author |
|
The current way we deal with PtrAuth is a bit heavy handed:
- We forcefully save the host's keys on each vcpu_load()
- Handling the PtrAuth trap forces us to go all the way back
to the exit handling code to just set the HCR bits
Overall, this is pretty cumbersome. A better approach would be
to handle it the same way we deal with the FPSIMD registers:
- On vcpu_load() disable PtrAuth for the guest
- On first use, save the host's keys, enable PtrAuth in the
guest
Crucially, this can happen as a fixup, which is done very early
on exit. We can then reenter the guest immediately without
leaving the hypervisor role.
Another thing is that it simplify the rest of the host handling:
exiting all the way to the host means that the only possible
outcome for this trap is to inject an UNDEF.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Make x86_fpu_cache static now that FPU allocation and destruction is
handled entirely by common x86 code.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200608180218.20946-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Explicitly set the VA width to 48 bits for the x86_64-only PXXV48_4K VM
mode instead of asserting the guest VA width is 48 bits. The fact that
KVM supports 5-level paging is irrelevant unless the selftests opt-in to
5-level paging by setting CR4.LA57 for the guest. The overzealous
assert prevents running the selftests on a kernel with 5-level paging
enabled.
Incorporate LA57 into the assert instead of removing the assert entirely
as a sanity check of KVM's CPUID output.
Fixes: 567a9f1e9deb ("KVM: selftests: Introduce VM_MODE_PXXV48_4K")
Reported-by: Sergio Perez Gonzalez <sergio.perez.gonzalez@intel.com>
Cc: Adriana Cervantes Jimenez <adriana.cervantes.jimenez@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200528021530.28091-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When using the PtrAuth feature in a guest, we need to save the host's
keys before allowing the guest to program them. For that, we dump
them in a per-CPU data structure (the so called host context).
But both call sites that do this are in preemptible context,
which may end up in disaster should the vcpu thread get preempted
before reentering the guest.
Instead, save the keys eagerly on each vcpu_load(). This has an
increased overhead, but is at least safe.
Cc: stable@vger.kernel.org
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
'jiffies' and 'jiffies_64' are meant to alias (two different symbols that
share the same address). Most architectures make the symbols alias to the
same address via a linker script assignment in their
arch/<arch>/kernel/vmlinux.lds.S:
jiffies = jiffies_64;
which is effectively a definition of jiffies.
jiffies and jiffies_64 are both forward declared for all architectures in
include/linux/jiffies.h. jiffies_64 is defined in kernel/time/timer.c.
x86_64 was peculiar in that it wasn't doing the above linker script
assignment, but rather was:
1. defining jiffies in arch/x86/kernel/time.c instead via the linker script.
2. overriding the symbol jiffies_64 from kernel/time/timer.c in
arch/x86/kernel/vmlinux.lds.s via 'jiffies_64 = jiffies;'.
As Fangrui notes:
In LLD, symbol assignments in linker scripts override definitions in
object files. GNU ld appears to have the same behavior. It would
probably make sense for LLD to error "duplicate symbol" but GNU ld
is unlikely to adopt for compatibility reasons.
This results in an ODR violation (UB), which seems to have survived
thus far. Where it becomes harmful is when;
1. -fno-semantic-interposition is used:
As Fangrui notes:
Clang after LLVM commit 5b22bcc2b70d
("[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local")
defaults to -fno-semantic-interposition similar semantics which help
-fpic/-fPIC code avoid GOT/PLT when the referenced symbol is defined
within the same translation unit. Unlike GCC
-fno-semantic-interposition, Clang emits such relocations referencing
local symbols for non-pic code as well.
This causes references to jiffies to refer to '.Ljiffies$local' when
jiffies is defined in the same translation unit. Likewise, references to
jiffies_64 become references to '.Ljiffies_64$local' in translation units
that define jiffies_64. Because these differ from the names used in the
linker script, they will not be rewritten to alias one another.
2. Full LTO
Full LTO effectively treats all source files as one translation
unit, causing these local references to be produced everywhere. When
the linker processes the linker script, there are no longer any
references to jiffies_64' anywhere to replace with 'jiffies'. And
thus '.Ljiffies$local' and '.Ljiffies_64$local' no longer alias
at all.
In the process of porting patches enabling Full LTO from arm64 to x86_64,
spooky bugs have been observed where the kernel appeared to boot, but init
doesn't get scheduled.
Avoid the ODR violation by matching other architectures and define jiffies
only by linker script. For -fno-semantic-interposition + Full LTO, there
is no longer a global definition of jiffies for the compiler to produce a
local symbol which the linker script won't ensure aliases to jiffies_64.
Fixes: 40747ffa5aa8 ("asmlinkage: Make jiffies visible")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Alistair Delva <adelva@google.com>
Debugged-by: Nick Desaulniers <ndesaulniers@google.com>
Debugged-by: Sami Tolvanen <samitolvanen@google.com>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Bob Haarman <inglorion@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # build+boot on
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/852
Link: https://lkml.kernel.org/r/20200602193100.229287-1-inglorion@google.com
|
|
Currently, it is possible to enable indirect branch speculation even after
it was force-disabled using the PR_SPEC_FORCE_DISABLE option. Moreover, the
PR_GET_SPECULATION_CTRL command gives afterwards an incorrect result
(force-disabled when it is in fact enabled). This also is inconsistent
vs. STIBP and the documention which cleary states that
PR_SPEC_FORCE_DISABLE cannot be undone.
Fix this by actually enforcing force-disabled indirect branch
speculation. PR_SPEC_ENABLE called after PR_SPEC_FORCE_DISABLE now fails
with -EPERM as described in the documentation.
Fixes: 9137bb27e60e ("x86/speculation: Add prctl() control for indirect branch speculation")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
|
|
On context switch the change of TIF_SSBD and TIF_SPEC_IB are evaluated
to adjust the mitigations accordingly. This is optimized to avoid the
expensive MSR write if not needed.
This optimization is buggy and allows an attacker to shutdown the SSBD
protection of a victim process.
The update logic reads the cached base value for the speculation control
MSR which has neither the SSBD nor the STIBP bit set. It then OR's the
SSBD bit only when TIF_SSBD is different and requests the MSR update.
That means if TIF_SSBD of the previous and next task are the same, then
the base value is not updated, even if TIF_SSBD is set. The MSR write is
not requested.
Subsequently if the TIF_STIBP bit differs then the STIBP bit is updated
in the base value and the MSR is written with a wrong SSBD value.
This was introduced when the per task/process conditional STIPB
switching was added on top of the existing SSBD switching.
It is exploitable if the attacker creates a process which enforces SSBD
and has the contrary value of STIBP than the victim process (i.e. if the
victim process enforces STIBP, the attacker process must not enforce it;
if the victim process does not enforce STIBP, the attacker process must
enforce it) and schedule it on the same core as the victim process. If
the victim runs after the attacker the victim becomes vulnerable to
Spectre V4.
To fix this, update the MSR value independent of the TIF_SSBD difference
and dependent on the SSBD mitigation method available. This ensures that
a subsequent STIPB initiated MSR write has the correct state of SSBD.
[ tglx: Handle X86_FEATURE_VIRT_SSBD & X86_FEATURE_VIRT_SSBD correctly
and massaged changelog ]
Fixes: 5bfbe3ad5840 ("x86/speculation: Prepare for per task indirect branch speculation control")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
|
|
When STIBP is unavailable or enhanced IBRS is available, Linux
force-disables the IBPB mitigation of Spectre-BTB even when simultaneous
multithreading is disabled. While attempts to enable IBPB using
prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, ...) fail with
EPERM, the seccomp syscall (or its prctl(PR_SET_SECCOMP, ...) equivalent)
which are used e.g. by Chromium or OpenSSH succeed with no errors but the
application remains silently vulnerable to cross-process Spectre v2 attacks
(classical BTB poisoning). At the same time the SYSFS reporting
(/sys/devices/system/cpu/vulnerabilities/spectre_v2) displays that IBPB is
conditionally enabled when in fact it is unconditionally disabled.
STIBP is useful only when SMT is enabled. When SMT is disabled and STIBP is
unavailable, it makes no sense to force-disable also IBPB, because IBPB
protects against cross-process Spectre-BTB attacks regardless of the SMT
state. At the same time since missing STIBP was only observed on AMD CPUs,
AMD does not recommend using STIBP, but recommends using IBPB, so disabling
IBPB because of missing STIBP goes directly against AMD's advice:
https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf
Similarly, enhanced IBRS is designed to protect cross-core BTB poisoning
and BTB-poisoning attacks from user space against kernel (and
BTB-poisoning attacks from guest against hypervisor), it is not designed
to prevent cross-process (or cross-VM) BTB poisoning between processes (or
VMs) running on the same core. Therefore, even with enhanced IBRS it is
necessary to flush the BTB during context-switches, so there is no reason
to force disable IBPB when enhanced IBRS is available.
Enable the prctl control of IBPB even when STIBP is unavailable or enhanced
IBRS is available.
Fixes: 7cc765a67d8e ("x86/speculation: Enable prctl mode for spectre_v2_user")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
|
|
KVM sets HCR_EL2.TACR via HCR_GUEST_FLAGS. This means ACTLR* accesses
from the guest are always trapped, and always return the value in the
sys_regs array.
The guest can't change the value of these registers, so we are
save restoring the reset value, which came from the host.
Stop save/restoring this register. Keep the storage for this register
in sys_regs[] as this is how the value is exposed to user-space,
removing it would break migration.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200529150656.7339-4-james.morse@arm.com
|
|
ACTLR_EL1 is a 64bit register while the 32bit ACTLR is obviously 32bit.
For 32bit software, the extra bits are accessible via ACTLR2... which
KVM doesn't emulate.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200529150656.7339-3-james.morse@arm.com
|
|
aarch32 has pairs of registers to access the high and low parts of 64bit
registers. KVM has a union of 64bit sys_regs[] and 32bit copro[]. The
32bit accessors read the high or low part of the 64bit sys_reg[] value
through the union.
Both sys_reg_descs[] and cp15_regs[] list access_csselr() as the accessor
for CSSELR{,_EL1}. access_csselr() is only aware of the 64bit sys_regs[],
and expects r->reg to be 'CSSELR_EL1' in the enum, index 2 of the 64bit
array.
cp15_regs[] uses the 32bit copro[] alias of sys_regs[]. Here CSSELR is
c0_CSSELR which is the same location in sys_reg[]. r->reg is 'c0_CSSELR',
index 4 in the 32bit array.
access_csselr() uses the 32bit r->reg value to access the 64bit array,
so reads and write the wrong value. sys_regs[4], is ACTLR_EL1, which
is subsequently save/restored when we enter the guest.
ACTLR_EL1 is supposed to be read-only for the guest. This register
only affects execution at EL1, and the host's value is restored before
we return to host EL1.
Convert the 32bit register index back to the 64bit version.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200529150656.7339-2-james.morse@arm.com
|
|
This code calls brelse(bh) and then dereferences "bh" on the next line
resulting in a possible use after free. The brelse() should just be
moved down a line.
Fixes: b676fdbcf4c8 ("exfat: standardize checksum calculation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
There is check error in range condition that can never be entered
even with invalid input.
Replace incorrent checking code with already existing valid checker.
Signed-off-by: hyeongseok.kim <hyeongseok@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
At truncate, there is a problem of incorrect updating in the file entry
pointer instead of stream entry. This will cause the problem of
overwriting the time field of the file entry to new_size. Fix it to
update stream entry.
Fixes: 98d917047e8b ("exfat: add file operations")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
butt3rflyh4ck reported memory leak found by syzkaller.
A param->string held by exfat_mount_options.
BUG: memory leak
unreferenced object 0xffff88801972e090 (size 8):
comm "syz-executor.2", pid 16298, jiffies 4295172466 (age 14.060s)
hex dump (first 8 bytes):
6b 6f 69 38 2d 75 00 00 koi8-u..
backtrace:
[<000000005bfe35d6>] kstrdup+0x36/0x70 mm/util.c:60
[<0000000018ed3277>] exfat_parse_param+0x160/0x5e0
fs/exfat/super.c:276
[<000000007680462b>] vfs_parse_fs_param+0x2b4/0x610
fs/fs_context.c:147
[<0000000097c027f2>] vfs_parse_fs_string+0xe6/0x150
fs/fs_context.c:191
[<00000000371bf78f>] generic_parse_monolithic+0x16f/0x1f0
fs/fs_context.c:231
[<000000005ce5eb1b>] do_new_mount fs/namespace.c:2812 [inline]
[<000000005ce5eb1b>] do_mount+0x12bb/0x1b30 fs/namespace.c:3141
[<00000000b642040c>] __do_sys_mount fs/namespace.c:3350 [inline]
[<00000000b642040c>] __se_sys_mount fs/namespace.c:3327 [inline]
[<00000000b642040c>] __x64_sys_mount+0x18f/0x230 fs/namespace.c:3327
[<000000003b024e98>] do_syscall_64+0xf6/0x7d0
arch/x86/entry/common.c:295
[<00000000ce2b698c>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
exfat_free() should call exfat_free_iocharset(), to prevent a leak
in case we fail after parsing iocharset= but before calling
get_tree_bdev().
Additionally, there's no point copying param->string in
exfat_parse_param() - just steal it, leaving NULL in param->string.
That's independent from the leak or fix thereof - it's simply
avoiding an extra copy.
Fixes: 719c1e182916 ("exfat: add super block operations")
Cc: stable@vger.kernel.org # v5.7
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
kbuild test robot reported :
fs/exfat/nls.c:531:22: warning: Variable 'p_uniname->name_len'
is reassigned a value before the old one has been used.
The reassignment of p_uniname->name_len is not needed and remove it.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
To clarify that it is a 16-bit checksum, the parts related to the 16-bit
checksum are renamed and change type to u16.
Furthermore, replace checksum calculation in exfat_load_upcase_table()
with exfat_calc_checksum32().
Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
Add Boot-Regions verification specified in exFAT specification.
Note that the checksum type is strongly related to the raw structure,
so the'u32 'type is used to clarify the number of bits.
Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
Separate the boot sector analysis to read_boot_sector().
And add a check for the fs_name field.
Furthermore, add a strict consistency check, because overlapping areas
can cause serious corruption.
Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
Aggregate PBR related definitions and redefine as "boot_sector" to comply
with the exFAT specification.
And, rename variable names including 'pbr'.
Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
Optimize directory access based on exfat_entry_set_cache.
- Hold bh instead of copied d-entry.
- Modify bh->data directly instead of the copied d-entry.
- Write back the retained bh instead of rescanning the d-entry-set.
And
- Remove unused cache related definitions.
Signed-off-by: Tetsuhiro Kohada <kohada.tetsuhiro@dc.mitsubishielectric.co.jp>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
Replace time_ms with time_cs in the file directory entry structure
and related functions.
The unit of create_time_ms/modify_time_ms in File Directory Entry are not
'milli-second', but 'centi-second'.
The exfat specification uses the term '10ms', but instead use 'cs' as in
msdos_fs.h.
Signed-off-by: Tetsuhiro Kohada <kohada.t2@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
There is no need to init 'sync' in exfat_set_vol_flags().
This also fixes the following coccicheck warning:
fs/exfat/super.c:104:6-10: WARNING: Assignment of 0/1 to bool variable
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
After applying previous two patches, these functions are not used anymore.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
Function partial_name_hash() takes long type value into which can be stored
one Unicode code point. Therefore conversion from UTF-32 to UTF-16 is not
needed.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
- Use consistent capitalization for "exFAT".
- Fix grammar,
- Split long sentence.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
Remove the direct use of KERN_<LEVEL> in functions by creating
separate exfat_<level> macros.
Miscellanea:
o Remove several unnecessary terminating newlines in formats
o Realign arguments and fit to 80 columns where appropriate
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
If two Unicode code points represented in UTF-16 are different then also
their UTF-32 representation must be different. Therefore conversion from
UTF-32 to UTF-16 is not needed.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
This adds more IOs to attach flags.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds another way to attach bio flags to node writes.
Description: Give a way to attach REQ_META|FUA to node writes
given temperature-based bits. Now the bits indicate:
* REQ_META | REQ_FUA |
* 5 | 4 | 3 | 2 | 1 | 0 |
* Cold | Warm | Hot | Cold | Warm | Hot |
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just cleanup, no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If mountpoint is readonly, we should allow shutdowning filesystem
successfully, this fixes issue found by generic/599 testcase of
xfstest.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If the dentry name passed to ->d_compare() fits in dentry::d_iname, then
it may be concurrently modified by a rename. This can cause undefined
behavior (possibly out-of-bounds memory accesses or crashes) in
utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings
that may be concurrently modified.
Fix this by first copying the filename to a stack buffer if needed.
This way we get a stable snapshot of the filename.
Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
Cc: <stable@vger.kernel.org> # v5.4+
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Rosenberg <drosen@google.com>
Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
kmalloc'ed or vmalloc'ed memory. But the f2fs wrappers, f2fs_kmalloc()
and f2fs_kvmalloc(), both return both kinds of memory.
It's redundant to have two functions that do the same thing, and also
breaking the standard naming convention is causing bugs since people
assume it's safe to kfree() memory allocated by f2fs_kmalloc(). See
e.g. the various allocations in fs/f2fs/compress.c.
Fix this by making f2fs_kmalloc() just use kmalloc(). And to avoid
re-introducing the allocation failures that the vmalloc fallback was
intended to fix, convert the largest allocations to use f2fs_kvmalloc().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
"This introduces device managed versions of functions used to register
remoteproc devices, add support for remoteproc driver specific
resource control, enables remoteproc drivers to specify ELF class and
machine for coredumps. It integrates pm_runtime in the core for
keeping resources active while the remote is booted and holds a wake
source while recoverying a remote processor after a firmware crash.
It refactors the remoteproc device's allocation path to simplify the
logic, fix a few cleanup bugs and to not clone const strings onto the
heap. Debugfs code is simplifies using the DEFINE_SHOW_ATTRIBUTE and a
zero-length array is replaced with flexible-array.
A new remoteproc driver for the JZ47xx VPU is introduced, the Qualcomm
SM8250 gains support for audio, compute and sensor remoteprocs and the
Qualcomm SC7180 modem support is cleaned up and improved.
The Qualcomm glink subsystem-restart driver is merged into the main
glink driver, the Qualcomm sysmon driver is extended to properly
notify remote processors about all other remote processors' state
transitions"
* tag 'rproc-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc: (43 commits)
remoteproc: Fix an error code in devm_rproc_alloc()
MAINTAINERS: Add myself as reviewer for Ingenic rproc driver
remoteproc: ingenic: Added remoteproc driver
remoteproc: Add support for runtime PM
dt-bindings: Document JZ47xx VPU auxiliary processor
remoteproc: wcss: Fix arguments passed to qcom_add_glink_subdev()
remoteproc: Fix and restore the parenting hierarchy for vdev
remoteproc: Fall back to using parent memory pool if no dedicated available
remoteproc: Replace zero-length array with flexible-array
remoteproc: wcss: add support for rpmsg communication
remoteproc: core: Prevent system suspend during remoteproc recovery
remoteproc: qcom_q6v5_mss: Remove unused q6v5_da_to_va function
remoteproc: qcom_q6v5_mss: map/unmap mpss segments before/after use
remoteproc: qcom_q6v5_mss: Drop accesses to MPSS PERPH register space
dt-bindings: remoteproc: qcom: Replace halt-nav with spare-regs
remoteproc: qcom: pas: Add SM8250 PAS remoteprocs
dt-bindings: remoteproc: qcom: pas: Add SM8250 remoteprocs
remoteproc: qcom_q6v5_mss: Extract mba/mpss from memory-region
dt-bindings: remoteproc: qcom: Use memory-region to reference memory
remoteproc: qcom: pas: Add SC7180 Modem support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"This replaces a zero-length array with flexible-array and fixes a typo
in a comment in the rpmsg core"
* tag 'rpmsg-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
rpmsg: Replace zero-length array with flexible-array
rpmsg: fix a comment typo for rpmsg_device_match()
|
|
Pull ceph updates from Ilya Dryomov:
"The highlights are:
- OSD/MDS latency and caps cache metrics infrastructure for the
filesytem (Xiubo Li). Currently available through debugfs and will
be periodically sent to the MDS in the future.
- support for replica reads (balanced and localized reads) for rbd
and the filesystem (myself). The default remains to always read
from primary, users can opt-in with the new crush_location and
read_from_replica options. Note that reading from replica is safe
for general use only since Octopus.
- support for RADOS allocation hint flags (myself). Currently used by
rbd to propagate the compressible/incompressible hint given with
the new compression_hint map option and ready for passing on more
advanced hints, e.g. based on fadvise() from the filesystem.
- support for efficient cross-quota-realm renames (Luis Henriques)
- assorted cap handling improvements and cleanups, particularly
untangling some of the locking (Jeff Layton)"
* tag 'ceph-for-5.8-rc1' of git://github.com/ceph/ceph-client: (29 commits)
rbd: compression_hint option
libceph: support for alloc hint flags
libceph: read_from_replica option
libceph: support for balanced and localized reads
libceph: crush_location infrastructure
libceph: decode CRUSH device/bucket types and names
libceph: add non-asserting rbtree insertion helper
ceph: skip checking caps when session reconnecting and releasing reqs
ceph: make sure mdsc->mutex is nested in s->s_mutex to fix dead lock
ceph: don't return -ESTALE if there's still an open file
libceph, rbd: replace zero-length array with flexible-array
ceph: allow rename operation under different quota realms
ceph: normalize 'delta' parameter usage in check_quota_exceeded
ceph: ceph_kick_flushing_caps needs the s_mutex
ceph: request expedited service on session's last cap flush
ceph: convert mdsc->cap_dirty to a per-session list
ceph: reset i_requested_max_size if file write is not wanted
ceph: throw a warning if we destroy session with mutex still locked
ceph: fix potential race in ceph_check_caps
ceph: document what protects i_dirty_item and i_flushing_item
...
|
|
io_uring is the only user of io-wq, and now it uses only io-wq callback
for all its requests, namely io_wq_submit_work(). Instead of storing
work->runner callback in each instance of io_wq_work, keep it in io-wq
itself.
pros:
- reduces io_wq_work size
- more robust -- ->func won't be invalidated with mem{cpy,set}(req)
- helps other work
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove io_link_work_cb() -- the last custom work.func.
Not the prettiest thing, but works. Instead of queueing a linked timeout
in io_link_work_cb() mark a request with REQ_F_QUEUE_TIMEOUT and do
enqueueing based on the flag in io_wq_submit_work().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation of getting rid of work.func, this removes almost all
custom instances of it, leaving only io_wq_submit_work() and
io_link_work_cb(). And the last one will be dealt later.
Nothing fancy, just routinely remove *_finish() function and inline
what's left. E.g. remove io_fsync_finish() + inline __io_fsync() into
io_fsync().
As no users of io_req_cancelled() are left, delete it as well. The patch
adds extra switch lookup on cold-ish path, but that's overweighted by
nice diffstat and other benefits of the following patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Relying on having a specific work.func is dangerous, even if an opcode
handler set it itself. E.g. io_wq_assign_next() can modify it.
io_close() sets a custom work.func to indicate that
__close_fd_get_file() was already called. Fortunately, there is no bugs
with io_wq_assign_next() and close yet.
Still, do it safe and always be prepared to be called through
io_wq_submit_work(). Zero req->close.put_file in prep, and call
__close_fd_get_file() IFF it's NULL.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- An iopen glock locking scheme rework that speeds up deletes of inodes
accessed from multiple nodes
- Various bug fixes and debugging improvements
- Convert gfs2-glocks.txt to ReST
* tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: fix use-after-free on transaction ail lists
gfs2: new slab for transactions
gfs2: initialize transaction tr_ailX_lists earlier
gfs2: Smarter iopen glock waiting
gfs2: Wake up when setting GLF_DEMOTE
gfs2: Check inode generation number in delete_work_func
gfs2: Move inode generation number check into gfs2_inode_lookup
gfs2: Minor gfs2_lookup_by_inum cleanup
gfs2: Try harder to delete inodes locally
gfs2: Give up the iopen glock on contention
gfs2: Turn gl_delete into a delayed work
gfs2: Keep track of deleted inode generations in LVBs
gfs2: Allow ASPACE glocks to also have an lvb
gfs2: instrumentation wrt log_flush stuck
gfs2: introduce new gfs2_glock_assert_withdraw
gfs2: print mapping->nrpages in glock dump for address space glocks
gfs2: Only do glock put in gfs2_create_inode for free inodes
gfs2: Allow lock_nolock mount to specify jid=X
gfs2: Don't ignore inode write errors during inode_go_sync
docs: filesystems: convert gfs2-glocks.txt to ReST
|
|
Commit 067c650c456e ("dtc: Use pkg-config to locate libyaml") added
'pkg-config --libs' to link libyaml installed in a non-standard
location.
yamltree.c includes <yaml.h>, but that commit did not add the search
path for <yaml.h>. If /usr/include/yaml.h does not exist, it fails to
build. A user can explicitly pass HOSTCFLAGS to work around it, but
the policy is not consistent.
There are two ways to deal with libraries in a non-default location.
[1] Use HOSTCFLAGS and HOSTLDFLAGS for additional search paths for
headers and libraries.
They are documented in Documentation/kbuild/kbuild.rst
$ make HOSTCFLAGS='-I <prefix>/include' HOSTLDFLAGS='-L <prefix>/lib'
[2] Use pkg-config
'pkg-config --cflags' for querying the header search path
'pkg-config --libs' for querying the lib and its path
If we go with pkg-config, use [2] consistently. Do not mix up
[1] and [2].
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for multi-function devices in pci code.
- Enable PF-VF linking for architectures using the pdev->no_vf_scan
flag (currently just s390).
- Add reipl from NVMe support.
- Get rid of critical section cleanup in entry.S.
- Refactor PNSO CHSC (perform network subchannel operation) in cio and
qeth.
- QDIO interrupts and error handling fixes and improvements, more
refactoring changes.
- Align ioremap() with generic code.
- Accept requests without the prefetch bit set in vfio-ccw.
- Enable path handling via two new regions in vfio-ccw.
- Other small fixes and improvements all over the code.
* tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits)
vfio-ccw: make vfio_ccw_regops variables declarations static
vfio-ccw: Add trace for CRW event
vfio-ccw: Wire up the CRW irq and CRW region
vfio-ccw: Introduce a new CRW region
vfio-ccw: Refactor IRQ handlers
vfio-ccw: Introduce a new schib region
vfio-ccw: Refactor the unregister of the async regions
vfio-ccw: Register a chp_event callback for vfio-ccw
vfio-ccw: Introduce new helper functions to free/destroy regions
vfio-ccw: document possible errors
vfio-ccw: Enable transparent CCW IPL from DASD
s390/pci: Log new handle in clp_disable_fh()
s390/cio, s390/qeth: cleanup PNSO CHSC
s390/qdio: remove q->first_to_kick
s390/qdio: fix up qdio_start_irq() kerneldoc
s390: remove critical section cleanup from entry.S
s390: add machine check SIGP
s390/pci: ioremap() align with generic code
s390/ap: introduce new ap function ap_get_qdev()
Documentation/s390: Update / remove developerWorks web links
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"A big part of this is a change in how devices get connected to IOMMUs
in the core code. It contains the change from the old add_device() /
remove_device() to the new probe_device() / release_device()
call-backs.
As a result functionality that was previously in the IOMMU drivers has
been moved to the IOMMU core code, including IOMMU group allocation
for each device. The reason for this change was to get more robust
allocation of default domains for the iommu groups.
A couple of fixes were necessary after this was merged into the IOMMU
tree, but there are no known bugs left. The last fix is applied on-top
of the merge commit for the topic branches.
Other than that change, we have:
- Removal of the driver private domain handling in the Intel VT-d
driver. This was fragile code and I am glad it is gone now.
- More Intel VT-d updates from Lu Baolu:
- Nested Shared Virtual Addressing (SVA) support to the Intel VT-d
driver
- Replacement of the Intel SVM interfaces to the common IOMMU SVA
API
- SVA Page Request draining support
- ARM-SMMU Updates from Will:
- Avoid mapping reserved MMIO space on SMMUv3, so that it can be
claimed by the PMU driver
- Use xarray to manage ASIDs on SMMUv3
- Reword confusing shutdown message
- DT compatible string updates
- Allow implementations to override the default domain type
- A new IOMMU driver for the Allwinner Sun50i platform
- Support for ATS gets disabled for untrusted devices (like
Thunderbolt devices). This includes a PCI patch, acked by Bjorn.
- Some cleanups to the AMD IOMMU driver to make more use of IOMMU
core features.
- Unification of some printk formats in the Intel and AMD IOMMU
drivers and in the IOVA code.
- Updates for DT bindings
- A number of smaller fixes and cleanups.
* tag 'iommu-updates-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (109 commits)
iommu: Check for deferred attach in iommu_group_do_dma_attach()
iommu/amd: Remove redundant devid checks
iommu/amd: Store dev_data as device iommu private data
iommu/amd: Merge private header files
iommu/amd: Remove PD_DMA_OPS_MASK
iommu/amd: Consolidate domain allocation/freeing
iommu/amd: Free page-table in protection_domain_free()
iommu/amd: Allocate page-table in protection_domain_init()
iommu/amd: Let free_pagetable() not rely on domain->pt_root
iommu/amd: Unexport get_dev_data()
iommu/vt-d: Fix compile warning
iommu/vt-d: Remove real DMA lookup in find_domain
iommu/vt-d: Allocate domain info for real DMA sub-devices
iommu/vt-d: Only clear real DMA device's context entries
iommu: Remove iommu_sva_ops::mm_exit()
uacce: Remove mm_exit() op
iommu/sun50i: Constify sun50i_iommu_ops
iommu/hyper-v: Constify hyperv_ir_domain_ops
iommu/vt-d: Use pci_ats_supported()
iommu/arm-smmu-v3: Use pci_ats_supported()
...
|
|
Pull drm msm updates from Dave Airlie:
"This tree has been in next for a couple of weeks, but Rob missed an
arm32 build issue, so I was awaiting the tree with a patch reverted.
- new gpu support: a405, a640, a650
- dpu: color processing support
- mdp5: support for msm8x36 (the thing with a405)
- some prep work for per-context pagetables (ie the part that does
not depend on in-flight iommu patches)
- last but not least, UABI update for submit ioctl to support syncobj
(from Bas)"
* tag 'drm-next-msm-5.8-2020-06-08' of git://anongit.freedesktop.org/drm/drm: (30 commits)
Revert "drm/msm/dpu: add support for clk and bw scaling for display"
drm/msm/a6xx: skip HFI set freq if GMU is powered down
drm/msm: Update the MMU helper function APIs
drm/msm: Refactor address space initialization
drm/msm: Attach the IOMMU device during initialization
drm/msm/dpu: dpu_setup_dspp_pcc() can be static
drm/msm/a6xx: a6xx_hfi_send_start() can be static
drm/msm/a4xx: add a405_registers for a405 device
drm/msm/a4xx: add adreno a405 support
drm/msm/a6xx: update a6xx_hw_init for A640 and A650
drm/msm/a6xx: enable GMU log
drm/msm/a6xx: update pdc/rscc GMU registers for A640/A650
drm/msm/a6xx: A640/A650 GMU firmware path
drm/msm/a6xx: HFI v2 for A640 and A650
drm/msm/a6xx: add A640/A650 to gpulist
drm/msm/a6xx: use msm_gem for GMU memory objects
drm/msm: add internal MSM_BO_MAP_PRIV flag
drm/msm: add msm_gem_get_and_pin_iova_range
drm/msm: Check for powered down HW in the devfreq callbacks
drm/msm/dpu: update bandwidth threshold check
...
|
|
Pull drm fixes from Dave Airlie:
"These are the fixes from last week for the stuff merged in the merge
window. It got a bunch of nouveau fixes for HDA audio on some new
GPUs, some i915 and some amdpgu fixes.
i915:
- gvt: Fix one clang warning on debug only function
- Use ARRAY_SIZE for coccicheck warning
- Use after free fix for display global state.
- Whitelisting context-local timestamp on Gen9 and two scheduler
fixes with deps (Cc: stable)
- Removal of write flag from sysfs files where ineffective
nouveau:
- HDMI/DP audio HDA fixes
- display hang fix for Volta/Turing
- GK20A regression fix.
amdgpu:
- Prevent hwmon accesses while GPU is in reset
- CTF interrupt fix
- Backlight fix for renoir
- Fix for display sync groups
- Display bandwidth validation workaround"
* tag 'drm-next-2020-06-08' of git://anongit.freedesktop.org/drm/drm: (28 commits)
drm/nouveau/kms/nv50-: clear SW state of disabled windows harder
drm/nouveau: gr/gk20a: Use firmware version 0
drm/nouveau/disp/gm200-: detect and potentially disable HDA support on some SORs
drm/nouveau/disp/gp100: split SOR implementation from gm200
drm/nouveau/disp: modify OR allocation policy to account for HDA requirements
drm/nouveau/disp: split part of OR allocation logic into a function
drm/nouveau/disp: provide hint to OR allocation about HDA requirements
drm/amd/display: Revalidate bandwidth before commiting DC updates
drm/amdgpu/display: use blanked rather than plane state for sync groups
drm/i915/params: fix i915.fake_lmem_start module param sysfs permissions
drm/i915/params: don't expose inject_probe_failure in debugfs
drm/i915: Whitelist context-local timestamp in the gen9 cmdparser
drm/i915: Fix global state use-after-frees with a refcount
drm/i915: Check for awaits on still currently executing requests
drm/i915/gt: Do not schedule normal requests immediately along virtual
drm/i915: Reorder await_execution before await_request
drm/nouveau/kms/gt215-: fix race with audio driver runpm
drm/nouveau/disp/gm200-: fix NV_PDISP_SOR_HDMI2_CTRL(n) selection
Revert "drm/amd/display: disable dcn20 abm feature for bring up"
drm/amd/powerplay: ack the SMUToHost interrupt on receive V2
...
|
|
Merge still more updates from Andrew Morton:
"Various trees. Mainly those parts of MM whose linux-next dependents
are now merged. I'm still sitting on ~160 patches which await merges
from -next.
Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug,
panic, lib, sysctl, mm/gup, mm/pagemap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (52 commits)
doc: cgroup: update note about conditions when oom killer is invoked
module: move the set_fs hack for flush_icache_range to m68k
nommu: use flush_icache_user_range in brk and mmap
binfmt_flat: use flush_icache_user_range
exec: use flush_icache_user_range in read_code
exec: only build read_code when needed
m68k: implement flush_icache_user_range
arm: rename flush_cache_user_range to flush_icache_user_range
xtensa: implement flush_icache_user_range
sh: implement flush_icache_user_range
asm-generic: add a flush_icache_user_range stub
mm: rename flush_icache_user_range to flush_icache_user_page
arm,sparc,unicore32: remove flush_icache_user_range
riscv: use asm-generic/cacheflush.h
powerpc: use asm-generic/cacheflush.h
openrisc: use asm-generic/cacheflush.h
m68knommu: use asm-generic/cacheflush.h
microblaze: use asm-generic/cacheflush.h
ia64: use asm-generic/cacheflush.h
hexagon: use asm-generic/cacheflush.h
...
|
|
Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
back to the charge path") cgroup oom killer is no longer invoked only
from page faults. Now it implements the same semantics as global OOM
killer: allocation context invokes OOM killer and keeps retrying until
success.
[akpm@linux-foundation.org: fixes per Randy]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/158894738928.208854.5244393925922074518.stgit@buzz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
flush_icache_range generally operates on kernel addresses, but for some
reason m68k needed a set_fs override. Move that into the m68k code
insted of keeping it in the module loader.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/20200515143646.3857579-30-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|