Age | Commit message (Collapse) | Author |
|
Remove compat vdso code, since there is hardly any compat user space
left. Still existing compat user space will have to use system calls
instead.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The livepatch consistency model requires reliable stack tracing
architecture support in order to work properly. In order to achieve
this, two main issues have to be solved. First, reliable and consistent
call chain backtracing has to be ensured. Second, the unwinder needs to
be able to detect stack corruptions and return errors.
The "zSeries ELF Application Binary Interface Supplement" says:
"The stack pointer points to the first word of the lowest allocated
stack frame. If the "back chain" is implemented this word will point to
the previously allocated stack frame (towards higher addresses), except
for the first stack frame, which shall have a back chain of zero (NULL).
The stack shall grow downwards, in other words towards lower addresses."
"back chain" is optional. GCC option -mbackchain enables it. Quoting
Martin Schwidefsky [1]:
"The compiler is called with the -mbackchain option, all normal C
function will store the backchain in the function prologue. All
functions written in assembler code should do the same, if you find one
that does not we should fix that. The end result is that a task that
*voluntarily* called schedule() should have a proper backchain at all
times.
Dependent on the use case this may or may not be enough. Asynchronous
interrupts may stop the CPU at the beginning of a function, if kernel
preemption is enabled we can end up with a broken backchain. The
production kernels for IBM Z are all compiled *without* kernel
preemption. So yes, we might get away without the objtool support.
On a side-note, we do have a line item to implement the ORC unwinder for
the kernel, that includes the objtool support. Once we have that we can
drop the -mbackchain option for the kernel build. That gives us a nice
little performance benefit. I hope that the change from backchain to the
ORC unwinder will not be too hard to implement in the livepatch tools."
Since -mbackchain is enabled by default when the kernel is compiled, the
call chain backtracing should be currently ensured and objtool should
not be necessary for livepatch purposes.
Regarding the second issue, stack corruptions and non-reliable states
have to be recognized by the unwinder. Mainly it means to detect
preemption or page faults, the end of the task stack must be reached,
return addresses must be valid text addresses and hacks like function
graph tracing and kretprobes must be properly detected.
Unwinding a running task's stack is not a problem, because there is a
livepatch requirement that every checked task is blocked, except for the
current task. Due to that, the implementation can be much simpler
compared to the existing non-reliable infrastructure. We can consider a
task's kernel/thread stack only and skip the other stacks.
[1] 20180912121106.31ffa97c@mschwideX1 [not archived on lore.kernel.org]
Link: https://lkml.kernel.org/r/20191106095601.29986-5-mbenes@suse.cz
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
ABI requires SP to be aligned 8 bytes, report unwinding error otherwise.
Link: https://lkml.kernel.org/r/20191106095601.29986-5-mbenes@suse.cz
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Currently unwinder unconditionally returns %r14 from the first frame
pointed by %r15 from pt_regs. A task could be interrupted when a function
already allocated this frame (if it needs it) for its callees or to
store local variables. In that case this frame would contain random
values from stack or values stored there by a callee. As we are only
interested in %r14 to get potential return address, skip bogus return
addresses which doesn't belong to kernel text.
This helps to avoid duplicating filtering logic in unwider users, most
of which use unwind_get_return_address() and would choke on bogus 0
address returned by it otherwise.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
A comment in arch/s390/include/asm/unwind.h says:
> If 'first_frame' is not zero unwind_start skips unwind frames until it
> reaches the specified stack pointer.
> The end of the unwinding is indicated with unwind_done, this can be true
> right after unwind_start, e.g. with first_frame!=0 that can not be found.
> unwind_next_frame skips to the next frame.
> Once the unwind is completed unwind_error() can be used to check if there
> has been a situation where the unwinder could not correctly understand
> the tasks call chain.
With this change backchain unwinder now comply with behaviour
described. As well as matches orc unwinder implementation. Now unwinder
starts from reliable state, i.e. __unwind_start own stack frame is
taken or stack frame generated by __switch_to (ksp) - both known to be
valid. In case of pt_regs %r15 is better match for pt_regs psw, than
sometimes random "sp" caller passed.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add unwinding from program check handler tests. Unwinder should be able
to unwind through pt_regs stored by program check handler on task stack.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add unwinding from irq context tests. Unwinder should be able to unwind
through irq stack to task stack up to task pt_regs.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add stack name, sp and reliable information into test unwinding
results. Also consider ip outside of kernel text as failure if the
state is reported reliable.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add CALL_ON_STACK helper testing. Tests make sure that we can unwind from
switched stack to original one up to task pt_regs (nodat -> task stack).
UWM_SWITCH_STACK could not be used together with UWM_THREAD because
get_stack_info explicitly restricts unwinding to task stack if
task != current.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
CALL_ON_STACK defines and initializes register variables. Inline
assembly which follows might trigger compiler to generate memory access
for "stack" argument (e.g. in case of S390_lowcore.nodat_stack). This
memory access produces a function call under kasan with outline
instrumentation which clobbers registers.
Switch "stack" argument in CALL_ON_STACK helper to use memory reference
constraint and perform load instead.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Currently unwinder test passes if unwinding results contain unwindme_func2
and unwindme_func1 functions.
Now that unwinder reports success upon reaching task pt_regs, check
that unwinding ended successfully in every test.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
unwind_for_each_frame can take at least 8 different sets of parameters.
Add a test to make sure they all are handled in a sane way.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Co-developed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Always inline get_stack_pointer() to avoid potential problems
due to compiler inlining decisions, i.e. getting stack pointer of
get_stack_pointer() itself which is later reused.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The config option CONFIG_PCI_NR_FUNCTIONS sets a limit on the number of
PCI functions we can support. Previously on reaching this limit there
was no indication why newly attached devices are not recognized by Linux
which could be quite confusing. Thus this patch adds a pr_err() for this
case.
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
When UID checking was turned off during runtime in the underlying
hypervisor, a PCI device may be attached with the same UID. This is
already detected but happens silently. Add an error message so it can
more easily be understood why a device was not added.
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Each SBDT is located at a 4KB page and contains 512 entries.
Each entry of a SDBT points to a SDB, a 4KB page containing
sampled data. The last entry is a link to another SDBT page.
When an event is created the function sequence executed is:
__hw_perf_event_init()
+--> allocate_buffers()
+--> realloc_sampling_buffers()
+---> alloc_sample_data_block()
Both functions realloc_sampling_buffers() and
alloc_sample_data_block() allocate pages and the allocation
can fail. This is handled correctly and all allocated
pages are freed and error -ENOMEM is returned to the
top calling function. Finally the event is not created.
Once the event has been created, the amount of initially
allocated SDBT and SDB can be too low. This is detected
during measurement interrupt handling, where the amount
of lost samples is calculated. If the number of lost samples
is too high considering sampling frequency and already allocated
SBDs, the number of SDBs is enlarged during the next execution
of cpumsf_pmu_enable().
If more SBDs need to be allocated, functions
realloc_sampling_buffers()
+---> alloc-sample_data_block()
are called to allocate more pages. Page allocation may fail
and the returned error is ignored. A SDBT and SDB setup
already exists.
However the modified SDBTs and SDBs might end up in a situation
where the first entry of an SDBT does not point to an SDB,
but another SDBT, basicly an SBDT without payload.
This can not be handled by the interrupt handler, where an SDBT
must have at least one entry pointing to an SBD.
Add a check to avoid SDBTs with out payload (SDBs) when enlarging
the buffer setup.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The macro TEAR_REG() saves the last used SDBT address
in the perf_hw_event structure. This is also done
by function hw_reset_registers() which is a one-liner
and simply uses macro TEAR_REG(). Remove function
hw_reset_registers(), which is only used one time and use
macro TEAR_REG() instead. This macro is used throughout
the code anyway.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In interrupt handling the function extend_sampling_buffer()
is called after checking for a possibly extension.
This check is not necessary as the called function itself
performs this check again.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Replace hard coded function names in debug statements
by the "%s ...", __func__ construct suggested by checkpatch.pl
script. Use consistent debug print format of the form variable
blank value. Also add leading 0x for all hex values.
Print allocated page addresses consistantly as hex numbers
with leading 0x.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The KASLR offset is added to vmcoreinfo in arch_crash_save_vmcoreinfo(),
so that it can be found by crash when processing kernel dumps.
However, arch_crash_save_vmcoreinfo() is called during a subsys_initcall,
so if the kernel crashes before that, we have no vmcoreinfo and no KASLR
offset.
Fix this by storing the KASLR offset in the lowcore, where the vmcore_info
pointer will be stored, and where it can be found by crash. In order to
make it distinguishable from a real vmcore_info pointer, mark it as uneven
(KASLR offset itself is aligned to THREAD_SIZE).
When arch_crash_save_vmcoreinfo() stores the real vmcore_info pointer in
the lowcore, it overwrites the KASLR offset. At that point, the KASLR
offset is not yet added to vmcoreinfo, so we also need to move the
mem_assign_absolute() behind the vmcoreinfo_append_str().
Fixes: b2d24b97b2a9 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
Cc: <stable@vger.kernel.org> # v5.2+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Consider reaching task pt_regs graceful unwinder termination. Task
pt_regs itself never contains a valid state to which a task might return
within the kernel context (user task pt_regs is a special case). Since
we already avoid printing user task pt_regs and in most cases we don't
even bother filling task pt_regs psw and r15 with something reasonable
simply skip task pt_regs altogether. With this change unwind_error() now
accurately represent whether unwinder reached task pt_regs successfully
or failed along the way.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add missing allocation of pt_regs at the bottom of the stack. This
makes it consistent with other stack setup cases and also what stack
unwinder expects.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Currently unwinder yields 2 entries when pt_regs are met:
sp="address of pt_regs itself" ip=pt_regs->psw
sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]"
And neither of those 2 states (combination of sp and ip) ever happened.
reuse_sp has been introduced by commit a1d863ac3e10 ("s390/unwind: fix
mixing regs and sp"). reuse_sp=true makes unwinder keen to produce the
following result, when pt_regs are given (as an arg to unwind_start):
sp=pt_regs->gprs[15] ip=pt_regs->psw
sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]"
The first state is an actual state in which a task was when pt_regs were
collected. The second state is marked unreliable and is for debugging
purposes to cover the case when a task has been interrupted in between
stack frame allocation and writing back_chain - in this case r14 might
show an actual caller.
Make unwinder behaviour enabled via reuse_sp=true default and drop the
special case handling.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
If unwinder is looking at pt_regs which is not on stack then something
went wrong and an error has to be reported rather than successful
unwinding termination.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
CALL_ON_STACK is intended to be used for temporary stack switching with
potential return to the caller.
When CALL_ON_STACK is misused to switch from nodat stack to task stack
back_chain information would later lead stack unwinder from task stack into
(per cpu) nodat stack which is reused for other purposes. This would
yield confusing unwinding result or errors.
To avoid that introduce CALL_ON_STACK_NORETURN to be used instead. It
makes sure that back_chain is zeroed and unwinder finishes gracefully
ending up at task pt_regs.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Currently CALL_ON_STACK saves r15 as back_chain in the first stack frame of
the stack we about to switch to. But if a function which uses CALL_ON_STACK
calls other function it allocates a stack frame for a callee. In this
case r15 is pointing to a callee stack frame and not a stack frame of
function itself. This results in dummy unwinding entry with random
sp and ip values.
Introduce and utilize current_frame_address macro to get an address of
actual function stack frame.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Avoid mixture of task == NULL and task == current meaning the same
thing and simply always initialize task with current in unwind_start.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make sure preemption is disabled when temporary switching to nodat
stack with CALL_ON_STACK helper, because nodat stack is per cpu.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
disabled_wait uses _THIS_IP_ and assumes that compiler would inline it.
Make sure this assumption is always correct by utilizing __always_inline.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
getcpu reads the required values for cpu and node with two
instructions. This might lead to an inconsistent result if user space
gets preempted and migrated to a different CPU between the two
instructions.
Fix this by using just a single instruction to read both values at
once.
This is currently rather a theoretical bug, since there is no real
NUMA support available (except for NUMA emulation).
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
When a secondary CPU is brought up it must initialize its control
registers. CPU A which triggers that a secondary CPU B is brought up
stores its control register contents into the lowcore of new CPU B,
which then loads these values on startup.
This is problematic in various ways: the control register which
contains the home space ASCE will correctly contain the kernel ASCE;
however control registers for primary and secondary ASCEs are
initialized with whatever values were present in CPU A.
Typically:
- the primary ASCE will contain the user process ASCE of the process
that triggered onlining of CPU B.
- the secondary ASCE will contain the percpu VDSO ASCE of CPU A.
Due to lazy ASCE handling we may also end up with other combinations.
When then CPU B switches to a different process (!= idle) it will
fixup the primary ASCE. However the problem is that the (wrong) ASCE
from CPU A was loaded into control register 1: as soon as an ASCE is
attached (aka loaded) a CPU is free to generate TLB entries using that
address space.
Even though it is very unlikey that CPU B will actually generate such
entries, this could result in TLB entries of the address space of the
process that ran on CPU A. These entries shouldn't exist at all and
could cause problems later on.
Furthermore the secondary ASCE of CPU B will not be updated correctly.
This means that processes may see wrong results or even crash if they
access VDSO data on CPU B. The correct VDSO ASCE will eventually be
loaded on return to user space as soon as the kernel executed a call
to strnlen_user or an atomic futex operation on CPU B.
Fix both issues by intializing the to be loaded control register
contents with the correct ASCEs and also enforce (re-)loading of the
ASCEs upon first context switch and return to user space.
Fixes: 0aaba41b58bc ("s390: remove all code using the access register mode")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This patch introduces support for a new architectured reply
code 0x8B indicating that a hypervisor layer (if any) has
rejected an ap message.
Linux may run as a guest on top of a hypervisor like zVM
or KVM. So the crypto hardware seen by the ap bus may be
restricted by the hypervisor for example only a subset like
only clear key crypto requests may be supported. Other
requests will be filtered out - rejected by the hypervisor.
The new reply code 0x8B will appear in such cases and needs
to get recognized by the ap bus and zcrypt device driver zoo.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
On s390 bpf_get_stack_raw_tp() returns 0 entries for both kernel and
user stacks. While there is no practical unwinding solution for userspace
on s390 at this moment, there certainly is a kernel unwinder. However,
it is not properly integrated with BPF.
In order to start unwinding, bpf_get_stack_raw_tp() obtains the current
kernel register values using perf_fetch_caller_regs(), which is not
implemented for s390. The actual unwinding then happens by passing those
registers to perf_callchain_kernel().
Implement perf_arch_fetch_caller_regs() for s390, where
__builtin_frame_address(0) points to back_chain.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Adjust PMU device drivers registration to avoid WARN_ON and few other
perf improvements.
- Enhance tracing in vfio-ccw.
- Few stack unwinder fixes and improvements, convert get_wchan custom
stack unwinding to generic api usage.
- Fixes for mm helpers issues uncovered with tests validating
architecture page table helpers.
- Fix noexec bit handling when hardware doesn't support it.
- Fix memleak and unsigned value compared with zero bugs in crypto
code. Minor code simplification.
- Fix crash during kdump with kasan enabled kernel.
- Switch bug and alternatives from asm to asm_inline to improve
inlining decisions.
- Use 'depends on cc-option' for MARCH and TUNE options in Kconfig, add
z13s and z14 ZR1 to TUNE descriptions.
- Minor head64.S simplification.
- Fix physical to logical CPU map for SMT.
- Several cleanups in qdio code.
- Other minor cleanups and fixes all over the code.
* tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
s390/cpumf: Adjust registration of s390 PMU device drivers
s390/smp: fix physical to logical CPU map for SMT
s390/early: move access registers setup in C code
s390/head64: remove unnecessary vdso_per_cpu_data setup
s390/early: move control registers setup in C code
s390/kasan: support memcpy_real with TRACE_IRQFLAGS
s390/crypto: Fix unsigned variable compared with zero
s390/pkey: use memdup_user() to simplify code
s390/pkey: fix memory leak within _copy_apqns_from_user()
s390/disassembler: don't hide instruction addresses
s390/cpum_sf: Assign error value to err variable
s390/cpum_sf: Replace function name in debug statements
s390/cpum_sf: Use consistant debug print format for sampling
s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr()
s390: add error handling to perf_callchain_kernel
s390: always inline current_stack_pointer()
s390/mm: add mm_pxd_folded() checks to pxd_free()
s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
s390/mm: simplify page table helpers for large entries
s390/mm: make pmd/pud_bad() report large entries as bad
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from the arm64-specific bits (core arch and perf, new arm64
selftests), it touches the generic cow_user_page() (reviewed by
Kirill) together with a macro for x86 to preserve the existing
behaviour on this architecture.
Summary:
- On ARMv8 CPUs without hardware updates of the access flag, avoid
failing cow_user_page() on PFN mappings if the pte is old. The
patches introduce an arch_faults_on_old_pte() macro, defined as
false on x86. When true, cow_user_page() makes the pte young before
attempting __copy_from_user_inatomic().
- Covert the synchronous exception handling paths in
arch/arm64/kernel/entry.S to C.
- FTRACE_WITH_REGS support for arm64.
- ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
- Several kselftest cases specific to arm64, together with a
MAINTAINERS update for these files (moved to the ARM64 PORT entry).
- Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
instructions under certain conditions.
- Workaround for Cortex-A57 and A72 errata where the CPU may
speculatively execute an AT instruction and associate a VMID with
the wrong guest page tables (corrupting the TLB).
- Perf updates for arm64: additional PMU topologies on HiSilicon
platforms, support for CCN-512 interconnect, AXI ID filtering in
the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
- GICv3 optimisation to avoid a heavy barrier when accessing the
ICC_PMR_EL1 register.
- ELF HWCAP documentation updates and clean-up.
- SMC calling convention conduit code clean-up.
- KASLR diagnostics printed during boot
- NVIDIA Carmel CPU added to the KPTI whitelist
- Some arm64 mm clean-ups: use generic free_initrd_mem(), remove
stale macro, simplify calculation in __create_pgd_mapping(), typos.
- Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
endinanness to help with allmodconfig"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64: Kconfig: add a choice for endianness
kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous"
arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE
MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry
arm64: kaslr: Check command line before looking for a seed
arm64: kaslr: Announce KASLR status on boot
kselftest: arm64: fake_sigreturn_misaligned_sp
kselftest: arm64: fake_sigreturn_bad_size
kselftest: arm64: fake_sigreturn_duplicated_fpsimd
kselftest: arm64: fake_sigreturn_missing_fpsimd
kselftest: arm64: fake_sigreturn_bad_size_for_magic0
kselftest: arm64: fake_sigreturn_bad_magic
kselftest: arm64: add helper get_current_context
kselftest: arm64: extend test_init functionalities
kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
kselftest: arm64: mangle_pstate_invalid_daif_bits
kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
kselftest: arm64: extend toplevel skeleton Makefile
drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform
arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest KUnit support gtom Shuah Khan:
"This adds KUnit, a lightweight unit testing and mocking framework for
the Linux kernel from Brendan Higgins.
KUnit is not an end-to-end testing framework. It is currently
supported on UML and sub-systems can write unit tests and run them in
UML env. KUnit documentation is included in this update.
In addition, this Kunit update adds 3 new kunit tests:
- proc sysctl test from Iurii Zaikin
- the 'list' doubly linked list test from David Gow
- ext4 tests for decoding extended timestamps from Iurii Zaikin
In the future KUnit will be linked to Kselftest framework to provide a
way to trigger KUnit tests from user-space"
* tag 'linux-kselftest-5.5-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (23 commits)
lib/list-test: add a test for the 'list' doubly linked list
ext4: add kunit test for decoding extended timestamps
Documentation: kunit: Fix verification command
kunit: Fix '--build_dir' option
kunit: fix failure to build without printk
MAINTAINERS: add proc sysctl KUnit test to PROC SYSCTL section
kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()
MAINTAINERS: add entry for KUnit the unit testing framework
Documentation: kunit: add documentation for KUnit
kunit: defconfig: add defconfigs for building KUnit tests
kunit: tool: add Python wrappers for running KUnit tests
kunit: test: add tests for KUnit managed resources
kunit: test: add the concept of assertions
kunit: test: add tests for kunit test abort
kunit: test: add support for test abort
objtool: add kunit_try_catch_throw to the noreturn list
kunit: test: add initial tests
lib: enable building KUnit in lib/
kunit: test: add the concept of expectations
kunit: test: add assertion printing library
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This consists of several fixes to tests and framework.
Masami Hiramatsu fixed several tests to build and run correctly on arm
and other 32bit architectures"
* tag 'linux-kselftest-5.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: sync: Fix cast warnings on arm
selftests: net: Fix printf format warnings on arm
selftests: net: Use size_t and ssize_t for counting file size
selftests: vm: Build/Run 64bit tests only on 64bit arch
selftests: proc: Make va_max 1MB
kselftest: Fix NULL INSTALL_PATH for TARGETS runlist
selftests: Move kselftest_module.sh into kselftest/
selftests: gen_kselftest_tar.sh: Do not clobber kselftest/
selftests: breakpoints: Fix a typo of function name
selftests: Fix O= and KBUILD_OUTPUT handling for relative paths
|
|
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
"Expose the fs-verity bit through statx()"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
docs: fs-verity: mention statx() support
f2fs: support STATX_ATTR_VERITY
ext4: support STATX_ATTR_VERITY
statx: define STATX_ATTR_VERITY
docs: fs-verity: document first supported kernel version
|
|
Pull fscrypt updates from Eric Biggers:
- Add the IV_INO_LBLK_64 encryption policy flag which modifies the
encryption to be optimized for UFS inline encryption hardware.
- For AES-128-CBC, use the crypto API's implementation of ESSIV (which
was added in 5.4) rather than doing ESSIV manually.
- A few other cleanups.
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
f2fs: add support for IV_INO_LBLK_64 encryption policies
ext4: add support for IV_INO_LBLK_64 encryption policies
fscrypt: add support for IV_INO_LBLK_64 policies
fscrypt: avoid data race on fscrypt_mode::logged_impl_name
docs: ioctl-number: document fscrypt ioctl numbers
fscrypt: zeroize fscrypt_info before freeing
fscrypt: remove struct fscrypt_ctx
fscrypt: invoke crypto API for ESSIV handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull AFFS updates from David Sterba:
"A minor bugfix and cleanup for AFFS"
* tag 'affs-for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: fix a memory leak in affs_remount
affs: Replace binary semaphores with mutexes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible changes:
- new block group profiles: RAID1 with 3- and 4- copies
- RAID1 in btrfs has always 2 copies, now add support for 3 and 4
- this is an incompat feature (named RAID1C34)
- recommended use of RAID1C3 is replacement of RAID6 profile on
metadata, this brings a more reliable resiliency against 2
device loss/damage
- support for new checksums
- per-filesystem, set at mkfs time
- fast hash (crc32c successor): xxhash, 64bit digest
- strong hashes (both 256bit): sha256 (slower, FIPS), blake2b
(faster)
- the blake2b module goes via the crypto tree, btrfs.ko has a
soft dependency
- speed up lseek, don't take inode locks unnecessarily, this can
speed up parallel SEEK_CUR/SEEK_SET/SEEK_END by 80%
- send:
- allow clone operations within the same file
- limit maximum number of sent clone references to avoid slow
backref walking
- error message improvements: device scan prints process name and PID
Core changes:
- cleanups
- remove unique workqueue helpers, used to provide a way to avoid
deadlocks in the workqueue code, now done in a simpler way
- remove lots of indirect function calls in compression code
- extent IO tree code moved out of extent_io.c
- cleanup backup superblock handling at mount time
- transaction life cycle documentation and cleanups
- locking code cleanups, annotations and documentation
- add more cold, const, pure function attributes
- removal of unused or redundant struct members or variables
- new tree-checker sanity tests
- try to detect missing INODE_ITEM, cross-reference checks of
DIR_ITEM, DIR_INDEX, INODE_REF, and XATTR_* items
- remove own bio scheduling code (used to avoid checksum submissions
being stuck behind other IO), replaced by cgroup controller-based
code to allow better control and avoid priority inversions in cases
where the custom and cgroup scheduling disagreed
Fixes:
- avoid getting stuck during cyclic writebacks
- fix trimming of ranges crossing block group boundaries
- fix rename exchange on subvolumes, all involved subvolumes need to
be recorded in the transaction"
* tag 'for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (137 commits)
btrfs: drop bdev argument from submit_extent_page
btrfs: remove extent_map::bdev
btrfs: drop bio_set_dev where not needed
btrfs: get bdev directly from fs_devices in submit_extent_page
btrfs: record all roots for rename exchange on a subvol
Btrfs: fix block group remaining RO forever after error during device replace
btrfs: scrub: Don't check free space before marking a block group RO
btrfs: change btrfs_fs_devices::rotating to bool
btrfs: change btrfs_fs_devices::seeding to bool
btrfs: rename btrfs_block_group_cache
btrfs: block-group: Reuse the item key from caller of read_one_block_group()
btrfs: block-group: Refactor btrfs_read_block_groups()
btrfs: document extent buffer locking
btrfs: access eb::blocking_writers according to ACCESS_ONCE policies
btrfs: set blocking_writers directly, no increment or decrement
btrfs: merge blocking_writers branches in btrfs_tree_read_lock
btrfs: drop incompat bit for raid1c34 after last block group is gone
btrfs: add incompat for raid1 with 3, 4 copies
btrfs: add support for 4-copy replication (raid1c4)
btrfs: add support for 3-copy replication (raid1c3)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
"MTD core:
- drop inactive maintainers, update the repositories and add IRC
channel
- debugfs functions improvements
- initialize more structure parameters
- misc fixes reported by robots
MTD devices:
- spear_smi: Fixed Write Burst mode
- new Intel IXP4xx flash probing hook
Raw NAND core:
- useless extra checks dropped
- update the detection of the bad block markers position
Raw NAND controller drivers:
- Cadence: new driver
- Brcmnand: support for flash-dma v0 + fixes
- Denali: drop support for the legacy controller/chip DT representation
- superfluous dev_err() calls removed
SPI NOR core changes:
- introduce 'struct spi_nor_controller_ops'
- clean the Register Operations methods
- use dev_dbg insted of dev_err for low level info
- fix retlen handling in sst_write()
- fix silent truncations in spi_nor_read and spi_nor_read_raw()
- fix the clearing of QE bit on lock()/unlock()
- rework the disabling of the block write protection
- rework the Quad Enable methods
- make sure nor->spimem and nor->controller_ops are mutually exclusive
- set default Quad Enable method for ISSI flashes
- add support for few flashes
SPI NOR controller drivers changes:
- intel-spi:
- support chips without software sequencer
- add support for Intel Cannon Lake and Intel Comet Lake-H flashes
CFI core changes:
- code cleanups related useless initializers and coding style issues
- fix for a possible double free problem in cfi_cmdset_0002
- improved HyperFlash error reporting and handling in cfi_cmdset_0002 core"
* tag 'mtd/for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (73 commits)
mtd: devices: fix mchp23k256 read and write
mtd: no need to check return value of debugfs_create functions
mtd: spi-nor: Set default Quad Enable method for ISSI flashes
mtd: spi-nor: Add support for is25wp256
mtd: spi-nor: Add support for w25q256jw
mtd: spi-nor: Move condition to avoid a NULL check
mtd: spi-nor: Make sure nor->spimem and nor->controller_ops are mutually exclusive
mtd: spi-nor: Rename Quad Enable methods
mtd: spi-nor: Merge spansion Quad Enable methods
mtd: spi-nor: Rename CR_QUAD_EN_SPAN to SR2_QUAD_EN_BIT1
mtd: spi-nor: Extend the SR Read Back test
mtd: spi-nor: Rework the disabling of block write protection
mtd: spi-nor: Fix clearing of QE bit on lock()/unlock()
mtd: cfi_cmdset_0002: fix delayed error detection on HyperFlash
mtd: cfi_cmdset_0002: only check errors when ready in cfi_check_err_status()
mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup()
mtd: cfi_cmdset_*: kill useless 'ret' variable initializers
mtd: cfi_util: use DIV_ROUND_UP() in cfi_udelay()
mtd: spi-nor: Print debug message when the read back test fails
mtd: spi-nor: Check all the bits written, not just the BP ones
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Fix DM core to disallow stacking request-based DM on partitions.
- Fix DM raid target to properly resync raidset even if bitmap needed
additional pages.
- Fix DM crypt performance regression due to use of WQ_HIGHPRI for the
IO and crypt workqueues.
- Fix DM integrity metadata layout that was aligned on 128K boundary
rather than the intended 4K boundary (removes 124K of wasted space
for each metadata block).
- Improve the DM thin, cache and clone targets to use spin_lock_irq
rather than spin_lock_irqsave where possible.
- Fix DM thin single thread performance that was lost due to needless
workqueue wakeups.
- Fix DM zoned target performance that was lost due to excessive
backing device checks.
- Add ability to trigger write failure with the DM dust test target.
- Fix whitespace indentation in drivers/md/Kconfig.
- Various smalls fixes and cleanups (e.g. use struct_size, fix
uninitialized variable, variable renames, etc).
* tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
dm: Fix Kconfig indentation
dm thin: wakeup worker only when deferred bios exist
dm integrity: fix excessive alignment of metadata runs
dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout
dm zoned: reduce overhead of backing device checks
dm dust: add limited write failure mode
dm dust: change ret to r in dust_map_read and dust_map
dm dust: change result vars to r
dm cache: replace spin_lock_irqsave with spin_lock_irq
dm bio prison: replace spin_lock_irqsave with spin_lock_irq
dm thin: replace spin_lock_irqsave with spin_lock_irq
dm clone: add bucket_lock_irq/bucket_unlock_irq helpers
dm clone: replace spin_lock_irqsave with spin_lock_irq
dm writecache: handle REQ_FUA
dm writecache: fix uninitialized variable warning
dm stripe: use struct_size() in kmalloc()
dm raid: streamline rs_get_progress() and its raid_status() caller side
dm raid: simplify rs_setup_recovery call chain
dm raid: to ensure resynchronization, perform raid set grow in preresume
...
|
|
Pull disk revalidation updates from Jens Axboe:
"This continues the work that Jan Kara started to thoroughly cleanup
and consolidate how we handle rescans and revalidations"
* tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block:
block: move clearing bd_invalidated into check_disk_size_change
block: remove (__)blkdev_reread_part as an exported API
block: fix bdev_disk_changed for non-partitioned devices
block: move rescan_partitions to fs/block_dev.c
block: merge invalidate_partitions into rescan_partitions
block: refactor rescan_partitions
|
|
Pull zoned block device update from Jens Axboe:
"Enhancements and improvements to the zoned device support"
* tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block:
scsi: sd_zbc: Remove set but not used variable 'buflen'
block: rework zone reporting
scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
null_blk: Add zone_nr_conv to features
null_blk: clean up report zones
null_blk: clean up the block device operations
block: Remove partition support for zoned block devices
block: Simplify report zones execution
block: cleanup the !zoned case in blk_revalidate_disk_zones
block: Enhance blk_revalidate_disk_zones()
|
|
Pull additional block driver updates from Jens Axboe:
"Here's another block driver update, done to avoid conflicts with the
zoned changes coming next.
This contains:
- Prepare SCSI sd for zone open/close/finish support
- Small NVMe pull request
- hwmon support (Akinobu)
- add new co-maintainer (Christoph)
- work-around for a discard issue on non-conformant drives
(Eduard)
- Small nbd leak fix"
* tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block:
nbd: prevent memory leak
nvme: hwmon: add quirk to avoid changing temperature threshold
nvme: hwmon: provide temperature min and max values for each sensor
nvmet: add another maintainer
nvme: Discard workaround for non-conformant devices
nvme: Add hardware monitoring support
scsi: sd_zbc: add zone open, close, and finish support
|
|
Pull block driver updates from Jens Axboe:
"Here are the main block driver updates for 5.5. Nothing major in here,
mostly just fixes. This contains:
- a set of bcache changes via Coly
- MD changes from Song
- loop unmap write-zeroes fix (Darrick)
- spelling fixes (Geert)
- zoned additions cleanups to null_blk/dm (Ajay)
- allow null_blk online submit queue changes (Bart)
- NVMe changes via Keith, nothing major here either"
* tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits)
Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()"
drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
bcache: don't export symbols
bcache: remove the extra cflags for request.o
bcache: at least try to shrink 1 node in bch_mca_scan()
bcache: add idle_max_writeback_rate sysfs interface
bcache: add code comments in bch_btree_leaf_dirty()
bcache: fix deadlock in bcache_allocator
bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front()
bcache: deleted code comments for dead code in bch_data_insert_keys()
bcache: add more accurate error messages in read_super()
bcache: fix static checker warning in bcache_device_free()
bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
bcache: fix fifo index swapping condition in journal_pin_cmp()
md/raid10: prevent access of uninitialized resync_pages offset
md: avoid invalid memory access for array sb->dev_roles
md/raid1: avoid soft lockup under high load
null_blk: add zone open, close, and finish support
dm: add zone open, close and finish support
...
|
|
Pull core block updates from Jens Axboe:
"Due to more granular branches, this one is small and will be followed
with other core branches that add specific features. I meant to just
have a core and drivers branch, but external dependencies we ended up
adding a few more that are also core.
The changes are:
- Fixes and improvements for the zoned device support (Ajay, Damien)
- sed-opal table writing and datastore UID (Revanth)
- blk-cgroup (and bfq) blk-cgroup stat fixes (Tejun)
- Improvements to the block stats tracking (Pavel)
- Fix for overruning sysfs buffer for large number of CPUs (Ming)
- Optimization for small IO (Ming, Christoph)
- Fix typo in RWH lifetime hint (Eugene)
- Dead code removal and documentation (Bart)
- Reduction in memory usage for queue and tag set (Bart)
- Kerneldoc header documentation (André)
- Device/partition revalidation fixes (Jan)
- Stats tracking for flush requests (Konstantin)
- Various other little fixes here and there (et al)"
* tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block: (48 commits)
Revert "block: split bio if the only bvec's length is > SZ_4K"
block: add iostat counters for flush requests
block,bfq: Skip tracing hooks if possible
block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64'
blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1
block: Don't disable interrupts in trigger_softirq()
sbitmap: Delete sbitmap_any_bit_clear()
blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue()
block: split bio if the only bvec's length is > SZ_4K
block: still try to split bio if the bvec crosses pages
blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT
blk-cgroup: reimplement basic IO stats using cgroup rstat
blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive()
blk-throtl: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: relocate bfqg_*rwstat*() helpers
block: add zone open, close and finish ioctl support
block: add zone open, close and finish operations
block: Simplify REQ_OP_ZONE_RESET_ALL handling
block: Remove REQ_OP_ZONE_RESET plugging
...
|
|
Pull libata updates from Jens Axboe:
"Just a few fixes all over the place, support for the Annapurna SATA
controller, and a patchset that cleans up the error defines and
ultimately fixes anissue with sata_mv"
* tag 'for-5.5/libata-20191121' of git://git.kernel.dk/linux-block:
ata: pata_artop: make arrays static const, makes object smaller
ata_piix: remove open-coded dmi_match(DMI_OEM_STRING)
ata: sata_mv, avoid trigerrable BUG_ON
ata: make qc_prep return ata_completion_errors
ata: define AC_ERR_OK
ata: Documentation, fix function names
libata: Ensure ata_port probe has completed before detach
ahci: tegra: use regulator_bulk_set_supply_names()
ahci: Add support for Amazon's Annapurna Labs SATA controller
|
|
Pull io_uring updates from Jens Axboe:
"A lot of stuff has been going on this cycle, with improving the
support for networked IO (and hence unbounded request completion
times) being one of the major themes. There's been a set of fixes done
this week, I'll send those out as well once we're certain we're fully
happy with them.
This contains:
- Unification of the "normal" submit path and the SQPOLL path (Pavel)
- Support for sparse (and bigger) file sets, and updating of those
file sets without needing to unregister/register again.
- Independently sized CQ ring, instead of just making it always 2x
the SQ ring size. This makes it more flexible for networked
applications.
- Support for overflowed CQ ring, never dropping events but providing
backpressure on submits.
- Add support for absolute timeouts, not just relative ones.
- Support for generic cancellations. This divorces io_uring from
workqueues as well, which additionally gets us one step closer to
generic async system call support.
- With cancellations, we can support grabbing the process file table
as well, just like we do mm context. This allows support for system
calls that create file descriptors, like accept4() support that's
built on top of that.
- Support for io_uring tracing (Dmitrii)
- Support for linked timeouts. These abort an operation if it isn't
completed by the time noted in the linke timeout.
- Speedup tracking of poll requests
- Various cleanups making the coder easier to follow (Jackie, Pavel,
Bob, YueHaibing, me)
- Update MAINTAINERS with new io_uring list"
* tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-block: (64 commits)
io_uring: make POLL_ADD/POLL_REMOVE scale better
io-wq: remove now redundant struct io_wq_nulls_list
io_uring: Fix getting file for non-fd opcodes
io_uring: introduce req_need_defer()
io_uring: clean up io_uring_cancel_files()
io-wq: ensure free/busy list browsing see all items
io-wq: ensure we have a stable view of ->cur_work for cancellations
io_wq: add get/put_work handlers to io_wq_create()
io_uring: check for validity of ->rings in teardown
io_uring: fix potential deadlock in io_poll_wake()
io_uring: use correct "is IO worker" helper
io_uring: fix -ENOENT issue with linked timer with short timeout
io_uring: don't do flush cancel under inflight_lock
io_uring: flag SQPOLL busy condition to userspace
io_uring: make ASYNC_CANCEL work with poll and timeout
io_uring: provide fallback request for OOM situations
io_uring: convert accept4() -ERESTARTSYS into -EINTR
io_uring: fix error clear of ->file_table in io_sqe_files_register()
io_uring: separate the io_free_req and io_free_req_find_next interface
io_uring: keep io_put_req only responsible for release and put req
...
|