Age | Commit message (Collapse) | Author |
|
guoren@kernel.org <guoren@kernel.org> says:
From: Guo Ren <guoren@linux.alibaba.com>
The patches convert riscv to use the generic entry infrastructure from
kernel/entry/*. Some optimization for entry.S with new .macro and merge
ret_from_kernel_thread into ret_from_fork.
* b4-shazam-merge:
riscv: entry: Consolidate general regs saving/restoring
riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork
riscv: entry: Remove extra level wrappers of trace_hardirqs_{on,off}
riscv: entry: Convert to generic entry
riscv: entry: Add noinstr to prevent instrumentation inserted
riscv: ptrace: Remove duplicate operation
Link: https://lore.kernel.org/r/20230222033021.983168-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Consolidate the saving/restoring GPs (except zero, ra, sp, gp,
tp and t0) into save_from_x6_to_x31/restore_from_x6_to_x31 macros.
No functional change intended.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-8-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The ret_from_kernel_thread() behaves similarly with ret_from_fork(),
the only difference is whether call the fn(arg) or not, this can be
achieved by testing fn is NULL or not, I.E s0 is 0 or not. Many
architectures have done the same thing, it makes entry.S more clean.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-7-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Since riscv is converted to generic entry, there's no need for the
extra wrappers of trace_hardirqs_{on,off}.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-6-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch converts riscv to use the generic entry infrastructure from
kernel/entry/*. The generic entry makes maintainers' work easier and
codes more elegant. Here are the changes:
- More clear entry.S with handle_exception and ret_from_exception
- Get rid of complex custom signal implementation
- Move syscall procedure from assembly to C, which is much more
readable.
- Connect ret_from_fork & ret_from_kernel_thread to generic entry.
- Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode
- Use the standard preemption code instead of custom
Suggested-by: Huacai Chen <chenhuacai@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Yipeng Zou <zouyipeng@huawei.com>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Link: https://lore.kernel.org/r/20230222033021.983168-5-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Without noinstr the compiler is free to insert instrumentation (think
all the k*SAN, KCov, GCov, ftrace etc..) which can call code we're not
yet ready to run this early in the entry path, for instance it could
rely on RCU which isn't on yet, or expect lockdep state. (by peterz)
Link: https://lore.kernel.org/linux-riscv/YxcQ6NoPf3AH0EXe@hirez.programming.kicks-ass.net/
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-4-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The TIF_SYSCALL_TRACE is controlled by a common code, see
kernel/ptrace.c and include/linux/thread_info.h.
clear_task_syscall_work(child, SYSCALL_TRACE);
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230222033021.983168-3-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Andrew Jones <ajones@ventanamicro.com> says:
When the Zicboz extension is available we can more rapidly zero naturally
aligned Zicboz block sized chunks of memory. As pages are always page
aligned and are larger than any Zicboz block size will be, then
clear_page() appears to be a good candidate for the extension. While cycle
count and energy consumption should also be considered, we can be pretty
certain that implementing clear_page() with the Zicboz extension is a win
by comparing the new dynamic instruction count with its current count[1].
Doing so we see that the new count is just over a quarter of the old count
(see patch6's commit message for more details).
For those of you who reviewed v1[2], you may be looking for the memset()
patches. As pointed out in v1, and a couple follow-up emails, it's not
clear that patching memset() is a win yet. When I get a chance to test
on real hardware with a comprehensive benchmark collection then I can
post the memset() patches separately (assuming the benchmarks show it's
worthwhile).
* b4-shazam-merge:
RISC-V: KVM: Expose Zicboz to the guest
RISC-V: KVM: Provide UAPI for Zicboz block size
RISC-V: Use Zicboz in clear_page when available
RISC-V: cpufeatures: Put the upper 16 bits of patch ID to work
RISC-V: Add Zicboz detection and block size parsing
dt-bindings: riscv: Document cboz-block-size
RISC-V: Factor out body of riscv_init_cbom_blocksize loop
RISC-V: alternatives: Support patching multiple insns in assembly
Link: https://lore.kernel.org/r/20230224162631.405473-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Guests may use the cbo.zero instruction when the CPU has the Zicboz
extension and the hypervisor sets henvcfg.CBZE.
Add Zicboz support for KVM guests which may be enabled and
disabled from KVM userspace using the ISA extension ONE_REG API.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230224162631.405473-9-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
We're about to allow guests to use the Zicboz extension. KVM
userspace needs to know the cache block size in order to
properly advertise it to the guest. Provide a virtual config
register for userspace to get it with the GET_ONE_REG API, but
setting it cannot be supported, so disallow SET_ONE_REG.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230224162631.405473-8-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Using memset() to zero a 4K page takes 563 total instructions, where
20 are branches. clear_page(), with Zicboz and a 64 byte block size,
takes 169 total instructions, where 4 are branches and 33 are nops.
Even though the block size is a variable, thanks to alternatives, we
can still implement a Duff device without having to do any preliminary
calculations. This is achieved by using the alternatives' cpufeature
value (the upper 16 bits of patch_id). The value used is the maximum
zicboz block size order accepted at the patch site. This enables us
to stop patching / unrolling when 4K bytes have been zeroed (we would
loop and continue after 4K if the page size would be larger)
For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
only loop a few times and larger block sizes to not loop at all. Since
cbo.zero doesn't take an offset, we also need an 'add' after each
instruction, making the loop body 112 to 160 bytes. Hopefully this
is small enough to not cause icache misses.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
cpufeature IDs are consecutive integers starting at 26, so a 32-bit
patch ID allows an aircraft carrier load of feature IDs. Repurposing
the upper 16 bits still leaves a boat load of feature IDs and gains
16 bits which may be used to control patching on a per patch-site
basis.
This will be initially used in Zicboz's application to clear_page(),
as Zicboz's block size must also be considered. In that case, the
upper 16-bit value's role will be to convey the maximum block size
which the Zicboz clear_page() implementation supports.
cpufeature patch sites which need to check for the existence or
absence of other cpufeatures may also be able to make use of this.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-6-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Parse "riscv,cboz-block-size" from the DT by piggybacking on Zicbom's
riscv_init_cbom_blocksize(). Additionally check the DT for the presence
of the "zicboz" extension and, when it's present, validate the parsed
cboz block size as we do Zicbom's cbom block size with
riscv_isa_extension_check().
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-5-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Refactor riscv_init_cbom_blocksize() to prepare for it to be used
for both cbom block size and cboz block size.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
As pointed out in commit d374a16539b1 ("RISC-V: fix compile error
from deduplicated __ALTERNATIVE_CFG_2"), we need quotes around
parameters passed to macros within macros to avoid spaces being
interpreted as separators. ALT_NEW_CONTENT was trying to handle
this by defining new_c has a vararg, but this isn't sufficient
for calling ALTERNATIVE() from assembly with multiple instructions
in the new/old sequences. Remove the vararg "hack" and use quotes.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Andrew Jones <ajones@ventanamicro.com> says:
This series has no intended functional change. These cleanups were
found while renaming errata_id to patch_id in order to better
convey that its purpose is larger than errata (it's also for
cpufeatures).
* b4-shazam-merge:
riscv: cpufeature: Drop errata_list.h and other unused includes
riscv: lib: Include hwcap.h directly
riscv: alternatives: Rename errata_id to patch_id
riscv: alternatives: Remove unnecessary define and unused struct
riscv: Rename Kconfig.erratas to Kconfig.errata
riscv: Clarify RISCV_ALTERNATIVE help text
Link: https://lore.kernel.org/r/20230224154601.88163-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Drop errata_list.h, since cpufeature.c includes hwcap.h directly to
get cpufeature IDs. And, while there, prune the rest of the unused
includes too.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
When using alternatives for cpufeatures we should include hwcap.h
directly, rather than through errata_list.h. Opportunistically drop
an unused include too.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-6-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Alternatives are used for both errata and cpufeatures. Use a more
generic name, 'patch_id', as in "ID of code patching site", to
avoid confusion when alternatives are used for cpufeatures.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-5-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
A define and a struct were introduced with commit 6f4eea90465a
("riscv: Introduce alternative mechanism to apply errata solution"),
which introduced alternatives to RISC-V. The define is used for
an arbitrary string length, specific to sifive errata, so just use
the number directly there instead. The struct has never been used,
so remove it.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-4-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Errata is already plural for erratum. Rename it to make the
grammar gooder.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Clarify RISCV_ALTERNATIVE's help text by pointing out that code
patching is not only done at boot time, but also module load time.
Also point out that this is the minimal possible overhead.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230224154601.88163-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Qinglin Pan <panqinglin00@gmail.com> says:
Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K
page. This patch set is for using Svnapot in hugetlb fs and huge vmap.
This patchset adds a Kconfig item for using Svnapot in
"Platform type"->"SVNAPOT extension support". Its default value is on,
and people can set it off if they don't allow kernel to detect Svnapot
hardware support and leverage it.
Tested on:
- qemu rv64 with "Svnapot support" off and svnapot=true.
- qemu rv64 with "Svnapot support" on and svnapot=true.
- qemu rv64 with "Svnapot support" off and svnapot=false.
- qemu rv64 with "Svnapot support" on and svnapot=false.
* b4-shazam-merge:
riscv: mm: support Svnapot in huge vmap
riscv: mm: support Svnapot in hugetlb page
riscv: mm: modify pte format for Svnapot
Link: https://lore.kernel.org/r/20230209131647.17245-1-panqinglin00@gmail.com
[Palmer: fix up the feature ordering in the merge]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
As HAVE_ARCH_HUGE_VMAP and HAVE_ARCH_HUGE_VMALLOC is supported, we can
implement arch_vmap_pte_range_map_size and arch_vmap_pte_supported_shift
for Svnapot to support huge vmap about napot size.
It can be tested by huge vmap used in pci driver. Huge vmalloc with svnapot
can be tested by test_vmalloc with [1] applied, and probe this
module to run fix_size_alloc_test with use_huge true.
[1]https://lore.kernel.org/all/20221212055657.698420-1-panqinglin2020@iscas.ac.cn/
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230209131647.17245-4-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Svnapot can be used to support 64KB hugetlb page, so it can become a new
option when using hugetlbfs. Add a basic implementation of hugetlb page,
and support 64KB as a size in it by using Svnapot.
For test, boot kernel with command line contains "default_hugepagesz=64K
hugepagesz=64K hugepages=20" and run a simple test like this:
tools/testing/selftests/vm/map_hugetlb 1 16
And it should be passed.
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230209131647.17245-3-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add one alternative to enable/disable svnapot support, enable this static
key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT compile
option is set. It will influence the behavior of has_svnapot. All code
dependent on svnapot should make sure that has_svnapot return true firstly.
Modify PTE definition for Svnapot, and creates some functions in pgtable.h
to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
64KB napot size is supported in spec, so some macros has only 64KB version.
Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230209131647.17245-2-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.
The optimization was then later added back in a limited form by commit
6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.
Instead, just re-introduce the optimization, with some changes.
Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":
- the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
This is used for situations where we should use the exact size.
- the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
fits in a single word and the bitmap operations thus end up able
to trigger the "small_const_nbits()" optimizations.
This is used for the operations that have optimized single-word
cases that get inlined, notably the bit find and scanning functions.
- the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
is an sufficiently small constant that makes simple "copy" and
"clear" operations more efficient.
This is arbitrarily set at four words or less.
As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like
movl nr_cpu_ids(%rip), %edx
addq $63, %rdx
shrq $3, %rdx
andl $-8, %edx
callq memset@PLT
on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.
In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single
movq $0,cpumask
instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.
Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.
But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.
In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'. Better remove them now than have somebody introduce use
of them later.
Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless. Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 updates from Thomas Gleixner:
"A small set of updates for x86:
- Return -EIO instead of success when the certificate buffer for SEV
guests is not large enough
- Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared
on return to userspace for performance reasons, but the leaves user
space vulnerable to cross-thread attacks which STIBP prevents.
Update the documentation accordingly"
* tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt/sev-guest: Return -EIO if certificate buffer is not large enough
Documentation/hw-vuln: Document the interaction between IBRS and STIBP
x86/speculation: Allow enabling STIBP with legacy IBRS
|
|
Pull VM_FAULT_RETRY fixes from Al Viro:
"Some of the page fault handlers do not deal with the following case
correctly:
- handle_mm_fault() has returned VM_FAULT_RETRY
- there is a pending fatal signal
- fault had happened in kernel mode
Correct action in such case is not "return unconditionally" - fatal
signals are handled only upon return to userland and something like
copy_to_user() would end up retrying the faulting instruction and
triggering the same fault again and again.
What we need to do in such case is to make the caller to treat that as
failed uaccess attempt - handle exception if there is an exception
handler for faulting instruction or oops if there isn't one.
Over the years some architectures had been fixed and now are handling
that case properly; some still do not. This series should fix the
remaining ones.
Status:
- m68k, riscv, hexagon, parisc: tested/acked by maintainers.
- alpha, sparc32, sparc64: tested locally - bug has been reproduced
on the unpatched kernel and verified to be fixed by this series.
- ia64, microblaze, nios2, openrisc: build, but otherwise completely
untested"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
openrisc: fix livelock in uaccess
nios2: fix livelock in uaccess
microblaze: fix livelock in uaccess
ia64: fix livelock in uaccess
sparc: fix livelock in uaccess
alpha: fix livelock in uaccess
parisc: fix livelock in uaccess
hexagon: fix livelock in uaccess
riscv: fix livelock in uaccess
m68k: fix livelock in uaccess
|
|
include/linux/compiler-intel.h had no update in the past 3 years.
We often forget about the third C compiler to build the kernel.
For example, commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO")
only mentioned GCC and Clang.
init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC,
and nobody has reported any issue.
I guess the Intel Compiler support is broken, and nobody is caring
about it.
Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is
deprecated:
$ icc -v
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is
deprecated and will be removed from product release in the second half
of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended
compiler moving forward. Please transition to use this compiler. Use
'-diag-disable=10441' to disable this message.
icc version 2021.7.0 (gcc version 12.1.0 compatibility)
Arnd Bergmann provided a link to the article, "Intel C/C++ compilers
complete adoption of LLVM".
lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept
untouched for better sync with https://github.com/facebook/zstd
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes.
Eight are for MM and seven are for other parts of the kernel. Seven
are cc:stable and eight address post-6.3 issues or were judged
unsuitable for -stable backporting"
* tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: map Dikshita Agarwal's old address to his current one
mailmap: map Vikash Garodia's old address to his current one
fs/cramfs/inode.c: initialize file_ra_state
fs: hfsplus: fix UAF issue in hfsplus_put_super
panic: fix the panic_print NMI backtrace setting
lib: parser: update documentation for match_NUMBER functions
kasan, x86: don't rename memintrinsics in uninstrumented files
kasan: test: fix test for new meminstrinsic instrumentation
kasan: treat meminstrinsic as builtins in uninstrumented files
kasan: emit different calls for instrumentable memintrinsics
ocfs2: fix non-auto defrag path not working issue
ocfs2: fix defrag path triggering jbd2 ASSERT
mailmap: map Georgi Djakov's old Linaro address to his current one
mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH
mm/damon/paddr: fix missing folio_put()
mm/mremap: fix dup_anon_vma() in vma_merge() case 4
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Drop orphaned VAS MAINTAINERS entry
- Fix build errors with clang and KCSAN
- Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together
with recordmcount
Thanks to Nathan Chancellor.
* tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Avoid dead code/data elimination when using recordmcount
powerpc/vmlinux.lds: Add .text.asan/tsan sections
powerpc: Drop orphaned VAS MAINTAINERS entry
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Add empty command line parameter handling stubs to kernel for all
command line parameters which are handled in the decompressor. This
avoids invalid "Unknown kernel command line parameters" messages from
the kernel, and also avoids that these will be incorrectly passed to
user space. This caused already confusion, therefore add the empty
stubs
- Add missing phys_to_virt() handling to machine check handler
- Introduce and use a union to be used for zcrypt inline assemblies.
This makes sure that only a register wide member of the union is
passed as input and output parameter to inline assemblies, while
usual C code uses other members of the union to access bit fields of
it
- Add and use a READ_ONCE_ALIGNED_128() macro, which can be used to
atomically read a 128-bit value from memory. This replaces the
(mis-)use of the 128-bit cmpxchg operation to do the same in cpum_sf
code. Currently gcc does not generate the used lpq instruction if
__READ_ONCE() is used for aligned 128-bit accesses, therefore use
this s390 specific helper
- Simplify machine check handler code if a task needs to be killed
because of e.g. register corruption due to a machine malfunction
- Perform CPU reset to clear pending interrupts and TLB entries on an
already stopped target CPU before delegating work to it
- Generate arch/s390/boot/vmlinux.map link map for the decompressor,
when CONFIG_VMLINUX_MAP is enabled for debugging purposes
- Fix segment type handling for dcssblk devices. It incorrectly always
returned type "READ/WRITE" even for read-only segements, which can
result in a kernel panic if somebody tries to write to a read-only
device
- Sort config S390 select list again
- Fix two kprobe reenter bugs revealed by a recently added kprobe kunit
test
* tag 's390-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kprobes: fix current_kprobe never cleared after kprobes reenter
s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler
s390/Kconfig: sort config S390 select list again
s390/extmem: return correct segment type in __segment_load()
s390/decompressor: add link map saving
s390/smp: perform cpu reset before delegating work to target cpu
s390/mcck: cleanup user process termination path
s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg
s390/rwonce: add READ_ONCE_ALIGNED_128() macro
s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union
s390/nmi: fix virtual-physical address confusion
s390/setup: do not complain about parameters handled in decompressor
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- Some cleanups and fixes for the Zbb-optimized string routines
- Support for custom (vendor or implementation defined) perf events
- COMMAND_LINE_SIZE has been increased to 1024
* tag 'riscv-for-linus-6.3-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Bump COMMAND_LINE_SIZE value to 1024
drivers/perf: RISC-V: Allow programming custom firmware events
riscv, lib: Fix Zbb strncmp
RISC-V: improve string-function assembly
|
|
Now that memcpy/memset/memmove are no longer overridden by KASAN, we can
just use the normal symbol names in uninstrumented files.
Drop the preprocessor redefinitions.
Link: https://lkml.kernel.org/r/20230224085942.1791837-4-elver@google.com
Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linux Kernel Functional Testing <lkft@linaro.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- In copy_highpage(), only reset the tag of the destination pointer if
KASAN_HW_TAGS is enabled so that user-space MTE does not interfere
with KASAN_SW_TAGS (which relies on top-byte-ignore).
- Remove warning if SME is detected without SVE, the kernel can cope
with such configuration (though none in the field currently).
- In cfi_handler(), pass the ESR_EL1 value to die() for consistency
with other die() callers.
- Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte
manipulation from the generic vmemmap_remap_pte() does not follow the
required ARM break-before-make sequence (clear the pte, flush the
TLBs, set the new pte). It may be re-enabled once this sequence is
sorted.
- Fix possible memory leak in the arm64 ACPI code if the SMCCC version
and conduit checks fail.
- Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores
-falign-functions=N with -Os.
- Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no
randomisation would actually take place.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN
arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE
arm64: acpi: Fix possible memory leak of ffh_ctxt
arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP
arm64: pass ESR_ELx to die() of cfi_handler
arm64/fpsimd: Remove warning for SME without SVE
arm64: Reset KASAN tag in copy_highpage with HW tags only
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull more MIPS updates from Thomas Bogendoerfer:
"A few more cleanups and fixes"
* tag 'mips_6.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Workaround clang inline compat branch issue
mips: dts: ralink: mt7621: add phandle to system controller node for watchdog
mips: dts: ralink: mt7621: rename watchdog node from 'wdt' into 'watchdog'
mips: ralink: make SOC_MT7621 select PINCTRL
mips: remove SYS_HAS_CPU_MIPS32_R1 from RALINK
MIPS: cevt-r4k: Offset the value used to clear compare interrupt
MIPS: smp-cps: Don't rely on CP0_CMGCRBASE
MIPS: Remove DMA_PERDEV_COHERENT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Shrink 'struct instruction', to improve objtool performance & memory
footprint
- Other maximum memory usage reductions - this makes the build both
faster, and fixes kernel build OOM failures on allyesconfig and
similar configs when they try to build the final (large) vmlinux.o
- Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying
single-byte instruction (PUSH/POP or LEAVE). This requires the
extension of the ORC metadata structure with a 'signal' field
- Misc fixes & cleanups
* tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
objtool: Fix ORC 'signal' propagation
objtool: Remove instruction::list
x86: Fix FILL_RETURN_BUFFER
objtool: Fix overlapping alternatives
objtool: Union instruction::{call_dest,jump_table}
objtool: Remove instruction::reloc
objtool: Shrink instruction::{type,visited}
objtool: Make instruction::alts a single-linked list
objtool: Make instruction::stack_ops a single-linked list
objtool: Change arch_decode_instruction() signature
x86/entry: Fix unwinding from kprobe on PUSH/POP instruction
x86/unwind/orc: Add 'signal' field to ORC metadata
objtool: Optimize layout of struct special_alt
objtool: Optimize layout of struct symbol
objtool: Allocate multiple structures with calloc()
objtool: Make struct check_options static
objtool: Make struct entries[] static and const
objtool: Fix HOSTCC flag usage
objtool: Properly support make V=1
objtool: Install libsubcmd in build
...
|
|
openrisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
nios2 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
microblaze equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
ia64 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
sparc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
alpha equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
hexagon equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Acked-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Tested-by: Björn Töpel <bjorn@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Tested-by: Finn Thain <fthain@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Recent test_kprobe_missed kprobes kunit test uncovers the following
problem. Once kprobe is triggered from another kprobe (kprobe reenter),
all future kprobes on this cpu are considered as kprobe reenter, thus
pre_handler and post_handler are not being called and kprobes are counted
as "missed".
Commit b9599798f953 ("[S390] kprobes: activation and deactivation")
introduced a simpler scheme for kprobes (de)activation and status
tracking by using push_kprobe/pop_kprobe, which supposed to work for
both initial kprobe entry as well as kprobe reentry and helps to avoid
handling those two cases differently. The problem is that a sequence of
calls in case of kprobes reenter:
push_kprobe() <- NULL (current_kprobe)
push_kprobe() <- kprobe1 (current_kprobe)
pop_kprobe() -> kprobe1 (current_kprobe)
pop_kprobe() -> kprobe1 (current_kprobe)
leaves "kprobe1" as "current_kprobe" on this cpu, instead of setting it
to NULL. In fact push_kprobe/pop_kprobe can only store a single state
(there is just one prev_kprobe in kprobe_ctlblk). Which is a hack but
sufficient, there is no need to have another prev_kprobe just to store
NULL. To make a simple and backportable fix simply reset "prev_kprobe"
when kprobe is poped from this "stack". No need to worry about
"kprobe_status" in this case, because its value is only checked when
current_kprobe != NULL.
Cc: stable@vger.kernel.org
Fixes: b9599798f953 ("[S390] kprobes: activation and deactivation")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Recent test_kprobe_missed kprobes kunit test uncovers the following error
(reported when CONFIG_DEBUG_ATOMIC_SLEEP is enabled):
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 662, name: kunit_try_catch
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
no locks held by kunit_try_catch/662.
irq event stamp: 280
hardirqs last enabled at (279): [<00000003e60a3d42>] __do_pgm_check+0x17a/0x1c0
hardirqs last disabled at (280): [<00000003e3bd774a>] kprobe_exceptions_notify+0x27a/0x318
softirqs last enabled at (0): [<00000003e3c5c890>] copy_process+0x14a8/0x4c80
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 46 PID: 662 Comm: kunit_try_catch Tainted: G N 6.2.0-173644-g44c18d77f0c0 #2
Hardware name: IBM 3931 A01 704 (LPAR)
Call Trace:
[<00000003e60a3a00>] dump_stack_lvl+0x120/0x198
[<00000003e3d02e82>] __might_resched+0x60a/0x668
[<00000003e60b9908>] __mutex_lock+0xc0/0x14e0
[<00000003e60bad5a>] mutex_lock_nested+0x32/0x40
[<00000003e3f7b460>] unregister_kprobe+0x30/0xd8
[<00000003e51b2602>] test_kprobe_missed+0xf2/0x268
[<00000003e51b5406>] kunit_try_run_case+0x10e/0x290
[<00000003e51b7dfa>] kunit_generic_run_threadfn_adapter+0x62/0xb8
[<00000003e3ce30f8>] kthread+0x2d0/0x398
[<00000003e3b96afa>] __ret_from_fork+0x8a/0xe8
[<00000003e60ccada>] ret_from_fork+0xa/0x40
The reason for this error report is that kprobes handling code failed
to restore irqs.
The problem is that when kprobe is triggered from another kprobe
post_handler current sequence of enable_singlestep / disable_singlestep
is the following:
enable_singlestep <- original kprobe (saves kprobe_saved_imask)
enable_singlestep <- kprobe triggered from post_handler (clobbers kprobe_saved_imask)
disable_singlestep <- kprobe triggered from post_handler (restores kprobe_saved_imask)
disable_singlestep <- original kprobe (restores wrong clobbered kprobe_saved_imask)
There is just one kprobe_ctlblk per cpu and both calls saves and
loads irq mask to kprobe_saved_imask. To fix the problem simply move
resume_execution (which calls disable_singlestep) before calling
post_handler. This also fixes the problem that post_handler is called
with pt_regs which were not yet adjusted after single-stepping.
Cc: stable@vger.kernel.org
Fixes: 4ba069b802c2 ("[S390] add kprobes support.")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|