Age | Commit message (Collapse) | Author |
|
When performing an exec*(), there's a transient period before the return
to userspace where any stacktrace will result in a warning triggered by
kunwind_next_frame_record_meta() encountering a struct frame_record_meta
with an unknown type. This can be seen fairly reliably by enabling KASAN
or KFENCE, e.g.
| WARNING: CPU: 3 PID: 143 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x264/0x3b0
| Modules linked in:
| CPU: 3 UID: 0 PID: 143 Comm: login Not tainted 6.12.0-rc2-00010-g0f0b9a3f6a50 #1
| Hardware name: linux,dummy-virt (DT)
| pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
| pc : arch_stack_walk+0x264/0x3b0
| lr : arch_stack_walk+0x1ec/0x3b0
| sp : ffff80008060b970
| x29: ffff80008060ba10 x28: fff00000051133c0 x27: 0000000000000000
| x26: 0000000000000000 x25: 0000000000000000 x24: fff000007fe84000
| x23: ffff9d1b3c940af0 x22: 0000000000000000 x21: fff00000051133c0
| x20: ffff80008060ba50 x19: ffff9d1b3c9408e0 x18: 0000000000000014
| x17: 000000006d50da47 x16: 000000008e3f265e x15: fff0000004e8bf40
| x14: 0000ffffc5e50e48 x13: 000000000000000f x12: 0000ffffc5e50fed
| x11: 000000000000001f x10: 000018007f8bffff x9 : 0000000000000000
| x8 : ffff80008060b9c0 x7 : ffff80008060bfd8 x6 : ffff80008060ba80
| x5 : ffff80008060ba00 x4 : ffff80008060c000 x3 : ffff80008060bff0
| x2 : 0000000000000018 x1 : ffff80008060bfd8 x0 : 0000000000000000
| Call trace:
| arch_stack_walk+0x264/0x3b0 (P)
| arch_stack_walk+0x1ec/0x3b0 (L)
| stack_trace_save+0x50/0x80
| metadata_update_state+0x98/0xa0
| kfence_guarded_free+0xec/0x2c4
| __kfence_free+0x50/0x100
| kmem_cache_free+0x1a4/0x37c
| putname+0x9c/0xc0
| do_execveat_common.isra.0+0xf0/0x1e4
| __arm64_sys_execve+0x40/0x60
| invoke_syscall+0x48/0x104
| el0_svc_common.constprop.0+0x40/0xe0
| do_el0_svc+0x1c/0x28
| el0_svc+0x34/0xe0
| el0t_64_sync_handler+0x120/0x12c
| el0t_64_sync+0x198/0x19c
This happens because start_thread_common() zeroes the entirety of
current_pt_regs(), including pt_regs::stackframe::type, changing this
from FRAME_META_TYPE_FINAL to 0 and making the final record invalid.
The stacktrace code will reject this until the next return to userspace,
where a subsequent exception entry will reinitialize the type to
FRAME_META_TYPE_FINAL.
This zeroing wasn't a problem prior to commit:
c2c6b27b5aa14fa2 ("arm64: stacktrace: unwind exception boundaries")
... as before that commit the stacktrace code only expected the final
pt_regs::stackframe to contain zeroes, which was unchanged by
start_thread_common().
A stacktrace could occur at any time, either due to instrumentation or
an exception, and so start_thread_common() must ensure that
pt_regs::stackframe is always valid.
Fix this by changing the way start_thread_common() zeroes and
reinitializes the pt_regs fields:
* The '{regs,pc,pstate}' fields are initialized in one go via a struct
assignment to the user_regs, with start_thread() and
compat_start_thread() modified to pass 'pstate' into
start_thread_common().
* The 'sp' and 'compat_sp' fields are zeroed by the struct assignment in
start_thread_common(), and subsequently overwritten in start_thread()
and compat_start_thread respectively, matching existing behaviour.
* The 'syscallno' field is implicitly preserved while the 'orig_x0'
field is explicitly zeroed, maintaining existing ABI.
* The 'pmr' field is explicitly initialized, as necessary for an exec*()
from a kernel thread, matching existing behaviour.
* The 'stackframe' field is implicitly preserved, with a new comment and
some assertions to ensure we don't accidentally break this in future.
* All other fields are implicitly preserved, and should have no
functional impact:
- 'sdei_ttbr1' is only used for SDEI exception entry/exit, and we
never exec*() inside an SDEI handler.
- 'lockdep_hardirqs' and 'exit_rcu' are only used for EL1 exception
entry/exit, and we never exec*() inside an EL1 exception handler.
While updating compat_start_thread() to pass 'pstate' into
start_thread_common(), I've also updated the logic to remove the
ifdeffery, replacing:
| #ifdef __AARCH64EB__
| regs->pstate |= PSR_AA32_E_BIT;
| #endif
... with:
| if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
| pstate |= PSR_AA32_E_BIT;
... which should be functionally equivalent, and matches our preferred
code style.
Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Puranjay Mohan <puranjay12@gmail.com>
Cc: Will Deacon <will@kernel.org>
Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries")
Tested-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20241021164456.2275285-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When arm64's stack unwinder encounters an exception boundary, it uses
the pt_regs::stackframe created by the entry code, which has a copy of
the PC and FP at the time the exception was taken. The unwinder doesn't
know anything about pt_regs, and reports the PC from the stackframe, but
does not report the LR.
The LR is only guaranteed to contain the return address at function call
boundaries, and can be used as a scratch register at other times, so the
LR at an exception boundary may or may not be a legitimate return
address. It would be useful to report the LR value regardless, as it can
be helpful when debugging, and in future it will be helpful for reliable
stacktrace support.
This patch changes the way we unwind across exception boundaries,
allowing both the PC and LR to be reported. The entry code creates a
frame_record_meta structure embedded within pt_regs, which the unwinder
uses to find the pt_regs. The unwinder can then extract pt_regs::pc and
pt_regs::lr as two separate unwind steps before continuing with a
regular walk of frame records.
When a PC is unwound from pt_regs::lr, dump_backtrace() will log this
with an "L" marker so that it can be identified easily. For example,
an unwind across an exception boundary will appear as follows:
| el1h_64_irq+0x6c/0x70
| _raw_spin_unlock_irqrestore+0x10/0x60 (P)
| __aarch64_insn_write+0x6c/0x90 (L)
| aarch64_insn_patch_text_nosync+0x28/0x80
... with a (P) entry for pt_regs::pc, and an (L) entry for pt_regs:lr.
Note that the LR may be stale at the point of the exception, for example,
shortly after a return:
| el1h_64_irq+0x6c/0x70
| default_idle_call+0x34/0x180 (P)
| default_idle_call+0x28/0x180 (L)
| do_idle+0x204/0x268
... where the LR points a few instructions before the current PC.
This plays nicely with all the other unwind metadata tracking. With the
ftrace_graph profiler enabled globally, and kretprobes installed on
generic_handle_domain_irq() and do_interrupt_handler(), a backtrace triggered
by magic-sysrq + L reports:
| Call trace:
| show_stack+0x20/0x40 (CF)
| dump_stack_lvl+0x60/0x80 (F)
| dump_stack+0x18/0x28
| nmi_cpu_backtrace+0xfc/0x140
| nmi_trigger_cpumask_backtrace+0x1c8/0x200
| arch_trigger_cpumask_backtrace+0x20/0x40
| sysrq_handle_showallcpus+0x24/0x38 (F)
| __handle_sysrq+0xa8/0x1b0 (F)
| handle_sysrq+0x38/0x50 (F)
| pl011_int+0x460/0x5a8 (F)
| __handle_irq_event_percpu+0x60/0x220 (F)
| handle_irq_event+0x54/0xc0 (F)
| handle_fasteoi_irq+0xa8/0x1d0 (F)
| generic_handle_domain_irq+0x34/0x58 (F)
| gic_handle_irq+0x54/0x140 (FK)
| call_on_irq_stack+0x24/0x58 (F)
| do_interrupt_handler+0x88/0xa0
| el1_interrupt+0x34/0x68 (FK)
| el1h_64_irq_handler+0x18/0x28
| el1h_64_irq+0x6c/0x70
| default_idle_call+0x34/0x180 (P)
| default_idle_call+0x28/0x180 (L)
| do_idle+0x204/0x268
| cpu_startup_entry+0x3c/0x50 (F)
| rest_init+0xe4/0xf0
| start_kernel+0x744/0x750
| __primary_switched+0x88/0x98
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-11-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When unwinding stacks, we use unwind_consume_stack() to both find
whether an object (e.g. a frame record) is on an accessible stack *and*
to update the stack boundaries. This works fine today since we only care
about one type of object which does not overlap other objects.
In subsequent patches we'll want to check whether an object (e.g a frame
record) is on the stack and follow this up by accessing a larger object
containing the first (e.g. a pt_regs with an embedded frame record).
To make that pattern easier to implement, this patch reworks
unwind_find_next_stack() and unwind_consume_stack() so that the former
can be used to check if an object is on any accessible stack, and the
latter is purely used to update the stack boundaries.
As unwind_find_next_stack() is modified to also check the stack
currently being unwound, it is renamed to unwind_find_stack().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When analysing a stacktrace it can be useful to know whether an unwound
PC has been rewritten by fgraph or kretprobes, as in some situations
these may be suspect or be known to be unreliable.
This patch adds flags to track when an unwind entry has recovered the PC
from fgraph and/or kretprobes, and updates dump_backtrace() to log when
this is the case.
The flags recorded are:
"F" - the PC was recovered from fgraph
"K" - the PC was recovered from kretprobes
These flags are recorded and logged in addition to the original source
of the unwound PC.
For example, with the ftrace_graph profiler enabled globally, and
kretprobes installed on generic_handle_domain_irq() and
do_interrupt_handler(), a backtrace triggered by magic-sysrq + L
reports:
| Call trace:
| show_stack+0x20/0x40 (CF)
| dump_stack_lvl+0x60/0x80 (F)
| dump_stack+0x18/0x28
| nmi_cpu_backtrace+0xfc/0x140
| nmi_trigger_cpumask_backtrace+0x1c8/0x200
| arch_trigger_cpumask_backtrace+0x20/0x40
| sysrq_handle_showallcpus+0x24/0x38 (F)
| __handle_sysrq+0xa8/0x1b0 (F)
| handle_sysrq+0x38/0x50 (F)
| pl011_int+0x460/0x5a8 (F)
| __handle_irq_event_percpu+0x60/0x220 (F)
| handle_irq_event+0x54/0xc0 (F)
| handle_fasteoi_irq+0xa8/0x1d0 (F)
| generic_handle_domain_irq+0x34/0x58 (F)
| gic_handle_irq+0x54/0x140 (FK)
| call_on_irq_stack+0x24/0x58 (F)
| do_interrupt_handler+0x88/0xa0
| el1_interrupt+0x34/0x68 (FK)
| el1h_64_irq_handler+0x18/0x28
| el1h_64_irq+0x64/0x68
| default_idle_call+0x34/0x180
| do_idle+0x204/0x268
| cpu_startup_entry+0x40/0x50 (F)
| rest_init+0xe4/0xf0
| start_kernel+0x744/0x750
| __primary_switched+0x80/0x90
Note that as these flags are reported next to the recovered PC value,
they appear on the callers of instrumented functions. For example
gic_handle_irq() has a "K" marker because generic_handle_domain_irq()
was instrumented with kretprobes and had its return address rewritten.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When analysing a stacktrace it can be useful to know where an unwound PC
came from, as in some situations certain sources may be suspect or known
to be unreliable. In future it would also be useful to track this so
that certain unwind steps can be performed in a stateful manner. For
example when unwinding across an exception boundary, we'd ideally unwind
pt_regs::pc, then pt_regs::lr, then the next frame record.
This patch adds an enumerated set of unwind sources, tracks this during
the unwind, and updates dump_backtrace() to log these for interesting
unwind steps.
The interesting sources recorded are:
"C" - the PC came from the caller of an unwind function.
"T" - the PC came from thread_saved_pc() for a blocked task.
"P" - the PC came from a pt_regs::pc.
"U" - the PC came from an unknown source (indicates an unwinder error).
... with nothing recorded when the PC came from a frame_record::pc as
this is the vastly common case and logging this would make it difficult
to spot the more interesting cases.
For example, when triggering a backtrace via magic-sysrq + L, the CPU
handling the sysrq will have a backtrace whose first element is the
caller (C) of dump_backtrace():
| Call trace:
| show_stack+0x18/0x30 (C)
| dump_stack_lvl+0x60/0x80
| dump_stack+0x18/0x24
| nmi_cpu_backtrace+0xfc/0x140
| ...
... and other CPUs will have a backtrace whose first element is their
pt_regs::pc (P) at the instant the backtrace IPI was taken:
| Call trace:
| _raw_spin_unlock_irqrestore+0x8/0x50 (P)
| wake_up_process+0x18/0x24
| process_timeout+0x14/0x20
| call_timer_fn.isra.0+0x24/0x80
| ...
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-8-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently dump_backtrace() can only print the PC value at each step of
the unwind, as this is all the information that arch_stack_walk()
passes to the dump_backtrace_entry() callback.
In future we'd like to print some additional information, such as the
origin of entries (e.g. PC, LR, FP) and/or the reliability thereof.
In preparation for doing so, this patch moves dump_backtrace() over to
kunwind_stack_walk(), which passes the full kunwind_state to the
callback.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently the signal handling code has its own struct frame_record,
the definition of struct pt_regs open-codes a frame record as an array,
and the kernel unwinder hard-codes frame record offsets.
Move to a common struct frame_record that can be used throughout the
kernel.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
In subsequent patches we'll want to add an additional u64 to struct
pt_regs. To make space, this patch swaps the 'unused' and 'pmr' fields,
as the 'pmr' value only requires bits[7:0] and can safely fit into a
u32, which frees up a 64-bit unused field.
The 'lockdep_hardirqs' and 'exit_rcu' fields should eventually be moved
out of pt_regs and managed locally within entry-common.c, so I've left
those as-is for the moment.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The pt_regs::pmr_save field is weirdly named relative to all other
pt_regs fields, with a '_save' suffix that doesn't make anything clearer
and only leads to more typing to access the field.
Remove the '_save' suffix.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
For historical reasons the layout of struct pt_regs depends on the
configured endianness, with the order of the 'syscallno' and 'unused2'
fields varying dependent upon whether __AARCH64EB__ is defined. We no
longer depend on the order of these two fields and can remove the
ifdeffery.
The current conditional layout was introduced in commit:
35d0e6fb4d219d64 ("arm64: syscallno is secretly an int, make it official")
At the time, this was necessary so that the entry assembly could use a
single STP instruction to save the pt_regs::{orig_x0,syscallno} fields,
without logic that was conditional on the endianness of the kernel:
| el0_svc_naked:
| stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
This logic was converted to C in commit:
f37099b6992a0b81 ("arm64: convert syscall trace logic to C")
Since that commit, we no longer manipulate pt_regs::orig_x0 from
assembly, and only manipulate pt_regs::syscallno as a 32-bit quantity
early in the kernel_entry assembly:
| /* Not in a syscall by default (el0_svc overwrites for real syscall) */
| .if \el == 0
| mov w21, #NO_SYSCALL
| str w21, [sp, #S_SYSCALLNO]
| .endif
Given the above, there's no longer a need for the layout of
pt_regs::{syscallno,unused2} to depend on the endianness of the kernel.
This patch removes the ifdeffery and places 'syscallno' before 'unused2'
regardless of the endianess of the kernel. At the same time, 'unused2'
is renamed to 'unused', as it is the only unused field within pt_regs.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
To ensure that the stack is correctly aligned when branching to C code,
we require that struct pt_regs is a multiple of 16 bytes, as noted in a
comment.
Add an explicit assertion for this, so that any accidental violation of
this requirement will be caught by the compiler.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241017092538.1859841-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The cpu_emergency_register_virt_callback() function is used
unconditionally by the x86 kvm code, but it is declared (and defined)
conditionally:
#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
...
leading to a build error when neither KVM_INTEL nor KVM_AMD support is
enabled:
arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’:
arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration]
12517 | cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’:
arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration]
12522 | cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix the build by defining empty helper functions the same way the old
cpu_emergency_disable_virtualization() function was dealt with for the
same situation.
Maybe we could instead have made the call sites conditional, since the
callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak
fallback. I'll leave that to the kvm people to argue about, this at
least gets the build going for that particular config.
Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Chao Gao <chao.gao@intel.com>
Cc: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix TDX MMIO #VE fault handling, and add two new Intel model numbers
for 'Pantherlake' and 'Diamond Rapids'"
* tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add two Intel CPU model numbers
x86/tdx: Fix "in-kernel MMIO" check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"lockdep:
- Fix potential deadlock between lockdep and RCU (Zhiguo Niu)
- Use str_plural() to address Coccinelle warning (Thorsten Blum)
- Add debuggability enhancement (Luis Claudio R. Goncalves)
static keys & calls:
- Fix static_key_slow_dec() yet again (Peter Zijlstra)
- Handle module init failure correctly in static_call_del_module()
(Thomas Gleixner)
- Replace pointless WARN_ON() in static_call_module_notify() (Thomas
Gleixner)
<linux/cleanup.h>:
- Add usage and style documentation (Dan Williams)
rwsems:
- Move is_rwsem_reader_owned() and rwsem_owner() under
CONFIG_DEBUG_RWSEMS (Waiman Long)
atomic ops, x86:
- Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak)
- Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros
Bizjak)"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS
jump_label: Fix static_key_slow_dec() yet again
static_call: Replace pointless WARN_ON() in static_call_module_notify()
static_call: Handle module init failure correctly in static_call_del_module()
locking/lockdep: Simplify character output in seq_line()
lockdep: fix deadlock issue between lockdep and rcu
lockdep: Use str_plural() to fix Coccinelle warning
cleanup: Add usage and style documentation
lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug
locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void
locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
|
|
Merge all pending locking commits into a single branch.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull x86 kvm updates from Paolo Bonzini:
"x86:
- KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or
deleted.
This does not traditionally have particularly noticeable overhead,
but Intel's TDX will require the guest to re-accept private pages
if they are dropped from the secure EPT, which is a non starter.
Actually, the only reason why this is not already being done is a
bug which was never fully investigated and caused VM instability
with assigned GeForce GPUs, so allow userspace to opt into the new
behavior.
- Advertise AVX10.1 to userspace (effectively prep work for the
"real" AVX10 functionality that is on the horizon)
- Rework common MSR handling code to suppress errors on userspace
accesses to unsupported-but-advertised MSRs
This will allow removing (almost?) all of KVM's exemptions for
userspace access to MSRs that shouldn't exist based on the vCPU
model (the actual cleanup is non-trivial future work)
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
splits the 64-bit value into the legacy ICR and ICR2 storage,
whereas Intel (APICv) stores the entire 64-bit value at the ICR
offset
- Fix a bug where KVM would fail to exit to userspace if one was
triggered by a fastpath exit handler
- Add fastpath handling of HLT VM-Exit to expedite re-entering the
guest when there's already a pending wake event at the time of the
exit
- Fix a WARN caused by RSM entering a nested guest from SMM with
invalid guest state, by forcing the vCPU out of guest mode prior to
signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
nested guest)
- Overhaul the "unprotect and retry" logic to more precisely identify
cases where retrying is actually helpful, and to harden all retry
paths against putting the guest into an infinite retry loop
- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
rmaps in the shadow MMU
- Refactor pieces of the shadow MMU related to aging SPTEs in
prepartion for adding multi generation LRU support in KVM
- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
enabled, i.e. when the CPU has already flushed the RSB
- Trace the per-CPU host save area as a VMCB pointer to improve
readability and cleanup the retrieval of the SEV-ES host save area
- Remove unnecessary accounting of temporary nested VMCB related
allocations
- Set FINAL/PAGE in the page fault error code for EPT violations if
and only if the GVA is valid. If the GVA is NOT valid, there is no
guest-side page table walk and so stuffing paging related metadata
is nonsensical
- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
instead of emulating posted interrupt delivery to L2
- Add a lockdep assertion to detect unsafe accesses of vmcs12
structures
- Harden eVMCS loading against an impossible NULL pointer deref
(really truly should be impossible)
- Minor SGX fix and a cleanup
- Misc cleanups
Generic:
- Register KVM's cpuhp and syscore callbacks when enabling
virtualization in hardware, as the sole purpose of said callbacks
is to disable and re-enable virtualization as needed
- Enable virtualization when KVM is loaded, not right before the
first VM is created
Together with the previous change, this simplifies a lot the logic
of the callbacks, because their very existence implies
virtualization is enabled
- Fix a bug that results in KVM prematurely exiting to userspace for
coalesced MMIO/PIO in many cases, clean up the related code, and
add a testcase
- Fix a bug in kvm_clear_guest() where it would trigger a buffer
overflow _if_ the gpa+len crosses a page boundary, which thankfully
is guaranteed to not happen in the current code base. Add WARNs in
more helpers that read/write guest memory to detect similar bugs
Selftests:
- Fix a goof that caused some Hyper-V tests to be skipped when run on
bare metal, i.e. NOT in a VM
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
guest
- Explicitly include one-off assets in .gitignore. Past Sean was
completely wrong about not being able to detect missing .gitignore
entries
- Verify userspace single-stepping works when KVM happens to handle a
VM-Exit in its fastpath
- Misc cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
Documentation: KVM: fix warning in "make htmldocs"
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
selftests: kvm: s390: Add VM run test case
KVM: SVM: let alternatives handle the cases when RSB filling is required
KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
KVM: x86: Update retry protection fields when forcing retry on emulation failure
KVM: x86: Apply retry protection to "unprotect on failure" path
KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Clean up and improve vdso code: use SYM_* macros for function and
data annotations, add CFI annotations to fix GDB unwinding, optimize
the chacha20 implementation
- Add vfio-ap driver feature advertisement for use by libvirt and
mdevctl
* tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Driver feature advertisement
s390/vdso: Use one large alternative instead of an alternative branch
s390/vdso: Use SYM_DATA_START_LOCAL()/SYM_DATA_END() for data objects
tools: Add additional SYM_*() stubs to linkage.h
s390/vdso: Use macros for annotation of asm functions
s390/vdso: Add CFI annotations to __arch_chacha20_blocks_nostack()
s390/vdso: Fix comment within __arch_chacha20_blocks_nostack()
s390/vdso: Get rid of permutation constants
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Removal of dead code (TT mode leftovers, etc)
- Fixes for the network vector driver
- Fixes for time-travel mode
* tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: fix time-travel syscall scheduling hack
um: Remove outdated asm/sysrq.h header
um: Remove the declaration of user_thread function
um: Remove the call to SUBARCH_EXECVE1 macro
um: Remove unused mm_fd field from mm_id
um: Remove unused fields from thread_struct
um: Remove the redundant newpage check in update_pte_range
um: Remove unused kpte_clear_flush macro
um: Remove obsoleted declaration for execute_syscall_skas
user_mode_linux_howto_v2: add VDE vector support in doc
vector_user: add VDE support
um: remove ARCH_NO_PREEMPT_DYNAMIC
um: vector: Fix NAPI budget handling
um: vector: Replace locks guarding queue depth with atomics
um: remove variable stack array in os_rcv_fd_msg()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
- Christophe realized that the LoongArch64 instructions could be
scheduled more similar to how GCC generates code, which Ruoyao
implemented, for a 5% speedup from basically some rearrangements
- An update to MAINTAINERS to match the right files
* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
LoongArch: vDSO: Tune chacha implementation
MAINTAINERS: make vDSO getrandom matches more generic
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Fix objtool about do_syscall() and Clang
- Enable generic CPU vulnerabilites support
- Enable ACPI BGRT handling
- Rework CPU feature probe from CPUCFG/IOCSR
- Add ARCH_HAS_SET_MEMORY support
- Add ARCH_HAS_SET_DIRECT_MAP support
- Improve hardware page table walker
- Simplify _percpu_read() and _percpu_write()
- Add advanced extended IRQ model documentions
- Some bug fixes and other small changes
* tag 'loongarch-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
Docs/LoongArch: Add advanced extended IRQ model description
LoongArch: Remove posix_types.h include from sigcontext.h
LoongArch: Fix memleak in pci_acpi_scan_root()
LoongArch: Simplify _percpu_read() and _percpu_write()
LoongArch: Improve hardware page table walker
LoongArch: Add ARCH_HAS_SET_DIRECT_MAP support
LoongArch: Add ARCH_HAS_SET_MEMORY support
LoongArch: Rework CPU feature probe from CPUCFG/IOCSR
LoongArch: Enable ACPI BGRT handling
LoongArch: Enable generic CPU vulnerabilites support
LoongArch: Remove STACK_FRAME_NON_STANDARD(do_syscall)
LoongArch: Set AS_HAS_THIN_ADD_SUB as y if AS_IS_LLVM
LoongArch: Enable objtool for Clang
objtool: Handle frame pointer related instructions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"The first change by Gaosheng Cui removes unused declarations which
have been obsoleted since commit 5a4053b23262 ("sh: Kill off dead
boards.") and the second by his colleague Hongbo Li replaces the use
of the unsafe simple_strtoul() with the safer kstrtoul() function in
the sh interrupt controller driver code"
* tag 'sh-for-v6.12-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: intc: Replace simple_strtoul() with kstrtoul()
sh: Remove unused declarations for make_maskreg_irq() and irq_mask_register
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:
"A second round of Xen related changes and features:
- a small fix of the xen-pciback driver for a warning issued by
sparse
- support PCI passthrough when using a PVH dom0
- enable loading the kernel in PVH mode at arbitrary addresses,
avoiding conflicts with the memory map when running as a Xen dom0
using the host memory layout"
* tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/pvh: Add 64bit relocation page tables
x86/kernel: Move page table macros to header
x86/pvh: Set phys_base when calling xen_prepare_pvh()
x86/pvh: Make PVH entrypoint PIC for x86-64
xen: sync elfnote.h from xen tree
xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t
xen/privcmd: Add new syscall to get gsi from dev
xen/pvh: Setup gsi for passthrough device
xen/pci: Add a function to reset device for xen
|
|
no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC update from Arnd Bergmann:
"Convert ep93xx to devicetree
This concludes a long journey towards replacing the old board files
with devictree description on the Cirrus Logic EP93xx platform.
Nikita Shubin has been working on this for a long time, for details
see the last post on
https://lore.kernel.org/lkml/20240909-ep93xx-v12-0-e86ab2423d4b@maquefel.me/"
* tag 'soc-ep93xx-dt-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
dt-bindings: gpio: ep9301: Add missing "#interrupt-cells" to examples
MAINTAINERS: Update EP93XX ARM ARCHITECTURE maintainer
soc: ep93xx: drop reference to removed EP93XX_SOC_COMMON config
net: cirrus: use u8 for addr to calm down sparse
dmaengine: cirrus: use snprintf() to calm down gcc 13.3.0
dmaengine: ep93xx: Fix a NULL vs IS_ERR() check in probe()
pinctrl: ep93xx: Fix raster pins typo
spi: ep93xx: update kerneldoc comments for ep93xx_spi
clk: ep93xx: Fix off by one in ep93xx_div_recalc_rate()
clk: ep93xx: add module license
dmaengine: cirrus: remove platform code
ASoC: cirrus: edb93xx: Delete driver
ARM: ep93xx: soc: drop defines
ARM: ep93xx: delete all boardfiles
ata: pata_ep93xx: remove legacy pinctrl use
pwm: ep93xx: drop legacy pinctrl
ARM: ep93xx: DT for the Cirrus ep93xx SoC platforms
ARM: dts: ep93xx: Add EDB9302 DT
ARM: dts: ep93xx: add ts7250 board
ARM: dts: add Cirrus EP93XX SoC .dtsi
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are only two small patches, one cleanup for arch/alpha and a
preparation patch cleaning up the handling of runtime constants in the
linker scripts"
* tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
runtime constants: move list of constants to vmlinux.lds.h
alpha: no need to include asm/xchg.h twice
|
|
Pantherlake is a mobile CPU. Diamond Rapids next generation Xeon.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240923173750.16874-1-tony.luck%40intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here is the "big" set of char/misc and other driver subsystem changes
for 6.12-rc1.
Lots of changes in here, primarily dominated by the usual IIO driver
updates and additions, but there are also small driver subsystem
updates all over the place. Included in here are:
- lots and lots of new IIO drivers and updates to existing ones
- interconnect subsystem updates and new drivers
- nvmem subsystem updates and new drivers
- mhi driver updates
- power supply subsystem updates
- kobj_type const work for many different small subsystems
- comedi driver fix
- coresight subsystem and driver updates
- fpga subsystem improvements
- slimbus fixups
- binder new feature addition for "frozen" notifications
- lots and lots of other small driver updates and cleanups
All of these have been in linux-next for a long time with no reported
problems"
* tag 'char-misc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (354 commits)
greybus: gb-beagleplay: Add firmware upload API
arm64: dts: ti: k3-am625-beagleplay: Add bootloader-backdoor-gpios to cc1352p7
dt-bindings: net: ti,cc1352p7: Add bootloader-backdoor-gpios
MAINTAINERS: Update path for U-Boot environment variables YAML
nvmem: layouts: add U-Boot env layout
comedi: ni_routing: tools: Check when the file could not be opened
ocxl: Remove the unused declarations in headr file
hpet: Fix the wrong format specifier
uio: Constify struct kobj_type
cxl: Constify struct kobj_type
binder: modify the comment for binder_proc_unlock
iio: adc: axp20x_adc: add support for AXP717 ADC
dt-bindings: iio: adc: Add AXP717 compatible
iio: adc: axp20x_adc: Add adc_en1 and adc_en2 to axp_data
w1: ds2482: Drop explicit initialization of struct i2c_device_id::driver_data to 0
tools: iio: rm .*.cmd when make clean
iio: adc: standardize on formatting for id match tables
iio: proximity: aw96103: Add support for aw96103/aw96105 proximity sensor
bus: mhi: host: pci_generic: Enable EDL trigger for Foxconn modems
bus: mhi: host: pci_generic: Update EDL firmware path for Foxconn modems
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
"Here is the "big" set of tty/serial driver updates for 6.12-rc1.
Nothing major in here, just nice forward progress in the slow cleanup
of the serial apis, and lots of other driver updates and fixes.
Included in here are:
- serial api updates from Jiri to make things more uniform and sane
- 8250_platform driver cleanups
- samsung serial driver fixes and updates
- qcom-geni serial driver fixes from Johan for the bizarre UART
engine that that chip seems to have. Hopefully it's in a better
state now, but hardware designers still seem to come up with more
ways to make broken UARTS 40+ years after this all should have
finished.
- sc16is7xx driver updates
- omap 8250 driver updates
- 8250_bcm2835aux driver updates
- a few new serial driver bindings added
- other serial minor driver updates
All of these have been in linux-next for a long time with no reported
problems"
* tag 'tty-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits)
tty: serial: samsung: Fix serial rx on Apple A7-A9
tty: serial: samsung: Fix A7-A11 serial earlycon SError
tty: serial: samsung: Use bit manipulation macros for APPLE_S5L_*
tty: rp2: Fix reset with non forgiving PCIe host bridges
serial: 8250_aspeed_vuart: Enable module autoloading
serial: qcom-geni: fix polled console corruption
serial: qcom-geni: disable interrupts during console writes
serial: qcom-geni: fix console corruption
serial: qcom-geni: introduce qcom_geni_serial_poll_bitfield()
serial: qcom-geni: fix arg types for qcom_geni_serial_poll_bit()
soc: qcom: geni-se: add GP_LENGTH/IRQ_EN_SET/IRQ_EN_CLEAR registers
serial: qcom-geni: fix false console tx restart
serial: qcom-geni: fix fifo polling timeout
tty: hvc: convert comma to semicolon
mxser: convert comma to semicolon
serial: 8250_bcm2835aux: Fix clock imbalance in PM resume
serial: sc16is7xx: convert bitmask definitions to use BIT() macro
serial: sc16is7xx: fix copy-paste errors in EFR_SWFLOWx_BIT constants
serial: sc16is7xx: remove SC16IS7XX_MSR_DELTA_MASK
serial: xilinx_uartps: Make cdns_rs485_supported static
...
|
|
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.1726237595.git.legion%40kernel.org
|
|
make_maskreg_irq() and irq_mask_register have been removed since
commit 5a4053b23262 ("sh: Kill off dead boards."), so remove the
unused declarations.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
- new memblock_estimated_nr_free_pages() helper to replace
totalram_pages() which is less accurate when
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
- fixes for memblock tests
* tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
s390/mm: get estimated free pages by memblock api
kernel/fork.c: get estimated free pages by memblock api
mm/memblock: introduce a new helper memblock_estimated_nr_free_pages()
memblock test: fix implicit declaration of function 'strscpy'
memblock test: fix implicit declaration of function 'isspace'
memblock test: fix implicit declaration of function 'memparse'
memblock test: add the definition of __setup()
memblock test: fix implicit declaration of function 'virt_to_phys'
tools/testing: abstract two init.h into common include directory
memblock tests: include export.h in linkage.h as kernel dose
memblock tests: include memory_hotplug.h in mmzone.h as kernel dose
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc
Pull sparc32 update from Andreas Larsson:
- Remove an unused variable for sparc32
* tag 'sparc-for-6.12-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/alarsson/linux-sparc:
arch/sparc: remove unused varible paddrbase in function leon_swprobe()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix build error in vdso32 when building 64-bit with COMPAT=y and -Os
- Fix build error in pseries EEH when CONFIG_DEBUG_FS is not set
Thanks to Christophe Leroy, Narayana Murty N, Christian Zigotzky, and
Ritesh Harjani.
* tag 'powerpc-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/eeh: move pseries_eeh_err_inject() outside CONFIG_DEBUG_FS block
powerpc/vdso32: Fix use of crtsavres for PPC64
|
|
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up
objtool warnings), teach objtool about 'noreturn' Rust symbols and
mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we
should be objtool-warning-free, so enable it to run for all Rust
object files.
- KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support.
- Support 'RUSTC_VERSION', including re-config and re-build on
change.
- Split helpers file into several files in a folder, to avoid
conflicts in it. Eventually those files will be moved to the right
places with the new build system. In addition, remove the need to
manually export the symbols defined there, reusing existing
machinery for that.
- Relax restriction on configurations with Rust + GCC plugins to just
the RANDSTRUCT plugin.
'kernel' crate:
- New 'list' module: doubly-linked linked list for use with reference
counted values, which is heavily used by the upcoming Rust Binder.
This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed
unique for the given ID), 'AtomicTracker' (tracks whether a
'ListArc' exists using an atomic), 'ListLinks' (the prev/next
pointers for an item in a linked list), 'List' (the linked list
itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor
into a 'List' that allows to remove elements), 'ListArcField' (a
field exclusively owned by a 'ListArc'), as well as support for
heterogeneous lists.
- New 'rbtree' module: red-black tree abstractions used by the
upcoming Rust Binder.
This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a
node), 'RBTreeNodeReservation' (a memory reservation for a node),
'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor'
(bidirectional cursor that allows to remove elements), as well as
an entry API similar to the Rust standard library one.
- 'init' module: add 'write_[pin_]init' methods and the
'InPlaceWrite' trait. Add the 'assert_pinned!' macro.
- 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by
introducing an associated type in the trait.
- 'alloc' module: add 'drop_contents' method to 'BoxExt'.
- 'types' module: implement the 'ForeignOwnable' trait for
'Pin<Box<T>>' and improve the trait's documentation. In addition,
add the 'into_raw' method to the 'ARef' type.
- 'error' module: in preparation for the upcoming Rust support for
32-bit architectures, like arm, locally allow Clippy lint for
those.
Documentation:
- https://rust.docs.kernel.org has been announced, so link to it.
- Enable rustdoc's "jump to definition" feature, making its output a
bit closer to the experience in a cross-referencer.
- Debian Testing now also provides recent Rust releases (outside of
the freeze period), so add it to the list.
MAINTAINERS:
- Trevor is joining as reviewer of the "RUST" entry.
And a few other small bits"
* tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits)
kasan: rust: Add KASAN smoke test via UAF
kbuild: rust: Enable KASAN support
rust: kasan: Rust does not support KHWASAN
kbuild: rust: Define probing macros for rustc
kasan: simplify and clarify Makefile
rust: cfi: add support for CFI_CLANG with Rust
cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS
rust: support for shadow call stack sanitizer
docs: rust: include other expressions in conditional compilation section
kbuild: rust: replace proc macros dependency on `core.o` with the version text
kbuild: rust: rebuild if the version text changes
kbuild: rust: re-run Kconfig if the version text changes
kbuild: rust: add `CONFIG_RUSTC_VERSION`
rust: avoid `box_uninit_write` feature
MAINTAINERS: add Trevor Gross as Rust reviewer
rust: rbtree: add `RBTree::entry`
rust: rbtree: add cursor
rust: rbtree: add mutable iterator
rust: rbtree: add iterator
rust: rbtree: add red-black tree implementation backed by the C version
...
|
|
The PVH entry point is 32bit. For a 64bit kernel, the entry point must
switch to 64bit mode, which requires a set of page tables. In the past,
PVH used init_top_pgt.
This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the
page tables are prebuilt for this address. If the kernel is loaded at a
different address, they need to be adjusted.
__startup_64() adjusts the prebuilt page tables for the physical load
address, but it is 64bit code. The 32bit PVH entry code can't call it
to adjust the page tables, so it can't readily be re-used.
64bit PVH entry needs page tables set up for identity map, the kernel
high map and the direct map. pvh_start_xen() enters identity mapped.
Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer
into the highmap. The direct map is used for __va() on the initramfs
and other guest physical addresses.
Add a dedicated set of prebuild page tables for PVH entry. They are
adjusted in assembly before loading.
Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation
along with the kernel's loading constraints. The maximum load address,
KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt
page. It could be larger with more pages.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-6-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
The PVH entry point will need an additional set of prebuild page tables.
Move the macros and defines to pgtable_64.h, so they can be re-used.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Message-ID: <20240823193630.2583107-5-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
phys_base needs to be set for __pa() to work in xen_pvh_init() when
finding the hypercall page. Set it before calling into
xen_prepare_pvh(), which calls xen_pvh_init(). Clear it afterward to
avoid __startup_64() adding to it and creating an incorrect value.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-4-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
The PVH entrypoint is 32bit non-PIC code running the uncompressed
vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000
(16MB). The kernel is loaded at that physical address inside the VM by
the VMM software (Xen/QEMU).
When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1
into the PVH container. There exist system firmwares (Coreboot/EDK2)
with reserved memory at 16MB. This creates a conflict where the PVH
kernel cannot be loaded at that address.
Modify the PVH entrypoint to be position-indepedent to allow flexibility
in load address. Only the 64bit entry path is converted. A 32bit
kernel is not PIC, so calling into other parts of the kernel, like
xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
relocated.
This makes the code PIC, but the page tables need to be updated as well
to handle running from the kernel high map.
The UNWIND_HINT_END_OF_STACK is to silence:
vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction
after the lret into 64bit code.
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240823193630.2583107-3-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.
When assigning a device to passthrough, proactively setup the gsi
of the device during that process.
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Message-ID: <20240924061437.2636766-3-Jiqian.Chen@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support cross-compiling linux-headers Debian package and kernel-devel
RPM package
- Add support for the linux-debug Pacman package
- Improve module rebuilding speed by factoring out the common code to
scripts/module-common.c
- Separate device tree build rules into scripts/Makefile.dtbs
- Add a new script to generate modules.builtin.ranges, which is useful
for tracing tools to find symbols in built-in modules
- Refactor Kconfig and misc tools
- Update Kbuild and Kconfig documentation
* tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
kbuild: doc: replace "gcc" in external module description
kbuild: doc: describe the -C option precisely for external module builds
kbuild: doc: remove the description about shipped files
kbuild: doc: drop section numbering, use references in modules.rst
kbuild: doc: throw out the local table of contents in modules.rst
kbuild: doc: remove outdated description of the limitation on -I usage
kbuild: doc: remove description about grepping CONFIG options
kbuild: doc: update the description about Kbuild/Makefile split
kbuild: remove unnecessary export of RUST_LIB_SRC
kbuild: remove append operation on cmd_ld_ko_o
kconfig: cache expression values
kconfig: use hash table to reuse expressions
kconfig: refactor expr_eliminate_dups()
kconfig: add comments to expression transformations
kconfig: change some expr_*() functions to bool
scripts: move hash function from scripts/kconfig/ to scripts/include/
kallsyms: change overflow variable to bool type
kallsyms: squash output_address()
kbuild: add install target for modules.builtin.ranges
scripts: add verifier script for builtin module range data
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- support for PixArt PS/2 touchpad
- updates to tsc2004/5, usbtouchscreen, and zforce_ts drivers
- support for GPIO-only mode for ADP55888 controller
- support for touch keys in Zinitix driver
- support for querying density of Synaptics sensors
- sysfs interface for Goodex "Berlin" devices to read and write touch
IC registers
- more quirks to i8042 to handle various Tuxedo laptops
- a number of drivers have been converted to using "guard" notation
when acquiring various locks, as well as using other cleanup
functions to simplify releasing of resources (with more drivers to
follow)
- evdev will limit amount of data that can be written into an evdev
instance at a given time to 4096 bytes (170 input events) to avoid
holding evdev->mutex for too long and starving other users
- Spitz has been converted to use software nodes/properties to describe
its matrix keypad and GPIO-connected LEDs
- msc5000_ts, msc_touchkey and keypad-nomadik-ske drivers have been
removed since noone in mainline have been using them
- other assorted cleanups and fixes
* tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (98 commits)
ARM: spitz: fix compile error when matrix keypad driver is enabled
Input: hynitron_cstxxx - drop explicit initialization of struct i2c_device_id::driver_data to 0
Input: adp5588-keys - fix check on return code
Input: Convert comma to semicolon
Input: i8042 - add TUXEDO Stellaris 15 Slim Gen6 AMD to i8042 quirk table
Input: i8042 - add another board name for TUXEDO Stellaris Gen5 AMD line
Input: tegra-kbc - use of_property_read_variable_u32_array() and of_property_present()
Input: ps2-gpio - use IRQF_NO_AUTOEN flag in request_irq()
Input: ims-pcu - fix calling interruptible mutex
Input: zforce_ts - switch to using asynchronous probing
Input: zforce_ts - remove assert/deassert wrappers
Input: zforce_ts - do not hardcode interrupt level
Input: zforce_ts - switch to using devm_regulator_get_enable()
Input: zforce_ts - stop treating VDD regulator as optional
Input: zforce_ts - make zforce_idtable constant
Input: zforce_ts - use dev_err_probe() where appropriate
Input: zforce_ts - do not ignore errors when acquiring regulator
Input: zforce_ts - make parsing of contacts less confusing
Input: zforce_ts - switch to using get_unaligned_le16
Input: zforce_ts - use guard notation when acquiring mutexes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support using Zkr to seed KASLR
- Support IPI-triggered CPU backtracing
- Support for generic CPU vulnerabilities reporting to userspace
- A few cleanups for missing licenses
- The size limit on the XIP kernel has been removed
- Support for tracing userspace stacks
- Support for the Svvptc extension
- Various cleanups and fixes throughout the tree
* tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (47 commits)
crash: Fix riscv64 crash memory reserve dead loop
perf/riscv-sbi: Add platform specific firmware event handling
tools: Optimize ring buffer for riscv
tools: Add riscv barrier implementation
RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t
ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
riscv: Enable bitops instrumentation
riscv: Omit optimized string routines when using KASAN
ACPI: RISCV: Make acpi_numa_get_nid() to be static
riscv: Randomize lower bits of stack address
selftests: riscv: Allow mmap test to compile on 32-bit
riscv: Make riscv_isa_vendor_ext_andes array static
riscv: Use LIST_HEAD() to simplify code
riscv: defconfig: Disable RZ/Five peripheral support
RISC-V: Implement kgdb_roundup_cpus() to enable future NMI Roundup
riscv: avoid Imbalance in RAS
riscv: cacheinfo: Add back init_cache_level() function
riscv: Remove unused _TIF_WORK_MASK
drivers/perf: riscv: Remove redundant macro check
riscv: define ILLEGAL_POINTER_VALUE for 64bit
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu fixlet from Greg Ungerer:
"Only a single change, cleaning up white space in debug message"
* tag 'm68knommu-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: remove trailing space after \n newline
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Disable buggy p10 aes-gcm code on powerpc
- Fix module aliases in paes_s390
- Fix buffer overread in caam
* tag 'v6.12-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: powerpc/p10-aes-gcm - Disable CRYPTO_AES_GCM_P10
crypto: s390/paes - Fix module aliases
crypto: caam - Pad SG length when allocating hash edesc
|
|
As Christophe pointed out, tuning the chacha implementation by
scheduling the instructions like what GCC does can improve the
performance.
The tuning does not introduce too much complexity (basically it's just
reordering some instructions). And the tuning does not hurt readibility
too much: actually the tuned code looks even more similar to a
textbook-style implementation based on 128-bit vectors. So overall it's
a good deal to me.
Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
with a lower issue rate.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@csgroup.eu/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Nothing in sigcontext.h seems to require anything from
linux/posix_types.h. This include seems a MIPS relic originated from
an error in Linux 2.6.11-rc2 (in 2005).
The unneeded include was found debugging some vDSO self test build
failure (it's not the root cause though).
Link: https://lore.kernel.org/linux-mips/20240828030413.143930-2-xry111@xry111.site/
Link: https://lore.kernel.org/loongarch/0b540679ec8cfccec75aeb3463810924f6ff71e6.camel@xry111.site/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add kfree(root_ops) in this case to avoid memleak of root_ops,
leaks when pci_find_bus() != 0.
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Now _percpu_read() and _percpu_write() macros call __percpu_read()
and __percpu_write() static inline functions that result in a single
assembly instruction. However, percpu infrastructure expects its leaf
definitions to encode the size of their percpu variable, so the patch
merges all the asm clauses from the static inline function into the
corresponding leaf macros.
The secondary effect of this change is to avoid explicit __percpu
annotations for function arguments. Currently, __percpu macro is defined
in include/linux/compiler_types.h, but with proposed patch [1], __percpu
definition will need macros from include/asm-generic/percpu.h, creating
forward dependency loop.
The proposed solution is the same as x86 architecture uses.
[1] https://lore.kernel.org/lkml/20240812115945.484051-4-ubizjak@gmail.com/
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
LoongArch has similar problems explained in commit 7f0b1bf04511348995d6
("arm64: Fix barriers used for page table modifications"), when hardware
page table walker (PTW) enabled, speculative accesses may cause spurious
page fault in kernel space. Theoretically, in order to completely avoid
spurious page fault we need a "dbar + ibar" pair between the page table
modifications and the subsequent memory accesses using the corresponding
virtual address. But "ibar" is too heavy for performace, so we only use
a "dbar 0b11000" in set_pte(). And let spurious_fault() filter the rest
rare spurious page faults which should be avoided by "ibar".
Besides, we replace the llsc loop with amo in set_pte() which has better
performace, and refactor mmu_context.h to 1) avoid any load/store/branch
instructions between the writing of CSR.ASID & CSR.PGDL, 2) ensure flush
tlb operation is after updating ASID.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add set_direct_map_*() functions for setting the direct map alias for
the page to its default permissions and to an invalid state that cannot
be cached in a TLB. (See d253ca0c3 ("x86/mm/cpa: Add set_direct_map_*()
functions")) Add a similar implementation for LoongArch.
This fixes the KFENCE warnings during hibernation:
==================================================================
BUG: KFENCE: invalid read in swsusp_save+0x368/0x4d8
Invalid read at 0x00000000f7b89a3c:
swsusp_save+0x368/0x4d8
hibernation_snapshot+0x3f0/0x4e0
hibernate+0x20c/0x440
state_store+0x128/0x140
kernfs_fop_write_iter+0x160/0x260
vfs_write+0x2c0/0x520
ksys_write+0x74/0x160
do_syscall+0xb0/0x160
CPU: 0 UID: 0 PID: 812 Comm: bash Tainted: G B 6.11.0-rc1+ #1566
Tainted: [B]=BAD_PAGE
Hardware name: Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0 10/21/2022
==================================================================
Note: We can only set permissions for KVRANGE/XKVRANGE kernel addresses.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|