Age | Commit message (Collapse) | Author |
|
Convert FAR_ELx to automatic register generation as per DDI0487H.a. In the
architecture these registers have a single field "named" as "Faulting
Virtual Address for synchronous exceptions taken to ELx" occupying the
entire register, in order to fit in with the requirement to describe the
contents of the register I have created a single field named ADDR.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220520161639.324236-7-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Convert DACR32_EL2 to automatic register generation as per DDI0487H.a, no
functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220520161639.324236-6-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Convert CSSELR_EL1 to automatic generation as per DDI0487H.a, no functional
change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220520161639.324236-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
With shadow paging enabled, the INVPCID instruction results in a call
to kvm_mmu_invpcid_gva. If INVPCID is executed with CR0.PG=0, the
invlpg callback is not set and the result is a NULL pointer dereference.
Fix it trivially by checking for mmu->invlpg before every call.
There are other possibilities:
- check for CR0.PG, because KVM (like all Intel processors after P5)
flushes guest TLB on CR0.PG changes so that INVPCID/INVLPG are a
nop with paging disabled
- check for EFER.LMA, because KVM syncs and flushes when switching
MMU contexts outside of 64-bit mode
All of these are tricky, go for the simple solution. This is CVE-2022-1789.
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Convert the CPACR system register definitions to be automatically generated
using the definitions in DDI0487H.a. The kernel does have some additional
definitions for subfields of SMEN, FPEN and ZEN which are not identified as
distinct subfields in the architecture so the definitions are not updated
as part of this patch.
No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220520161639.324236-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Convert the various CONTEXTIDR_ELx register definitions to be automatically
generated following the definitions in DDI0487H.a. No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220520161639.324236-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Convert CLIDR_EL1 to be automatically generated with definition as per
DDI0487H.a. No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220520161639.324236-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
asm-generic: New generic ticket-based spinlock
This contains a new ticket-based spinlock that uses only generic
atomics and doesn't require as much from the memory system as qspinlock
does in order to be fair. It also includes a bit of documentation about
the qspinlock and qrwlock fairness requirements.
This will soon be used by a handful of architectures that don't meet the
qspinlock requirements.
* tag 'generic-ticket-spinlocks-v6':
csky: Move to generic ticket-spinlock
RISC-V: Move to queued RW locks
RISC-V: Move to generic spinlocks
openrisc: Move to ticket-spinlock
asm-generic: qrwlock: Document the spinlock fairness requirements
asm-generic: qspinlock: Indicate the use of mixed-size atomics
asm-generic: ticket-lock: New generic ticket-based spinlock
|
|
In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long,
but is used as u64, which is wrong for i386, and has been spotted by
LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with
hweight64()"
https://lore.kernel.org/lkml/20220510154750.212913-12-yury.norov@gmail.com/
But it's wrong even without that patch because now bitmap_weight()
dereferences a word after valid_bank_mask on i386.
>> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type
+[-Wshift-count-overflow]
21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
| ^~
include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8'
10 | ((!!((w) & (1ULL << 0))) + \
| ^
include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16'
20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16))
| ^~~~~~~~~~~~~~~~~
include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32'
21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32))
| ^~~~~~~~~~~~~~~~~
include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64'
29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w))
| ^~~~~~~~~~~~~~~~~
arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64'
1983 | if (hc->var_cnt != hweight64(valid_bank_mask))
| ^~~~~~~~~
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Jim Mattson <jmattson@google.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Sean Christopherson <seanjc@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Wanpeng Li <wanpengli@tencent.com>
CC: kvm@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: x86@kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Message-Id: <20220519171504.1238724-1-yury.norov@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If user space uses a memop to emulate an instruction and that
memop fails, the execution of the instruction ends.
Instruction execution can end in different ways, one of which is
suppression, which requires that the instruction execute like a no-op.
A writing memop that spans multiple pages and fails due to key
protection may have modified guest memory, as a result, the likely
correct ending is termination. Therefore, do not indicate a
suppressing instruction ending in this case.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220512131019.2594948-2-scgl@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
This patch adds a new miscdevice to expose some Ultravisor functions
to userspace. Userspace can send IOCTLs to the uvdevice that will then
emit a corresponding Ultravisor Call and hands the result over to
userspace. The uvdevice is available if the Ultravisor Call facility is
present.
Userspace can call the Retrieve Attestation Measurement
Ultravisor Call using IOCTLs on the uvdevice.
The uvdevice will do some sanity checks first.
Then, copy the request data to kernel space, build the UVCB,
perform the UV call, and copy the result back to userspace.
Signed-off-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/kvm/20220516113335.338212-1-seiden@linux.ibm.com/
Message-Id: <20220516113335.338212-1-seiden@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com> (whitespace and tristate fixes, pick)
|
|
For EABI stack unwinding, when loading .ko module
the EXIDX sections will be added to a unwind_table list.
However not all EXIDX sections are added because EXIDX
sections are searched by hardcoded section names.
For functions in other sections such as .ref.text
or .kprobes.text, gcc generates seprated EXIDX sections
(such as .ARM.exidx.ref.text or .ARM.exidx.kprobes.text).
These extra EXIDX sections are not loaded, so when unwinding
functions in these sections, we will failed with:
unwind: Index not found xxx
To fix that, I refactor the code for searching and adding
EXIDX sections:
- Check section type to search EXIDX tables (0x70000001)
instead of strcmp() the hardcoded names. Then find the
corresponding text sections by their section names.
- Add a unwind_table list in module->arch to save their own
unwind_table instead of the fixed-lenth array.
- Save .ARM.exidx.init.text section ptr, because it should
be cleaned after module init.
Now all EXIDX sections of .ko can be added correctly.
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Enable the workaround for the 764319 Cortex A-9 erratum.
CP14 read accesses to the DBGPRSR and DBGOSLSR registers generate an
unexpected Undefined Instruction exception when the DBGSWENABLE external
pin is set to 0, even when the CP14 accesses are performed from a
privileged mode. The work around catches the exception in a way
the kernel does not stop execution with the use of undef_hook. This
has been found to effect the HPE GXP SoC.
Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
The assembler does not permit 'LDR PC, <sym>' when the symbol lives in a
different section, which is why we have been relying on rather fragile
open-coded arithmetic to load the address of the vector_swi routine into
the program counter using a single LDR instruction in the SWI slot in
the vector table. The literal was moved to a different section to in
commit 19accfd373847 ("ARM: move vector stubs") to ensure that the
vector stubs page does not need to be mapped readable for user space,
which is the case for the vector page itself, as it carries the kuser
helpers as well.
So the cross-section literal load is open-coded, and this relies on the
address of vector_swi to be at the very start of the vector stubs page,
and we won't notice if we got it wrong until booting the kernel and see
it break. Fortunately, it was guaranteed to break, so this was fragile
but not problematic.
Now that we have added two other variants of the vector table, we have 3
occurrences of the same trick, and so the size of our ISA/compiler/CPU
validation space has tripled, in a way that may cause regressions to only
be observed once booting the image in question on a CPU that exercises a
particular vector table.
So let's switch to true cross section references, and let the linker fix
them up like it fixes up all the other cross section references in the
vector page.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
In order to minimize potential confusion regarding numbered labels
appearing in a different order in the assembler output due to the use of
subsections, use a named local label to jump back into the vector
handler code from the associated loop8 mitigation sequence.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
The loop8 mitigation for Spectre-BHB only requires a CPU local DSB
rather than a systemwide one, which is much more costly. And by the same
reasoning as why it is justified to omit the ISB after BPIALL, we can
also elide the ISB and rely on the exception return for the context
synchronization.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
The BPIALL mitigation for Spectre-BHB adds a single instruction to the
handler sequence that doesn't clobber any registers. Given that these
sequences are 10 instructions long, they don't fit neatly into a
cacheline anyway, so we can simply move that single instruction to the
start of the unmitigated one, and rearrange the symbol names accordingly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
ARMv7 has MOVW/MOVT instruction pairs to load symbol addresses into
registers without having to rely on literal loads that go via the
D-cache. For older cores, we now support a similar arrangement, based
on PC-relative group relocations.
This means we can elide most literal loads entirely from the entry path,
by switching to the ldr_va macro to emit the appropriate sequence
depending on the target architecture revision.
While at it, switch to the bl_r macro for invoking the right PABT/DABT
helpers instead of setting the LR register explicitly, which does not
play well with cores that speculate across function returns.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
When CONFIG_SMP is not defined, the CPU offset is always zero, and so
we can simplify the sequence to load a per-CPU variable.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
When returning from the compare function the u64 is truncated to an
int. This results in a loss of the high nybble[1] in the event select
and its sign if that nybble is in use. Switch from using a result that
can end up being truncated to a result that can only be: 1, 0, -1.
[1] bits 35:32 in the event select register and bits 11:8 in the event
select.
Fixes: 7ff775aca48ad ("KVM: x86/pmu: Use binary search to check filtered events")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220517051238.2566934-1-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Because build-testing is over-rated, fix a few trivial objtool complaints:
vmlinux.o: warning: objtool: __tdx_module_call+0x3e: missing int3 after ret
vmlinux.o: warning: objtool: __tdx_hypercall+0x6e: missing int3 after ret
Fixes: eb94f1b6a70a ("x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220520083839.GR2578@worktop.programming.kicks-ass.net
|
|
Remove empty files which were supposed to get removed with the
respective commits removing the functionality in them:
$ find arch/x86/ -empty
arch/x86/lib/mmx_32.c
arch/x86/include/asm/fpu/internal.h
arch/x86/include/asm/mmx.h
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220520101723.12006-1-bp@alien8.de
|
|
Commit
47f33de4aafb ("x86/sev: Mark the code returning to user space as syscall gap")
added a bunch of text references without annotating them, resulting in a
spree of objtool complaints:
vmlinux.o: warning: objtool: vc_switch_off_ist+0x77: relocation to !ENDBR: entry_SYSCALL_64+0x15c
vmlinux.o: warning: objtool: vc_switch_off_ist+0x8f: relocation to !ENDBR: entry_SYSCALL_compat+0xa5
vmlinux.o: warning: objtool: vc_switch_off_ist+0x97: relocation to !ENDBR: .entry.text+0x21ea
vmlinux.o: warning: objtool: vc_switch_off_ist+0xef: relocation to !ENDBR: .entry.text+0x162
vmlinux.o: warning: objtool: __sev_es_ist_enter+0x60: relocation to !ENDBR: entry_SYSCALL_64+0x15c
vmlinux.o: warning: objtool: __sev_es_ist_enter+0x6c: relocation to !ENDBR: .entry.text+0x162
vmlinux.o: warning: objtool: __sev_es_ist_enter+0x8a: relocation to !ENDBR: entry_SYSCALL_compat+0xa5
vmlinux.o: warning: objtool: __sev_es_ist_enter+0xc1: relocation to !ENDBR: .entry.text+0x21ea
Since these text references are used to compare against IP, and are not
an indirect call target, they don't need ENDBR so annotate them away.
Fixes: 47f33de4aafb ("x86/sev: Mark the code returning to user space as syscall gap")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220520082604.GQ2578@worktop.programming.kicks-ass.net
|
|
Add an explicit dependency to the respective CPU vendor so that the
respective microcode support for it gets built only when that support is
enabled.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/8ead0da9-9545-b10d-e3db-7df1a1f219e4@infradead.org
|
|
Currently, there is no provision for vmm (qemu-kvm or kvmtool) to
query about multiple-letter ISA extensions. The config register
is only used for base single letter ISA extensions.
A new ISA extension register is added that will allow the vmm
to query about any ISA extension one at a time. It is enabled for
both single letter or multi-letter ISA extensions. The ISA extension
register is useful to if the vmm requires to retrieve/set single
extension while the config register should be used if all the base
ISA extension required to retrieve or set.
For any multi-letter ISA extensions, the new register interface
must be used.
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
On RISC-V platforms with hardware VMID support, we share same
VMID for all VCPUs of a particular Guest/VM. This means we might
have stale G-stage TLB entries on the current Host CPU due to
some other VCPU of the same Guest which ran previously on the
current Host CPU.
To cleanup stale TLB entries, we simply flush all G-stage TLB
entries by VMID whenever underlying Host CPU changes for a VCPU.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The generic KVM has support for VCPU requests which can be used
to do arch-specific work in the run-loop. We introduce remote
HFENCE functions which will internally use VCPU requests instead
of host SBI calls.
Advantages of doing remote HFENCEs as VCPU requests are:
1) Multiple VCPUs of a Guest may be running on different Host CPUs
so it is not always possible to determine the Host CPU mask for
doing Host SBI call. For example, when VCPU X wants to do HFENCE
on VCPU Y, it is possible that VCPU Y is blocked or in user-space
(i.e. vcpu->cpu < 0).
2) To support nested virtualization, we will be having a separate
shadow G-stage for each VCPU and a common host G-stage for the
entire Guest/VM. The VCPU requests based remote HFENCEs helps
us easily synchronize the common host G-stage and shadow G-stage
of each VCPU without any additional IPI calls.
This is also a preparatory patch for upcoming nested virtualization
support where we will be having a shadow G-stage page table for
each Guest VCPU.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Currently, the KVM_MAX_VCPUS value is 16384 for RV64 and 128
for RV32.
The KVM_MAX_VCPUS value is too high for RV64 and too low for
RV32 compared to other architectures (e.g. x86 sets it to 1024
and ARM64 sets it to 512). The too high value of KVM_MAX_VCPUS
on RV64 also leads to VCPU mask on stack consuming 2KB.
We set KVM_MAX_VCPUS to 1024 for both RV64 and RV32 to be
aligned other architectures.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Various __kvm_riscv_hfence_xyz() functions implemented in the
kvm/tlb.S are equivalent to corresponding HFENCE.GVMA instructions
and we don't have range based local HFENCE functions.
This patch provides complete set of local HFENCE functions which
supports range based TLB invalidation and supports HFENCE.VVMA
based functions. This is also a preparatory patch for upcoming
Svinval support in KVM RISC-V.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
We should treat SBI HFENCE calls as NOPs until nested virtualization
is supported by KVM RISC-V. This will help us test booting a hypervisor
under KVM RISC-V.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Latest QEMU supports G-stage Sv57x4 mode so this patch extends KVM
RISC-V G-stage handling to detect and use Sv57x4 mode when available.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The two-stage address translation defined by the RISC-V privileged
specification defines: VS-stage (guest virtual address to guest
physical address) programmed by the Guest OS and G-stage (guest
physical addree to host physical address) programmed by the
hypervisor.
To align with above terminology, we replace "stage2" with "gstage"
and "Stage2" with "G-stage" name everywhere in KVM RISC-V sources.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Fix reg address typo in the gpio1 stanza.
Signed-off-by: Conor Paxton <conor.paxton@microchip.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Link: https://lore.kernel.org/r/20220517104058.2004734-1-conor.paxton@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch set implements kexec_file_load() for RISC-V, which is
currently only allowed on rv64 due to some minor build issues on 32-bit
platforms in the generic code. This allows users to kexec() using an FD
as opposed to a buffer.
Link: https://lore.kernel.org/all/20220408100914.150110-1-lizhengyu3@huawei.com/
* palmer/riscv-kexec_file:
RISC-V: Load purgatory in kexec_file
RISC-V: Add purgatory
RISC-V: Support for kexec_file on panic
RISC-V: Add kexec_file support
RISC-V: use memcpy for kexec_file mode
kexec_file: Fix kexec_file.c build error for riscv platform
|
|
Add a device tree for the n6000 instantiation of Agilex
Hard Processor System (HPS).
Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
Some firmware includes unusable space (host bridge registers, hidden PCI
device BARs, etc) in PCI host bridge _CRS. As far as we know, there's
nothing in the ACPI, UEFI, or PCI Firmware spec that requires the OS to
remove E820 reserved regions from _CRS, so this seems like a firmware
defect.
As a workaround, 4dc2287c1805 ("x86: avoid E820 regions when allocating
address space") has clipped out the unusable space in the past. This is
required for machines like the following:
- Dell Precision T3500 (the original motivator for 4dc2287c1805); see
https://bugzilla.kernel.org/show_bug.cgi?id=16228
- Asus C523NA (Coral) Chromebook; see
https://lore.kernel.org/all/4e9fca2f-0af1-3684-6c97-4c35befd5019@redhat.com/
- Lenovo ThinkPad X1 Gen 2; see:
https://bugzilla.redhat.com/show_bug.cgi?id=2029207
But other firmware supplies E820 reserved regions that cover entire _CRS
windows, and clipping throws away the entire window, leaving none for
hot-added or uninitialized devices. This clipping breaks a whole range of
Lenovo IdeaPads, Yogas, Yoga Slims, and notebooks, as well as Acer Spin 5
and Clevo X170KM-G Barebone machines.
E820 reserved entries that cover a memory-mapped PCI host bridge, including
its registers and memory/IO windows, are probably *not* a firmware defect.
Per ACPI v5.4, sec 15.2, the E820 memory map may include:
Address ranges defined for baseboard memory-mapped I/O devices, such as
APICs, are returned as reserved.
Disable the E820 clipping by default for all post-2022 machines. We
already have quirks to disable clipping for pre-2023 machines, and we'll
likely need quirks to *enable* clipping for post-2022 machines that
incorrectly include unusable space in _CRS, including Chromebooks and
Lenovo ThinkPads.
Here's the rationale for doing this. If we do nothing, and continue
clipping by default:
- Future systems like the Lenovo IdeaPads, Yogas, etc, Acer Spin, and
Clevo Barebones will require new quirks to disable clipping.
- The problem here is E820 entries that cover entire _CRS windows that
should not be clipped out.
- I think these E820 entries are legal per spec, and it would be hard to
get BIOS vendors to change them.
- We will discover new systems that need clipping disabled piecemeal as
they are released.
- Future systems like Lenovo X1 Carbon and the Chromebooks (probably
anything using coreboot) will just work, even though their _CRS is
incorrect, so we will not notice new ones that rely on the clipping.
- BIOS updates will not require new quirks unless they change the DMI
model string.
If we add the date check in this commit that disables clipping, e.g., "no
clipping when date >= 2023":
- Future systems like Lenovo *IIL*, Acer Spin, and Clevo Barebones will
just work without new quirks.
- Future systems like Lenovo X1 Carbon and the Chromebooks will require
new quirks to *enable* clipping.
- The problem here is that _CRS contains regions that are not usable by
PCI devices, and we rely on the E820 kludge to clip them out.
- I think this use of E820 is clearly a firmware bug, so we have a
fighting chance of getting it changed eventually.
- BIOS updates after the cutoff date *will* require quirks, but only for
systems like Lenovo X1 Carbon and Chromebooks that we already think
have broken firmware.
It seems to me like it's better to add quirks for firmware that we think is
broken than for firmware that seems unusual but correct.
[bhelgaas: comment and commit log]
Link: https://lore.kernel.org/linux-pci/20220518220754.GA7911@bhelgaas/
Link: https://lore.kernel.org/r/20220519152150.6135-4-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>
|
|
To avoid unusable space that some firmware includes in PCI host bridge
_CRS, Linux currently excludes E820 reserved regions from _CRS windows; see
4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
However, some systems supply E820 reserved regions that cover the entire
memory window from _CRS, so clipping them out leaves no space for hot-added
or uninitialized PCI devices.
For example, from a Lenovo IdeaPad 3 15IIL 81WE:
BIOS-e820: [mem 0x4bc50000-0xcfffffff] reserved
pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
pci 0000:00:15.0: BAR 0: [mem 0x00000000-0x00000fff 64bit]
pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit]
Add quirks to disable the E820 clipping for machines known to do this.
A single DMI_PRODUCT_VERSION "IIL" quirk matches all the below:
Lenovo IdeaPad 3 14IIL05
Lenovo IdeaPad 3 15IIL05
Lenovo IdeaPad 3 17IIL05
Lenovo IdeaPad 5 14IIL05
Lenovo IdeaPad 5 15IIL05
Lenovo IdeaPad Slim 7 14IIL05
Lenovo IdeaPad Slim 7 15IIL05
Lenovo IdeaPad S145-15IIL
Lenovo IdeaPad S340-14IIL
Lenovo IdeaPad S340-15IIL
Lenovo IdeaPad C340-15IIL
Lenovo BS145-15IIL
Lenovo V14-IIL
Lenovo V15-IIL
Lenovo V17-IIL
Lenovo Yoga C940-14IIL
Lenovo Yoga S740-14IIL
Lenovo Yoga Slim 7 14IIL05
Lenovo Yoga Slim 7 15IIL05
in addition to the following that don't actually need it because they have
no E820 reserved regions that overlap _CRS windows:
Lenovo IdeaPad Flex 5 14IIL05
Lenovo IdeaPad Flex 5 15IIL05
Lenovo ThinkBook 14-IIL
Lenovo ThinkBook 15-IIL
Lenovo Yoga S940-14IIL
Other quirks match these:
Acer Spin 5 (SP513-54N)
Clevo X170KM-G Barebone
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459 Lenovo Yoga C940-14IIL
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214259 Clevo X170KM Barebone
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Lenovo IdeaPad 3 15IIL05
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1871793 Lenovo IdeaPad 5 14IIL05
Link: https://bugs.launchpad.net/bugs/1878279 Lenovo IdeaPad 5 14IIL05
Link: https://bugs.launchpad.net/bugs/1880172 Lenovo IdeaPad 3 14IIL05
Link: https://bugs.launchpad.net/bugs/1884232 Acer Spin SP513-54N
Link: https://bugs.launchpad.net/bugs/1921649 Lenovo IdeaPad S145
Link: https://bugs.launchpad.net/bugs/1931715 Lenovo IdeaPad S145
Link: https://bugs.launchpad.net/bugs/1932069 Lenovo BS145-15IIL
Link: https://lore.kernel.org/r/20220519152150.6135-3-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>
|
|
Commit 1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only
counting with SVM disabled") addresses an issue in which the
Host-Only bit in the counter control registers needs to be
masked off when SVM is not enabled.
The events need to be reloaded whenever SVM is enabled or
disabled for a CPU and this requires the PERF_CTL registers
to be reprogrammed using {enable,disable}_all(). However,
PerfMonV2 variants of these functions do not reprogram the
PERF_CTL registers. Hence, the legacy enable_all() function
should also be called.
Fixes: 9622e67e3980 ("perf/x86/amd/core: Add PerfMonV2 counter control")
Reported-by: Like Xu <likexu@tencent.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220518084327.464005-1-sandipan.das@amd.com
|
|
With CONFIG_GENERIC_BUG_RELATIVE_POINTERS, the addr/file relative
pointers are calculated weirdly: based on the beginning of the bug_entry
struct address, rather than their respective pointer addresses.
Make the relative pointers less surprising to both humans and tools by
calculating them the normal way.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Sven Schnelle <svens@linux.ibm.com> # s390
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/f0e05be797a16f4fc2401eeb88c8450dcbe61df6.1652362951.git.jpoimboe@kernel.org
|
|
A panic was reported in the init process on AMD:
Run /sbin/init as init process
init[1]: segfault at f7fd5ca0 ip 00000000f7f5bbc7 sp 00000000ffa06aa0 error 7 in libc.so[f7f51000+4e000]
Code: 8a 44 24 10 88 41 ff 8b 44 24 10 83 c4 2c 5b 5e 5f 5d c3 53 83 ec 08 8b 5c 24 10 81 fb 00 f0 ff ff 76 0c e8 ba dc ff ff f7 db <89> 18 83 cb ff 83 c4 08 89 d8 5b c3 e8 81 60 ff ff 05 28 84 07 00
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
CPU: 1 PID: 1 Comm: init Tainted: G W 5.18.0-rc7-next-20220519 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
panic+0x10f/0x28d
do_exit.cold+0x18/0x48
do_group_exit+0x2e/0xb0
get_signal+0xb6d/0xb80
arch_do_signal_or_restart+0x31/0x760
? show_opcodes.cold+0x1c/0x21
? force_sig_fault+0x49/0x70
exit_to_user_mode_prepare+0x131/0x1a0
irqentry_exit_to_user_mode+0x5/0x30
asm_exc_page_fault+0x27/0x30
RIP: 0023:0xf7f5bbc7
Code: 8a 44 24 10 88 41 ff 8b 44 24 10 83 c4 2c 5b 5e 5f 5d c3 53 83 ec 08 8b 5c 24 10 81 fb 00 f0 ff ff 76 0c e8 ba dc ff ff f7 db <89> 18 83 cb ff 83 c4 08 89 d8 5b c3 e8 81 60 ff ff 05 28 84 07 00
RSP: 002b:00000000ffa06aa0 EFLAGS: 00000217
RAX: 00000000f7fd5ca0 RBX: 000000000000000c RCX: 0000000000001000
RDX: 0000000000000001 RSI: 00000000f7fd5b60 RDI: 00000000f7fd5b60
RBP: 00000000f7fd1c1c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
The task's CX register got corrupted by commit 8c42819b61b8 ("x86/entry:
Use PUSH_AND_CLEAR_REGS for compat"), which overlooked the fact that
compat SYSCALL apparently stores the user's CX value in BP.
Before that commit, CX was saved from its stashed value in BP:
pushq %rbp /* pt_regs->cx (stashed in bp) */
But then it got changed to:
pushq %rcx /* pt_regs->cx */
So the wrong value got saved and later restored back to the user. Fix
it by pushing the correct value again (BP) for regs->cx.
Fixes: 8c42819b61b8 ("x86/entry: Use PUSH_AND_CLEAR_REGS for compat")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.kernel.org/r/b5a26592c9dd60bbacdf97974a7433fd802a5593.1652985970.git.jpoimboe@kernel.org
|
|
If CONFIG_PGTABLE_LEVELS=2 and CONFIG_ARCH_SUPPORTS_PAGE_TABLE_CHECK=y,
then we trigger a compile error:
error: implicit declaration of function 'pte_user_accessible_page'
Move the definition of page table check helper out of branch
CONFIG_PGTABLE_LEVELS > 2
Link: https://lkml.kernel.org/r/20220517074548.2227779-3-tongtiangen@huawei.com
Fixes: daf214c14dbe ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Guohanjun <guohanjun@huawei.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Two page table check related issues have been fixed here.
1. Open CONFIG_PAGE_TABLE_CHECK in riscv32, we got a compile error[1]:
error: implicit declaration of function 'pud_leaf'
Add pud_leaf() definition to incluce/asm-generic/pgtable-nopmd.h to fix
this issue.
2. Keep consistent with other pud_xxx() helpers, move pud_user() to
pgtable-64.h and add pud_user() to pgtable-nopmd.h.
[1]https://lore.kernel.org/linux-mm/202205161811.2nLxmN2O-lkp@intel.com/T/
Link: https://lkml.kernel.org/r/20220517074548.2227779-2-tongtiangen@huawei.com
Fixes: 856eed79f8d3 ("riscv/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Guohanjun <guohanjun@huawei.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Some firmware supplies PCI host bridge _CRS that includes address space
unusable by PCI devices, e.g., space occupied by host bridge registers or
used by hidden PCI devices.
To avoid this unusable space, Linux currently excludes E820 reserved
regions from _CRS windows; see 4dc2287c1805 ("x86: avoid E820 regions when
allocating address space").
However, this use of E820 reserved regions to clip things out of _CRS is
not supported by ACPI, UEFI, or PCI Firmware specs, and some systems have
E820 reserved regions that cover the entire memory window from _CRS.
4dc2287c1805 clips the entire window, leaving no space for hot-added or
uninitialized PCI devices.
For example, from a Lenovo IdeaPad 3 15IIL 81WE:
BIOS-e820: [mem 0x4bc50000-0xcfffffff] reserved
pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
pci 0000:00:15.0: BAR 0: [mem 0x00000000-0x00000fff 64bit]
pci 0000:00:15.0: BAR 0: no space for [mem size 0x00001000 64bit]
Future patches will add quirks to enable/disable E820 clipping
automatically.
Add a "pci=no_e820" kernel command line option to disable clipping with
E820 reserved regions. Also add a matching "pci=use_e820" option to enable
clipping with E820 reserved regions if that has been disabled by default by
further patches in this patch-set.
Both options taint the kernel because they are intended for debugging and
workaround purposes until a quirk can set them automatically.
[bhelgaas: commit log, add printk]
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Lenovo IdeaPad 3
Link: https://lore.kernel.org/r/20220519152150.6135-2-hdegoede@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Benoit Grégoire <benoitg@coeus.ca>
Cc: Hui Wang <hui.wang@canonical.com>
|
|
This patch supports kexec_file to load and relocate purgatory.
It works well on riscv64 QEMU, being tested with devmem.
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-7-lizhengyu3@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch adds purgatory, the name and concept have been taken
from kexec-tools. Purgatory runs between two kernels, and do
verify sha256 hash to ensure the kernel to jump to is fine and
has not been corrupted after loading. Makefile is modified based
on x86 platform.
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-6-lizhengyu3@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch adds support for loading a kexec on panic (kdump) kernel.
It has been tested with vmcore-dmesg on riscv64 QEMU on both an smp
and a non-smp system.
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-5-lizhengyu3@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch adds support for kexec_file on RISC-V. I tested it on riscv64
QEMU with busybear-linux and single core along with the OpenSBI firmware
fw_jump.bin for generic platform.
On SMP system, it depends on CONFIG_{HOTPLUG_CPU, RISCV_SBI} to
resume/stop hart through OpenSBI firmware, it also needs a OpenSBI that
support the HSM extension.
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Li Zhengyu <lizhengyu3@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-4-lizhengyu3@huawei.com
[Palmer: Make 64-bit only]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The pointer to buffer loading kernel binaries is in kernel space for
kexec_fil mode, When copy_from_user copies data from pointer to a block
of memory, it checkes that the pointer is in the user space range, on
RISCV-V that is:
static inline bool __access_ok(unsigned long addr, unsigned long size)
{
return size <= TASK_SIZE && addr <= TASK_SIZE - size;
}
and TASK_SIZE is 0x4000000000 for 64-bits, which now causes
copy_from_user to reject the access of the field 'buf' of struct
kexec_segment that is in range [CONFIG_PAGE_OFFSET - VMALLOC_SIZE,
CONFIG_PAGE_OFFSET), is invalid user space pointer.
This patch fixes this issue by skipping access_ok(), use mempcy() instead.
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Link: https://lore.kernel.org/r/20220408100914.150110-3-lizhengyu3@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Fixes dtbs_check warnings like:
dma@3000000: $nodename:0: 'dma@3000000' does not match '^dma-controller(@.*)?$'
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20220407193856.18223-1-krzysztof.kozlowski@linaro.org
Fixes: c5ab54e9945b ("riscv: dts: add support for PDMA device of HiFive Unleashed Rev A00")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|