Age | Commit message (Collapse) | Author |
|
In order to better accommodate the upcoming Family 19h, given
the 80-char line limit, move the existing code into a new
l3_thread_slice_mask() function.
No functional changes.
[ bp: Touchups. ]
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313231024.17601-1-kim.phillips@amd.com
|
|
... to find whether there are northbridges present on the
system. Convert the last forgotten user and therefore, unexport
amd_nb_misc_ids[] too.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20200316150725.925-1-bp@alien8.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two fixes for x86:
- Map EFI runtime service data as encrypted when SEV is enabled.
Otherwise e.g. SMBIOS data cannot be properly decoded by dmidecode.
- Remove the warning in the vector management code which triggered
when a managed interrupt affinity changed outside of a CPU hotplug
operation.
The warning was correct until the recent core code change that
introduced a CPU isolation feature which needs to migrate managed
interrupts away from online CPUs under certain conditions to
achieve the isolation"
* tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vector: Remove warning on managed interrupt migration
x86/ioremap: Map EFI runtime services data as encrypted for SEV
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A pile of perf fixes:
Kernel side:
- AMD uncore driver: Replace the open coded sanity check with the
core variant, which provides the correct error code and also leaves
a hint in dmesg
Tooling:
- Fix the stdio input handling with glibc versions >= 2.28
- Unbreak the futex-wake benchmark which was reduced to 0 test
threads due to the conversion to cpumaps
- Initialize sigaction structs before invoking sys_sigactio()
- Plug the mapfile memory leak in perf jevents
- Fix off by one relative directory includes
- Fix an undefined string comparison in perf diff"
* tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
tools: Fix off-by 1 relative directory includes
perf jevents: Fix leak of mapfile memory
perf bench: Clear struct sigaction before sigaction() syscall
perf bench futex-wake: Restore thread count default to online CPU count
perf top: Fix stdio interface input handling with glibc 2.28+
perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare
perf symbols: Don't try to find a vmlinux file when looking for kernel modules
perf bench: Share some global variables to fix build with gcc 10
perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files
perf env: Do not return pointers to local variables
perf tests bp_account: Make global variable static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
"Two RAS related fixes:
- Shut down the per CPU thermal throttling poll work properly when a
CPU goes offline.
The missing shutdown caused the poll work to be migrated to a
unbound worker which triggered warnings about the usage of
smp_processor_id() in preemptible context
- Fix the PPIN feature initialization which missed to enable the
functionality when PPIN_CTL was enabled but the MSR locked against
updates"
* tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix logic and comments around MSR_PPIN_CTL
x86/mce/therm_throt: Undo thermal polling properly on CPU offline
|
|
Pull kvm fixes from Paolo Bonzini:
"Bugfixes for x86 and s390"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs
KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect
KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1
KVM: s390: Also reset registers in sync regs for initial cpu reset
KVM: fix Kconfig menu text for -Werror
KVM: x86: remove stale comment from struct x86_emulate_ctxt
KVM: x86: clear stale x86_emulate_ctxt->intercept value
KVM: SVM: Fix the svm vmexit code for WRMSR
KVM: X86: Fix dereference null cpufreq policy
|
|
|
|
When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter
with EVMCS GPA = 0 the host crashes because the
evmcs_gpa != vmx->nested.hv_evmcs_vmptr
condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to
false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will
happen on vmx->nested.hv_evmcs pointer dereference.
Another problematic EVMCS ptr value is '-1' but it only causes host crash
after nested_release_evmcs() invocation. The problem is exactly the same as
with '0', we mistakenly think that the EVMCS pointer hasn't changed and
thus nested.hv_evmcs_vmptr is valid.
Resolve the issue by adding an additional !vmx->nested.hv_evmcs
check to nested_vmx_handle_enlightened_vmptrld(), this way we will
always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL
and this is supposed to catch all invalid EVMCS GPAs.
Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs()
to be consistent with initialization where we don't currently
set hv_evmcs_vmptr to '-1'.
Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Previously all fields of structure kvm_lapic_irq were not initialized
before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause
an issue when any of those fields are used for processing a request.
For example not initializing the msi_redir_hint field before passing
to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of
kvm_apic_map_get_dest_lapic(). This will specifically happen when the
kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage
value of msi_redir_hint, which should not happen as the request belongs
to APIC fixed delivery mode and we do not want to deliver the
interrupt only to the lowest priority candidate.
This patch initializes all the fields of kvm_lapic_irq based on the
values of ioapic redirect_entry object before passing it on to
kvm_bitmap_or_dest_vcpus().
Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Set level to false since the value doesn't really matter. Suggested
by Vitaly Kuznetsov. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The value in "new" is constructed from "old" such that all bits defined
as reserved by the ACPI spec[1] are left untouched. But if those bits
do not happen to be all zero, "new < 3" will not evaluate to true.
The firmware of the laptop(s) Medion MD63490 / Akoya P15648 comes with
garbage inside the "FACS" ACPI table. The starting value is
old=0x4944454d, therefore new=0x4944454e, which is >= 3. Mask off
the reserved bits.
[1] https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206553
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
BGRT is for displaying seamless OEM logo from booting to login screen;
however, this mechanism does not always work well on all configurations
and the OEM logo can be displayed multiple times. This looks worse than
without BGRT enabled.
This patch adds a kernel parameter to disable BGRT in boot time. This is
easier than re-compiling a kernel with CONFIG_ACPI_BGRT disabled.
Signed-off-by: Alex Hung <alex.hung@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if
the CPU supports SGX1. Per Intel's SDM, all ENCLS leafs #UD if SGX1
is not supported[*], i.e. intercepting ENCLS to inject a #UD is
unnecessary.
Avoiding ENCLS-exiting even when it is reported as supported by the CPU
works around a reported issue where SGX is "hard" disabled after an S3
suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control
are enumerated as unsupported. While the root cause of the S3 issue is
unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a
workaround for what is most likely a hardware or firmware issue. As a
bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and
vmcs02.
Note, SGX must be disabled in BIOS to take advantage of this workaround
[*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be
globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are
cleared. Soft disabled meaning disabling SGX without clearing the
primary CPUID bit (in leaf 0x7) and without poking into non-SGX
CPU paths, e.g. for the VMCS controls.
Fixes: 0b665d304028 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest")
Reported-by: Toni Spets <toni.spets@iki.fi>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
g++ insists that function declaration must start with extern "C"
(which asmlinkage expands to).
gcc doesn't care.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2020-03-12
The following pull-request contains BPF updates for your *net* tree.
We've added 12 non-merge commits during the last 8 day(s) which contain
a total of 12 files changed, 161 insertions(+), 15 deletions(-).
The main changes are:
1) Andrii fixed two bugs in cgroup-bpf.
2) John fixed sockmap.
3) Luke fixed x32 jit.
4) Martin fixed two issues in struct_ops.
5) Yonghong fixed bpf_send_signal.
6) Yoshiki fixed BTF enum.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The vector management code assumes that managed interrupts cannot be
migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE()
which triggers when a managed interrupt vector association on a online CPU
is cleared. The CPU offline code uses a different mechanism which cannot
trigger this.
This assumption is not longer correct because the new CPU isolation feature
which affects the placement of managed interrupts must be able to move a
managed interrupt away from an online CPU.
There are two reasons why this can happen:
1) When the interrupt is activated the affinity mask which was
established in irq_create_affinity_masks() is handed in to
the vector allocation code. This mask contains all CPUs to which
the interrupt can be made affine to, but this does not take the
CPU isolation 'managed_irq' mask into account.
When the interrupt is finally requested by the device driver then the
affinity is checked again and the CPU isolation 'managed_irq' mask is
taken into account, which moves the interrupt to a non-isolated CPU if
possible.
2) The interrupt can be affine to an isolated CPU because the
non-isolated CPUs in the calculated affinity mask are not online.
Once a non-isolated CPU which is in the mask comes online the
interrupt is migrated to this non-isolated CPU
In both cases the regular online migration mechanism is used which triggers
the WARN_ON_ONCE() in free_moved_vector().
Case #1 could have been addressed by taking the isolation mask into
account, but that would require a massive code change in the activation
logic and the eventual migration event was accepted as a reasonable
tradeoff when the isolation feature was developed. But even if #1 would be
addressed, #2 would still trigger it.
Of course the warning in free_moved_vector() was overlooked at that time
and the above two cases which have been discussed during patch review have
obviously never been tested before the final submission.
So keep it simple and remove the warning.
[ tglx: Rewrote changelog and added a comment to free_moved_vector() ]
Fixes: 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed interrupts")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lkml.kernel.org/r/20200312205830.81796-1-peterx@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a build problem with x86/curve25519"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/curve25519 - support assemblers with no adx support
|
|
Enable the sampling check in kernel/events/core.c::perf_event_open(),
which returns the more appropriate -EOPNOTSUPP.
BEFORE:
$ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses).
/bin/dmesg | grep -i perf may provide additional information.
With nothing relevant in dmesg.
AFTER:
$ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
Error:
l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
Fixes: c43ca5091a37 ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200311191323.13124-1-kim.phillips@amd.com
|
|
Family 19h CPUs are Zen-based and still share most architectural
features with Family 17h CPUs, and therefore still need to call
init_amd_zn() e.g., to set the RECLAIM_DISTANCE override.
init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used
in amd_set_core_ssb_state(), which isn't called on some late
model Family 17h CPUs, nor on any Family 19h CPUs:
X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those
later model CPUs, where the SSBD mitigation is done via the
SPEC_CTRL MSR instead of the LS_CFG MSR.
Family 19h CPUs also don't have the erratum where the CPB feature
bit isn't set, but that code can stay unchanged and run safely
on Family 19h.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.com
|
|
Modern x86 laptops are starting to use GPIO pins as interrupts more
and more, e.g. touchpads and touchscreens have almost all moved away
from PS/2 and USB to using I2C with a GPIO pin as interrupt.
Modern x86 laptops also have almost all moved to using s2idle instead
of using the system S3 ACPI power state to suspend.
The Intel and AMD pinctrl drivers do not define irq_retrigger handlers
for the irqchips they register, this is causing edge triggered interrupts
which happen while suspended using s2idle to get lost.
One specific example of this is the lid switch on some devices, lid
switches used to be handled by the embedded-controller, but now the
lid open/closed sensor is sometimes directly connected to a GPIO pin.
On most devices the ACPI code for this looks like this:
Method (_E00, ...) {
Notify (LID0, 0x80) // Status Change
}
Where _E00 is an ACPI event handler for changes on both edges of the GPIO
connected to the lid sensor, this event handler is then combined with an
_LID method which directly reads the pin. When the device is resumed by
opening the lid, the GPIO interrupt will wake the system, but because the
pinctrl irqchip doesn't have an irq_retrigger handler, the Notify will not
happen. This is not a problem in the case the _LID method directly reads
the GPIO, because the drivers/acpi/button.c code will call _LID on resume
anyways.
But some devices have an event handler for the GPIO connected to the
lid sensor which looks like this:
Method (_E00, ...) {
if (LID_GPIO == One)
LIDS = One
else
LIDS = Zero
Notify (LID0, 0x80) // Status Change
}
And the _LID method returns the cached LIDS value, since on open we
do not re-run the edge-interrupt handler when we re-enable IRQS on resume
(because of the missing irq_retrigger handler), _LID now will keep
reporting closed, as LIDS was never changed to reflect the open status,
this causes userspace to re-resume the laptop again shortly after opening
the lid.
The Intel GPIO controllers do not allow implementing irq_retrigger without
emulating it in software, at which point we are better of just using the
generic HARDIRQS_SW_RESEND mechanism rather then re-implementing software
emulation for this separately in aprox. 14 different pinctrl drivers.
Select HARDIRQS_SW_RESEND to solve the problem of edge-triggered GPIO
interrupts not being re-triggered on resume when they were triggered during
suspend (s2idle) and/or when they were the cause of the wakeup.
This requires
008f1d60fe25 ("x86/apic/vector: Force interupt handler invocation to irq context")
c16816acd086 ("genirq: Add protection against unsafe usage of generic_handle_irq()")
to protect the APIC based interrupts from being wreckaged by a software
resend.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200123210242.53367-1-hdegoede@redhat.com
|
|
The dmidecode program fails to properly decode the SMBIOS data supplied
by OVMF/UEFI when running in an SEV guest. The SMBIOS area, under SEV, is
encrypted and resides in reserved memory that is marked as EFI runtime
services data.
As a result, when memremap() is attempted for the SMBIOS data, it
can't be mapped as regular RAM (through try_ram_remap()) and, since
the address isn't part of the iomem resources list, it isn't mapped
encrypted through the fallback ioremap().
Add a new __ioremap_check_other() to deal with memory types like
EFI_RUNTIME_SERVICES_DATA which are not covered by the resource ranges.
This allows any runtime services data which has been created encrypted,
to be mapped encrypted too.
[ bp: Move functionality to a separate function. ]
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Cc: <stable@vger.kernel.org> # 5.3
Link: https://lkml.kernel.org/r/2d9e16eb5b53dc82665c95c6764b7407719df7a0.1582645327.git.thomas.lendacky@amd.com
|
|
We have had a hard coded limit of 32 machine check records since the
dawn of time. But as numbers of cores increase, it is possible for
more than 32 errors to be reported before a user process reads from
/dev/mcelog. In this case the additional errors are lost.
Keep 32 as the minimum. But tune the maximum value up based on the
number of processors.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200218184408.GA23048@agluck-desk2.amr.corp.intel.com
|
|
Some Centaur family 7 CPUs and Zhaoxin family 7 CPUs support the UMIP
feature too. The text size growth which UMIP adds is ~1K and distro
kernels enable it anyway so remove the vendor dependency.
[ bp: Rewrite commit message. ]
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1583733990-2587-1-git-send-email-TonyWWang-oc@zhaoxin.com
|
|
Sathyanarayanan reported that the PCI-E AER error injection mechanism
can result in a NULL pointer dereference in apic_ack_edge():
BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
RIP: 0010:apic_ack_edge+0x1e/0x40
Call Trace:
handle_edge_irq+0x7d/0x1e0
generic_handle_irq+0x27/0x30
aer_inject_write+0x53a/0x720
It crashes in irq_complete_move() which dereferences get_irq_regs() which
is obviously NULL when this is called from non interrupt context.
Of course the pointer could be checked, but that just papers over the real
issue. Invoking the low level interrupt handling mechanism from random code
can wreckage the fragile interrupt affinity mechanism of x86 as interrupts
can only be moved in interrupt context or with special care when a CPU goes
offline and the move has to be enforced.
In the best case this triggers the warning in the MSI affinity setter, but
if the call happens on the correct CPU it just corrupts state and might
prevent further interrupt delivery for the affected device.
Mark the APIC interrupts as unsuitable for being invoked in random contexts.
This prevents the AER injection from proliferating the wreckage, but that's
less broken than the current state of affairs and more correct than just
papering over the problem by sprinkling random checks all over the place
and silently corrupting state.
Reported-by: sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200306130623.684591280@linutronix.de
|
|
Commit:
59f2a619a2db8611 ("efi: Add 'runtime' pointer to struct efi")
modified the assembler routine called by efi_set_virtual_address_map(),
to grab the 'runtime' EFI service pointer while running with paging
disabled (which is tricky to do in C code)
After the change, register %ebx is not restored correctly, resulting
in all kinds of weird behavior, so fix that.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200304133515.15035-1-ardb@kernel.org
Link: https://lore.kernel.org/r/20200308080859.21568-22-ardb@kernel.org
|
|
Add alignment slack to the PE image size, so that we can realign the
decompression buffer within the space allocated for the image.
Only relocate the kernel if it has been loaded at an unsuitable address:
- Below LOAD_PHYSICAL_ADDR, or
- Above 64T for 64-bit and 512MiB for 32-bit
For 32-bit, the upper limit is conservative, but the exact limit can be
difficult to calculate.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-6-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-20-ardb@kernel.org
|
|
The following commit:
223e3ee56f77 ("efi/x86: add headroom to decompressor BSS to account for setup block")
added headroom to the PE image to account for the setup block, which
wasn't used for the decompression buffer.
Now that the decompression buffer is located at the start of the image,
and includes the setup block, this is no longer required.
Add a check to make sure that the head section of the compressed kernel
won't overwrite itself while relocating. This is only for
future-proofing as with current limits on the setup and the actual size
of the head section, this can never happen.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-5-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-19-ardb@kernel.org
|
|
Store the kernel's link address as ImageBase in the PE header. Note that
the PE specification requires the ImageBase to be 64k aligned. The
preferred address should almost always satisfy that, except for 32-bit
kernel if the configuration has been customized.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-4-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-18-ardb@kernel.org
|
|
When booted via PE loader, define image_offset to hold the offset of
startup_32() from the start of the PE image, and use it as the start of
the decompression buffer.
[ mingo: Fixed the grammar in the comments. ]
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-3-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-17-ardb@kernel.org
|
|
In preparation for being able to decompress into a buffer starting at a
different address than startup_32, save the calculated output address
instead of recalculating it later.
We now keep track of three addresses:
%edx: startup_32 as we were loaded by bootloader
%ebx: new location of compressed kernel
%ebp: start of decompression buffer
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200303221205.4048668-2-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-16-ardb@kernel.org
|
|
The load address is compared with LOAD_PHYSICAL_ADDR using a signed
comparison currently (using jge instruction).
When loading a 64-bit kernel using the new efi32_pe_entry() point added by:
97aa276579b2 ("efi/x86: Add true mixed mode entry point into .compat section")
using Qemu with -m 3072, the firmware actually loads us above 2Gb,
resulting in a very early crash.
Use the JAE instruction to perform a unsigned comparison instead, as physical
addresses should be considered unsigned.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-6-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-14-ardb@kernel.org
|
|
code32_start is meant for 16-bit real-mode bootloaders to inform the
kernel where the 32-bit protected mode code starts. Nothing in the
protected mode kernel except the EFI stub uses it.
efi_main() currently returns boot_params, with code32_start set inside it
to tell efi_stub_entry() where startup_32 is located. Since it was invoked
by efi_stub_entry() in the first place, boot_params is already known.
Return the address of startup_32 instead.
This will allow a 64-bit kernel to live above 4Gb, for example, and it's
cleaner as well.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-5-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-13-ardb@kernel.org
|
|
Set up a proper frame pointer in efi32_pe_entry() so that it's easier to
calculate offsets for arguments.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-4-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-12-ardb@kernel.org
|
|
verify_cpu() clobbers BX and DI. In case we have to return error, we need
to preserve them to respect the 32-bit calling convention.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-3-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-11-ardb@kernel.org
|
|
Use SYM_DATA*() macros to annotate this constant, and explicitly align it
to 4-byte boundary. Use lower-case for hexadecimal data.
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200301230436.2246909-2-nivedita@alum.mit.edu
Link: https://lore.kernel.org/r/20200308080859.21568-10-ardb@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core
More EFI updates for v5.7
- Incorporate a stable branch with the EFI pieces of Hans's work on
loading device firmware from EFI boot service memory regions
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The current x32 BPF JIT is incorrect for JMP32 JSET BPF_X when the upper
32 bits of operand registers are non-zero in certain situations.
The problem is in the following code:
case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X:
...
/* and dreg_lo,sreg_lo */
EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
/* and dreg_hi,sreg_hi */
EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi));
/* or dreg_lo,dreg_hi */
EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));
This code checks the upper bits of the operand registers regardless if
the BPF instruction is BPF_JMP32 or BPF_JMP64. Registers dreg_hi and
dreg_lo are not loaded from the stack for BPF_JMP32, however, they can
still be polluted with values from previous instructions.
The following BPF program demonstrates the bug. The jset64 instruction
loads the temporary registers and performs the jump, since ((u64)r7 &
(u64)r8) is non-zero. The jset32 should _not_ be taken, as the lower
32 bits are all zero, however, the current JIT will take the branch due
the pollution of temporary registers from the earlier jset64.
mov64 r0, 0
ld64 r7, 0x8000000000000000
ld64 r8, 0x8000000000000000
jset64 r7, r8, 1
exit
jset32 r7, r8, 1
mov64 r0, 2
exit
The expected return value of this program is 2; under the buggy x32 JIT
it returns 0. The fix is to skip using the upper 32 bits for jset32 and
compare the upper 32 bits for jset64 only.
All tests in test_bpf.ko and selftests/bpf/test_verifier continue to
pass with this change.
We found this bug using our automated verification tool, Serval.
Fixes: 69f827eb6e14 ("x32: bpf: implement jitting of JMP32")
Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200305234416.31597-1-luke.r.nels@gmail.com
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This was evidently copy and pasted from the i915 driver, but the text
wasn't updated.
Fixes: 4f337faf1c55 ("KVM: allow disabling -Werror")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Some older version of GAS do not support the ADX instructions, similarly
to how they also don't support AVX and such. This commit adds the same
build-time detection mechanisms we use for AVX and others for ADX, and
then makes sure that the curve25519 library dispatcher calls the right
functions.
Reported-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Commit c44b4c6ab80e ("KVM: emulate: clean up initializations in
init_decode_cache") did some field shuffling and instead of
[opcode_len, _regs) started clearing [has_seg_override, modrm).
The comment about clearing fields altogether is not true anymore.
Fixes: c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
After commit 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest
mode") Hyper-V guests on KVM stopped booting with:
kvm_nested_vmexit: rip fffff802987d6169 reason EPT_VIOLATION info1 181
info2 0 int_info 0 int_info_err 0
kvm_page_fault: address febd0000 error_code 181
kvm_emulate_insn: 0:fffff802987d6169: f3 a5
kvm_emulate_insn: 0:fffff802987d6169: f3 a5 FAIL
kvm_inj_exception: #UD (0x0)
"f3 a5" is a "rep movsw" instruction, which should not be intercepted
at all. Commit c44b4c6ab80e ("KVM: emulate: clean up initializations in
init_decode_cache") reduced the number of fields cleared by
init_decode_cache() claiming that they are being cleared elsewhere,
'intercept', however, is left uncleared if the instruction does not have
any of the "slow path" flags (NotImpl, Stack, Op3264, Sse, Mmx, CheckPerm,
NearBranch, No16 and of course Intercept itself).
Fixes: c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache")
Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Just like with PCI options ROMs, which we save in the setup_efi_pci*
functions from arch/x86/boot/compressed/eboot.c, the EFI code / ROM itself
sometimes may contain data which is useful/necessary for peripheral drivers
to have access to.
Specifically the EFI code may contain an embedded copy of firmware which
needs to be (re)loaded into the peripheral. Normally such firmware would be
part of linux-firmware, but in some cases this is not feasible, for 2
reasons:
1) The firmware is customized for a specific use-case of the chipset / use
with a specific hardware model, so we cannot have a single firmware file
for the chipset. E.g. touchscreen controller firmwares are compiled
specifically for the hardware model they are used with, as they are
calibrated for a specific model digitizer.
2) Despite repeated attempts we have failed to get permission to
redistribute the firmware. This is especially a problem with customized
firmwares, these get created by the chip vendor for a specific ODM and the
copyright may partially belong with the ODM, so the chip vendor cannot
give a blanket permission to distribute these.
This commit adds support for finding peripheral firmware embedded in the
EFI code and makes the found firmware available through the new
efi_get_embedded_fw() function.
Support for loading these firmwares through the standard firmware loading
mechanism is added in a follow-up commit in this patch-series.
Note we check the EFI_BOOT_SERVICES_CODE for embedded firmware near the end
of start_kernel(), just before calling rest_init(), this is on purpose
because the typical EFI_BOOT_SERVICES_CODE memory-segment is too large for
early_memremap(), so the check must be done after mm_init(). This relies
on EFI_BOOT_SERVICES_CODE not being free-ed until efi_free_boot_services()
is called, which means that this will only work on x86 for now.
Reported-by: Dave Olsthoorn <dave@bewaar.me>
Suggested-by: Peter Jones <pjones@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200115163554.101315-3-hdegoede@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Sometimes it is useful to be able to dump the efi boot-services code and
data. This commit adds these as debugfs-blobs to /sys/kernel/debug/efi,
but only if efi=debug is passed on the kernel-commandline as this requires
not freeing those memory-regions, which costs 20+ MB of RAM.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200115163554.101315-2-hdegoede@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In svm, exit_code for MSR writes is not EXIT_REASON_MSR_WRITE which
belongs to vmx.
According to amd manual, SVM_EXIT_MSR(7ch) is the exit_code of VMEXIT_MSR
due to RDMSR or WRMSR access to protected MSR. Additionally, the processor
indicates in the VMCB's EXITINFO1 whether a RDMSR(EXITINFO1=0) or
WRMSR(EXITINFO1=1) was intercepted.
Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Fixes: 1e9e2622a149 ("KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath", 2019-11-21)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Naresh Kamboju reported:
Linux version 5.6.0-rc4 (oe-user@oe-host) (gcc version
(GCC)) #1 SMP Sun Mar 1 22:59:08 UTC 2020
kvm: no hardware support
BUG: kernel NULL pointer dereference, address: 000000000000028c
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
04/01/2014
RIP: 0010:kobject_put+0x12/0x1c0
Call Trace:
cpufreq_cpu_put+0x15/0x20
kvm_arch_init+0x1f6/0x2b0
kvm_init+0x31/0x290
? svm_check_processor_compat+0xd/0xd
? svm_check_processor_compat+0xd/0xd
svm_init+0x21/0x23
do_one_initcall+0x61/0x2f0
? rdinit_setup+0x30/0x30
? rcu_read_lock_sched_held+0x4f/0x80
kernel_init_freeable+0x219/0x279
? rest_init+0x250/0x250
kernel_init+0xe/0x110
ret_from_fork+0x27/0x50
Modules linked in:
CR2: 000000000000028c
---[ end trace 239abf40c55c409b ]---
RIP: 0010:kobject_put+0x12/0x1c0
cpufreq policy which is get by cpufreq_cpu_get() can be NULL if it is failure,
this patch takes care of it.
Fixes: aaec7c03de (KVM: x86: avoid useless copy of cpufreq policy)
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a pkeys fix for a bug that triggers with weird BIOS
settings, and two Xen PV fixes: a paravirt interface fix, and
pagetable dumping fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix dump_pagetables with Xen PV
x86/ioperm: Add new paravirt function update_io_bitmap()
x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Three fixes to EFI mixed boot mode, mostly related to x86-64 vmap
stacks activated years ago, bug-fixed recently for EFI, which had
knock-on effects of various 1:1 mapping assumptions in mixed mode.
There's also a READ_ONCE() fix for reading an mmap-ed EFI firmware
data field only once, out of caution"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: READ_ONCE rng seed size before munmap
efi/x86: Handle by-ref arguments covering multiple pages in mixed mode
efi/x86: Remove support for EFI time and counter services in mixed mode
efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
|
|
Pull KVM fixes from Paolo Bonzini:
"More bugfixes, including a few remaining "make W=1" issues such as too
large frame sizes on some configurations.
On the ARM side, the compiler was messing up shadow stacks between EL1
and EL2 code, which is easily fixed with __always_inline"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: check descriptor table exits on instruction emulation
kvm: x86: Limit the number of "kvm: disabled by bios" messages
KVM: x86: avoid useless copy of cpufreq policy
KVM: allow disabling -Werror
KVM: x86: allow compiling as non-module with W=1
KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
KVM: Introduce pv check helpers
KVM: let declaration of kvm_get_running_vcpus match implementation
KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
arm64: Ask the compiler to __always_inline functions used by KVM at HYP
KVM: arm64: Define our own swab32() to avoid a uapi static inline
KVM: arm64: Ask the compiler to __always_inline functions used at HYP
kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
KVM: arm/arm64: Fix up includes for trace.h
|