Age | Commit message (Collapse) | Author |
|
The update-VMSA ioctl touches data stored in struct kvm_vcpu, and
therefore should not be performed concurrently with any VCPU ioctl
that might cause KVM or the processor to use the same data.
Adds vcpu mutex guard to the VMSA updating code. Refactors out
__sev_launch_update_vmsa() function to deal with per vCPU parts
of sev_launch_update_vmsa().
Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20210915171755.3773766-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
"VMXON pointer" is saved in vmx->nested.vmxon_ptr since
commit 3573e22cfeca ("KVM: nVMX: additional checks on
vmxon region"). Also, handle_vmptrld() & handle_vmclear()
now have logic to check the VMCS pointer against the VMXON
pointer.
So just remove the obsolete comments of handle_vmon().
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Message-Id: <20210908171731.18885-1-yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Check the return of init_srcu_struct(), which can fail due to OOM, when
initializing the page track mechanism. Lack of checking leads to a NULL
pointer deref found by a modified syzkaller.
Reported-by: TCS Robot <tcs_robot@tencent.com>
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Message-Id: <1630636626-12262-1-git-send-email-tcs_kernel@tencent.com>
[Move the call towards the beginning of kvm_arch_init_vm. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are
both defunct now that KVM keeps the list constant and instead explicitly
tracks which entries need to be loaded into hardware.
No functional change intended.
Fixes: ee9d22e08d13 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210908002401.1947049-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Explicitly zero the guest's CR3 and mark it available+dirty at RESET/INIT.
Per Intel's SDM and AMD's APM, CR3 is zeroed at both RESET and INIT. For
RESET, this is a nop as vcpu is zero-allocated. For INIT, the bug has
likely escaped notice because no firmware/kernel puts its page tables root
at PA=0, let alone relies on INIT to get the desired CR3 for such page
tables.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210921000303.400537-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Mark all registers as available and dirty at vCPU creation, as the vCPU has
obviously not been loaded into hardware, let alone been given the chance to
be modified in hardware. On SVM, reading from "uninitialized" hardware is
a non-issue as VMCBs are zero allocated (thus not truly uninitialized) and
hardware does not allow for arbitrary field encoding schemes.
On VMX, backing memory for VMCSes is also zero allocated, but true
initialization of the VMCS _technically_ requires VMWRITEs, as the VMX
architectural specification technically allows CPU implementations to
encode fields with arbitrary schemes. E.g. a CPU could theoretically store
the inverted value of every field, which would result in VMREAD to a
zero-allocated field returns all ones.
In practice, only the AR_BYTES fields are known to be manipulated by
hardware during VMREAD/VMREAD; no known hardware or VMM (for nested VMX)
does fancy encoding of cacheable field values (CR0, CR3, CR4, etc...). In
other words, this is technically a bug fix, but practically speakings it's
a glorified nop.
Failure to mark registers as available has been a lurking bug for quite
some time. The original register caching supported only GPRs (+RIP, which
is kinda sorta a GPR), with the masks initialized at ->vcpu_reset(). That
worked because the two cacheable registers, RIP and RSP, are generally
speaking not read as side effects in other flows.
Arguably, commit aff48baa34c0 ("KVM: Fetch guest cr3 from hardware on
demand") was the first instance of failure to mark regs available. While
_just_ marking CR3 available during vCPU creation wouldn't have fixed the
VMREAD from an uninitialized VMCS bug because ept_update_paging_mode_cr0()
unconditionally read vmcs.GUEST_CR3, marking CR3 _and_ intentionally not
reading GUEST_CR3 when it's available would have avoided VMREAD to a
technically-uninitialized VMCS.
Fixes: aff48baa34c0 ("KVM: Fetch guest cr3 from hardware on demand")
Fixes: 6de4f3ada40b ("KVM: Cache pdptrs")
Fixes: 6de12732c42c ("KVM: VMX: Optimize vmx_get_rflags()")
Fixes: 2fb92db1ec08 ("KVM: VMX: Cache vmcs segment fields")
Fixes: bd31fe495d0d ("KVM: VMX: Add proper cache tracking for CR0")
Fixes: f98c1e77127d ("KVM: VMX: Add proper cache tracking for CR4")
Fixes: 5addc235199f ("KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags")
Fixes: 8791585837f6 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210921000303.400537-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The general convention for pcibios_* hooks is that they're named after the
corresponding pci_* function they provide a hook for. The exception is
pcibios_add_device() which provides a hook for pci_device_add().
Rename pcibios_add_device() to pcibios_device_add() so it matches
pci_device_add().
Also, remove the export of the microblaze version. The only caller must be
compiled as a built-in so there's no reason for the export.
Link: https://lore.kernel.org/r/20210913152709.48013-1-oohall@gmail.com
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390
|
|
It turns out that a single page of stack is trivial to overflow with
all the tracing gunk enabled. Raise the exception stacks to 2 pages,
which is still half the interrupt stacks, which are at 4 pages.
Reported-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YUIO9Ye98S5Eb68w@hirez.programming.kicks-ass.net
|
|
Current code has an explicit check for hitting the task stack guard;
but overflowing any of the other stacks will get you a non-descript
general #DF warning.
Improve matters by using get_stack_info_noinstr() to detetrmine if and
which stack guard page got hit, enabling a better stack warning.
In specific, Michael Wang reported what turned out to be an NMI
exception stack overflow, which is now clearly reported as such:
[] BUG: NMI stack guard page was hit at 0000000085fd977b (stack is 000000003a55b09e..00000000d8cce1a5)
Reported-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Wang <yun.wang@linux.alibaba.com>
Link: https://lkml.kernel.org/r/YUTE/NuqnaWbST8n@hirez.programming.kicks-ass.net
|
|
Since commit c8137ace5638 ("x86/iopl: Restrict iopl() permission
scope") it's possible to emulate iopl(3) using ioperm(), except for
the CLI/STI usage.
Userspace CLI/STI usage is very dubious (read broken), since any
exception taken during that window can lead to rescheduling anyway (or
worse). The IOPL(2) manpage even states that usage of CLI/STI is highly
discouraged and might even crash the system.
Of course, that won't stop people and HP has the dubious honour of
being the first vendor to be found using this in their hp-health
package.
In order to enable this 'software' to still 'work', have the #GP treat
the CLI/STI instructions as NOPs when iopl(3). Warn the user that
their program is doing dubious things.
Fixes: a24ca9976843 ("x86/iopl: Remove legacy IOPL option")
Reported-by: Ondrej Zary <linux@zary.sk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org # v5.5+
Link: https://lkml.kernel.org/r/20210918090641.GD5106@worktop.programming.kicks-ass.net
|
|
Commit in Fixes introduced early_reserve_memory() to do all needed
initial memblock_reserve() calls in one function. Unfortunately, the call
of early_reserve_memory() is done too late for Xen dom0, as in some
cases a Xen hook called by e820__memory_setup() will need those memory
reservations to have happened already.
Move the call of early_reserve_memory() before the call of
e820__memory_setup() in order to avoid such problems.
Fixes: a799c2bd29d1 ("x86/setup: Consolidate early memory reservations")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210920120421.29276-1-jgross@suse.com
|
|
The initial observation was that in PV mode under Xen 32-bit user space
didn't work anymore. Attempts of system calls ended in #GP(0x402). All
of the sudden the vector 0x80 handler was not in place anymore. As it
turns out up to 5.13 redundant initialization did occur: Once from
cpu_initialize_context() (through its VCPUOP_initialise hypercall) and a
2nd time while each CPU was brought fully up. This 2nd initialization is
now gone, uncovering that the 1st one was flawed: Unlike for the
set_trap_table hypercall, a full virtual IDT needs to be specified here;
the "vector" fields of the individual entries are of no interest. With
many (kernel) IDT entries still(?) (i.e. at that point at least) empty,
the syscall vector 0x80 ended up in slot 0x20 of the virtual IDT, thus
becoming the domain's handler for vector 0x20.
Make xen_convert_trap_info() fit for either purpose, leveraging the fact
that on the xen_copy_trap_info() path the table starts out zero-filled.
This includes moving out the writing of the sentinel, which would also
have lead to a buffer overrun in the xen_copy_trap_info() case if all
(kernel) IDT entries were populated. Convert the writing of the sentinel
to clearing of the entire table entry rather than just the address
field.
(I didn't bother trying to identify the commit which uncovered the issue
in 5.14; the commit named below is the one which actually introduced the
bad code.)
Fixes: f87e4cac4f4e ("xen: SMP guest support")
Cc: stable@vger.kernel.org
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/7a266932-092e-b68f-f2bb-1473b61adc6e@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
The function __bad_area_nosemaphore() calls kernelmode_fixup_or_oops()
with the parameter @signal being actually @pkey, which will send a
signal numbered with the argument in @pkey.
This bug can be triggered when the kernel fails to access user-given
memory pages that are protected by a pkey, so it can go down the
do_user_addr_fault() path and pass the !user_mode() check in
__bad_area_nosemaphore().
Most cases will simply run the kernel fixup code to make an -EFAULT. But
when another condition current->thread.sig_on_uaccess_err is met, which
is only used to emulate vsyscall, the kernel will generate the wrong
signal.
Add a new parameter @pkey to kernelmode_fixup_or_oops() to fix this.
[ bp: Massage commit message, fix build error as reported by the 0day
bot: https://lkml.kernel.org/r/202109202245.APvuT8BX-lkp@intel.com ]
Fixes: 5042d40a264c ("x86/fault: Bypass no_context() for implicit kernel faults from usermode")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jiashuo Liang <liangjs@pku.edu.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20210730030152.249106-1-liangjs@pku.edu.cn
|
|
Fixes to the iterator code to handle faults that are not on page
boundaries mean that the special case for machine check during copy from
user is no longer needed.
For a full list of those fixes, see the output of:
git log --oneline v5.14 ^v5.13 -- lib/iov_iter.c
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210818002942.1607544-4-tony.luck@intel.com
|
|
The code is unreachable for HVM or PVH, and it also makes little sense
in auto-translated environments. On Arm, with
xen_{create,destroy}_contiguous_region() both being stubs, I have a hard
time seeing what good the Xen specific variant does - the generic one
ought to be fine for all purposes there. Still Arm code explicitly
references symbols here, so the code will continue to be included there.
Instead of making PCI_XEN's "select" conditional, simply drop it -
SWIOTLB_XEN will be available unconditionally in the PV case anyway, and
is - as explained above - dead code in non-PV environments.
This in turn allows dropping the stubs for
xen_{create,destroy}_contiguous_region(), the former of which was broken
anyway - it failed to set the DMA handle output.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/5947b8ae-fdc7-225c-4838-84712265fc1e@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
xen_swiotlb and pci_xen_swiotlb_init() are only used within the file
defining them, so make them static and remove the stubs. Otoh
pci_xen_swiotlb_detect() has a use (as function pointer) from the main
pci-swiotlb.c file - convert its stub to a #define to NULL.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/aef5fc33-9c02-4df0-906a-5c813142e13c@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Just after having obtained the pointer from kzalloc() there's no reason
at all to set part of the area to all zero yet another time. Similarly
there's no point explicitly clearing "ldt_ents".
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com>
Link: https://lore.kernel.org/r/14881835-a48e-29fa-0870-e177b10fcf65@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Sending a SIGBUS for a copy from user is not the correct semantic.
System calls should return -EFAULT (or a short count for write(2)).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210818002942.1607544-3-tony.luck@intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prevent a infinite loop in the MCE recovery on return to user space,
which was caused by a second MCE queueing work for the same page and
thereby creating a circular work list.
- Make kern_addr_valid() handle existing PMD entries, which are marked
not present in the higher level page table, correctly instead of
blindly dereferencing them.
- Pass a valid address to sanitize_phys(). This was caused by the
mixture of inclusive and exclusive ranges. memtype_reserve() expect
'end' being exclusive, but sanitize_phys() wants it inclusive. This
worked so far, but with end being the end of the physical address
space the fail is exposed.
- Increase the maximum supported GPIO numbers for 64bit. Newer SoCs
exceed the previous maximum.
* tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Avoid infinite loop for copy from user recovery
x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
x86/platform: Increase maximum GPIO number for X86_64
x86/pat: Pass valid address to sanitize_phys()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix bugs in checkkconfigsymbols.py
- Fix missing sys import in gen_compile_commands.py
- Fix missing FORCE warning for ARCH=sh builds
- Fix -Wignored-optimization-argument warnings for Clang builds
- Turn -Wignored-optimization-argument into an error in order to stop
building instead of sprinkling warnings
* tag 'kbuild-fixes-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Add -Werror=ignored-optimization-argument to CLANG_FLAGS
x86/build: Do not add -falign flags unconditionally for clang
kbuild: Fix comment typo in scripts/Makefile.modpost
sh: Add missing FORCE prerequisites in Makefile
gen_compile_commands: fix missing 'sys' package
checkkconfigsymbols.py: Remove skipping of help lines in parse_kconfig_file
checkkconfigsymbols.py: Forbid passing 'HEAD' to --commit
|
|
clang does not support -falign-jumps and only recently gained support
for -falign-loops. When one of the configuration options that adds these
flags is enabled, clang warns and all cc-{disable-warning,option} that
follow fail because -Werror gets added to test for the presence of this
warning:
clang-14: warning: optimization flag '-falign-jumps=0' is not supported
[-Wignored-optimization-argument]
To resolve this, add a couple of cc-option calls when building with
clang; gcc has supported these options since 3.2 so there is no point in
testing for their support. -falign-functions was implemented in clang-7,
-falign-loops was implemented in clang-14, and -falign-jumps has not
been implemented yet.
Link: https://lore.kernel.org/r/YSQE2f5teuvKLkON@Ryzen-9-3900X.localdomain/
Link: https://lore.kernel.org/r/20210824022640.2170859-2-nathan@kernel.org/
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-09-17
We've added 63 non-merge commits during the last 12 day(s) which contain
a total of 65 files changed, 2653 insertions(+), 751 deletions(-).
The main changes are:
1) Streamline internal BPF program sections handling and
bpf_program__set_attach_target() in libbpf, from Andrii.
2) Add support for new btf kind BTF_KIND_TAG, from Yonghong.
3) Introduce bpf_get_branch_snapshot() to capture LBR, from Song.
4) IMUL optimization for x86-64 JIT, from Jie.
5) xsk selftest improvements, from Magnus.
6) Introduce legacy kprobe events support in libbpf, from Rafael.
7) Access hw timestamp through BPF's __sk_buff, from Vadim.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits)
selftests/bpf: Fix a few compiler warnings
libbpf: Constify all high-level program attach APIs
libbpf: Schedule open_opts.attach_prog_fd deprecation since v0.7
selftests/bpf: Switch fexit_bpf2bpf selftest to set_attach_target() API
libbpf: Allow skipping attach_func_name in bpf_program__set_attach_target()
libbpf: Deprecated bpf_object_open_opts.relaxed_core_relocs
selftests/bpf: Stop using relaxed_core_relocs which has no effect
libbpf: Use pre-setup sec_def in libbpf_find_attach_btf_id()
bpf: Update bpf_get_smp_processor_id() documentation
libbpf: Add sphinx code documentation comments
selftests/bpf: Skip btf_tag test if btf_tag attribute not supported
docs/bpf: Add documentation for BTF_KIND_TAG
selftests/bpf: Add a test with a bpf program with btf_tag attributes
selftests/bpf: Test BTF_KIND_TAG for deduplication
selftests/bpf: Add BTF_KIND_TAG unit tests
selftests/bpf: Change NAME_NTH/IS_NAME_NTH for BTF_KIND_TAG format
selftests/bpf: Test libbpf API function btf__add_tag()
bpftool: Add support for BTF_KIND_TAG
libbpf: Add support for BTF_KIND_TAG
libbpf: Rename btf_{hash,equal}_int to btf_{hash,equal}_int_tag
...
====================
Link: https://lore.kernel.org/r/20210917173738.3397064-1-ast@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Coverity warns of an unused value in arch_scale_freq_tick():
CID 100778 (#1 of 1): Unused value (UNUSED_VALUE)
assigned_value: Assigning value 1024ULL to freq_scale here, but that stored
value is overwritten before it can be used.
It was introduced by commit:
e2b0d619b400a ("x86, sched: check for counters overflow in frequency invariant accounting")
Remove the variable initializer.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Link: https://lkml.kernel.org/r/20210910184405.24422-1-tim.gardner@canonical.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- The first hunk of a Xen swiotlb fixup series fixing multiple minor
issues and doing some small cleanups
- Some further Xen related fixes avoiding WARN() splats when running as
Xen guests or dom0
- A Kconfig fix allowing the pvcalls frontend to be built as a module
* tag 'for-linus-5.15b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
swiotlb-xen: drop DEFAULT_NSLABS
swiotlb-xen: arrange to have buffer info logged
swiotlb-xen: drop leftover __ref
swiotlb-xen: limit init retries
swiotlb-xen: suppress certain init retries
swiotlb-xen: maintain slab count properly
swiotlb-xen: fix late init retry
swiotlb-xen: avoid double free
xen/pvcalls: backend can be a module
xen: fix usage of pmd_populate in mremap for pv guests
xen: reset legacy rtc flag for PV domU
PM: base: power: don't try to use non-existing RTC for storing data
xen/balloon: use a kernel thread instead a workqueue
|
|
Since BTS is coherent, simply add a compiler barrier to separate the BTS
update and aux_head store.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210809111407.596077-5-leo.yan@linaro.org
|
|
In order to allow objtool to make sense of all the various paravirt
functions, it needs to either parse whole pv_ops[] tables, or observe
individual assignments in the form:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
As is, xen_cpu_ops[] is at offset +0 in pv_ops[] and could thus be
parsed as a 'normal' pv_ops[] table, however xen_irq_ops[] and
xen_mmu_ops[] are not.
Worse, both the latter two are compiled into the individual assignment
for by current GCC, but that's not something one can rely on.
Therefore, convert all three into full pv_ops[] tables. This has the
benefit of not needing to teach objtool about the offsets and
resulting in more conservative code-gen.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095149.057262522@infradead.org
|
|
vmlinux.o: warning: objtool: check_events()+0xd: call to xen_force_evtchn_callback() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.996055323@infradead.org
|
|
vmlinux.o: warning: objtool: pv_ops[31]: native_irq_disable
vmlinux.o: warning: objtool: pv_ops[31]: __raw_callee_save_xen_irq_disable
vmlinux.o: warning: objtool: pv_ops[31]: xen_irq_disable_direct
vmlinux.o: warning: objtool: lock_is_held_type()+0x5b: call to pv_ops[31]() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.933869441@infradead.org
|
|
vmlinux.o: warning: objtool: pv_ops[32]: native_irq_enable
vmlinux.o: warning: objtool: pv_ops[32]: __raw_callee_save_xen_irq_enable
vmlinux.o: warning: objtool: pv_ops[32]: xen_irq_enable_direct
vmlinux.o: warning: objtool: lock_is_held_type()+0xfe: call to pv_ops[32]() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.872254932@infradead.org
|
|
vmlinux.o: warning: objtool: xen_set_debugreg()+0x3: call to hypercall_page() leaves .noinstr.text section
vmlinux.o: warning: objtool: xen_get_debugreg()+0x3: call to hypercall_page() leaves .noinstr.text section
vmlinux.o: warning: objtool: xen_irq_enable()+0x24: call to hypercall_page() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.810950584@infradead.org
|
|
vmlinux.o: warning: objtool: pv_ops[30]: native_save_fl
vmlinux.o: warning: objtool: pv_ops[30]: __raw_callee_save_xen_save_fl
vmlinux.o: warning: objtool: pv_ops[30]: xen_save_fl_direct
vmlinux.o: warning: objtool: lockdep_hardirqs_off()+0x73: call to pv_ops[30]() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.749712274@infradead.org
|
|
vmlinux.o: warning: objtool: pv_ops[2]: xen_set_debugreg
vmlinux.o: warning: objtool: pv_ops[2]: native_set_debugreg
vmlinux.o: warning: objtool: exc_debug()+0x3b: call to pv_ops[2]() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.687755639@infradead.org
|
|
vmlinux.o: warning: objtool: pv_ops[1]: xen_get_debugreg
vmlinux.o: warning: objtool: pv_ops[1]: native_get_debugreg
vmlinux.o: warning: objtool: exc_debug()+0x25: call to pv_ops[1]() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.625523645@infradead.org
|
|
vmlinux.o: warning: objtool: pv_ops[42]: native_write_cr2
vmlinux.o: warning: objtool: pv_ops[42]: xen_write_cr2
vmlinux.o: warning: objtool: exc_nmi()+0x127: call to pv_ops[42]() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.563524913@infradead.org
|
|
vmlinux.o: warning: objtool: pv_ops[41]: native_read_cr2
vmlinux.o: warning: objtool: pv_ops[41]: xen_read_cr2
vmlinux.o: warning: objtool: pv_ops[41]: xen_read_cr2_direct
vmlinux.o: warning: objtool: exc_double_fault()+0x15: call to pv_ops[41]() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.500331616@infradead.org
|
|
In the code for xts_crypt(), we check for the err value returned by
skcipher_walk_virt() and return from the function if it is non zero.
However, skcipher_walk_virt() can set walk.nbytes to 0, which would cause
us to call kernel_fpu_begin(), and then skip the kernel_fpu_end() call.
This patch checks for the walk.nbytes value instead, and returns if
walk.nbytes is 0. This prevents us from calling kernel_fpu_begin() in
the first place and also covers the case of having a non zero err value
returned from skcipher_walk_virt().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix kernel crash caused by uio driver (Vitaly Kuznetsov)
- Remove on-stack cpumask from HV APIC code (Wei Liu)
* tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself
asm-generic/hyperv: provide cpumask_to_vpset_noself
Drivers: hv: vmbus: Fix kernel crash upon unbinding a device from uio_hv_generic driver
|
|
Doing unconditional indirect calls through the pv_ops vector is weird.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.437720419@infradead.org
|
|
vmlinux.o: warning: objtool: lockdep_hardirqs_on()+0x72: call to arch_local_save_flags() leaves .noinstr.text section
vmlinux.o: warning: objtool: lockdep_hardirqs_off()+0x73: call to arch_local_save_flags() leaves .noinstr.text section
vmlinux.o: warning: objtool: match_held_lock()+0x11f: call to arch_local_save_flags() leaves .noinstr.text section
vmlinux.o: warning: objtool: lock_is_held_type()+0x4e: call to arch_local_irq_save() leaves .noinstr.text section
vmlinux.o: warning: objtool: lock_is_held_type()+0x65: call to arch_local_irq_disable() leaves .noinstr.text section
vmlinux.o: warning: objtool: lock_is_held_type()+0xfe: call to arch_local_irq_enable() leaves .noinstr.text section
It makes no sense to not inline these things.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210624095148.373073648@infradead.org
|
|
vmlinux.o: warning: objtool: __sev_put_ghcb()+0x88: call to __memset() leaves .noinstr.text section
vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0x39: call to __memset() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095148.250770465@infradead.org
|
|
vmlinux.o: warning: objtool: vc_switch_off_ist()+0x20: call to ip_within_syscall_gap.isra.0() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095148.188166492@infradead.org
|
|
vmlinux.o: warning: objtool: vmx_update_host_rsp()+0x64: call to evmcs_write64() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095148.126956644@infradead.org
|
|
vmlinux.o: warning: objtool: svm_vcpu_enter_exit()+0x13: call to to_svm() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095148.066347165@infradead.org
|
|
vmlinux.o: warning: objtool: svm_vcpu_enter_exit()+0xea: call to vmload() leaves .noinstr.text section
vmlinux.o: warning: objtool: svm_vcpu_enter_exit()+0x133: call to vmsave() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095147.942250748@infradead.org
|
|
vmlinux.o: warning: objtool: svm_vcpu_enter_exit()+0x4d: call to sev_es_guest() leaves .noinstr.text section
vmlinux.o: warning: objtool: svm_vcpu_enter_exit()+0x50: call to sev_guest() leaves .noinstr.text section
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095147.880513802@infradead.org
|
|
Because hypercall_page is page-aligned, the assembler inexplicably adds
an unreachable jump from after the end of the previous code to the
beginning of hypercall_page.
That confuses objtool, understandably. It also creates significant text
fragmentation. As a result, much of the object file is wasted text
(nops).
Move hypercall_page to the beginning of the file to both prevent the
text fragmentation and avoid the dead jump instruction.
$ size /tmp/head_64.before.o /tmp/head_64.after.o
text data bss dec hex filename
10924 307252 4096 322272 4eae0 /tmp/head_64.before.o
6823 307252 4096 318171 4dadb /tmp/head_64.after.o
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20210820193107.omvshmsqbpxufzkc@treble
|
|
Commit 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page
table entries") introduced a regression when running as Xen PV guest.
Today pmd_populate() for Xen PV assumes that the PFN inserted is
referencing a not yet used page table. In case of move_normal_pmd()
this is not true, resulting in WARN splats like:
[34321.304270] ------------[ cut here ]------------
[34321.304277] WARNING: CPU: 0 PID: 23628 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x176/0x1a0
[34321.304288] Modules linked in:
[34321.304291] CPU: 0 PID: 23628 Comm: apt-get Not tainted 5.14.1-20210906-doflr-mac80211debug+ #1
[34321.304294] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010
[34321.304296] RIP: e030:xen_mc_flush+0x176/0x1a0
[34321.304300] Code: 89 45 18 48 c1 e9 3f 48 89 ce e9 20 ff ff ff e8 60 03 00 00 66 90 5b 5d 41 5c 41 5d c3 48 c7 45 18 ea ff ff ff be 01 00 00 00 <0f> 0b 8b 55 00 48 c7 c7 10 97 aa 82 31 db 49 c7 c5 38 97 aa 82 65
[34321.304303] RSP: e02b:ffffc90000a97c90 EFLAGS: 00010002
[34321.304305] RAX: ffff88807d416398 RBX: ffff88807d416350 RCX: ffff88807d416398
[34321.304306] RDX: 0000000000000001 RSI: 0000000000000001 RDI: deadbeefdeadf00d
[34321.304308] RBP: ffff88807d416300 R08: aaaaaaaaaaaaaaaa R09: ffff888006160cc0
[34321.304309] R10: deadbeefdeadf00d R11: ffffea000026a600 R12: 0000000000000000
[34321.304310] R13: ffff888012f6b000 R14: 0000000012f6b000 R15: 0000000000000001
[34321.304320] FS: 00007f5071177800(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
[34321.304322] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[34321.304323] CR2: 00007f506f542000 CR3: 00000000160cc000 CR4: 0000000000000660
[34321.304326] Call Trace:
[34321.304331] xen_alloc_pte+0x294/0x320
[34321.304334] move_pgt_entry+0x165/0x4b0
[34321.304339] move_page_tables+0x6fa/0x8d0
[34321.304342] move_vma.isra.44+0x138/0x500
[34321.304345] __x64_sys_mremap+0x296/0x410
[34321.304348] do_syscall_64+0x3a/0x80
[34321.304352] entry_SYSCALL_64_after_hwframe+0x44/0xae
[34321.304355] RIP: 0033:0x7f507196301a
[34321.304358] Code: 73 01 c3 48 8b 0d 76 0e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 46 0e 0c 00 f7 d8 64 89 01 48
[34321.304360] RSP: 002b:00007ffda1eecd38 EFLAGS: 00000246 ORIG_RAX: 0000000000000019
[34321.304362] RAX: ffffffffffffffda RBX: 000056205f950f30 RCX: 00007f507196301a
[34321.304363] RDX: 0000000001a00000 RSI: 0000000001900000 RDI: 00007f506dc56000
[34321.304364] RBP: 0000000001a00000 R08: 0000000000000010 R09: 0000000000000004
[34321.304365] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f506dc56060
[34321.304367] R13: 00007f506dc56000 R14: 00007f506dc56060 R15: 000056205f950f30
[34321.304368] ---[ end trace a19885b78fe8f33e ]---
[34321.304370] 1 of 2 multicall(s) failed: cpu 0
[34321.304371] call 2: op=12297829382473034410 arg=[aaaaaaaaaaaaaaaa] result=-22
Fix that by modifying xen_alloc_ptpage() to only pin the page table in
case it wasn't pinned already.
Fixes: 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries")
Cc: <stable@vger.kernel.org>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210908073640.11299-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
A Xen PV guest doesn't have a legacy RTC device, so reset the legacy
RTC flag. Otherwise the following WARN splat will occur at boot:
[ 1.333404] WARNING: CPU: 1 PID: 1 at /home/gross/linux/head/drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1be/0x210
[ 1.333404] Modules linked in:
[ 1.333404] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-default+ #282
[ 1.333404] RIP: e030:mc146818_get_time+0x1be/0x210
[ 1.333404] Code: c0 64 01 c5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ec 01 b8 02 00 00 00 44 89 63 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 c7 c7 30 0e ef 82 4c 89 e6 e8 71 2a 24 00 48 c7 c0 ff ff
[ 1.333404] RSP: e02b:ffffc90040093df8 EFLAGS: 00010002
[ 1.333404] RAX: 00000000000000ff RBX: ffffc90040093e34 RCX: 0000000000000000
[ 1.333404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000d
[ 1.333404] RBP: ffffffff82ef0e30 R08: ffff888005013e60 R09: 0000000000000000
[ 1.333404] R10: ffffffff82373e9b R11: 0000000000033080 R12: 0000000000000200
[ 1.333404] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff82cdc6d4
[ 1.333404] FS: 0000000000000000(0000) GS:ffff88807d440000(0000) knlGS:0000000000000000
[ 1.333404] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.333404] CR2: 0000000000000000 CR3: 000000000260a000 CR4: 0000000000050660
[ 1.333404] Call Trace:
[ 1.333404] ? wakeup_sources_sysfs_init+0x30/0x30
[ 1.333404] ? rdinit_setup+0x2b/0x2b
[ 1.333404] early_resume_init+0x23/0xa4
[ 1.333404] ? cn_proc_init+0x36/0x36
[ 1.333404] do_one_initcall+0x3e/0x200
[ 1.333404] kernel_init_freeable+0x232/0x28e
[ 1.333404] ? rest_init+0xd0/0xd0
[ 1.333404] kernel_init+0x16/0x120
[ 1.333404] ret_from_fork+0x1f/0x30
Cc: <stable@vger.kernel.org>
Fixes: 8d152e7a5c7537 ("x86/rtc: Replace paravirt rtc check with platform legacy quirk")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20210903084937.19392-3-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
IMUL allows for multiple operands and saving and storing rax/rdx is no
longer needed. Signedness of the operands doesn't matter here because
the we only keep the lower 32/64 bit of the product for 32/64 bit
multiplications.
Signed-off-by: Jie Meng <jmeng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210913211337.1564014-1-jmeng@fb.com
|
|
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.
Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:
https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
or just random memory corruption if the debug checks don't catch it:
https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.
I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.
So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer. And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|