summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-01-08KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTINGXiaoyao Li
The mis-spelling is found by checkpatch.pl, so fix them. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: VMX: Rename NMI_PENDING to NMI_WINDOWXiaoyao Li
Rename the NMI-window exiting related definitions to match the latest Intel SDM. No functional changes. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOWXiaoyao Li
Rename interrupt-windown exiting related definitions to match the latest Intel SDM. No functional changes. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86: Fix some comment typosMiaohe Lin
Fix some typos in comment. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Convert the last users of "shorthand = 0" to use macrosPeter Xu
Change the last users of "shorthand = 0" to use APIC_DEST_NOSHORT. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Fix callers of kvm_apic_match_dest() to use correct macrosPeter Xu
Callers of kvm_apic_match_dest() should always pass in APIC_DEST_* macros for either dest_mode and short_hand parameters. Fix up all the callers of kvm_apic_match_dest() that are not following the rule. Since at it, rename the parameter from short_hand to shorthand in kvm_apic_match_dest(), as suggested by Vitaly. Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Drop KVM_APIC_SHORT_MASK and KVM_APIC_DEST_MASKPeter Xu
We have both APIC_SHORT_MASK and KVM_APIC_SHORT_MASK defined for the shorthand mask. Similarly, we have both APIC_DEST_MASK and KVM_APIC_DEST_MASK defined for the destination mode mask. Drop the KVM_APIC_* macros and replace the only user of them to use the APIC_DEST_* macros instead. At the meantime, move APIC_SHORT_MASK and APIC_DEST_MASK from lapic.c to lapic.h. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Use APIC_DEST_* macros properly in kvm_lapic_irq.dest_modePeter Xu
We were using either APIC_DEST_PHYSICAL|APIC_DEST_LOGICAL or 0|1 to fill in kvm_lapic_irq.dest_mode. It's fine only because in most cases when we check against dest_mode it's against APIC_DEST_PHYSICAL (which equals to 0). However, that's not consistent. We'll have problem when we want to start checking against APIC_DEST_LOGICAL, which does not equals to 1. This patch firstly introduces kvm_lapic_irq_dest_mode() helper to take any boolean of destination mode and return the APIC_DEST_* macro. Then, it replaces the 0|1 settings of irq.dest_mode with the helper. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Move irrelevant declarations out of ioapic.hPeter Xu
kvm_apic_match_dest() is declared in both ioapic.h and lapic.h. Remove the declaration in ioapic.h. kvm_apic_compare_prio() is declared in ioapic.h but defined in lapic.c. Move the declaration to lapic.h. kvm_irq_delivery_to_apic() is declared in ioapic.h but defined in irq_comm.c. Move the declaration to irq.h. hyperv.c needs to use kvm_irq_delivery_to_apic(). Include irq.h in hyperv.c. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Fix kvm_bitmap_or_dest_vcpus() to use irq shorthandPeter Xu
The 3rd parameter of kvm_apic_match_dest() is the irq shorthand, rather than the irq delivery mode. Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: explicitly set rmap_head->val to 0 in pte_list_desc_remove_entry()Miaohe Lin
When we reach here, we have desc->sptes[j] = NULL with j = 0. So we can replace desc->sptes[0] with 0 to make it more clear. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: vmx: remove unreachable statement in vmx_get_msr_feature()Miaohe Lin
We have no way to reach the final statement, remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86: use CPUID to locate host page table reserved bitsPaolo Bonzini
The comment in kvm_get_shadow_phys_bits refers to MKTME, but the same is actually true of SME and SEV. Just use CPUID[0x8000_0008].EAX[7:0] unconditionally if available, it is simplest and works even if memory is not encrypted. Cc: stable@vger.kernel.org Reported-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08x86/cpufeatures: Add support for fast short REP; MOVSBTony Luck
>From the Intel Optimization Reference Manual: 3.7.6.1 Fast Short REP MOVSB Beginning with processors based on Ice Lake Client microarchitecture, REP MOVSB performance of short operations is enhanced. The enhancement applies to string lengths between 1 and 128 bytes long. Support for fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID [EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1. There is no change in the REP STOS performance. Add an X86_FEATURE_FSRM flag for this. memmove() avoids REP MOVSB for short (< 32 byte) copies. Check FSRM and use REP MOVSB for short copies on systems that support it. [ bp: Massage and add comment. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191216214254.26492-1-tony.luck@intel.com
2020-01-07x86/fpu: Deactivate FPU state after failure during state loadSebastian Andrzej Siewior
In __fpu__restore_sig(), fpu_fpregs_owner_ctx needs to be reset if the FPU state was not fully restored. Otherwise the following may happen (on the same CPU): Task A Task B fpu_fpregs_owner_ctx *active* A.fpu __fpu__restore_sig() ctx switch load B.fpu *active* B.fpu fpregs_lock() copy_user_to_fpregs_zeroing() copy_kernel_to_xregs() *modify* copy_user_to_xregs() *fails* fpregs_unlock() ctx switch skip loading B.fpu, *active* B.fpu In the success case, fpu_fpregs_owner_ctx is set to the current task. In the failure case, the FPU state might have been modified by loading the init state. In this case, fpu_fpregs_owner_ctx needs to be reset in order to ensure that the FPU state of the following task is loaded from saved state (and not skipped because it was the previous state). Reset fpu_fpregs_owner_ctx after a failure during restore occurred, to ensure that the FPU state for the next task is always loaded. The problem was debugged-by Yu-cheng Yu <yu-cheng.yu@intel.com>. [ bp: Massage commit message. ] Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") Reported-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191220195906.plk6kpmsrikvbcfn@linutronix.de
2020-01-07um: Implement copy_thread_tlsAmanieu d'Antras
This is required for clone3 which passes the TLS value through a struct rather than a register. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: linux-um@lists.infradead.org Cc: <stable@vger.kernel.org> # 5.3.x Link: https://lore.kernel.org/r/20200104123928.1048822-1-amanieu@gmail.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-07x86/unwind/orc: Fix !CONFIG_MODULES build warningShile Zhang
To fix follwowing warning due to ORC sort moved to build time: arch/x86/kernel/unwind_orc.c:210:12: warning: ‘orc_sort_cmp’ defined but not used [-Wunused-function] arch/x86/kernel/unwind_orc.c:190:13: warning: ‘orc_sort_swap’ defined but not used [-Wunused-function] Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/c9c81536-2afc-c8aa-c5f8-c7618ecd4f54@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-07x86/context-tracking: Remove exception_enter/exit() from ↵Frederic Weisbecker
KVM_PV_REASON_PAGE_NOT_PRESENT async page fault This is a leftover. Page faults, just like most other exceptions, are protected inside user_exit() / user_enter() calls in x86 entry code when we fault from userspace. So this pair of calls is now superfluous. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Link: https://lkml.kernel.org/r/20191227163612.10039-3-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-07x86/context-tracking: Remove exception_enter/exit() from do_page_fault()Frederic Weisbecker
do_page_fault(), like other exceptions, is already covered by user_enter() and user_exit() when the exception triggers in userspace. As explained in: 8c84014f3bbb11 ("x86/entry: Remove exception_enter() from most trap handlers") exception_enter/exit() only remained to handle possible page fault from kernel mode while context tracking is in CONTEXT_USER mode, ie: on kernel entry before we manage to call user_exit(). The only known offender was do_fast_syscall_32() fetching EBP register from where vDSO stashed it. Meanwhile this got fixed in: 9999c8c01f34c9 ("x86/entry: Call enter_from_user_mode() with IRQs off") that moved enter_from_user_mode() before the call to get_user(). So we can safely remove it now. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Link: https://lkml.kernel.org/r/20191227163612.10039-2-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-06x86/fpu/xstate: Make xfeature_is_supervisor()/xfeature_is_user() return boolYu-cheng Yu
Have both xfeature_is_supervisor()/xfeature_is_user() return bool because they are used only in boolean context. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191212210855.19260-3-yu-cheng.yu@intel.com
2020-01-06x86/fpu/xstate: Fix small issuesYu-cheng Yu
In response to earlier comments, fix small issues before introducing XSAVES supervisor states: - Fix comments of xfeature_is_supervisor(). - Replace ((u64)1 << 63) with XCOMP_BV_COMPACTED_FORMAT. No functional changes. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191212210855.19260-2-yu-cheng.yu@intel.com
2020-01-06remove ioremap_nocache and devm_ioremap_nocacheChristoph Hellwig
ioremap has provided non-cached semantics by default since the Linux 2.6 days, so remove the additional ioremap_nocache interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de>
2020-01-04mm/memory_hotplug: shrink zones when offlining memoryDavid Hildenbrand
We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): BUG: unable to handle page fault for address: 000000000000353d #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:clear_zone_contiguous+0x5/0x10 Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __remove_pages+0x4b/0x640 arch_remove_memory+0x63/0x8d try_remove_memory+0xdb/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x70/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x227/0x3a0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x221/0x550 worker_thread+0x50/0x3b0 kthread+0x105/0x140 ret_from_fork+0x3a/0x50 Modules linked in: CR2: 000000000000353d Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-03compat: provide compat_ptr() on all architecturesArnd Bergmann
In order to avoid needless #ifdef CONFIG_COMPAT checks, move the compat_ptr() definition to linux/compat.h where it can be seen by any file regardless of the architecture. Only s390 needs a special definition, this can use the self-#define trick we have elsewhere. Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-01-02x86/resctrl: Fix potential memory leakShakeel Butt
set_cache_qos_cfg() is leaking memory when the given level is not RDT_RESOURCE_L3 or RDT_RESOURCE_L2. At the moment, this function is called with only valid levels but move the allocation after the valid level checks in order to make it more robust and future proof. [ bp: Massage commit message. ] Fixes: 99adde9b370de ("x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG") Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20200102165844.133133-1-shakeelb@google.com
2020-01-02x86/nospec: Remove unused RSB_FILL_LOOPSAnthony Steinhauser
It was never really used, see 117cc7a908c8 ("x86/retpoline: Fill return stack buffer on vmexit") [ bp: Massage. ] Signed-off-by: Anthony Steinhauser <asteinhauser@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191226204512.24524-1-asteinhauser@google.com
2019-12-31x86/traps: Cleanup do_general_protection()Borislav Petkov
Hoist the user_mode() case up because it is less code and can be dealt with up-front like the other special cases UMIP and vm86. This saves an indentation level for the kernel-mode #GP case and allows to "unfold" the code more so that it is more readable. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jann Horn <jannh@google.com> Cc: x86@kernel.org
2019-12-31x86/kasan: Print original address on #GPJann Horn
Make #GP exceptions caused by out-of-bounds KASAN shadow accesses easier to understand by computing the address of the original access and printing that. More details are in the comments in the patch. This turns an error like this: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault, probably for non-canonical address 0xe017577ddf75b7dd: 0000 [#1] PREEMPT SMP KASAN PTI into this: general protection fault, probably for non-canonical address 0xe017577ddf75b7dd: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: maybe wild-memory-access in range [0x00badbeefbadbee8-0x00badbeefbadbeef] The hook is placed in architecture-independent code, but is currently only wired up to the X86 exception handler because I'm not sufficiently familiar with the address space layout and exception handling mechanisms on other architectures. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kasan-dev@googlegroups.com Cc: linux-mm <linux-mm@kvack.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191218231150.12139-4-jannh@google.com
2019-12-31x86/dumpstack: Introduce die_addr() for die() with #GP fault addressJann Horn
Split __die() into __die_header() and __die_body(). This allows inserting extra information below the header line that initiates the bug report. Introduce a new function die_addr() that behaves like die(), but is for faults only and uses __die_header() and __die_body() so that a future commit can print extra information after the header line. [ bp: Comment the KASAN-specific usage of gp_addr. ] Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kasan-dev@googlegroups.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191218231150.12139-3-jannh@google.com
2019-12-31x86/traps: Print address on #GPJann Horn
A frequent cause of #GP exceptions are memory accesses to non-canonical addresses. Unlike #PF, #GP doesn't report a fault address in CR2, so the kernel doesn't currently print the fault address for a #GP. Luckily, the necessary infrastructure for decoding x86 instructions and computing the memory address being accessed is already present. Hook it up to the #GP handler so that the address operand of the faulting instruction can be figured out and printed. Distinguish two cases: a) (Part of) the memory range being accessed lies in the non-canonical address range; in this case, it is likely that the decoded address is actually the one that caused the #GP. b) The entire memory range of the decoded operand lies in canonical address space; the #GP may or may not be related in some way to the computed address. Print it, but with hedging language in the message. While it is already possible to compute the faulting address manually by disassembling the opcode dump and evaluating the instruction against the register dump, this should make it slightly easier to identify crashes at a glance. Note that the operand length which comes from the instruction decoder and is used to determine whether the access straddles into non-canonical address space, is currently somewhat unreliable; but it should be good enough, considering that Linux on x86-64 never maps the page directly before the start of the non-canonical range anyway, and therefore the case where a memory range begins in that page and potentially straddles into the non-canonical range should be fairly uncommon. In the case the address is still computed wrongly, it only influences whether the error message claims that the access is canonical. [ bp: Remove ambiguous "we", massage, reflow comments and spacing. ] Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Tested-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kasan-dev@googlegroups.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191218231150.12139-2-jannh@google.com
2019-12-30x86/insn-eval: Add support for 64-bit kernel modeJann Horn
To support evaluating 64-bit kernel mode instructions: * Replace existing checks for user_64bit_mode() with a new helper that checks whether code is being executed in either 64-bit kernel mode or 64-bit user mode. * Select the GS base depending on whether the instruction is being evaluated in kernel mode. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: kasan-dev@googlegroups.com Cc: Oleg Nesterov <oleg@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191218231150.12139-1-jannh@google.com
2019-12-30x86/resctrl: Fix an imbalance in domain_remove_cpu()Qian Cai
A system that supports resource monitoring may have multiple resources while not all of these resources are capable of monitoring. Monitoring related state is initialized only for resources that are capable of monitoring and correspondingly this state should subsequently only be removed from these resources that are capable of monitoring. domain_add_cpu() calls domain_setup_mon_state() only when r->mon_capable is true where it will initialize d->mbm_over. However, domain_remove_cpu() calls cancel_delayed_work(&d->mbm_over) without checking r->mon_capable resulting in an attempt to cancel d->mbm_over on all resources, even those that never initialized d->mbm_over because they are not capable of monitoring. Hence, it triggers a debugobjects warning when offlining CPUs because those timer debugobjects are never initialized: ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0 WARNING: CPU: 143 PID: 789 at lib/debugobjects.c:484 debug_print_object Hardware name: HP Synergy 680 Gen9/Synergy 680 Gen9 Compute Module, BIOS I40 05/23/2018 RIP: 0010:debug_print_object Call Trace: debug_object_assert_init del_timer try_to_grab_pending cancel_delayed_work resctrl_offline_cpu cpuhp_invoke_callback cpuhp_thread_fun smpboot_thread_fn kthread ret_from_fork Fixes: e33026831bdb ("x86/intel_rdt/mbm: Handle counter overflow") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: tj@kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191211033042.2188-1-cai@lca.pw
2019-12-29x86/vdso: Provide missing include fileValdis Klētnieks
When building with C=1, sparse issues a warning: CHECK arch/x86/entry/vdso/vdso32-setup.c arch/x86/entry/vdso/vdso32-setup.c:28:28: warning: symbol 'vdso32_enabled' was not declared. Should it be static? Provide the missing header file. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/36224.1575599767@turing-police
2019-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-12-27 The following pull-request contains BPF updates for your *net-next* tree. We've added 127 non-merge commits during the last 17 day(s) which contain a total of 110 files changed, 6901 insertions(+), 2721 deletions(-). There are three merge conflicts. Conflicts and resolution looks as follows: 1) Merge conflict in net/bpf/test_run.c: There was a tree-wide cleanup c593642c8be0 ("treewide: Use sizeof_field() macro") which gets in the way with b590cb5f802d ("bpf: Switch to offsetofend in BPF_PROG_TEST_RUN"): <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) + sizeof_field(struct __sk_buff, priority), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c There are a few occasions that look similar to this. Always take the chunk with offsetofend(). Note that there is one where the fields differ in here: <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) + sizeof_field(struct __sk_buff, tstamp), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to 850a88cc4096 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN"). 2) Merge conflict in arch/riscv/net/bpf_jit_comp.c: (I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.) <<<<<<< HEAD if (is_13b_check(off, insn)) return -1; emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx); ======= emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx); >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Result should look like: emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx); 3) Merge conflict in arch/riscv/include/asm/pgtable.h: <<<<<<< HEAD ======= #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) #define vmemmap ((struct page *)VMEMMAP_START) >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Only take the BPF_* defines from there and move them higher up in the same file. Remove the rest from the chunk. The VMALLOC_* etc defines got moved via 01f52e16b868 ("riscv: define vmemmap before pfn_to_page calls"). Result: [...] #define __S101 PAGE_READ_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) [...] Let me know if there are any other issues. Anyway, the main changes are: 1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific to a provided BPF object file. This provides an alternative, simplified API compared to standard libbpf interaction. Also, add libbpf extern variable resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko. 2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also, add various BPF riscv JIT improvements, from Björn Töpel. 3) Extend bpftool to allow matching BPF programs and maps by name, from Paul Chaignon. 4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI flag for allowing updates without service interruption, from Andrey Ignatov. 5) Cleanup and simplification of ring access functions for AF_XDP with a bonus of 0-5% performance improvement, from Magnus Karlsson. 6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa. 7) Move and extend test_select_reuseport into BPF program tests under BPF selftests, from Jakub Sitnicki. 8) Various BPF sample improvements for xdpsock for customizing parameters to set up and benchmark AF_XDP, from Jay Jayatheerthan. 9) Improve libbpf to provide a ulimit hint on permission denied errors. Also change XDP sample programs to attach in driver mode by default, from Toke Høiland-Jørgensen. 10) Extend BPF test infrastructure to allow changing skb mark from tc BPF programs, from Nikita V. Shirokov. 11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King. 12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after libbpf conversion, from Jesper Dangaard Brouer. 13) Minor misc improvements from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25efi/libstub/x86: Avoid globals to store context during mixed mode callsArd Biesheuvel
Instead of storing the return address in a global variable when calling a 32-bit EFI service from the 64-bit stub, avoid the indirection via efi_exit32, and take the return address from the stack. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-26-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Rename efi_call_early/_runtime macros to be more intuitiveArd Biesheuvel
The macros efi_call_early and efi_call_runtime are used to call EFI boot services and runtime services, respectively. However, the naming is confusing, given that the early vs runtime distinction may suggest that these are used for calling the same set of services either early or late (== at runtime), while in reality, the sets of services they can be used with are completely disjoint, and efi_call_runtime is also only usable in 'early' code. So do a global sweep to replace all occurrences with efi_bs_call or efi_rt_call, respectively, where BS and RT match the idiom used by the UEFI spec to refer to boot time or runtime services. While at it, use 'func' as the macro parameter name for the function pointers, which is less likely to collide and cause weird build errors. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-24-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Drop 'table' argument from efi_table_attr() macroArd Biesheuvel
None of the definitions of the efi_table_attr() still refer to their 'table' argument so let's get rid of it entirely. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-23-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Drop protocol argument from efi_call_proto() macroArd Biesheuvel
After refactoring the mixed mode support code, efi_call_proto() no longer uses its protocol argument in any of its implementation, so let's remove it altogether. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-22-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub/x86: Work around page freeing issue in mixed modeArd Biesheuvel
Mixed mode translates calls from the 64-bit kernel into the 32-bit firmware by wrapping them in a call to a thunking routine that pushes a 32-bit word onto the stack for each argument passed to the function, regardless of the argument type. This works surprisingly well for most services and protocols, with the exception of ones that take explicit 64-bit arguments. efi_free() invokes the FreePages() EFI boot service, which takes a efi_physical_addr_t as its address argument, and this is one of those 64-bit types. This means that the 32-bit firmware will interpret the (addr, size) pair as a single 64-bit quantity, and since it is guaranteed to have the high word set (as size > 0), it will always fail due to the fact that EFI memory allocations are always < 4 GB on 32-bit firmware. So let's fix this by giving the thunking code a little hand, and pass two values for the address, and a third one for the size. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-21-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Remove 'sys_table_arg' from all function prototypesArd Biesheuvel
We have a helper efi_system_table() that gives us the address of the EFI system table in memory, so there is no longer point in passing it around from each function to the next. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-20-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Drop sys_table_arg from printk routinesArd Biesheuvel
As a first step towards getting rid of the need to pass around a function parameter 'sys_table_arg' pointing to the EFI system table, remove the references to it in the printing code, which is represents the majority of the use cases. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-19-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub/x86: Drop __efi_early() export and efi_config structArd Biesheuvel
The various pointers we stash in the efi_config struct which we retrieve using __efi_early() are simply copies of the ones in the EFI system table, which we have started accessing directly in the previous patch. So drop all the __efi_early() related plumbing, as well as all the assembly code dealing with efi_config, which allows us to move the PE/COFF entry point to C code as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-18-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Unify the efi_char16_printk implementationsArd Biesheuvel
Use a single implementation for efi_char16_printk() across all architectures. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-17-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Get rid of 'sys_table_arg' macro parameterArd Biesheuvel
The efi_call macros on ARM have a dependency on a variable 'sys_table_arg' existing in the scope of the macro instantiation. Since this variable always points to the same data structure, let's create a global getter for it and use that instead. Note that the use of a global variable with external linkage is avoided, given the problems we had in the past with early processing of the GOT tables. While at it, drop the redundant casts in the efi_table_attr and efi_call_proto macros. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-16-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub/x86: Avoid thunking for native firmware callsArd Biesheuvel
We use special wrapper routines to invoke firmware services in the native case as well as the mixed mode case. For mixed mode, the need is obvious, but for the native cases, we can simply rely on the compiler to generate the indirect call, given that GCC now has support for the MS calling convention (and has had it for quite some time now). Note that on i386, the decompressor and the EFI stub are not built with -mregparm=3 like the rest of the i386 kernel, so we can safely allow the compiler to emit the indirect calls here as well. So drop all the wrappers and indirection, and switch to either native calls, or direct calls into the thunk routine for mixed mode. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-14-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Annotate firmware routines as __efiapiArd Biesheuvel
Annotate all the firmware routines (boot services, runtime services and protocol methods) called in the boot context as __efiapi, and make it expand to __attribute__((ms_abi)) on 64-bit x86. This allows us to use the compiler to generate the calls into firmware that use the MS calling convention instead of the SysV one. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-13-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Use stricter typing for firmware function pointersArd Biesheuvel
We will soon remove another level of pointer casting, so let's make sure all type handling involving firmware calls at boot time is correct. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-12-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Drop explicit 32/64-bit protocol definitionsArd Biesheuvel
Now that we have incorporated the mixed mode protocol definitions into the native ones using unions, we no longer need the separate 32/64 bit struct definitions, with the exception of the EFI system table definition and the boot services, runtime services and configuration table definitions. So drop the unused ones. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-11-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Distinguish between native/mixed not 32/64 bitArd Biesheuvel
Currently, we support mixed mode by casting all boot time firmware calls to 64-bit explicitly on native 64-bit systems, and to 32-bit on 32-bit systems or 64-bit systems running with 32-bit firmware. Due to this explicit awareness of the bitness in the code, we do a lot of casting even on generic code that is shared with other architectures, where mixed mode does not even exist. This casting leads to loss of coverage of type checking by the compiler, which we should try to avoid. So instead of distinguishing between 32-bit vs 64-bit, distinguish between native vs mixed, and limit all the nasty casting and pointer mangling to the code that actually deals with mixed mode. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-10-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25efi/libstub: Extend native protocol definitions with mixed_mode aliasesArd Biesheuvel
In preparation of moving to a native vs. mixed mode split rather than a 32 vs. 64 bit split when it comes to invoking EFI firmware services, update all the native protocol definitions and redefine them as unions containing an anonymous struct for the native view and a struct called 'mixed_mode' describing the 32-bit view of the protocol when called from 64-bit code. While at it, flesh out some PCI I/O member definitions that we will be needing shortly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191224151025.32482-9-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>