summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2020-05-27KVM: nSVM: leave ASID aside in copy_vmcb_control_areaPaolo Bonzini
Restoring the ASID from the hsave area on VMEXIT is wrong, because its value depends on the handling of TLB flushes. Just skipping the field in copy_vmcb_control_area will do. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: nSVM: fix condition for filtering async PFPaolo Bonzini
Async page faults have to be trapped in the host (L1 in this case), since the APF reason was passed from L0 to L1 and stored in the L1 APF data page. This was completely reversed: the page faults were passed to the guest, a L2 hypervisor. Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27kvm/x86: Remove redundant function implementations彭浩(Richard)
pic_in_kernel(), ioapic_in_kernel() and irqchip_kernel() have the same implementation. Signed-off-by: Peng Hao <richard.peng@oppo.com> Message-Id: <HKAPR02MB4291D5926EA10B8BFE9EA0D3E0B70@HKAPR02MB4291.apcprd02.prod.outlook.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: Fix the indentation to match coding styleHaiwei Li
There is a bad indentation in next&queue branch. The patch looks like fixes nothing though it fixes the indentation. Before fixing: if (!handle_fastpath_set_x2apic_icr_irqoff(vcpu, data)) { kvm_skip_emulated_instruction(vcpu); ret = EXIT_FASTPATH_EXIT_HANDLED; } break; case MSR_IA32_TSCDEADLINE: After fixing: if (!handle_fastpath_set_x2apic_icr_irqoff(vcpu, data)) { kvm_skip_emulated_instruction(vcpu); ret = EXIT_FASTPATH_EXIT_HANDLED; } break; case MSR_IA32_TSCDEADLINE: Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <2f78457e-f3a7-3bc9-e237-3132ee87f71e@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: VMX: replace "fall through" with "return" to indicate different caseMiaohe Lin
The second "/* fall through */" in rmode_exception() makes code harder to read. Replace it with "return" to indicate they are different cases, only the #DB and #BP check vcpu->guest_debug, while others don't care. And this also improves the readability. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Message-Id: <1582080348-20827-1-git-send-email-linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: x86: Take an unsigned 32-bit int for has_emulated_msr()'s indexSean Christopherson
Take a u32 for the index in has_emulated_msr() to match hardware, which treats MSR indices as unsigned 32-bit values. Functionally, taking a signed int doesn't cause problems with the current code base, but could theoretically cause problems with 32-bit KVM, e.g. if the index were checked via a less-than statement, which would evaluate incorrectly for MSR indices with bit 31 set. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200218234012.7110-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: x86: Remove superfluous brackets from case statementSean Christopherson
Remove unnecessary brackets from a case statement that unintentionally encapsulates unrelated case statements in the same switch statement. While technically legal and functionally correct syntax, the brackets are visually confusing and potentially dangerous, e.g. the last of the encapsulated case statements has an undocumented fall-through that isn't flagged by compilers due the encapsulation. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200218234012.7110-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: x86: allow KVM_STATE_NESTED_MTF_PENDING in kvm_state flagsPaolo Bonzini
The migration functionality was left incomplete in commit 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation", 2020-02-23), fix it. Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation") Cc: stable@vger.kernel.org Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27Merge branch 'kvm-master' into HEADPaolo Bonzini
Merge AMD fixes before doing more development work.
2020-05-27Merge tag 'kvm-s390-next-5.8-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Cleanups for 5.8 - vsie (nesting) cleanups - remove unneeded semicolon
2020-05-27KVM: x86: simplify is_mmio_sptePaolo Bonzini
We can simply look at bits 52-53 to identify MMIO entries in KVM's page tables. Therefore, there is no need to pass a mask to kvm_mmu_set_mmio_spte_mask. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: x86: don't expose MSR_IA32_UMWAIT_CONTROL unconditionallyMaxim Levitsky
This msr is only available when the host supports WAITPKG feature. This breaks a nested guest, if the L1 hypervisor is set to ignore unknown msrs, because the only other safety check that the kernel does is that it attempts to read the msr and rejects it if it gets an exception. Cc: stable@vger.kernel.org Fixes: 6e3ba4abce ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200523161455.3940-3-mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: VMX: enable X86_FEATURE_WAITPKG in KVM capabilitiesMaxim Levitsky
Even though we might not allow the guest to use WAITPKG's new instructions, we should tell KVM that the feature is supported by the host CPU. Note that vmx_waitpkg_supported checks that WAITPKG _can_ be set in secondary execution controls as specified by VMX capability MSR, rather that we actually enable it for a guest. Cc: stable@vger.kernel.org Fixes: e69e72faa3a0 ("KVM: x86: Add support for user wait instructions") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200523161455.3940-2-mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27KVM: x86/mmu: Set mmio_value to '0' if reserved #PF can't be generatedSean Christopherson
Set the mmio_value to '0' instead of simply clearing the present bit to squash a benign warning in kvm_mmu_set_mmio_spte_mask() that complains about the mmio_value overlapping the lower GFN mask on systems with 52 bits of PA space. Opportunistically clean up the code and comments. Cc: stable@vger.kernel.org Fixes: d43e2675e96fc ("KVM: x86: only do L1TF workaround on affected processors") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200527084909.23492-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-27x86: Hide the archdata.iommu field behind generic IOMMU_APIKrzysztof Kozlowski
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU and AMD_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU or AMD_IOMMU are not selected). For the case when INTEL_IOMMU/AMD_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but neither INTEL_IOMMU nor AMD_IOMMU are not. Reported-by: kbuild test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200518120855.27822-2-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27ia64: Hide the archdata.iommu field behind generic IOMMU_APIKrzysztof Kozlowski
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU is not selected). For the case when INTEL_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but INTEL_IOMMU not. Reported-by: kbuild test robot <lkp@intel.com> Fixes: e93a1695d7fb ("iommu: Enable compile testing for some of drivers") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20200518120855.27822-1-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27MIPS: Loongson64: select NO_EXCEPT_FILLJiaxun Yang
Loongson64 load kernel at 0x82000000 and allocate exception vectors by ebase. So we don't need to reserve space for exception vectors at head of kernel. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()Anshuman Khandual
There is no way to proceed when requested register could not be searched in arm64_ftr_reg[]. Requesting for a non present register would be an error as well. Hence lets just WARN_ON() when search fails in get_arm64_ftr_reg() rather than checking for return value and doing a BUG_ON() instead in some individual callers. But there are also caller instances that dont error out when register search fails. Add a new helper get_arm64_ftr_reg_nowarn() for such cases. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1590573876-19120-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-05-27x86/apb_timer: Drop unused declaration and macroJohan Hovold
Drop an extern declaration that has never been used and a no longer needed macro. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200513100944.9171-2-johan@kernel.org
2020-05-27MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe()YuanJunQing
Register "a1" is unsaved in this function, when CONFIG_TRACE_IRQFLAGS is enabled, the TRACE_IRQS_OFF macro will call trace_hardirqs_off(), and this may change register "a1". The changed register "a1" as argument will be send to do_fpe() and do_msa_fpe(). Signed-off-by: YuanJunQing <yuanjunqing66@163.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27MIPS: mm: add page valid judgement in function pte_modifyBibo Mao
If original PTE has _PAGE_ACCESSED bit set, and new pte has no _PAGE_NO_READ bit set, we can add _PAGE_SILENT_READ bit to enable page valid bit. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27mm/memory.c: Add memory read privilege on page fault handlingBibo Mao
Here add pte_sw_mkyoung function to make page readable on MIPS platform during page fault handling. This patch improves page fault latency about 10% on my MIPS machine with lmbench lat_pagefault case. It is noop function on other arches, there is no negative influence on those architectures. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27mm/memory.c: Update local TLB if PTE entry existsBibo Mao
If two threads concurrently fault at the same page, the thread that won the race updates the PTE and its local TLB. For now, the other thread gives up, simply does nothing, and continues. It could happen that this second thread triggers another fault, whereby it only updates its local TLB while handling the fault. Instead of triggering another fault, let's directly update the local TLB of the second thread. Function update_mmu_tlb is used here to update local TLB on the second thread, and it is defined as empty on other arches. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27x86/apb_timer: Drop unused TSC calibrationJohan Hovold
Drop the APB-timer TSC calibration, which hasn't been used since the removal of Moorestown support by commit 1a8359e411eb ("x86/mid: Remove Intel Moorestown"). Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200513100944.9171-1-johan@kernel.org
2020-05-27MIPS: Do not flush tlb page when updating PTE entryBibo Mao
It is not necessary to flush tlb page on all CPUs if suitable PTE entry exists already during page fault handling, just updating TLB is fine. Here redefine flush_tlb_fix_spurious_fault as empty on MIPS system. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27MIPS: ingenic: Default to a generic boardPaul Cercueil
Having a generic board option makes it possible to create a kernel that will run on various Ingenic SoCs, as long as the right devicetree is provided. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27MIPS: ingenic: Add support for GCW Zero prototypePaul Cercueil
Add support for the GCW Zero prototype. The only (?) difference is that it only has 256 MiB of RAM, compared to the 512 MiB of RAM of the retail device. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27MIPS: ingenic: DTS: Add memory info of GCW ZeroPaul Cercueil
Add memory info of the GCW Zero in its devicetree. The bootloader generally provides this information, but since it is fixed to 512 MiB, it doesn't hurt to have it in devicetree. It allows the kernel to boot without any parameter passed as argument. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27MIPS: Loongson64: Switch to generic PCI driverJiaxun Yang
We can now enable generic PCI driver in Kconfig, and remove legacy PCI driver code. Radeon vbios quirk is moved to the platform folder to fit the new structure. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27MIPS: DTS: Loongson64: Add PCI Controller NodeJiaxun Yang
Add PCI Host controller node for Loongson64 with RS780E PCH dts. Note that PCI interrupts are probed via legacy way, as different machine have different interrupt arrangement, we can't cover all of them in dt. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-27MIPS: BCM63xx: fix 6328 boot selection bitÁlvaro Fernández Rojas
MISC_STRAP_BUS_BOOT_SEL_SHIFT is 18 according to Broadcom's GPL source code. Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-05-26ARM: dts: imx6qdl-sabresd: enable fec wake-on-lanFugang Duan
Enable ethernet wake-on-lan feature for imx6q/dl/qp sabresd boards since the PHY clock is supplied by external osc. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26ARM: dts: imx: add ethernet stop mode propertyFugang Duan
- Update the imx6qdl gpr property to define gpr register offset and bit in DT. - Add imx6sx/imx6ul/imx7d ethernet stop mode property. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27KVM: PPC: Book3S HV: Relax check on H_SVM_INIT_ABORTLaurent Dufour
The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") added checks of secure bit of SRR1 to filter out the Hcall reserved to the Ultravisor. However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the context of the VM calling UV_ESM. This allows the Hypervisor to return to the guest without going through the Ultravisor. Thus the Secure bit of SRR1 is not set in that particular case. In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is not set in that case. Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-05-27KVM: PPC: Book3S: Fix some RCU-list locksQian Cai
It is unsafe to traverse kvm->arch.spapr_tce_tables and stt->iommu_tables without the RCU read lock held. Also, add cond_resched_rcu() in places with the RCU read lock held that could take a while to finish. arch/powerpc/kvm/book3s_64_vio.c:76 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by qemu-kvm/4265. stack backtrace: CPU: 96 PID: 4265 Comm: qemu-kvm Not tainted 5.7.0-rc4-next-20200508+ #2 Call Trace: [c000201a8690f720] [c000000000715948] dump_stack+0xfc/0x174 (unreliable) [c000201a8690f770] [c0000000001d9470] lockdep_rcu_suspicious+0x140/0x164 [c000201a8690f7f0] [c008000010b9fb48] kvm_spapr_tce_release_iommu_group+0x1f0/0x220 [kvm] [c000201a8690f870] [c008000010b8462c] kvm_spapr_tce_release_vfio_group+0x54/0xb0 [kvm] [c000201a8690f8a0] [c008000010b84710] kvm_vfio_destroy+0x88/0x140 [kvm] [c000201a8690f8f0] [c008000010b7d488] kvm_put_kvm+0x370/0x600 [kvm] [c000201a8690f990] [c008000010b7e3c0] kvm_vm_release+0x38/0x60 [kvm] [c000201a8690f9c0] [c0000000005223f4] __fput+0x124/0x330 [c000201a8690fa20] [c000000000151cd8] task_work_run+0xb8/0x130 [c000201a8690fa70] [c0000000001197e8] do_exit+0x4e8/0xfa0 [c000201a8690fb70] [c00000000011a374] do_group_exit+0x64/0xd0 [c000201a8690fbb0] [c000000000132c90] get_signal+0x1f0/0x1200 [c000201a8690fcc0] [c000000000020690] do_notify_resume+0x130/0x3c0 [c000201a8690fda0] [c000000000038d64] syscall_exit_prepare+0x1a4/0x280 [c000201a8690fe20] [c00000000000c8f8] system_call_common+0xf8/0x278 ==== arch/powerpc/kvm/book3s_64_vio.c:368 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4264: #0: c000201ae2d000d8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm] #1: c000200c9ed0c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm] ==== arch/powerpc/kvm/book3s_64_vio.c:108 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-kvm/4257: #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm] ==== arch/powerpc/kvm/book3s_64_vio.c:146 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-kvm/4257: #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm] Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-05-27KVM: PPC: Book3S HV: Ignore kmemleak false positivesQian Cai
kvmppc_pmd_alloc() and kvmppc_pte_alloc() allocate some memory but then pud_populate() and pmd_populate() will use __pa() to reference the newly allocated memory. Since kmemleak is unable to track the physical memory resulting in false positives, silence those by using kmemleak_ignore(). unreferenced object 0xc000201c382a1000 (size 4096): comm "qemu-kvm", pid 124828, jiffies 4295733767 (age 341.250s) hex dump (first 32 bytes): c0 00 20 09 f4 60 03 87 c0 00 20 10 72 a0 03 87 .. ..`.... .r... c0 00 20 0e 13 a0 03 87 c0 00 20 1b dc c0 03 87 .. ....... ..... backtrace: [<000000004cc2790f>] kvmppc_create_pte+0x838/0xd20 [kvm_hv] kvmppc_pmd_alloc at arch/powerpc/kvm/book3s_64_mmu_radix.c:366 (inlined by) kvmppc_create_pte at arch/powerpc/kvm/book3s_64_mmu_radix.c:590 [<00000000d123c49a>] kvmppc_book3s_instantiate_page+0x2e0/0x8c0 [kvm_hv] [<00000000bb549087>] kvmppc_book3s_radix_page_fault+0x1b4/0x2b0 [kvm_hv] [<0000000086dddc0e>] kvmppc_book3s_hv_page_fault+0x214/0x12a0 [kvm_hv] [<000000005ae9ccc2>] kvmppc_vcpu_run_hv+0xc5c/0x15f0 [kvm_hv] [<00000000d22162ff>] kvmppc_vcpu_run+0x34/0x48 [kvm] [<00000000d6953bc4>] kvm_arch_vcpu_ioctl_run+0x314/0x420 [kvm] [<000000002543dd54>] kvm_vcpu_ioctl+0x33c/0x950 [kvm] [<0000000048155cd6>] ksys_ioctl+0xd8/0x130 [<0000000041ffeaa7>] sys_ioctl+0x28/0x40 [<000000004afc4310>] system_call_exception+0x114/0x1e0 [<00000000fb70a873>] system_call_common+0xf0/0x278 unreferenced object 0xc0002001f0c03900 (size 256): comm "qemu-kvm", pid 124830, jiffies 4295735235 (age 326.570s) hex dump (first 32 bytes): c0 00 20 10 fa a0 03 87 c0 00 20 10 fa a1 03 87 .. ....... ..... c0 00 20 10 fa a2 03 87 c0 00 20 10 fa a3 03 87 .. ....... ..... backtrace: [<0000000023f675b8>] kvmppc_create_pte+0x854/0xd20 [kvm_hv] kvmppc_pte_alloc at arch/powerpc/kvm/book3s_64_mmu_radix.c:356 (inlined by) kvmppc_create_pte at arch/powerpc/kvm/book3s_64_mmu_radix.c:593 [<00000000d123c49a>] kvmppc_book3s_instantiate_page+0x2e0/0x8c0 [kvm_hv] [<00000000bb549087>] kvmppc_book3s_radix_page_fault+0x1b4/0x2b0 [kvm_hv] [<0000000086dddc0e>] kvmppc_book3s_hv_page_fault+0x214/0x12a0 [kvm_hv] [<000000005ae9ccc2>] kvmppc_vcpu_run_hv+0xc5c/0x15f0 [kvm_hv] [<00000000d22162ff>] kvmppc_vcpu_run+0x34/0x48 [kvm] [<00000000d6953bc4>] kvm_arch_vcpu_ioctl_run+0x314/0x420 [kvm] [<000000002543dd54>] kvm_vcpu_ioctl+0x33c/0x950 [kvm] [<0000000048155cd6>] ksys_ioctl+0xd8/0x130 [<0000000041ffeaa7>] sys_ioctl+0x28/0x40 [<000000004afc4310>] system_call_exception+0x114/0x1e0 [<00000000fb70a873>] system_call_common+0xf0/0x278 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-05-27KVM: PPC: Clean up redundant 'kvm_run' parametersTianjia Zhang
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-05-27KVM: PPC: Remove redundant kvm_run from vcpu_archTianjia Zhang
The 'kvm_run' field already exists in the 'vcpu' structure, which is the same structure as the 'kvm_run' in the 'vcpu_arch' and should be deleted. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-05-27KVM: PPC: Book3S HV: Read ibm,secure-memory nodesLaurent Dufour
The newly introduced ibm,secure-memory nodes supersede the ibm,uv-firmware's property secure-memory-ranges. Firmware will no more expose the secure-memory-ranges property so first read the new one and if not found rollback to the older one. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-05-27KVM: PPC: Book3S HV: Remove redundant NULL checkChen Zhou
Free function kfree() already does NULL check, so the additional check is unnecessary, just remove it. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-05-26x86/io_apic: Remove unused function mp_init_irq_at_boot()YueHaibing
There are no callers in-tree anymore since ef9e56d894ea ("x86/ioapic: Remove obsolete post hotplug update") so remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200508140808.49428-1-yuehaibing@huawei.com
2020-05-26x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"Andy Lutomirski
Revert 45e29d119e99 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long") and add a comment to discourage someone else from making the same mistake again. It turns out that some user code fails to compile if __X32_SYSCALL_BIT is unsigned long. See, for example [1] below. [ bp: Massage and do the same thing in the respective tools/ header. ] Fixes: 45e29d119e99 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long") Reported-by: Thorsten Glaser <t.glaser@tarent.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@kernel.org Link: [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954294 Link: https://lkml.kernel.org/r/92e55442b744a5951fdc9cfee10badd0a5f7f828.1588983892.git.luto@kernel.org
2020-05-26powerpc/xive: Clear the page tables for the ESB IO mappingCédric Le Goater
Commit 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the machine crash handler") fixed an issue in the FW assisted dump of machines using hash MMU and the XIVE interrupt mode under the POWER hypervisor. It forced the mapping of the ESB page of interrupts being mapped in the Linux IRQ number space to make sure the 'crash kexec' sequence worked during such an event. But it didn't handle the un-mapping. This mapping is now blocking the removal of a passthrough IO adapter under the POWER hypervisor because it expects the guest OS to have cleared all page table entries related to the adapter. If some are still present, the RTAS call which isolates the PCI slot returns error 9001 "valid outstanding translations". Remove these mapping in the IRQ data cleanup routine. Under KVM, this cleanup is not required because the ESB pages for the adapter interrupts are un-mapped from the guest by the hypervisor in the KVM XIVE native device. This is now redundant but it's harmless. Fixes: 1ca3dec2b2df ("powerpc/xive: Prevent page fault issues in the machine crash handler") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org
2020-05-26powerpc: Add ppc_inst_as_u64()Michael Ellerman
The code patching code wants to get the value of a struct ppc_inst as a u64 when the instruction is prefixed, so we can pass the u64 down to __put_user_asm() and write it with a single store. The optprobes code wants to load a struct ppc_inst as an immediate into a register so it is useful to have it as a u64 to use the existing helper function. Currently this is a bit awkward because the value differs based on the CPU endianness, so add a helper to do the conversion. This fixes the usage in arch_prepare_optimized_kprobe() which was previously incorrect on big endian. Fixes: 650b55b707fd ("powerpc: Add prefixed instructions to instruction data type") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Jordan Niethe <jniethe5@gmail.com> Link: https://lore.kernel.org/r/20200526072630.2487363-1-mpe@ellerman.id.au
2020-05-26powerpc: Add ppc_inst_next()Michael Ellerman
In a few places we want to calculate the address of the next instruction. Previously that was simple, we just added 4 bytes, or if using a u32 * we incremented that pointer by 1. But prefixed instructions make it more complicated, we need to advance by either 4 or 8 bytes depending on the actual instruction. We also can't do pointer arithmetic using struct ppc_inst, because it is always 8 bytes in size on 64-bit, even though we might only need to advance by 4 bytes. So add a ppc_inst_next() helper which calculates the location of the next instruction, if the given instruction was located at the given address. Note the instruction doesn't need to actually be at the address in memory. Although it would seem natural for the value to be passed by value, that makes it too easy to write a loop that will read off the end of a page, eg: for (; src < end; src = ppc_inst_next(src, *src), dest = ppc_inst_next(dest, *dest)) As noticed by Christophe and Jordan, if end is the exact end of a page, and the next page is not mapped, this will fault, because *dest will read 8 bytes, 4 bytes into the next page. So value is passed by reference, so the helper can be careful to use ppc_inst_read() on it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Link: https://lore.kernel.org/r/20200522133318.1681406-1-mpe@ellerman.id.au
2020-05-26Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch from this cycle. It contains several important fixes we need in next for testing purposes, and also some that will conflict with upcoming changes.
2020-05-26Merge "Use hugepages to map kernel mem on 8xx" into nextMichael Ellerman
Merge Christophe's large series to use huge pages for the linear mapping on 8xx. From his cover letter: The main purpose of this big series is to: - reorganise huge page handling to avoid using mm_slices. - use huge pages to map kernel memory on the 8xx. The 8xx supports 4 page sizes: 4k, 16k, 512k and 8M. It uses 2 Level page tables, PGD having 1024 entries, each entry covering 4M address space. Then each page table has 1024 entries. At the time being, page sizes are managed in PGD entries, implying the use of mm_slices as it can't mix several pages of the same size in one page table. The first purpose of this series is to reorganise things so that standard page tables can also handle 512k pages. This is done by adding a new _PAGE_HUGE flag which will be copied into the Level 1 entry in the TLB miss handler. That done, we have 2 types of pages: - PGD entries to regular page tables handling 4k/16k and 512k pages - PGD entries to hugepd tables handling 8M pages. There is no need to mix 8M pages with other sizes, because a 8M page will use more than what a single PGD covers. Then comes the second purpose of this series. At the time being, the 8xx has implemented special handling in the TLB miss handlers in order to transparently map kernel linear address space and the IMMR using huge pages by building the TLB entries in assembly at the time of the exception. As mm_slices is only for user space pages, and also because it would anyway not be convenient to slice kernel address space, it was not possible to use huge pages for kernel address space. But after step one of the series, it is now more flexible to use huge pages. This series drop all assembly 'just in time' handling of huge pages and use huge pages in page tables instead. Once the above is done, then comes icing on the cake: - Use huge pages for KASAN shadow mapping - Allow pinned TLBs with strict kernel rwx - Allow pinned TLBs with debug pagealloc Then, last but not least, those modifications for the 8xx allows the following improvement on book3s/32: - Mapping KASAN shadow with BATs - Allowing BATs with debug pagealloc All this allows to considerably simplify TLB miss handlers and associated initialisation. The overhead of reading page tables is negligible compared to the reduction of the miss handlers. While we were at touching pte_update(), some cleanup was done there too. Tested widely on 8xx and 832x. Boot tested on QEMU MAC99.
2020-05-26powerpc/32s: Implement dedicated kasan_init_region()Christophe Leroy
Implement a kasan_init_region() dedicated to book3s/32 that allocates KASAN regions using BATs. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/709e821602b48a1d7c211a9b156da26db98c3e9d.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOCChristophe Leroy
DEBUG_PAGEALLOC only manages RW data. Text and RO data can still be mapped with BATs. In order to map with BATs, also enforce data alignment. Set by default to 256M which is a good compromise for keeping enough BATs for also KASAN and IMMR. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fd29c1718ee44d82115d0e835ced808eb4ccbf51.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Implement dedicated kasan_init_region()Christophe Leroy
Implement a kasan_init_region() dedicated to 8xx that allocates KASAN regions using huge pages. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d2d60202a8821dc81cffe6ff59cc13c15b7e4bb6.1589866984.git.christophe.leroy@csgroup.eu