summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-18PCI: tegra194: Fix MCFG quirk build regressionsJon Hunter
7f100744749e ("PCI: tegra: Add Tegra194 MCFG quirks for ECAM errata") caused a few build regressions: - 7f100744749e removed the Makefile rule for CONFIG_PCIE_TEGRA194, so pcie-tegra.c can no longer be built as a module. Restore that rule. - 7f100744749e added "#ifdef CONFIG_PCIE_TEGRA194" around the native driver, but that's only set when the driver is built-in (for a module, CONFIG_PCIE_TEGRA194_MODULE is defined). The ACPI quirk is completely independent of the rest of the native driver, so move the quirk to its own file and remove the #ifdef in the native driver. - 7f100744749e added symbols that are always defined but used only when CONFIG_PCIEASPM, which causes warnings when CONFIG_PCIEASPM is not set: drivers/pci/controller/dwc/pcie-tegra194.c:259:18: warning: ‘event_cntr_data_offset’ defined but not used [-Wunused-const-variable=] drivers/pci/controller/dwc/pcie-tegra194.c:250:18: warning: ‘event_cntr_ctrl_offset’ defined but not used [-Wunused-const-variable=] drivers/pci/controller/dwc/pcie-tegra194.c:243:27: warning: ‘pcie_gen_freq’ defined but not used [-Wunused-const-variable=] Fixes: 7f100744749e ("PCI: tegra: Add Tegra194 MCFG quirks for ECAM errata") Link: https://lore.kernel.org/r/20210610064134.336781-1-jonathanh@nvidia.com Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thierry Reding <treding@nvidia.com>
2021-06-18PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GBPunit Agrawal
Alexandru and Qu reported this resource allocation failure on ROCKPro64 v2 and ROCK Pi 4B, both based on the RK3399: pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff 64bit] pci 0000:00:00.0: PCI bridge to [bus 01] pci 0000:00:00.0: BAR 14: no space for [mem size 0x00100000] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit] "BAR 14" is the PCI bridge's 32-bit non-prefetchable window, and our PCI allocation code isn't smart enough to allocate it in a host bridge window marked as 64-bit, even though this should work fine. A DT host bridge description includes the windows from the CPU address space to the PCI bus space. On a few architectures (microblaze, powerpc, sparc), the DT may also describe PCI devices themselves, including their BARs. Before 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses"), of_bus_pci_get_flags() ignored the fact that some DT addresses described 64-bit windows and BARs. That was a problem because the virtio virtual NIC has a 32-bit BAR and a 64-bit BAR, and the driver couldn't distinguish them. 9d57e61bf723 set IORESOURCE_MEM_64 for those 64-bit DT ranges, which fixed the virtio driver. But it also set IORESOURCE_MEM_64 for host bridge windows, which exposed the fact that the PCI allocator isn't smart enough to put 32-bit resources in those 64-bit windows. Clear IORESOURCE_MEM_64 from host bridge windows since we don't need that information. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Fixes: 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses") Link: https://lore.kernel.org/r/20210614230457.752811-1-punitagrawal@gmail.com Reported-at: https://lore.kernel.org/lkml/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com/ Reported-at: https://lore.kernel.org/lkml/YMyTUv7Jsd89PGci@m4/T/#u Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Reported-by: Qu Wenruo <wqu@suse.com> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Domenico Andreoli <domenico.andreoli@linux.com> Signed-off-by: Punit Agrawal <punitagrawal@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org>
2021-06-18Merge branch kvm-arm64/pmu-fixes into kvmarm-master/nextMarc Zyngier
Fixes for the PMUv3 emulation of PMCR_EL0: - Don't spuriously reset the cycle counter when resetting other counters - Force PMCR_EL0 to become effective after having restored it * kvm-arm64/pmu-fixes: KVM: arm64: Restore PMU configuration on first run KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is set
2021-06-18KVM: arm64: Restore PMU configuration on first runMarc Zyngier
Restoring a guest with an active virtual PMU results in no perf counters being instanciated on the host side. Not quite what you'd expect from a restore. In order to fix this, force a writeback of PMCR_EL0 on the first run of a vcpu (using a new request so that it happens once the vcpu has been loaded). This will in turn create all the host-side counters that were missing. Reported-by: Jinank Jain <jinankj@amazon.de> Tested-by: Jinank Jain <jinankj@amazon.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/87wnrbylxv.wl-maz@kernel.org Link: https://lore.kernel.org/r/b53dfcf9bbc4db7f96154b1cd5188d72b9766358.camel@amazon.de
2021-06-18tracing: Do no increment trace_clock_global() by oneSteven Rostedt (VMware)
The trace_clock_global() tries to make sure the events between CPUs is somewhat in order. A global value is used and updated by the latest read of a clock. If one CPU is ahead by a little, and is read by another CPU, a lock is taken, and if the timestamp of the other CPU is behind, it will simply use the other CPUs timestamp. The lock is also only taken with a "trylock" due to tracing, and strange recursions can happen. The lock is not taken at all in NMI context. In the case where the lock is not able to be taken, the non synced timestamp is returned. But it will not be less than the saved global timestamp. The problem arises because when the time goes "backwards" the time returned is the saved timestamp plus 1. If the lock is not taken, and the plus one to the timestamp is returned, there's a small race that can cause the time to go backwards! CPU0 CPU1 ---- ---- trace_clock_global() { ts = clock() [ 1000 ] trylock(clock_lock) [ success ] global_ts = ts; [ 1000 ] <interrupted by NMI> trace_clock_global() { ts = clock() [ 999 ] if (ts < global_ts) ts = global_ts + 1 [ 1001 ] trylock(clock_lock) [ fail ] return ts [ 1001] } unlock(clock_lock); return ts; [ 1000 ] } trace_clock_global() { ts = clock() [ 1000 ] if (ts < global_ts) [ false 1000 == 1000 ] trylock(clock_lock) [ success ] global_ts = ts; [ 1000 ] unlock(clock_lock) return ts; [ 1000 ] } The above case shows to reads of trace_clock_global() on the same CPU, but the second read returns one less than the first read. That is, time when backwards, and this is not what is allowed by trace_clock_global(). This was triggered by heavy tracing and the ring buffer checker that tests for the clock going backwards: Ring buffer clock went backwards: 20613921464 -> 20613921463 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 0 at kernel/trace/ring_buffer.c:3412 check_buffer+0x1b9/0x1c0 Modules linked in: [..] [CPU: 2]TIME DOES NOT MATCH expected:20620711698 actual:20620711697 delta:6790234 before:20613921463 after:20613921463 [20613915818] PAGE TIME STAMP [20613915818] delta:0 [20613915819] delta:1 [20613916035] delta:216 [20613916465] delta:430 [20613916575] delta:110 [20613916749] delta:174 [20613917248] delta:499 [20613917333] delta:85 [20613917775] delta:442 [20613917921] delta:146 [20613918321] delta:400 [20613918568] delta:247 [20613918768] delta:200 [20613919306] delta:538 [20613919353] delta:47 [20613919980] delta:627 [20613920296] delta:316 [20613920571] delta:275 [20613920862] delta:291 [20613921152] delta:290 [20613921464] delta:312 [20613921464] delta:0 TIME EXTEND [20613921464] delta:0 This happened more than once, and always for an off by one result. It also started happening after commit aafe104aa9096 was added. Cc: stable@vger.kernel.org Fixes: aafe104aa9096 ("tracing: Restructure trace_clock_global() to never block") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-18tracing: Do not stop recording comms if the trace file is being readSteven Rostedt (VMware)
A while ago, when the "trace" file was opened, tracing was stopped, and code was added to stop recording the comms to saved_cmdlines, for mapping of the pids to the task name. Code has been added that only records the comm if a trace event occurred, and there's no reason to not trace it if the trace file is opened. Cc: stable@vger.kernel.org Fixes: 7ffbd48d5cab2 ("tracing: Cache comms only after an event occurred") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-18tracing: Do not stop recording cmdlines when tracing is offSteven Rostedt (VMware)
The saved_cmdlines is used to map pids to the task name, such that the output of the tracing does not just show pids, but also gives a human readable name for the task. If the name is not mapped, the output looks like this: <...>-1316 [005] ...2 132.044039: ... Instead of this: gnome-shell-1316 [005] ...2 132.044039: ... The names are updated when tracing is running, but are skipped if tracing is stopped. Unfortunately, this stops the recording of the names if the top level tracer is stopped, and not if there's other tracers active. The recording of a name only happens when a new event is written into a ring buffer, so there is no need to test if tracing is on or not. If tracing is off, then no event is written and no need to test if tracing is off or not. Remove the check, as it hides the names of tasks for events in the instance buffers. Cc: stable@vger.kernel.org Fixes: 7ffbd48d5cab2 ("tracing: Cache comms only after an event occurred") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-18recordmcount: Correct st_shndx handlingPeter Zijlstra
One should only use st_shndx when >SHN_UNDEF and <SHN_LORESERVE. When SHN_XINDEX, then use .symtab_shndx. Otherwise use 0. This handles the case: st_shndx >= SHN_LORESERVE && st_shndx != SHN_XINDEX. Link: https://lore.kernel.org/lkml/20210607023839.26387-1-mark-pk.tsai@mediatek.com/ Link: https://lkml.kernel.org/r/20210616154126.2794-1-mark-pk.tsai@mediatek.com Reported-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [handle endianness of sym->st_shndx] Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-18KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is setAlexandru Elisei
According to ARM DDI 0487G.a, page D13-3895, setting the PMCR_EL0.P bit to 1 has the following effect: "Reset all event counters accessible in the current Exception level, not including PMCCNTR_EL0, to zero." Similar behaviour is described for AArch32 on page G8-7022. Make it so. Fixes: c01d6a18023b ("KVM: arm64: pmu: Only handle supported event counters") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210618105139.83795-1-alexandru.elisei@arm.com
2021-06-18Merge branch kvm-arm64/mmu/stage2-cmos into kvmarm-master/nextMarc Zyngier
Cache maintenance updates from Yanan Wang, moving the CMOs down into the page-table code. This ensures that we only issue them when actually performing a mapping rather than upfront. * kvm-arm64/mmu/stage2-cmos: KVM: arm64: Move guest CMOs to the fault handlers KVM: arm64: Tweak parameters of guest cache maintenance functions KVM: arm64: Introduce mm_ops member for structure stage2_attr_data KVM: arm64: Introduce two cache maintenance callbacks
2021-06-18KVM: arm64: Move guest CMOs to the fault handlersYanan Wang
We currently uniformly perform CMOs of D-cache and I-cache in function user_mem_abort before calling the fault handlers. If we get concurrent guest faults(e.g. translation faults, permission faults) or some really unnecessary guest faults caused by BBM, CMOs for the first vcpu are necessary while the others later are not. By moving CMOs to the fault handlers, we can easily identify conditions where they are really needed and avoid the unnecessary ones. As it's a time consuming process to perform CMOs especially when flushing a block range, so this solution reduces much load of kvm and improve efficiency of the stage-2 page table code. We can imagine two specific scenarios which will gain much benefit: 1) In a normal VM startup, this solution will improve the efficiency of handling guest page faults incurred by vCPUs, when initially populating stage-2 page tables. 2) After live migration, the heavy workload will be resumed on the destination VM, however all the stage-2 page tables need to be rebuilt at the moment. So this solution will ease the performance drop during resuming stage. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210617105824.31752-5-wangyanan55@huawei.com
2021-06-18KVM: arm64: Tweak parameters of guest cache maintenance functionsYanan Wang
Adjust the parameter "kvm_pfn_t pfn" of __clean_dcache_guest_page and __invalidate_icache_guest_page to "void *va", which paves the way for converting these two guest CMO functions into callbacks in structure kvm_pgtable_mm_ops. No functional change. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210617105824.31752-4-wangyanan55@huawei.com
2021-06-18KVM: arm64: Introduce mm_ops member for structure stage2_attr_dataYanan Wang
Also add a mm_ops member for structure stage2_attr_data, since we will move I-cache maintenance for guest stage-2 to the permission path and as a result will need mm_ops for some callbacks. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210617105824.31752-3-wangyanan55@huawei.com
2021-06-18KVM: arm64: Introduce two cache maintenance callbacksYanan Wang
To prepare for performing CMOs for guest stage-2 in the fault handlers in pgtable.c, here introduce two cache maintenance callbacks in struct kvm_pgtable_mm_ops. We also adjust the comment alignment for the existing part but make no real content change at all. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> [maz: fixed up comments and renamed callbacks] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210617105824.31752-2-wangyanan55@huawei.com
2021-06-18mac80211: handle various extensible elements correctlyJohannes Berg
Various elements are parsed with a requirement to have an exact size, when really we should only check that they have the minimum size that we need. Check only that and therefore ignore any additional data that they might carry. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.cd101f8040a4.Iadf0e9b37b100c6c6e79c7b298cc657c2be9151a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-18mac80211: reset profile_periodicity/ema_apJohannes Berg
Apparently we never clear these values, so they'll remain set since the setting of them is conditional. Clear the values in the relevant other cases. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.316e32d136a9.I2a12e51814258e1e1b526103894f4b9f19a91c8d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-18cfg80211: avoid double free of PMSR requestAvraham Stern
If cfg80211_pmsr_process_abort() moves all the PMSR requests that need to be freed into a local list before aborting and freeing them. As a result, it is possible that cfg80211_pmsr_complete() will run in parallel and free the same PMSR request. Fix it by freeing the request in cfg80211_pmsr_complete() only if it is still in the original pmsr list. Cc: stable@vger.kernel.org Fixes: 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API") Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.1fbef57e269a.I00294bebdb0680b892f8d1d5c871fd9dbe785a5e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-18cfg80211: make certificate generation more robustJohannes Berg
If all net/wireless/certs/*.hex files are deleted, the build will hang at this point since the 'cat' command will have no arguments. Do "echo | cat - ..." so that even if the "..." part is empty, the whole thing won't hang. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20210618133832.c989056c3664.Ic3b77531d00b30b26dcd69c64e55ae2f60c3f31e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-18KVM: x86/mmu: Remove redundant root_hpa checksDavid Matlack
The root_hpa checks below the top-level check in kvm_mmu_page_fault are theoretically redundant since there is no longer a way for the root_hpa to be reset during a page fault. The details of why are described in commit ddce6208217c ("KVM: x86/mmu: Move root_hpa validity checks to top of page fault handler") __direct_map, kvm_tdp_mmu_map, and get_mmio_spte are all only reachable through kvm_mmu_page_fault, therefore their root_hpa checks are redundant. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210617231948.2591431-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18KVM: x86/mmu: Refactor is_tdp_mmu_root into is_tdp_mmuDavid Matlack
This change simplifies the call sites slightly and also abstracts away the implementation detail of looking at root_hpa as the mechanism for determining if the mmu is the TDP MMU. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210617231948.2591431-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18KVM: x86/mmu: Remove redundant is_tdp_mmu_enabled checkDavid Matlack
This check is redundant because the root shadow page will only be a TDP MMU page if is_tdp_mmu_enabled() returns true, and is_tdp_mmu_enabled() never changes for the lifetime of a VM. It's possible that this check was added for performance reasons but it is unlikely that it is useful in practice since to_shadow_page() is cheap. That being said, this patch also caches the return value of is_tdp_mmu_root() in direct_page_fault() since there's no reason to duplicate the call so many times, so performance is not a concern. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210617231948.2591431-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18KVM: x86/mmu: Remove redundant is_tdp_mmu_root checkDavid Matlack
The check for is_tdp_mmu_root in kvm_tdp_mmu_map is redundant because kvm_tdp_mmu_map's only caller (direct_page_fault) already checks is_tdp_mmu_root. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210617231948.2591431-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18KVM: x86: Stub out is_tdp_mmu_root on 32-bit hostsPaolo Bonzini
If is_tdp_mmu_root is not inlined, the elimination of TDP MMU calls as dead code might not work out. To avoid this, explicitly declare the stubbed is_tdp_mmu_root on 32-bit hosts. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18KVM: x86: WARN and reject loading KVM if NX is supported but not enabledSean Christopherson
WARN if NX is reported as supported but not enabled in EFER. All flavors of the kernel, including non-PAE 32-bit kernels, set EFER.NX=1 if NX is supported, even if NX usage is disable via kernel command line. KVM relies on NX being enabled if it's supported, e.g. KVM will generate illegal NPT entries if nx_huge_pages is enabled and NX is supported but not enabled. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210615164535.2146172-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18KVM: SVM: Refuse to load kvm_amd if NX support is not availableSean Christopherson
Refuse to load KVM if NX support is not available. Shadow paging has assumed NX support since commit 9167ab799362 ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active"), and NPT has assumed NX support since commit b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation"). While the NX huge pages mitigation should not be enabled by default for AMD CPUs, it can be turned on by userspace at will. Unlike Intel CPUs, AMD does not provide a way for firmware to disable NX support, and Linux always sets EFER.NX=1 if it is supported. Given that it's extremely unlikely that a CPU supports NPT but not NX, making NX a formal requirement is far simpler than adding requirements to the mitigation flow. Fixes: 9167ab799362 ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active") Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210615164535.2146172-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18KVM: VMX: Refuse to load kvm_intel if EPT and NX are disabledSean Christopherson
Refuse to load KVM if NX support is not available and EPT is not enabled. Shadow paging has assumed NX support since commit 9167ab799362 ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active"), so for all intents and purposes this has been a de facto requirement for over a year. Do not require NX support if EPT is enabled purely because Intel CPUs let firmware disable NX support via MSR_IA32_MISC_ENABLES. If not for that, VMX (and KVM as a whole) could require NX support with minimal risk to breaking userspace. Fixes: 9167ab799362 ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210615164535.2146172-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-18mac80211: minstrel_ht: fix sample time checkFelix Fietkau
We need to skip sampling if the next sample time is after jiffies, not before. This patch fixes an issue where in some cases only very little sampling (or none at all) is performed, leading to really bad data rates Fixes: 80d55154b2f8 ("mac80211: minstrel_ht: significantly redesign the rate probing strategy") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210617103854.61875-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-06-18powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not setAthira Rajeev
On systems without any specific PMU driver support registered, running perf record causes Oops. The relevant portion from call trace: BUG: Kernel NULL pointer dereference on read at 0x00000040 Faulting instruction address: 0xc0021f0c Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K PREEMPT CMPCPRO SAF3000 DIE NOTIFICATION CPU: 0 PID: 442 Comm: null_syscall Not tainted 5.13.0-rc6-s3k-dev-01645-g7649ee3d2957 #5164 NIP: c0021f0c LR: c00e8ad8 CTR: c00d8a5c NIP perf_instruction_pointer+0x10/0x60 LR perf_prepare_sample+0x344/0x674 Call Trace: perf_prepare_sample+0x7c/0x674 (unreliable) perf_event_output_forward+0x3c/0x94 __perf_event_overflow+0x74/0x14c perf_swevent_hrtimer+0xf8/0x170 __hrtimer_run_queues.constprop.0+0x160/0x318 hrtimer_interrupt+0x148/0x3b0 timer_interrupt+0xc4/0x22c Decrementer_virt+0xb8/0xbc During perf record session, perf_instruction_pointer() is called to capture the sample IP. This function in core-book3s accesses ppmu->flags. If a platform specific PMU driver is not registered, ppmu is set to NULL and accessing its members results in a crash. Fix this crash by checking if ppmu is set. Fixes: 2ca13a4cc56c ("powerpc/perf: Use regs->nip when SIAR is zero") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1623952506-1431-1-git-send-email-atrajeev@linux.vnet.ibm.com
2021-06-18Merge tag 'amd-drm-fixes-5.13-2021-06-16' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.13-2021-06-16: amdgpu: - GFX9 and 10 powergating fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210616204913.4368-1-alexander.deucher@amd.com
2021-06-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Miscellaneous bugfixes. The main interesting one is a NULL pointer dereference reported by syzkaller ("KVM: x86: Immediately reset the MMU context when the SMM flag is cleared")" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Fix kvm_check_cap() assertion KVM: x86/mmu: Calculate and check "full" mmu_role for nested MMU KVM: X86: Fix x86_emulator slab cache leak KVM: SVM: Call SEV Guest Decommission if ASID binding fails KVM: x86: Immediately reset the MMU context when the SMM flag is cleared KVM: x86: Fix fall-through warnings for Clang KVM: SVM: fix doc warnings KVM: selftests: Fix compiling errors when initializing the static structure kvm: LAPIC: Restore guard to prevent illegal APIC register access
2021-06-17net: qed: Fix memcpy() overflow of qed_dcbx_params()Kees Cook
The source (&dcbx_info->operational.params) and dest (&p_hwfn->p_dcbx_info->set.config.params) are both struct qed_dcbx_params (560 bytes), not struct qed_dcbx_admin_params (564 bytes), which is used as the memcpy() size. However it seems that struct qed_dcbx_operational_params (dcbx_info->operational)'s layout matches struct qed_dcbx_admin_params (p_hwfn->p_dcbx_info->set.config)'s 4 byte difference (3 padding, 1 byte for "valid"). On the assumption that the size is wrong (rather than the source structure type), adjust the memcpy() size argument to be 4 bytes smaller and add a BUILD_BUG_ON() to validate any changes to the structure sizes. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17net: cdc_eem: fix tx fixup skb leakLinyu Yuan
when usbnet transmit a skb, eem fixup it in eem_tx_fixup(), if skb_copy_expand() failed, it return NULL, usbnet_start_xmit() will have no chance to free original skb. fix it by free orginal skb in eem_tx_fixup() first, then check skb clone status, if failed, return NULL to usbnet. Fixes: 9f722c0978b0 ("usbnet: CDC EEM support (v5)") Signed-off-by: Linyu Yuan <linyyuan@codeaurora.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17KVM: x86/mmu: Fix TDP MMU page table levelKai Huang
TDP MMU iterator's level is identical to page table's actual level. For instance, for the last level page table (whose entry points to one 4K page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging, the iter->level is mmu->shadow_root_level, which is 5. However, struct kvm_mmu_page's level currently is not set correctly when it is allocated in kvm_tdp_mmu_map(). When iterator hits non-present SPTE and needs to allocate a new child page table, currently iter->level, which is the level of the page table where the non-present SPTE belongs to, is used. This results in struct kvm_mmu_page's level always having its parent's level (excpet root table's level, which is initialized explicitly using mmu->shadow_root_level). This is kinda wrong, and not consistent with existing non TDP MMU code. Fortuantely sp->role.level is only used in handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp(), and they are already aware of this and behave correctly. However to make it consistent with legacy MMU code (and fix the issue that both root page table and its child page table have shadow_root_level), use iter->level - 1 in kvm_tdp_mmu_map(), and change handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp() accordingly. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <bcb6569b6e96cb78aaa7b50640e6e6b53291a74e.1623717884.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level()Kai Huang
Currently pf_fixed is not increased when prefault is true. This is not correct, since prefault here really means "async page fault completed". In that case, the original page fault from the guest was morphed into as async page fault and pf_fixed was not increased. So when prefault indicates async page fault is completed, pf_fixed should be increased. Additionally, currently pf_fixed is also increased even when page fault is spurious, while legacy MMU increases pf_fixed when page fault returns RET_PF_EMULATE or RET_PF_FIXED. To fix above two issues, change to increase pf_fixed when return value is not RET_PF_SPURIOUS (RET_PF_RETRY has already been ruled out by reaching here). More information: https://lore.kernel.org/kvm/cover.1620200410.git.kai.huang@intel.com/T/#mbb5f8083e58a2cd262231512b9211cbe70fc3bd5 Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <2ea8b7f5d4f03c99b32bc56fc982e1e4e3d3fc6b.1623717884.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level()Kai Huang
Currently tdp_mmu_map_handle_target_level() returns 0, which is RET_PF_RETRY, when page fault is actually fixed. This makes kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of RET_PF_FIXED. Fix by initializing ret to RET_PF_FIXED. Note that kvm_mmu_page_fault() resumes guest on both RET_PF_RETRY and RET_PF_FIXED, which means in practice returning the two won't make difference, so this fix alone won't be necessary for stable tree. Fixes: bb18842e2111 ("kvm: x86/mmu: Add TDP MMU PF handler") Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <f9e8956223a586cd28c090879a8ff40f5eb6d609.1623717884.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17Merge tag 'mlx5-fixes-2021-06-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-06-16 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17KVM: LAPIC: Keep stored TMCCT register value 0 after KVM_SET_LAPICWanpeng Li
KVM_GET_LAPIC stores the current value of TMCCT and KVM_SET_LAPIC's memcpy stores it in vcpu->arch.apic->regs, KVM_SET_LAPIC could store zero in vcpu->arch.apic->regs after it uses it, and then the stored value would always be zero. In addition, the TMCCT is always computed on-demand and never directly readable. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1623223000-18116-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercallAshish Kalra
This hypercall is used by the SEV guest to notify a change in the page encryption status to the hypervisor. The hypercall should be invoked only when the encryption attribute is changed from encrypted -> decrypted and vice versa. By default all guest pages are considered encrypted. The hypercall exits to userspace to manage the guest shared regions and integrate with the userspace VMM's migration code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: switch per-VM stats to u64Paolo Bonzini
Make them the same type as vCPU stats. There is no reason to limit the counters to unsigned long. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17net: hamradio: fix memory leak in mkiss_closePavel Skripkin
My local syzbot instance hit memory leak in mkiss_open()[1]. The problem was in missing free_netdev() in mkiss_close(). In mkiss_open() netdevice is allocated and then registered, but in mkiss_close() netdevice was only unregistered, but not freed. Fail log: BUG: memory leak unreferenced object 0xffff8880281ba000 (size 4096): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 61 78 30 00 00 00 00 00 00 00 00 00 00 00 00 00 ax0............. 00 27 fa 2a 80 88 ff ff 00 00 00 00 00 00 00 00 .'.*............ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706e7e8>] alloc_netdev_mqs+0x98/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae BUG: memory leak unreferenced object 0xffff8880141a9a00 (size 96): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): e8 a2 1b 28 80 88 ff ff e8 a2 1b 28 80 88 ff ff ...(.......(.... 98 92 9c aa b0 40 02 00 00 00 00 00 00 00 00 00 .....@.......... backtrace: [<ffffffff8709f68b>] __hw_addr_create_ex+0x5b/0x310 [<ffffffff8709fb38>] __hw_addr_add_ex+0x1f8/0x2b0 [<ffffffff870a0c7b>] dev_addr_init+0x10b/0x1f0 [<ffffffff8706e88b>] alloc_netdev_mqs+0x13b/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae BUG: memory leak unreferenced object 0xffff8880219bfc00 (size 512): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 00 a0 1b 28 80 88 ff ff 80 8f b1 8d ff ff ff ff ...(............ 80 8f b1 8d ff ff ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706eec7>] alloc_netdev_mqs+0x777/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae BUG: memory leak unreferenced object 0xffff888029b2b200 (size 256): comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a27201>] kvmalloc_node+0x61/0xf0 [<ffffffff8706f062>] alloc_netdev_mqs+0x912/0xe80 [<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1] [<ffffffff842355db>] tty_ldisc_open+0x9b/0x110 [<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670 [<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440 [<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200 [<ffffffff8911263a>] do_syscall_64+0x3a/0xb0 [<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 815f62bf7427 ("[PATCH] SMP rewrite of mkiss") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17be2net: Fix an error handling path in 'be_probe()'Christophe JAILLET
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function. Fixes: d6b6d9877878 ("be2net: use PCIe AER capability") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-17KVM: x86/mmu: Grab nx_lpage_splits as an unsigned long before divisionSean Christopherson
Snapshot kvm->stats.nx_lpage_splits into a local unsigned long to avoid 64-bit division on 32-bit kernels. Casting to an unsigned long is safe because the maximum number of shadow pages, n_max_mmu_pages, is also an unsigned long, i.e. KVM will start recycling shadow pages before the number of splits can exceed a 32-bit value. ERROR: modpost: "__udivdi3" [arch/x86/kvm/kvm.ko] undefined! Fixes: 7ee093d4f3f5 ("KVM: switch per-VM stats to u64") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210615162905.2132937-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Check for pending interrupts when APICv is getting disabledVitaly Kuznetsov
When APICv is active, interrupt injection doesn't raise KVM_REQ_EVENT request (see __apic_accept_irq()) as the required work is done by hardware. In case KVM_REQ_APICV_UPDATE collides with such injection, the interrupt may never get delivered. Currently, the described situation is hardly possible: all kvm_request_apicv_update() calls normally happen upon VM creation when no interrupts are pending. We are, however, going to move unconditional kvm_request_apicv_update() call from kvm_hv_activate_synic() to synic_update_vector() and without this fix 'hyperv_connections' test from kvm-unit-tests gets stuck on IPI delivery attempt right after configuring a SynIC route which triggers APICv disablement. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210609150911.1471882-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: nVMX: Drop redundant checks on vmcs12 in EPTP switching emulationSean Christopherson
Drop the explicit check on EPTP switching being enabled. The EPTP switching check is handled in the generic VMFUNC function check, while the underlying VMFUNC enablement check is done by hardware and redone by generic VMFUNC emulation. The vmcs12 EPT check is handled by KVM at VM-Enter in the form of a consistency check, keep it but add a WARN. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: nVMX: WARN if subtly-impossible VMFUNC conditions occurSean Christopherson
WARN and inject #UD when emulating VMFUNC for L2 if the function is out-of-bounds or if VMFUNC is not enabled in vmcs12. Neither condition should occur in practice, as the CPU is supposed to prioritize the #UD over VM-Exit for out-of-bounds input and KVM is supposed to enable VMFUNC in vmcs02 if and only if it's enabled in vmcs12, but neither of those dependencies is obvious. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Drop pointless @reset_roots from kvm_init_mmu()Sean Christopherson
Remove the @reset_roots param from kvm_init_mmu(), the one user, kvm_mmu_reset_context() has already unloaded the MMU and thus freed and invalidated all roots. This also happens to be why the reset_roots=true paths doesn't leak roots; they're already invalid. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Defer MMU sync on PCID invalidationSean Christopherson
Defer the MMU sync on PCID invalidation so that multiple sync requests in a single VM-Exit are batched. This is a very minor optimization as checking for unsync'd children is quite cheap. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: nVMX: Use fast PGD switch when emulating VMFUNC[EPTP_SWITCH]Sean Christopherson
Use __kvm_mmu_new_pgd() via kvm_init_shadow_ept_mmu() to emulate VMFUNC[EPTP_SWITCH] instead of nuking all MMUs. EPTP_SWITCH is the EPT equivalent of MOV to CR3, i.e. is a perfect fit for the common PGD flow, the only hiccup being that A/D enabling is buried in the EPTP. But, that is easily handled by bouncing through kvm_init_shadow_ept_mmu(). Explicitly request a guest TLB flush if VPID is disabled. Per Intel's SDM, if VPID is disabled, "an EPTP-switching VMFUNC invalidates combined mappings associated with VPID 0000H (for all PCIDs and for all EP4TA values, where EP4TA is the value of bits 51:12 of EPTP)". Note, this technically is a very bizarre bug fix of sorts if L2 is using PAE paging, as avoiding the full MMU reload also avoids incorrectly reloading the PDPTEs, which the SDM explicitly states are not touched: If PAE paging is in use, an EPTP-switching VMFUNC does not load the four page-directory-pointer-table entries (PDPTEs) from the guest-physical address in CR3. The logical processor continues to use the four guest-physical addresses already present in the PDPTEs. The guest-physical address in CR3 is not translated through the new EPT paging structures (until some operation that would load the PDPTEs). In addition to optimizing L2's MMU shenanigans, avoiding the full reload also optimizes L1's MMU as KVM_REQ_MMU_RELOAD wipes out all roots in both root_mmu and guest_mmu. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Use KVM_REQ_TLB_FLUSH_GUEST to handle INVPCID(ALL) emulationSean Christopherson
Use KVM_REQ_TLB_FLUSH_GUEST instead of KVM_REQ_MMU_RELOAD when emulating INVPCID of all contexts. In the current code, this is a glorified nop as TLB_FLUSH_GUEST becomes kvm_mmu_unload(), same as MMU_RELOAD, when TDP is disabled, which is the only time INVPCID is only intercepted+emulated. In the future, reusing TLB_FLUSH_GUEST will simplify optimizing paths that emulate a guest TLB flush, e.g. by synchronizing as needed instead of completely unloading all MMUs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPTSean Christopherson
When emulating INVVPID for L1, free only L2+ roots, using the guest_mode tag in the MMU role to identify L2+ roots. From L1's perspective, its own TLB entries use VPID=0, and INVVPID is not requied to invalidate such entries. Per Intel's SDM, INVVPID _may_ invalidate entries with VPID=0, but it is not required to do so. Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>