summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-22block: remove support for cryptoloop and the xor transferChristoph Hellwig
Support for cyrptoloop has been officially marked broken and deprecated in favor of dm-crypt (which supports the same broken algorithms if needed) in Linux 2.6.4 (released in March 2004), and support for it has been entirely removed from losetup in util-linux 2.23 (released in April 2013). The XOR transfer has never been more than a toy to demonstrate the transfer in the bad old times of crypto export restrictions. Remove them as they have some nasty interactions with loop device life times due to the iteration over all loop devices in loop_unregister_transfer. Suggested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019075639.2333969-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: remove QUEUE_FLAG_SCSI_PASSTHROUGHChristoph Hellwig
Export scsi_device_from_queue for use with pktcdvd and use that instead of the otherwise unused QUEUE_FLAG_SCSI_PASSTHROUGH queue flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: remove the initialize_rq_fn blk_mq_ops methodChristoph Hellwig
Entirely unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22scsi: add a scsi_alloc_request helperChristoph Hellwig
Add a new helper that calls blk_get_request and initializes the scsi_request to avoid the indirect call through ->.initialize_rq_fn. Note that this makes the pktcdvd driver depend on the SCSI core, but given that only SCSI devices support SCSI passthrough requests that is not a functional change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22bsg-lib: initialize the bsg_job in bsg_transport_sg_io_fnChristoph Hellwig
Directly initialize the bsg_job structure instead of relying on the ->.initialize_rq_fn indirection. This also removes the superflous initialization of the second request used for BIDI requests. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commandsChristoph Hellwig
Call the ->get_unique_id method to query the SCSI identifiers. This can use the cached VPD page in the sd driver instead of sending a command on every LAYOUTGET. It will also allow to support NVMe based volumes if the draft for that ever takes off. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22sd: implement ->get_unique_idChristoph Hellwig
Add the method to query for a uniqueue ID of a given type by looking it up in the cached device identification VPD page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: add a ->get_unique_id methodChristoph Hellwig
Add a method to query unique IDs from block devices. It will be used to remove code that deeply pokes into SCSI internals in the NFS server. The implementation in the sd driver itself is also much nicer as it can use the cached VPD page instead of always sending a command as the current NFS code does. For now the interface is kept very minimal but could be easily extended when other users like a block-layer sysfs interface for uniquue IDs shows up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if neededPaolo Bonzini
The PIO scratch buffer is larger than a single page, and therefore it is not possible to copy it in a single step to vcpu->arch/pio_data. Bound each call to emulator_pio_in/out to a single page; keep track of how many I/O operations are left in vcpu->arch.sev_pio_count, so that the operation can be restarted in the complete_userspace_io callback. For OUT, this means that the previous kvm_sev_es_outs implementation becomes an iterator of the loop, and we can consume the sev_pio_data buffer before leaving to userspace. For IN, instead, consuming the buffer and decreasing sev_pio_count is always done in the complete_userspace_io callback, because that is when the memcpy is done into sev_pio_data. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Reported-by: Felix Wilhelm <fwilhelm@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: SEV-ES: keep INS functions togetherPaolo Bonzini
Make the diff a little nicer when we actually get to fixing the bug. No functional change intended. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86: remove unnecessary arguments from complete_emulator_pio_inPaolo Bonzini
complete_emulator_pio_in can expect that vcpu->arch.pio has been filled in, and therefore does not need the size and count arguments. This makes things nicer when the function is called directly from a complete_userspace_io callback. No functional change intended. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86: split the two parts of emulator_pio_inPaolo Bonzini
emulator_pio_in handles both the case where the data is pending in vcpu->arch.pio.count, and the case where I/O has to be done via either an in-kernel device or a userspace exit. For SEV-ES we would like to split these, to identify clearly the moment at which the sev_pio_data is consumed. To this end, create two different functions: __emulator_pio_in fills in vcpu->arch.pio.count, while complete_emulator_pio_in clears it and releases vcpu->arch.pio.data. Because this patch has to be backported, things are left a bit messy. kernel_pio() operates on vcpu->arch.pio, which leads to emulator_pio_in() having with two calls to complete_emulator_pio_in(). It will be fixed in the next release. While at it, remove the unused void* val argument of emulator_pio_in_out. The function currently hardcodes vcpu->arch.pio_data as the source/destination buffer, which sucks but will be fixed after the more severe SEV-ES buffer overflow. No functional change intended. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: SEV-ES: clean up kvm_sev_es_ins/outsPaolo Bonzini
A few very small cleanups to the functions, smushed together because the patch is already very small like this: - inline emulator_pio_in_emulated and emulator_pio_out_emulated, since we already have the vCPU - remove the data argument and pull setting vcpu->arch.sev_pio_data into the caller - remove unnecessary clearing of vcpu->arch.pio.count when emulation is done by the kernel (and therefore vcpu->arch.pio.count is already clear on exit from emulator_pio_in and emulator_pio_out). No functional change intended. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_outPaolo Bonzini
Currently emulator_pio_in clears vcpu->arch.pio.count twice if emulator_pio_in_out performs kernel PIO. Move the clear into emulator_pio_out where it is actually necessary. No functional change intended. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: SEV-ES: rename guest_ins_data to sev_pio_dataPaolo Bonzini
We will be using this field for OUTS emulation as well, in case the data that is pushed via OUTS spans more than one page. In that case, there will be a need to save the data pointer across exits to userspace. So, change the name to something that refers to any kind of PIO. Also spell out what it is used for, namely SEV-ES. No functional change intended. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22sched/core: Remove rq_relock()Peng Wang
After the removal of migrate_tasks(), there is no user of rq_relock() left, so remove it. Signed-off-by: Peng Wang <rocking@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/449948fdf9be4764b3929c52572917dd25eef758.1634611953.git.rocking@linux.alibaba.com
2021-10-22sched: Improve wake_up_all_idle_cpus() take #2Peter Zijlstra
As reported by syzbot and experienced by Pavel, using cpus_read_lock() in wake_up_all_idle_cpus() generates lock inversion (against mmap_sem and possibly others). Instead, shrink the preempt disable region by iterating all CPUs and checking the online status for each individual CPU while having preemption disabled. Fixes: 8850cb663b5c ("sched: Simplify wake_up_*idle*()") Reported-by: syzbot+d5b23b18d2f4feae8a67@syzkaller.appspotmail.com Reported-by: Pavel Machek <pavel@ucw.cz> Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Qian Cai <quic_qiancai@quicinc.com>
2021-10-22ice: Nuild fix.David S. Miller
M<erge issues... Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22crypto: tcrypt - fix skcipher multi-buffer tests for 1420B blocksHoria Geantă
Commit ad6d66bcac77e ("crypto: tcrypt - include 1420 byte blocks in aead and skcipher benchmarks") mentions: > power-of-2 block size. So let's add 1420 bytes explicitly, and round > it up to the next blocksize multiple of the algo in question if it > does not support 1420 byte blocks. but misses updating skcipher multi-buffer tests. Fix this by using the proper (rounded) input size. Fixes: ad6d66bcac77e ("crypto: tcrypt - include 1420 byte blocks in aead and skcipher benchmarks") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-10-22hwrng: s390 - replace snprintf in show functions with sysfs_emitQing Wang
coccicheck complains about the use of snprintf() in sysfs show functions. Fix the following coccicheck warning: drivers/char/hw_random/s390-trng.c:114:8-16: WARNING: use scnprintf or sprintf. drivers/char/hw_random/s390-trng.c:122:8-16: WARNING: use scnprintf or sprintf. Use sysfs_emit instead of scnprintf or sprintf makes more sense. Signed-off-by: Qing Wang <wangqing@vivo.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-10-22crypto: x86/sm4 - Fix invalid section entry sizeTianjia Zhang
This fixes the following warning: vmlinux.o: warning: objtool: elf_update: invalid section entry size The size of the rodata section is 164 bytes, directly using the entry_size of 164 bytes will cause errors in some versions of the gcc compiler, while using 16 bytes directly will cause errors in the clang compiler. This patch correct it by filling the size of rodata to a 16-byte boundary. Fixes: a7ee22ee1445 ("crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation") Fixes: 5b2efa2bb865 ("crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation") Reported-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Tested-by: Heyuan Shi <heyuan@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-10-22netfilter: ebtables: use array_size() helper in copy_{from,to}_user()Gustavo A. R. Silva
Use array_size() helper instead of the open-coded version in copy_{from,to}_user(). These sorts of multiplication factors need to be wrapped in array_size(). Link: https://github.com/KSPP/linux/issues/160 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-22vduse: Fix race condition between resetting and irq injectingXie Yongji
The interrupt might be triggered after a reset since there is no synchronization between resetting and irq injecting. And it might break something if the interrupt is delayed until a new round of device initialization. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210929083050.88-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-22vduse: Disallow injecting interrupt before DRIVER_OK is setXie Yongji
The interrupt callback should not be triggered before DRIVER_OK is set. Otherwise, it might break the virtio device driver. So let's add a check to avoid the unexpected behavior. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210923075722.98-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22riscv: do not select non-existing config ANON_INODESLukas Bulwahn
Commit 99cdc6c18c2d ("RISC-V: Add initial skeletal KVM support") selects the config ANON_INODES in config KVM, but the config ANON_INODES is removed since commit 5dd50aaeb185 ("Make anon_inodes unconditional") in 2018. Hence, ./scripts/checkkconfigsymbols.py warns on non-existing symbols: ANON_INODES Referencing files: arch/riscv/kvm/Kconfig Remove selecting the non-existing config ANON_INODES. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Message-Id: <20211022061514.25946-1-lukas.bulwahn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86/mmu: Extract zapping of rmaps for gfn range to separate helperSean Christopherson
Extract the zapping of rmaps, a.k.a. legacy MMU, for a gfn range to a separate helper to clean up the unholy mess that kvm_zap_gfn_range() has become. In addition to deep nesting, the rmaps zapping spreads out the declaration of several variables and is generally a mess. Clean up the mess now so that future work to improve the memslots implementation doesn't need to deal with it. Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211022010005.1454978-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range()Sean Christopherson
Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that said function holds mmu_lock for write for its entire duration. The flush was added by the now-reverted commit to allow TDP MMU to flush while holding mmu_lock for read, as the transition from write=>read required dropping the lock and thus a pending flush needed to be serviced. Fixes: 5a324c24b638 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"") Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211022010005.1454978-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86/mmu: Drop a redundant, broken remote TLB flushSean Christopherson
A recent commit to fix the calls to kvm_flush_remote_tlbs_with_address() in kvm_zap_gfn_range() inadvertantly added yet another flush instead of fixing the existing flush. Drop the redundant flush, and fix the params for the existing flush. Cc: stable@vger.kernel.org Fixes: 2822da446640 ("KVM: x86/mmu: fix parameters to kvm_flush_remote_tlbs_with_address") Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211022010005.1454978-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: X86: Don't unload MMU in kvm_vcpu_flush_tlb_guest()Lai Jiangshan
kvm_mmu_unload() destroys all the PGD caches. Use the lighter kvm_mmu_sync_roots() and kvm_mmu_sync_prev_roots() instead. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211019110154.4091-5-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: X86: pair smp_wmb() of mmu_try_to_unsync_pages() with smp_rmb()Lai Jiangshan
The commit 578e1c4db2213 ("kvm: x86: Avoid taking MMU lock in kvm_mmu_sync_roots if no sync is needed") added smp_wmb() in mmu_try_to_unsync_pages(), but the corresponding smp_load_acquire() isn't used on the load of SPTE.W. smp_load_acquire() orders _subsequent_ loads after sp->is_unsync; it does not order _earlier_ loads before the load of sp->is_unsync. This has no functional change; smp_rmb() is a NOP on x86, and no compiler barrier is required because there is a VMEXIT between the load of SPTE.W and kvm_mmu_snc_roots. Cc: Junaid Shahid <junaids@google.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211019110154.4091-4-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: X86: Cache CR3 in prev_roots when PCID is disabledLai Jiangshan
The commit 21823fbda5522 ("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") invalidates all PGDs for the specific PCID and in the case of PCID is disabled, it includes all PGDs in the prev_roots and the commit made prev_roots totally unused in this case. Not using prev_roots fixes a problem when CR4.PCIDE is changed 0 -> 1 before the said commit: (CR4.PCIDE=0, CR4.PGE=1; CR3=cr3_a; the page for the guest RIP is global; cr3_b is cached in prev_roots) modify page tables under cr3_b the shadow root of cr3_b is unsync in kvm INVPCID single context the guest expects the TLB is clean for PCID=0 change CR4.PCIDE 0 -> 1 switch to cr3_b with PCID=0,NOFLUSH=1 No sync in kvm, cr3_b is still unsync in kvm jump to the page that was modified in step 1 shadow page tables point to the wrong page It is a very unlikely case, but it shows that stale prev_roots can be a problem after CR4.PCIDE changes from 0 to 1. However, to fix this case, the commit disabled caching CR3 in prev_roots altogether when PCID is disabled. Not all CPUs have PCID; especially the PCID support for AMD CPUs is kind of recent. To restore the prev_roots optimization for CR4.PCIDE=0, flush the whole MMU (including all prev_roots) when CR4.PCIDE changes. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211019110154.4091-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid()Lai Jiangshan
The KVM doesn't know whether any TLB for a specific pcid is cached in the CPU when tdp is enabled. So it is better to flush all the guest TLB when invalidating any single PCID context. The case is very rare or even impossible since KVM generally doesn't intercept CR3 write or INVPCID instructions when tdp is enabled, so the fix is mostly for the sake of overall robustness. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211019110154.4091-2-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: X86: Don't reset mmu context when toggling X86_CR4_PGELai Jiangshan
X86_CR4_PGE doesn't participate in kvm_mmu_role, so the mmu context doesn't need to be reset. It is only required to flush all the guest tlb. It is also inconsistent that X86_CR4_PGE is in KVM_MMU_CR4_ROLE_BITS while kvm_mmu_role doesn't use X86_CR4_PGE. So X86_CR4_PGE is also removed from KVM_MMU_CR4_ROLE_BITS. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210919024246.89230-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: X86: Don't reset mmu context when X86_CR4_PCIDE 1->0Lai Jiangshan
X86_CR4_PCIDE doesn't participate in kvm_mmu_role, so the mmu context doesn't need to be reset. It is only required to flush all the guest tlb. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210919024246.89230-2-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: selftests: set CPUID before setting sregs in vcpu creationMichael Roth
Recent kernels have checks to ensure the GPA values in special-purpose registers like CR3 are within the maximum physical address range and don't overlap with anything in the upper/reserved range. In the case of SEV kselftest guests booting directly into 64-bit mode, CR3 needs to be initialized to the GPA of the page table root, with the encryption bit set. The kernel accounts for this encryption bit by removing it from reserved bit range when the guest advertises the bit position via KVM_SET_CPUID*, but kselftests currently call KVM_SET_SREGS as part of vm_vcpu_add_default(), before KVM_SET_CPUID*. As a result, KVM_SET_SREGS will return an error in these cases. Address this by moving vcpu_set_cpuid() (which calls KVM_SET_CPUID*) ahead of vcpu_setup() (which calls KVM_SET_SREGS). While there, address a typo in the assertion that triggers when KVM_SET_SREGS fails. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20211006203617.13045-1-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nathan Tempelman <natet@google.com>
2021-10-22KVM: emulate: Comment on difference between RDPMC implementation and manualWanpeng Li
SDM mentioned that, RDPMC: IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter)) THEN EAX := counter[31:0]; EDX := ZeroExtend(counter[MSCB:32]); ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *) #GP(0); FI; Let's add a comment why CR0.PE isn't tested since it's impossible for CPL to be >0 if CR0.PE=0. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1634724836-73721-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86: Add vendor name to kvm_x86_ops, use it for error messagesSean Christopherson
Paul pointed out the error messages when KVM fails to load are unhelpful in understanding exactly what went wrong if userspace probes the "wrong" module. Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel and kvm_amd, and use the name for relevant error message when KVM fails to load so that the user knows which module failed to load. Opportunistically tweak the "disabled by bios" error message to clarify that _support_ was disabled, not that the module itself was magically disabled by BIOS. Suggested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211018183929.897461-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22kvm: x86: mmu: Make NX huge page recovery period configurableJunaid Shahid
Currently, the NX huge page recovery thread wakes up every minute and zaps 1/nx_huge_pages_recovery_ratio of the total number of split NX huge pages at a time. This is intended to ensure that only a relatively small number of pages get zapped at a time. But for very large VMs (or more specifically, VMs with a large number of executable pages), a period of 1 minute could still result in this number being too high (unless the ratio is changed significantly, but that can result in split pages lingering on for too long). This change makes the period configurable instead of fixing it at 1 minute. Users of large VMs can then adjust the period and/or the ratio to reduce the number of pages zapped at one time while still maintaining the same overall duration for cycling through the entire list. By default, KVM derives a period from the ratio such that a page will remain on the list for 1 hour on average. Signed-off-by: Junaid Shahid <junaids@google.com> Message-Id: <20211020010627.305925-1-junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0Wanpeng Li
SDM section 18.2.3 mentioned that: "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of any general-purpose or fixed-function counters via a single WRMSR." It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing, the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop global_ovf_ctrl variable. Tested-by: Like Xu <likexu@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86/mmu: Rename slot_handle_leaf to slot_handle_level_4kDavid Matlack
slot_handle_leaf is a misnomer because it only operates on 4K SPTEs whereas "leaf" is used to describe any valid terminal SPTE (4K or large page). Rename slot_handle_leaf to slot_handle_level_4k to avoid confusion. Making this change makes it more obvious there is a benign discrepency between the legacy MMU and the TDP MMU when it comes to dirty logging. The legacy MMU only iterates through 4K SPTEs when zapping for collapsing and when clearing D-bits. The TDP MMU, on the other hand, iterates through SPTEs on all levels. The TDP MMU behavior of zapping SPTEs at all levels is technically overkill for its current dirty logging implementation, which always demotes to 4k SPTES, but both the TDP MMU and legacy MMU zap if and only if the SPTE can be replaced by a larger page, i.e. will not spuriously zap 2m (or larger) SPTEs. Opportunistically add comments to explain this discrepency in the code. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211019162223.3935109-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bitXiaoyao Li
Per Intel SDM, RTIT_CTL_BRANCH_EN bit has no dependency on any CPUID leaf 0x14. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-5-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: Rename pt_desc.addr_range to pt_desc.num_address_rangesXiaoyao Li
To better self explain the meaning of this field and match the PT_CAP_num_address_ranges constatn. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-4-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: Use precomputed vmx->pt_desc.addr_rangeXiaoyao Li
The number of valid PT ADDR MSRs for the guest is precomputed in vmx->pt_desc.addr_range. Use it instead of calculating again. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-3-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zeroXiaoyao Li
A minor optimization to WRMSR MSR_IA32_RTIT_CTL when necessary. Opportunistically refine the comment to call out that KVM requires VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-2-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: x86/mmu: clean up prefetch/prefault/speculative namingPaolo Bonzini
"prefetch", "prefault" and "speculative" are used throughout KVM to mean the same thing. Use a single name, standardizing on "prefetch" which is already used by various functions such as direct_pte_prefetch, FNAME(prefetch_gpte), FNAME(pte_prefetch), etc. Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22KVM: cleanup allocation of rmaps and page tracking dataDavid Stevens
Unify the flags for rmaps and page tracking data, using a single flag in struct kvm_arch and a single loop to go over all the address spaces and memslots. This avoids code duplication between alloc_all_memslots_rmaps and kvm_page_track_enable_mmu_write_tracking. Signed-off-by: David Stevens <stevensd@chromium.org> [This patch is the delta between David's v2 and v3, with conflicts fixed and my own commit message. - Paolo] Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-22x86/fpu/xstate: Move remaining xfeature helpers to coreThomas Gleixner
Now that everything is mopped up, move all the helpers and prototypes into the core header. They are not required by the outside. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211014230739.514095101@linutronix.de
2021-10-22drm/i915/selftests: Properly reset mock object propers for each testDaniel Vetter
I forgot to do this properly in commit 6f11f37459d8f9f74ff1c299c0bedd50b458057a Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Jul 23 10:34:55 2021 +0200 drm/plane: remove drm_helper_get_plane_damage_clips intel-gfx CI didn't spot this because we run each selftest in each own invocations, which means reloading i915.ko. But if you just run all the selftests in one go at boot-up, then it falls apart and eventually we cross over the hardcoded limited of how many properties can be attached to a single object. Fix this by resetting the property count. Nothing else to clean up since it's all static storage anyway. Reported-and-tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Fixes: 6f11f37459d8 ("drm/plane: remove drm_helper_get_plane_damage_clips") Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211021202048.2638668-1-daniel.vetter@ffwll.ch
2021-10-22x86/fpu: Rework restore_regs_from_fpstate()Thomas Gleixner
xfeatures_mask_fpstate() is no longer valid when dynamically enabled features come into play. Rework restore_regs_from_fpstate() so it takes a constant mask which will then be applied against the maximum feature set so that the restore operation brings all features which are not in the xsave buffer xfeature bitmap into init state. This ensures that if the previous task used a dynamically enabled feature that the task which restores has all unused components properly initialized. Cleanup the last user of xfeatures_mask_fpstate() as well and remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211014230739.461348278@linutronix.de