git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-01-14	vdpa/mlx5: Fix wrong configuration of virtio_version_1_0	Eli Cohen
	Remove overriding of virtio_version_1_0 which forced the virtqueue object to version 1. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20211230142024.142979-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-01-14	virtio/virtio_pci_legacy_dev: ensure the correct return value	Peng Hao
	When pci_iomap return NULL, the return value is zero. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20211222112014.87394-1-flyingpeng@tencent.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14	virtio/virtio_mem: handle a possible NULL as a memcpy parameter	Peng Hao
	There is a check for vm->sbm.sb_states before, and it should check it here as well. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20211222011225.40573-1-flyingpeng@tencent.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: 5f1f79bbc9e2 ("virtio-mem: Paravirtualized memory hotplug") Cc: stable@vger.kernel.org # v5.8+
2022-01-14	virtio: fix a typo in function "vp_modern_remove" comments.	Dapeng Mi
	Function name "vp_modern_remove" in comments is written to "vp_modern_probe" incorrectly. Change it. Signed-off-by: Dapeng Mi <dapeng1.mi@intel.com> Link: https://lore.kernel.org/r/20211210073546.700783-1-dapeng1.mi@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14	virtio-pci: fix the confusing error message	王贇
	The error message on the failure of pfn check should tell virtio-pci rather than virtio-mmio, just fix it. Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/ae5e154e-ac59-f0fa-a7c7-091a2201f581@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	firmware: qemu_fw_cfg: remove sysfs entries explicitly	Johan Hovold
	Explicitly remove the file entries from sysfs before dropping the final reference for symmetry reasons and for consistency with the rest of the driver. Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-5-johan@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	firmware: qemu_fw_cfg: fix sysfs information leak	Johan Hovold
	Make sure to always NUL-terminate file names retrieved from the firmware to avoid accessing data beyond the entry slab buffer and exposing it through sysfs in case the firmware data is corrupt. Fixes: 75f3e8e47f38 ("firmware: introduce sysfs driver for QEMU's fw_cfg device") Cc: stable@vger.kernel.org # 4.6 Cc: Gabriel Somlo <somlo@cmu.edu> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-4-johan@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	firmware: qemu_fw_cfg: fix kobject leak in probe error path	Johan Hovold
	An initialised kobject must be freed using kobject_put() to avoid leaking associated resources (e.g. the object name). Commit fe3c60684377 ("firmware: Fix a reference count leak.") "fixed" the leak in the first error path of the file registration helper but left the second one unchanged. This "fix" would however result in a NULL pointer dereference due to the release function also removing the never added entry from the fw_cfg_entry_cache list. This has now been addressed. Fix the remaining kobject leak by restoring the common error path and adding the missing kobject_put(). Fixes: 75f3e8e47f38 ("firmware: introduce sysfs driver for QEMU's fw_cfg device") Cc: stable@vger.kernel.org # 4.6 Cc: Gabriel Somlo <somlo@cmu.edu> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-3-johan@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	firmware: qemu_fw_cfg: fix NULL-pointer deref on duplicate entries	Johan Hovold
	Commit fe3c60684377 ("firmware: Fix a reference count leak.") "fixed" a kobject leak in the file registration helper by properly calling kobject_put() for the entry in case registration of the object fails (e.g. due to a name collision). This would however result in a NULL pointer dereference when the release function tries to remove the never added entry from the fw_cfg_entry_cache list. Fix this by moving the list-removal out of the release function. Note that the offending commit was one of the benign looking umn.edu fixes which was reviewed but not reverted. [1][2] [1] https://lore.kernel.org/r/202105051005.49BFABCE@keescook [2] https://lore.kernel.org/all/YIg7ZOZvS3a8LjSv@kroah.com Fixes: fe3c60684377 ("firmware: Fix a reference count leak.") Cc: stable@vger.kernel.org # 5.8 Cc: Qiushi Wu <wu000273@umn.edu> Cc: Kees Cook <keescook@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-2-johan@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	vdpa: Mark vdpa_config_ops.get_vq_notification as optional	Eugenio Pérez
	Since vhost_vdpa_mmap checks for its existence before calling it. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20211104195248.2088904-1-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14	vdpa: Avoid duplicate call to vp_vdpa get_status	Eugenio Pérez
	It has no sense to call get_status twice, since we already have a variable for that. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20211104195833.2089796-1-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14	eni_vdpa: Simplify 'eni_vdpa_probe()'	Christophe JAILLET
	When 'pcim_enable_device()' is used, some resources become automagically managed. There is no need to call 'pci_free_irq_vectors()' when the driver is removed. The same will already be done by 'pcim_release()'. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/02045bdcbbb25f79bae4827f66029cfcddc90381.1636301587.git.christophe.jaillet@wanadoo.fr Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14	net/mlx5_vdpa: Offer VIRTIO_NET_F_MTU when setting MTU	Eli Cohen
	Make sure to offer VIRTIO_NET_F_MTU since we configure the MTU based on what was queried from the device. This allows the virtio driver to allocate large enough buffers based on the reported MTU. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20211124170949.51725-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-01-14	virtio-mem: prepare fake page onlining code for granularity smaller than ↵	David Hildenbrand
	MAX_ORDER - 1 Let's prepare our fake page onlining code for subblock size smaller than MAX_ORDER - 1: we might get called for ranges not covering properly aligned MAX_ORDER - 1 pages. We have to detect the order to use dynamically. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211126134209.17332-3-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Eric Ren <renzhengeek@gmail.com>
2022-01-14	virtio-mem: prepare page onlining code for granularity smaller than ↵	David Hildenbrand
	MAX_ORDER - 1 Let's prepare our page onlining code for subblock size smaller than MAX_ORDER - 1: we'll get called for a MAX_ORDER - 1 page but might have some subblocks in the range plugged and some unplugged. In that case, fallback to subblock granularity to properly only expose the plugged parts to the buddy. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211126134209.17332-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Eric Ren <renzhengeek@gmail.com>
2022-01-14	vdpa: add driver_override support	Stefano Garzarella
	`driver_override` allows to control which of the vDPA bus drivers binds to a vDPA device. If `driver_override` is not set, the previous behaviour is followed: devices use the first vDPA bus driver loaded (unless auto binding is disabled). Tested on Fedora 34 with driverctl(8): $ modprobe virtio-vdpa $ modprobe vhost-vdpa $ modprobe vdpa-sim-net $ vdpa dev add mgmtdev vdpasim_net name dev1 # dev1 is attached to the first vDPA bus driver loaded $ driverctl -b vdpa list-devices dev1 virtio_vdpa $ driverctl -b vdpa set-override dev1 vhost_vdpa $ driverctl -b vdpa list-devices dev1 vhost_vdpa [*] Note: driverctl(8) integrates with udev so the binding is preserved. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211126164753.181829-3-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	docs: document sysfs ABI for vDPA bus	Stefano Garzarella
	Add missing documentation of sysfs ABI for vDPA bus in the new Documentation/ABI/testing/sysfs-bus-vdpa file. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211126164753.181829-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14	ifcvf/vDPA: fix misuse virtio-net device config size for blk dev	Zhu Lingshan
	This commit fixes a misuse of virtio-net device config size issue for virtio-block devices. A new member config_size in struct ifcvf_hw is introduced and would be initialized through vdpa_dev_add() to record correct device config size. To be more generic, rename ifcvf_hw.net_config to ifcvf_hw.dev_config, the helpers ifcvf_read/write_net_config() to ifcvf_read/write_dev_config() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Reported-and-suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Fixes: 6ad31d162a4e ("vDPA/ifcvf: enable Intel C5000X-PL virtio-block for vDPA") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211201081255.60187-1-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	vduse: moving kvfree into caller	Guanjun
	This free action should be moved into caller 'vduse_ioctl' in concert with the allocation. No functional change. Signed-off-by: Guanjun <guanjun@linux.alibaba.com> Link: https://lore.kernel.org/r/1638780498-55571-1-git-send-email-guanjun@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	hwrng: virtio - unregister device before reset	Michael S. Tsirkin
	unregister after reset is clearly wrong - device can be used while it's reset. There's an attempt to protect against that using hwrng_removed but it seems racy since access can be in progress when the flag is set. Just unregister, then reset seems simpler and cleaner. NB: we might be able to drop hwrng_removed in a follow-up patch. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	virtio: wrap config->reset calls	Michael S. Tsirkin
	This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14	drm/amd/display: Revert W/A for hard hangs on DCN20/DCN21	Mario Limonciello
	The WA from commit 2a50edbf10c8 ("drm/amd/display: Apply w/a for hard hang on HPD") and commit 1bd3bc745e7f ("drm/amd/display: Extend w/a for hard hang on HPD to dcn20") causes a regression in s0ix where the system will fail to resume properly on many laptops. Pull the workarounds out to avoid that s0ix regression in the common case. This HPD hang happens with an external device in special circumstances and a new W/A will need to be developed for this in the future. Cc: stable@vger.kernel.org Cc: Qingqing Zhuo <qingqing.zhuo@amd.com> Reported-by: Scott Bruce <smbruce@gmail.com> Reported-by: Chris Hixon <linux-kernel-bugs@hixontech.com> Reported-by: spasswolf@web.de Link: https://bugzilla.kernel.org/show_bug.cgi?id=215436 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1821 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1852 Fixes: 2a50edbf10c8 ("drm/amd/display: Apply w/a for hard hang on HPD") Fixes: 1bd3bc745e7f ("drm/amd/display: Extend w/a for hard hang on HPD to dcn20") Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14	drm/amdgpu: drop flags check for CHIP_IP_DISCOVERY	Alex Deucher
	Support for IP based discovery is in place now so this check is no longer required. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14	drm/amdgpu: Fix rejecting Tahiti GPUs	Lukas Fink
	eb4fd29afd4a ("drm/amdgpu: bind to any 0x1002 PCI diplay class device") added generic bindings to amdgpu so that that it binds to all display class devices with VID 0x1002 and then rejects those in amdgpu_pci_probe. Unfortunately it reuses a driver_data value of 0 to detect those new bindings, which is already used to denote CHIP_TAHITI ASICs. The driver_data value given to those new bindings was changed in dd0761fd24ea1 ("drm/amdgpu: set CHIP_IP_DISCOVERY as the asic type by default") to CHIP_IP_DISCOVERY (=36), but it seems that the check in amdgpu_pci_probe was forgotten to be changed. Therefore, it still rejects Tahiti GPUs. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1860 Fixes: eb4fd29afd4a ("drm/amdgpu: bind to any 0x1002 PCI diplay class device") Cc: stable@vger.kernel.org Signed-off-by: Lukas Fink <lukas.fink1@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14	drm/amdgpu: don't do resets on APUs which don't support it	Alex Deucher
	It can cause a hang. This is normally not enabled for GPU hangs on these asics, but was recently enabled for handling aborted suspends. This causes hangs on some platforms on suspend. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Cc: stable@vger.kernel.org Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1858 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14	drm/amdgpu: invert the logic in amdgpu_device_should_recover_gpu()	Alex Deucher
	Rather than opting into GPU recovery support, default to on, and opt out if it's not working on a particular GPU. This avoids the need to add new asics to this list since this is a core feature. Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14	drm/amdgpu: Enable recovery on yellow carp	CHANDAN VURDIGERE NATARAJ
	Add yellow carp to devices which support recovery Signed-off-by: CHANDAN VURDIGERE NATARAJ <chandan.vurdigerenataraj@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14	MAINTAINERS: Add Helge as fbdev maintainer	Helge Deller
	The fbdev layer is orphaned, but seems to need some care. So I'd like to step up as new maintainer. Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2022-01-14	x86/fpu: Fix inline prefix warnings	Yang Zhong
	Fix sparse warnings in xstate and remove inline prefix. Fixes: 980fe2fddcff ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions") Signed-off-by: Yang Zhong <yang.zhong@intel.com> Reported-by: kernel test robot <lkp@intel.com> Message-Id: <20220113180825.322333-1-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	selftest: kvm: Add amx selftest	Yang Zhong
	This selftest covers two aspects of AMX. The first is triggering #NM exception and checking the MSR XFD_ERR value. The second case is loading tile config and tile data into guest registers and trapping to the host side for a complete save/load of the guest state. TMM0 is also checked against memory data after save/restore. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20211223145322.2914028-4-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	selftest: kvm: Move struct kvm_x86_state to header	Yang Zhong
	Those changes can avoid dereferencing pointer compile issue when amx_test.c reference state->xsave. Move struct kvm_x86_state definition to processor.h. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20211223145322.2914028-3-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	selftest: kvm: Reorder vcpu_load_state steps for AMX	Paolo Bonzini
	For AMX support it is recommended to load XCR0 after XFD, so that KVM does not see XFD=0, XCR=1 for a save state that will eventually be disabled (which would lead to premature allocation of the space required for that save state). It is also required to load XSAVE data after XCR0 and XFD, so that KVM can trigger allocation of the extra space required to store AMX state. Adjust vcpu_load_state to obey these new requirements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20211223145322.2914028-2-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Disable interception for IA32_XFD on demand	Kevin Tian
	Always intercepting IA32_XFD causes non-negligible overhead when this register is updated frequently in the guest. Disable r/w emulation after intercepting the first WRMSR(IA32_XFD) with a non-zero value. Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync with the software states in fpstate and the per-cpu xfd cache. This leads to two additional changes accordingly: - Call fpu_sync_guest_vmexit_xfd_state() after vm-exit to bring software states back in-sync with the MSR, before handle_exit_irqoff() is called. - Always trap #NM once write interception is disabled for IA32_XFD. The #NM exception is rare if the guest doesn't use dynamic features. Otherwise, there is at most one exception per guest task given a dynamic feature. p.s. We have confirmed that SDM is being revised to say that when setting IA32_XFD[18] the AMX register state is not guaranteed to be preserved. This clarification avoids adding mess for a creative guest which sets IA32_XFD[18]=1 before saving active AMX state to its own storage. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-22-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()	Thomas Gleixner
	KVM can disable the write emulation for the XFD MSR when the vCPU's fpstate is already correctly sized to reduce the overhead. When write emulation is disabled the XFD MSR state after a VMEXIT is unknown and therefore not in sync with the software states in fpstate and the per CPU XFD cache. Provide fpu_sync_guest_vmexit_xfd_state() which has to be invoked after a VMEXIT before enabling interrupts when write emulation is disabled for the XFD MSR. It could be invoked unconditionally even when write emulation is enabled for the price of a pointless MSR read. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-21-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: selftests: Add support for KVM_CAP_XSAVE2	Wei Wang
	When KVM_CAP_XSAVE2 is supported, userspace is expected to allocate buffer for KVM_GET_XSAVE2 and KVM_SET_XSAVE using the size returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2). Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Guang Zeng <guang.zeng@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-20-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Add support for getting/setting expanded xstate buffer	Guang Zeng
	With KVM_CAP_XSAVE, userspace uses a hardcoded 4KB buffer to get/set xstate data from/to KVM. This doesn't work when dynamic xfeatures (e.g. AMX) are exposed to the guest as they require a larger buffer size. Introduce a new capability (KVM_CAP_XSAVE2). Userspace VMM gets the required xstate buffer size via KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2). KVM_SET_XSAVE is extended to work with both legacy and new capabilities by doing properly-sized memdup_user() based on the guest fpu container. KVM_GET_XSAVE is kept for backward-compatible reason. Instead, KVM_GET_XSAVE2 is introduced under KVM_CAP_XSAVE2 as the preferred interface for getting xstate buffer (4KB or larger size) from KVM (Link: https://lkml.org/lkml/2021/12/15/510) Also, update the api doc with the new KVM_GET_XSAVE2 ioctl. Signed-off-by: Guang Zeng <guang.zeng@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-19-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	x86/fpu: Add uabi_size to guest_fpu	Thomas Gleixner
	Userspace needs to inquire KVM about the buffer size to work with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info to guest_fpu for KVM to access. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-18-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Add CPUID support for Intel AMX	Jing Liu
	Extend CPUID emulation to support XFD, AMX_TILE, AMX_INT8 and AMX_BF16. Adding those bits into kvm_cpu_caps finally activates all previous logics in this series. Hide XFD on 32bit host kernels. Otherwise it leads to a weird situation where KVM tells userspace to migrate MSR_IA32_XFD and then rejects attempts to read/write the MSR. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-17-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Add XCR0 support for Intel AMX	Jing Liu
	Two XCR0 bits are defined for AMX to support XSAVE mechanism. Bit 17 is for tilecfg and bit 18 is for tiledata. The value of XCR0[17:18] is always either 00b or 11b. Also, SDM recommends that only 64-bit operating systems enable Intel AMX by setting XCR0[18:17]. 32-bit host kernel never sets the tile bits in vcpu->arch.guest_supported_xcr0. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-16-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Disable RDMSR interception of IA32_XFD_ERR	Jing Liu
	This saves one unnecessary VM-exit in guest #NM handler, given that the MSR is already restored with the guest value before the guest is resumed. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-15-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Emulate IA32_XFD_ERR for guest	Jing Liu
	Emulate read/write to IA32_XFD_ERR MSR. Only the saved value in the guest_fpu container is touched in the emulation handler. Actual MSR update is handled right before entering the guest (with preemption disabled) Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Zeng Guang <guang.zeng@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-14-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Intercept #NM for saving IA32_XFD_ERR	Jing Liu
	Guest IA32_XFD_ERR is generally modified in two places: - Set by CPU when #NM is triggered; - Cleared by guest in its #NM handler; Intercept #NM for the first case when a nonzero value is written to IA32_XFD. Nonzero indicates that the guest is willing to do dynamic fpstate expansion for certain xfeatures, thus KVM needs to manage and virtualize guest XFD_ERR properly. The vcpu exception bitmap is updated in XFD write emulation according to guest_fpu::xfd. Save the current XFD_ERR value to the guest_fpu container in the #NM VM-exit handler. This must be done with interrupt disabled, otherwise the unsaved MSR value may be clobbered by host activity. The saving operation is conducted conditionally only when guest_fpu:xfd includes a non-zero value. Doing so also avoids misread on a platform which doesn't support XFD but #NM is triggered due to L1 interception. Queueing #NM to the guest is postponed to handle_exception_nmi(). This goes through the nested_vmx check so a virtual vmexit is queued instead when #NM is triggered in L2 but L1 wants to intercept it. Restore the host value (always ZERO outside of the host #NM handler) before enabling interrupt. Restore the guest value from the guest_fpu container right before entering the guest (with interrupt disabled). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-13-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	x86/fpu: Prepare xfd_err in struct fpu_guest	Jing Liu
	When XFD causes an instruction to generate #NM, IA32_XFD_ERR contains information about which disabled state components are being accessed. The #NM handler is expected to check this information and then enable the state components by clearing IA32_XFD for the faulting task (if having permission). If the XFD_ERR value generated in guest is consumed/clobbered by the host before the guest itself doing so, it may lead to non-XFD-related #NM treated as XFD #NM in host (due to non-zero value in XFD_ERR), or XFD-related #NM treated as non-XFD #NM in guest (XFD_ERR cleared by the host #NM handler). Introduce a new field in fpu_guest to save the guest xfd_err value. KVM is expected to save guest xfd_err before interrupt is enabled and restore it right before entering the guest (with interrupt disabled). Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-12-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Add emulation for IA32_XFD	Jing Liu
	Intel's eXtended Feature Disable (XFD) feature allows the software to dynamically adjust fpstate buffer size for XSAVE features which have large state. Because guest fpstate has been expanded for all possible dynamic xstates at KVM_SET_CPUID2, emulation of the IA32_XFD MSR is straightforward. For write just call fpu_update_guest_xfd() to update the guest fpu container once all the sanity checks are passed. For read simply return the cached value in the container. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Zeng Guang <guang.zeng@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-11-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation	Kevin Tian
	Guest XFD can be updated either in the emulation path or in the restore path. Provide a wrapper to update guest_fpu::fpstate::xfd. If the guest fpstate is currently in-use, also update the per-cpu xfd cache and the actual MSR. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-10-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2	Jing Liu
	KVM can request fpstate expansion in two approaches: 1) When intercepting guest updates to XCR0 and XFD MSR; 2) Before vcpu runs (e.g. at KVM_SET_CPUID2); The first option doesn't waste memory for legacy guest if it doesn't support XFD. However doing so introduces more complexity and also imposes an order requirement in the restoring path, i.e. XCR0/XFD must be restored before XSTATE. Given that the agreement is to do the static approach. This is considered a better tradeoff though it does waste 8K memory for legacy guest if its CPUID includes dynamically-enabled xfeatures. Successful fpstate expansion requires userspace VMM to acquire guest xstate permissions before calling KVM_SET_CPUID2. Also take the chance to adjust the indent in kvm_set_cpuid(). Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-9-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM	Sean Christopherson
	Provide a wrapper for expanding the guest fpstate buffer according to requested xfeatures. KVM wants to call this wrapper to manage any dynamic xstate used by the guest. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220105123532.12586-8-yang.zhong@intel.com> [Remove unnecessary 32-bit check. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	x86/fpu: Add guest support to xfd_enable_feature()	Thomas Gleixner
	Guest support for dynamically enabled FPU features requires a few modifications to the enablement function which is currently invoked from the #NM handler: 1) Use guest permissions and sizes for the update 2) Update fpu_guest state accordingly 3) Take into account that the enabling can be triggered either from a running guest via XSETBV and MSR_IA32_XFD write emulation or from a guest restore. In the latter case the guests fpstate is not the current tasks active fpstate. Split the function and implement the guest mechanics throughout the callchain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-7-yang.zhong@intel.com> [Add 32-bit stub for __xfd_enable_feature. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	x86/fpu: Make XFD initialization in __fpstate_reset() a function argument	Jing Liu
	vCPU threads are different from native tasks regarding to the initial XFD value. While all native tasks follow a fixed value (init_fpstate::xfd) established by the FPU core at boot, vCPU threads need to obey the reset value (i.e. ZERO) defined by the specification, to meet the expectation of the guest. Let the caller supply an argument and adjust the host and guest related invocations accordingly. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-6-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14	module: fix signature check failures when using in-kernel decompression	Dmitry Torokhov
	The new flag MODULE_INIT_COMPRESSED_FILE unintentionally trips check in module_sig_check(). The check was supposed to catch case when version info or magic was removed from a signed module, making signature invalid, but it was coded too broadly and was catching this new flag as well. Change the check to only test the 2 particular flags affecting signature validity. Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing") Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>