summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/fpu
AgeCommit message (Collapse)Author
2022-03-30x86/fpu/xstate: Consolidate size calculationsThomas Gleixner
Use the offset calculation to do the size calculation which avoids yet another series of CPUID instructions for each invocation. [ Fix the FP/SSE only case which missed to take the xstate header into account, as Reported-by: kernel test robot <oliver.sang@intel.com> ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/87o81pgbp2.ffs@tglx
2022-03-30x86/fpu/xstate: Handle supervisor states in XSTATE permissionsThomas Gleixner
The size calculation in __xstate_request_perm() fails to take supervisor states into account because the permission bitmap is only relevant for user states. Up to 5.17 this does not matter because there are no supervisor states supported, but the (re-)enabling of ENQCMD makes them available. Fixes: 7c1ef59145f1 ("x86/cpufeatures: Re-enable ENQCMD") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.681768598@linutronix.de
2022-03-30x86/fpu/xsave: Handle compacted offsets correctly with supervisor statesThomas Gleixner
So far the cached fixed compacted offsets worked, but with (re-)enabling of ENQCMD this does no longer work with KVM fpstate. KVM does not have supervisor features enabled for the guest FPU, which means that KVM has then a different XSAVE area layout than the host FPU state. This in turn breaks the copy from/to UABI functions when invoked for a guest state. Remove the pre-calculated compacted offsets and calculate the offset of each component at runtime based on the XCOMP_BV field in the XSAVE header. The runtime overhead is not interesting because these copy from/to UABI functions are not used in critical fast paths. KVM uses them to save and restore FPU state during migration. The host uses them for ptrace and for the slow path of 32bit signal handling. Fixes: 7c1ef59145f1 ("x86/cpufeatures: Re-enable ENQCMD") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.627636809@linutronix.de
2022-03-30x86/fpu: Cache xfeature flags from CPUIDThomas Gleixner
In preparation for runtime calculation of XSAVE offsets cache the feature flags for each XSTATE component during feature enumeration via CPUID(0xD). EDX has two relevant bits: 0 Supervisor component 1 Feature storage must be 64 byte aligned These bits are currently only evaluated during init, but the alignment bit must be cached to make runtime calculation of XSAVE offsets efficient. Cache the full EDX content and use it for the existing alignment and supervisor checks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.573656209@linutronix.de
2022-03-30x86/fpu/xsave: Initialize offset/size cache earlyThomas Gleixner
Reading XSTATE feature information from CPUID over and over does not make sense. The information has to be cached anyway, so it can be done early. Prepare for runtime calculation of XSTATE offsets and allow consolidation of the size calculation functions in a later step. Rename the function while at it as it does not setup any features. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.519411939@linutronix.de
2022-03-30x86/fpu: Remove unused supervisor only offsetsThomas Gleixner
No users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.465066249@linutronix.de
2022-03-29x86/fpu: Remove redundant XCOMP_BV initializationThomas Gleixner
fpu_copy_uabi_to_guest_fpstate() initializes the XCOMP_BV field in the XSAVE header. That's a leftover from the old KVM FPU buffer handling code. Since d69c1382e1b7 ("x86/kvm: Convert FPU handling to a single swap buffer") KVM uses the FPU core allocation code, which initializes the XCOMP_BV field already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324134623.408932232@linutronix.de
2022-03-23x86/fpu/xstate: Fix the ARCH_REQ_XCOMP_PERM implementationYang Zhong
ARCH_REQ_XCOMP_PERM is supposed to add the requested feature to the permission bitmap of thread_group_leader()->fpu. But the code overwrites the bitmap with the requested feature bit only rather than adding it. Fix the code to add the requested feature bit to the master bitmask. Fixes: db8268df0983 ("x86/arch_prctl: Add controls for dynamic XSTATE components") Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Paolo Bonzini <bonzini@gnu.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220129173647.27981-2-chang.seok.bae@intel.com
2022-03-21Merge tag 'x86-pasid-2022-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PASID support from Thomas Gleixner: "Reenable ENQCMD/PASID support: - Simplify the PASID handling to allocate the PASID once, associate it to the mm of a process and free it on mm_exit(). The previous attempt of refcounted PASIDs and dynamic alloc()/free() turned out to be error prone and too complex. The PASID space is 20bits, so the case of resource exhaustion is a pure academic concern. - Populate the PASID MSR on demand via #GP to avoid racy updates via IPIs. - Reenable ENQCMD and let objtool check for the forbidden usage of ENQCMD in the kernel. - Update the documentation for Shared Virtual Addressing accordingly" * tag 'x86-pasid-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Update documentation for SVA (Shared Virtual Addressing) tools/objtool: Check for use of the ENQCMD instruction in the kernel x86/cpufeatures: Re-enable ENQCMD x86/traps: Demand-populate PASID MSR via #GP sched: Define and initialize a flag to identify valid PASID in the task x86/fpu: Clear PASID when copying fpstate iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit kernel/fork: Initialize mm's PASID iommu/ioasid: Introduce a helper to check for valid PASIDs mm: Change CONFIG option for mm->pasid field iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
2022-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "x86 host: - Expose KVM_CAP_ENABLE_CAP since it is supported - Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode - Ensure async page fault token is nonzero - Fix lockdep false negative - Fix FPU migration regression from the AMX changes x86 guest: - Don't use PV TLB/IPI/yield on uniprocessor guests PPC: - reserve capability id (topic branch for ppc/kvm)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled KVM: x86/mmu: make apf token non-zero to fix bug KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3 x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU x86/kvm: Fix compilation warning in non-x86_64 builds x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0 x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode KVM: Fix lockdep false negative during host resume KVM: x86: Add KVM_CAP_ENABLE_CAP to x86
2022-02-18x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearingAndy Lutomirski
xfpregs_set() handles 32-bit REGSET_XFP and 64-bit REGSET_FP. The actual code treats these regsets as modern FX state (i.e. the beginning part of XSTATE). The declarations of the regsets thought they were the legacy i387 format. The code thought they were the 32-bit (no xmm8..15) variant of XSTATE and, for good measure, made the high bits disappear by zeroing the wrong part of the buffer. The latter broke ptrace, and everything else confused anyone trying to understand the code. In particular, the nonsense definitions of the regsets confused me when I wrote this code. Clean this all up. Change the declarations to match reality (which shouldn't change the generated code, let alone the ABI) and fix xfpregs_set() to clear the correct bits and to only do so for 32-bit callers. Fixes: 6164331d15f7 ("x86/fpu: Rewrite xfpregs_set()") Reported-by: Luís Ferreira <contact@lsferreira.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215524 Link: https://lore.kernel.org/r/YgpFnZpF01WwR8wU@zn.tnic
2022-02-17x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0Leonardo Bras
During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate(). When xsave feature is available, the fpu swap is done by: - xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used to store the current state of the fpu registers to a buffer. - xrstor(s) instruction, with (fpu_kernel_cfg.max_features & XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs. For xsave(s) the mask is used to limit what parts of the fpu regs will be copied to the buffer. Likewise on xrstor(s), the mask is used to limit what parts of the fpu regs will be changed. The mask for xsave(s), the guest's fpstate->xfeatures, is defined on kvm_arch_vcpu_create(), which (in summary) sets it to all features supported by the cpu which are enabled on kernel config. This means that xsave(s) will save to guest buffer all the fpu regs contents the cpu has enabled when the guest is paused, even if they are not used. This would not be an issue, if xrstor(s) would also do that. xrstor(s)'s mask for host/guest swap is basically every valid feature contained in kernel config, except XFEATURE_MASK_PKRU. Accordingto kernel src, it is instead switched in switch_to() and flush_thread(). Then, the following happens with a host supporting PKRU starts a guest that does not support it: 1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest, 2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU) 3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU) 4 - guest runs, then switch back to host, 5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU) 6 - xrstor(s) host fpstate to fpu regs. 7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with XFEATURE_MASK_PKRU, which should not be supported by guest vcpu) On 5, even though the guest does not support PKRU, it does have the flag set on guest fpstate, which is transferred to userspace via vcpu ioctl KVM_GET_XSAVE. This becomes a problem when the user decides on migrating the above guest to another machine that does not support PKRU: the new host restores guest's fpu regs to as they were before (xrstor(s)), but since the new host don't support PKRU, a general-protection exception ocurs in xrstor(s) and that crashes the guest. This can be solved by making the guest's fpstate->user_xfeatures hold a copy of guest_supported_xcr0. This way, on 7 the only flags copied to userspace will be the ones compatible to guest requirements, and thus there will be no issue during migration. As a bonus, it will also fail if userspace tries to set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to the guest configuration. Such features will never be returned by KVM_GET_XSAVE or KVM_GET_XSAVE2. Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures, there is not need to set it in kvm_check_cpuid(). So, change fpstate_realloc() so it does not touch fpstate->user_xfeatures if a non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid() calls it. Signed-off-by: Leonardo Bras <leobras@redhat.com> Message-Id: <20220217053028.96432-2-leobras@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-15x86/fpu: Clear PASID when copying fpstateFenghua Yu
The kernel must allocate a Process Address Space ID (PASID) on behalf of each process which will use ENQCMD and program it into the new MSR to communicate the process identity to platform hardware. ENQCMD uses the PASID stored in this MSR to tag requests from this process. The PASID state must be cleared on fork() since fork creates a new address space. For clone(), it would be functionally OK to copy the PASID. However, clearing it is _also_ functionally OK since any PASID use will trigger the #GP handler to populate the MSR. Copying the PASID state has two main downsides: * It requires differentiating fork() and clone() in the code, both in the FPU code and keeping tsk->pasid_activated consistent. * It guarantees that the PASID is out of its init state, which incurs small but non-zero cost on every XSAVE/XRSTOR. The main downside of clearing the PASID at fpstate copy is the future, one-time #GP for the thread. Use the simplest approach: clear the PASID state both on clone() and fork(). Rely on the #GP handler for MSR population in children. Also, just clear the PASID bit from xfeatures if XSAVE is supported. This will have no effect on systems that do not have PASID support. It is virtually zero overhead because 'dst_fpu' was just written and the whole thing is cache hot. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220207230254.3342514-7-fenghua.yu@intel.com
2022-01-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "RISCV: - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish ARM: - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update s390: - fix sigp sense/start/stop/inconsistency - cleanups x86: - Clean up some function prototypes more - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery - completely remove potential TOC/TOU races in nested SVM consistency checks - update some PMCs on emulated instructions - Intel AMX support (joint work between Thomas and Intel) - large MMU cleanups - module parameter to disable PMU virtualization - cleanup register cache - first part of halt handling cleanups - Hyper-V enlightened MSR bitmap support for nested hypervisors Generic: - clean up Makefiles - introduce CONFIG_HAVE_KVM_DIRTY_RING - optimize memslot lookup using a tree - optimize vCPU array usage by converting to xarray" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits) x86/fpu: Fix inline prefix warnings selftest: kvm: Add amx selftest selftest: kvm: Move struct kvm_x86_state to header selftest: kvm: Reorder vcpu_load_state steps for AMX kvm: x86: Disable interception for IA32_XFD on demand x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() kvm: selftests: Add support for KVM_CAP_XSAVE2 kvm: x86: Add support for getting/setting expanded xstate buffer x86/fpu: Add uabi_size to guest_fpu kvm: x86: Add CPUID support for Intel AMX kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Add emulation for IA32_XFD x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM x86/fpu: Add guest support to xfd_enable_feature() ...
2022-01-14x86/fpu: Fix inline prefix warningsYang Zhong
Fix sparse warnings in xstate and remove inline prefix. Fixes: 980fe2fddcff ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions") Signed-off-by: Yang Zhong <yang.zhong@intel.com> Reported-by: kernel test robot <lkp@intel.com> Message-Id: <20220113180825.322333-1-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()Thomas Gleixner
KVM can disable the write emulation for the XFD MSR when the vCPU's fpstate is already correctly sized to reduce the overhead. When write emulation is disabled the XFD MSR state after a VMEXIT is unknown and therefore not in sync with the software states in fpstate and the per CPU XFD cache. Provide fpu_sync_guest_vmexit_xfd_state() which has to be invoked after a VMEXIT before enabling interrupts when write emulation is disabled for the XFD MSR. It could be invoked unconditionally even when write emulation is enabled for the price of a pointless MSR read. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-21-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Add uabi_size to guest_fpuThomas Gleixner
Userspace needs to inquire KVM about the buffer size to work with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info to guest_fpu for KVM to access. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-18-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulationKevin Tian
Guest XFD can be updated either in the emulation path or in the restore path. Provide a wrapper to update guest_fpu::fpstate::xfd. If the guest fpstate is currently in-use, also update the per-cpu xfd cache and the actual MSR. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-10-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Provide fpu_enable_guest_xfd_features() for KVMSean Christopherson
Provide a wrapper for expanding the guest fpstate buffer according to requested xfeatures. KVM wants to call this wrapper to manage any dynamic xstate used by the guest. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220105123532.12586-8-yang.zhong@intel.com> [Remove unnecessary 32-bit check. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Add guest support to xfd_enable_feature()Thomas Gleixner
Guest support for dynamically enabled FPU features requires a few modifications to the enablement function which is currently invoked from the #NM handler: 1) Use guest permissions and sizes for the update 2) Update fpu_guest state accordingly 3) Take into account that the enabling can be triggered either from a running guest via XSETBV and MSR_IA32_XFD write emulation or from a guest restore. In the latter case the guests fpstate is not the current tasks active fpstate. Split the function and implement the guest mechanics throughout the callchain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-7-yang.zhong@intel.com> [Add 32-bit stub for __xfd_enable_feature. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-14x86/fpu: Make XFD initialization in __fpstate_reset() a function argumentJing Liu
vCPU threads are different from native tasks regarding to the initial XFD value. While all native tasks follow a fixed value (init_fpstate::xfd) established by the FPU core at boot, vCPU threads need to obey the reset value (i.e. ZERO) defined by the specification, to meet the expectation of the guest. Let the caller supply an argument and adjust the host and guest related invocations accordingly. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-6-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-12Merge tag 'x86_core_for_v5.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Get rid of all the .fixup sections because this generates misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and LIVEPATCH as the backtrace misses the function which is being fixed up. - Add Straight Line Speculation mitigation support which uses a new compiler switch -mharden-sls= which sticks an INT3 after a RET or an indirect branch in order to block speculation after them. Reportedly, CPUs do speculate behind such insns. - The usual set of cleanups and improvements * tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) x86/entry_32: Fix segment exceptions objtool: Remove .fixup handling x86: Remove .fixup section x86/word-at-a-time: Remove .fixup usage x86/usercopy: Remove .fixup usage x86/usercopy_32: Simplify __copy_user_intel_nocache() x86/sgx: Remove .fixup usage x86/checksum_32: Remove .fixup usage x86/vmx: Remove .fixup usage x86/kvm: Remove .fixup usage x86/segment: Remove .fixup usage x86/fpu: Remove .fixup usage x86/xen: Remove .fixup usage x86/uaccess: Remove .fixup usage x86/futex: Remove .fixup usage x86/msr: Remove .fixup usage x86/extable: Extend extable functionality x86/entry_32: Remove .fixup usage x86/entry_64: Remove .fixup usage x86/copy_mc_64: Remove .fixup usage ...
2022-01-10Merge tag 'x86_fpu_for_v5.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu update from Borislav Petkov: "A single x86/fpu update for 5.17: - Exclude AVX opmask registers use from AVX512 state tracking as they don't contribute to frequency throttling" * tag 'x86_fpu_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Correct AVX512 state tracking
2022-01-07x86/fpu: Prepare guest FPU for dynamically enabled FPU featuresThomas Gleixner
To support dynamically enabled FPU features for guests prepare the guest pseudo FPU container to keep track of the currently enabled xfeatures and the guest permissions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-3-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07x86/fpu: Extend fpu_xstate_prctl() with guest permissionsThomas Gleixner
KVM requires a clear separation of host user space and guest permissions for dynamic XSTATE components. Add a guest permissions member to struct fpu and a separate set of prctl() arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM. The semantics are equivalent to the host user space permission control except for the following constraints: 1) Permissions have to be requested before the first vCPU is created 2) Permissions are frozen when the first vCPU is created to ensure consistency. Any attempt to expand permissions via the prctl() after that point is rejected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-2-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-11x86/fpu: Remove .fixup usagePeter Zijlstra
Employ EX_TYPE_EFAULT_REG to store '-EFAULT' into the %[err] register on exception. All the callers only ever test for 0, so the change from -1 to -EFAULT is immaterial. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20211110101325.604494664@infradead.org
2021-11-30x86/fpu/signal: Initialize sw_bytes in save_xstate_epilog()Marco Elver
save_sw_bytes() did not fully initialize sw_bytes, which caused KMSAN to report an infoleak (see below). Initialize sw_bytes explicitly to avoid this. KMSAN report follows: ===================================================== BUG: KMSAN: kernel-infoleak in instrument_copy_to_user ./include/linux/instrumented.h:121 BUG: KMSAN: kernel-infoleak in __copy_to_user ./include/linux/uaccess.h:154 BUG: KMSAN: kernel-infoleak in save_xstate_epilog+0x2df/0x510 arch/x86/kernel/fpu/signal.c:127 instrument_copy_to_user ./include/linux/instrumented.h:121 __copy_to_user ./include/linux/uaccess.h:154 save_xstate_epilog+0x2df/0x510 arch/x86/kernel/fpu/signal.c:127 copy_fpstate_to_sigframe+0x861/0xb60 arch/x86/kernel/fpu/signal.c:245 get_sigframe+0x656/0x7e0 arch/x86/kernel/signal.c:296 __setup_rt_frame+0x14d/0x2a60 arch/x86/kernel/signal.c:471 setup_rt_frame arch/x86/kernel/signal.c:781 handle_signal arch/x86/kernel/signal.c:825 arch_do_signal_or_restart+0x417/0xdd0 arch/x86/kernel/signal.c:870 handle_signal_work kernel/entry/common.c:149 exit_to_user_mode_loop+0x1f6/0x490 kernel/entry/common.c:173 exit_to_user_mode_prepare kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 syscall_exit_to_user_mode+0x7e/0xc0 kernel/entry/common.c:302 do_syscall_64+0x60/0xd0 arch/x86/entry/common.c:88 entry_SYSCALL_64_after_hwframe+0x44/0xae ??:? Local variable sw_bytes created at: save_xstate_epilog+0x80/0x510 arch/x86/kernel/fpu/signal.c:121 copy_fpstate_to_sigframe+0x861/0xb60 arch/x86/kernel/fpu/signal.c:245 Bytes 20-47 of 48 are uninitialized Memory access of size 48 starts at ffff8880801d3a18 Data copied to user address 00007ffd90e2ef50 ===================================================== Link: https://lore.kernel.org/all/CAG_fn=V9T6OKPonSjsi9PmWB0hMHFC=yawozdft8i1-MSxrv=w@mail.gmail.com/ Fixes: 53599b4d54b9b8dd ("x86/fpu/signal: Prepare for variable sigframe length") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lkml.kernel.org/r/20211126124746.761278-1-glider@google.com
2021-11-16x86/fpu: Correct AVX512 state trackingNoah Goldstein
Add a separate, local mask for tracking AVX512 usage which does not include the opmask xfeature set. Opmask registers usage does not cause frequency throttling so it is a completely unnecessary false positive. While at it, carve it out into a separate function to keep that abomination extracted out. [ bp: Rediff and cleanup ontop of 5.16-rc1. ] Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20210920053951.4093668-1-goldstein.w.n@gmail.com
2021-11-14Merge tag 'x86_urgent_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add the model number of a new, Raptor Lake CPU, to intel-family.h - Do not log spurious corrected MCEs on SKL too, due to an erratum - Clarify the path of paravirt ops patches upstream - Add an optimization to avoid writing out AMX components to sigframes when former are in init state * tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Raptor Lake to Intel family x86/mce: Add errata workaround for Skylake SKX37 MAINTAINERS: Add some information to PARAVIRT_OPS entry x86/fpu: Optimize out sigframe xfeatures when in init state
2021-11-03x86/fpu: Optimize out sigframe xfeatures when in init stateDave Hansen
tl;dr: AMX state is ~8k. Signal frames can have space for this ~8k and each signal entry writes out all 8k even if it is zeros. Skip writing zeros for AMX to speed up signal delivery by about 4% overall when AMX is in its init state. This is a user-visible change to the sigframe ABI. == Hardware XSAVE Background == XSAVE state components may be tracked by the processor as being in their initial configuration. Software can detect which features are in this configuration by looking at the XSTATE_BV field in an XSAVE buffer or with the XGETBV(1) instruction. Both the XSAVE and XSAVEOPT instructions enumerate features s being in the initial configuration via the XSTATE_BV field in the XSAVE header, However, XSAVEOPT declines to actually write features in their initial configuration to the buffer. XSAVE writes the feature unconditionally, regardless of whether it is in the initial configuration or not. Basically, XSAVE users never need to inspect XSTATE_BV to determine if the feature has been written to the buffer. XSAVEOPT users *do* need to inspect XSTATE_BV. They might also need to clear out the buffer if they want to make an isolated change to the state, like modifying one register. == Software Signal / XSAVE Background == Signal frames have historically been written with XSAVE itself. Each state is written in its entirety, regardless of being in its initial configuration. In other words, the signal frame ABI uses the XSAVE behavior, not the XSAVEOPT behavior. == Problem == This means that any application which has acquired permission to use AMX via ARCH_REQ_XCOMP_PERM will write 8k of state to the signal frame. This 8k write will occur even when AMX was in its initial configuration and software *knows* this because of XSTATE_BV. This problem also exists to a lesser degree with AVX-512 and its 2k of state. However, AVX-512 use does not require ARCH_REQ_XCOMP_PERM and is more likely to have existing users which would be impacted by any change in behavior. == Solution == Stop writing out AMX xfeatures which are in their initial state to the signal frame. This effectively makes the signal frame XSAVE buffer look as if it were written with a combination of XSAVEOPT and XSAVE behavior. Userspace which handles XSAVEOPT- style buffers should be able to handle this naturally. For now, include only the AMX xfeatures: XTILE and XTILEDATA in this new behavior. These require new ABI to use anyway, which makes their users very unlikely to be broken. This XSAVEOPT-like behavior should be expected for all future dynamic xfeatures. It may also be extended to legacy features like AVX-512 in the future. Only attempt this optimization on systems with dynamic features. Disable dynamic feature support (XFD) if XGETBV1 is unavailable by adding a CPUID dependency. This has been measured to reduce the *overall* cycle cost of signal delivery by about 4%. Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: "Chang S. Bae" <chang.seok.bae@intel.com> Link: https://lore.kernel.org/r/20211102224750.FA412E26@davehans-spike.ostc.intel.com
2021-11-02Merge tag 'gfs2-v5.15-rc5-mmap-fault' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 mmap + page fault deadlocks fixes from Andreas Gruenbacher: "Functions gfs2_file_read_iter and gfs2_file_write_iter are both accessing the user buffer to write to or read from while holding the inode glock. In the most basic deadlock scenario, that buffer will not be resident and it will be mapped to the same file. Accessing the buffer will trigger a page fault, and gfs2 will deadlock trying to take the same inode glock again while trying to handle that fault. Fix that and similar, more complex scenarios by disabling page faults while accessing user buffers. To make this work, introduce a small amount of new infrastructure and fix some bugs that didn't trigger so far, with page faults enabled" * tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix mmap + page fault deadlocks for direct I/O iov_iter: Introduce nofault flag to disable page faults gup: Introduce FOLL_NOFAULT flag to disable page faults iomap: Add done_before argument to iomap_dio_rw iomap: Support partial direct I/O on user copy failures iomap: Fix iomap_dio_rw return value for user copies gfs2: Fix mmap + page fault deadlocks for buffered I/O gfs2: Eliminate ip->i_gh gfs2: Move the inode glock locking to gfs2_file_buffered_write gfs2: Introduce flag for glock holder auto-demotion gfs2: Clean up function may_grant gfs2: Add wrapper for iomap_file_buffered_write iov_iter: Introduce fault_in_iov_iter_writeable iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} powerpc/kvm: Fix kvm_use_magic_page iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
2021-10-28x86/fpu: Include vmalloc.h for vzalloc()Stephen Rothwell
Explicitly include that header to avoid build errors when vzalloc() becomes "invisible" to the compiler due to header reorganizations. This is not a problem in the tip tree but occurred when integrating linux-next. [ bp: Commit message. ] Link: https://lore.kernel.org/r/20211025151144.552c60ca@canb.auug.org.au Fixes: 69f6ed1d14c6 ("x86/fpu: Provide infrastructure for KVM FPU cleanup") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Borislav Petkov <bp@suse.de>
2021-10-26x86/fpu/amx: Enable the AMX feature in 64-bit modeChang S. Bae
Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the TILE_DATA component to the dynamic states and update the permission check table accordingly. This is only effective on 64 bit kernels as for 32bit kernels XFEATURE_MASK_TILE is defined as 0. TILE_DATA is caller-saved state and the only dynamic state. Add build time sanity check to ensure the assumption that every dynamic feature is caller- saved. Make AMX state depend on XFD as it is dynamic feature. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-24-chang.seok.bae@intel.com
2021-10-26x86/fpu: Add XFD handling for dynamic statesChang S. Bae
To handle the dynamic sizing of buffers on first use the XFD MSR has to be armed. Store the delta between the maximum available and the default feature bits in init_fpstate where it can be retrieved for task creation. If the delta is non zero then dynamic features are enabled. This needs also to enable the static key which guards the XFD updates. This is delayed to an initcall because the FPU setup runs before jump labels are initialized. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-23-chang.seok.bae@intel.com
2021-10-26x86/fpu: Calculate the default sizes independentlyChang S. Bae
When dynamically enabled states are supported the maximum and default sizes for the kernel buffers and user space interfaces are not longer identical. Put the necessary calculations in place which only take the default enabled features into account. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-22-chang.seok.bae@intel.com
2021-10-26x86/fpu/amx: Define AMX state components and have it used for boot-time checksChang S. Bae
The XSTATE initialization uses check_xstate_against_struct() to sanity check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature, and its size is not hard-coded but discoverable at run-time via CPUID. The AMX state is composed of state components 17 and 18, which are all user state components. The first component is the XTILECFG state of a 64-byte tile-related control register. The state component 18, called XTILEDATA, contains the actual tile data, and the state size varies on implementations. The architectural maximum, as defined in the CPUID(0x1d, 1): EAX[15:0], is a byte less than 64KB. The first implementation supports 8KB. Check the XTILEDATA state size dynamically. The feature introduces the new tile register, TMM. Define one register struct only and read the number of registers from CPUID. Cross-check the overall size with CPUID again. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-21-chang.seok.bae@intel.com
2021-10-26x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbersChang S. Bae
The kernel checks at boot time which features are available by walking a XSAVE feature table which contains the CPUID feature bit numbers which need to be checked whether a feature is available on a CPU or not. So far the feature numbers have been linear, but AMX will create a gap which the current code cannot handle. Make the table entries explicitly indexed and adjust the loop code accordingly to prepare for that. No functional change. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-20-chang.seok.bae@intel.com
2021-10-26x86/fpu/xstate: Add fpstate_realloc()/free()Chang S. Bae
The fpstate embedded in struct fpu is the default state for storing the FPU registers. It's sized so that the default supported features can be stored. For dynamically enabled features the register buffer is too small. The #NM handler detects first use of a feature which is disabled in the XFD MSR. After handling permission checks it recalculates the size for kernel space and user space state and invokes fpstate_realloc() which tries to reallocate fpstate and install it. Provide the allocator function which checks whether the current buffer size is sufficient and if not allocates one. If allocation is successful the new fpstate is initialized with the new features and sizes and the now enabled features is removed from the task's XFD mask. realloc_fpstate() uses vzalloc(). If use of this mechanism grows to re-allocate buffers larger than 64KB, a more sophisticated allocation scheme that includes purpose-built reclaim capability might be justified. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-19-chang.seok.bae@intel.com
2021-10-26x86/fpu/xstate: Add XFD #NM handlerChang S. Bae
If the XFD MSR has feature bits set then #NM will be raised when user space attempts to use an instruction related to one of these features. When the task has no permissions to use that feature, raise SIGILL, which is the same behavior as #UD. If the task has permissions, calculate the new buffer size for the extended feature set and allocate a larger fpstate. In the unlikely case that vzalloc() fails, SIGSEGV is raised. The allocation function will be added in the next step. Provide a stub which fails for now. [ tglx: Updated serialization ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com
2021-10-26x86/fpu: Update XFD state where requiredChang S. Bae
The IA32_XFD_MSR allows to arm #NM traps for XSTATE components which are enabled in XCR0. The register has to be restored before the tasks XSTATE is restored. The life time rules are the same as for FPU state. XFD is updated on return to userspace only when the FPU state of the task is not up to date in the registers. It's updated before the XRSTORS so that eventually enabled dynamic features are restored as well and not brought into init state. Also in signal handling for restoring FPU state from user space the correctness of the XFD state has to be ensured. Add it to CPU initialization and resume as well. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-17-chang.seok.bae@intel.com
2021-10-26x86/fpu: Add sanity checks for XFDThomas Gleixner
Add debug functionality to ensure that the XFD MSR is up to date for XSAVE* and XRSTOR* operations. [ tglx: Improve comment. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-16-chang.seok.bae@intel.com
2021-10-26x86/fpu: Add XFD state to fpstateChang S. Bae
Add storage for XFD register state to struct fpstate. This will be used to store the XFD MSR state. This will be used for switching the XFD MSR when FPU content is restored. Add a per-CPU variable to cache the current MSR value so the MSR has only to be written when the values are different. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-15-chang.seok.bae@intel.com
2021-10-26x86/fpu: Reset permission and fpstate on exec()Chang S. Bae
On exec(), extended register states saved in the buffer is cleared. With dynamic features, each task carries variables besides the register states. The struct fpu has permission information and struct fpstate contains buffer size and feature masks. They are all dynamically updated with dynamic features. Reset the current task's entire FPU data before an exec() so that the new task starts with default permission and fpstate. Rename the register state reset function because the old naming confuses as it does not reset struct fpstate. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-12-chang.seok.bae@intel.com
2021-10-26x86/fpu: Prepare fpu_clone() for dynamically enabled featuresThomas Gleixner
The default portion of the parent's FPU state is saved in a child task. With dynamic features enabled, the non-default portion is not saved in a child's fpstate because these register states are defined to be caller-saved. The new task's fpstate is therefore the default buffer. Fork inherits the permission of the parent. Also, do not use memcpy() when TIF_NEED_FPU_LOAD is set because it is invalid when the parent has dynamic features. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-11-chang.seok.bae@intel.com
2021-10-26x86/fpu/signal: Prepare for variable sigframe lengthChang S. Bae
The software reserved portion of the fxsave frame in the signal frame is copied from structures which have been set up at boot time. With dynamically enabled features the content of these structures is no longer correct because the xfeatures and size can be different per task. Calculate the software reserved portion at runtime and fill in the xfeatures and size values from the tasks active fpstate. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-10-chang.seok.bae@intel.com
2021-10-26x86/fpu: Add basic helpers for dynamically enabled featuresThomas Gleixner
To allow building up the infrastructure required to support dynamically enabled FPU features, add: - XFEATURES_MASK_DYNAMIC This constant will hold xfeatures which can be dynamically enabled. - fpu_state_size_dynamic() A static branch for 64-bit and a simple 'return false' for 32-bit. This helper allows to add dynamic-feature-specific changes to common code which is shared between 32-bit and 64-bit without #ifdeffery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-8-chang.seok.bae@intel.com
2021-10-26x86/arch_prctl: Add controls for dynamic XSTATE componentsChang S. Bae
Dynamically enabled XSTATE features are by default disabled for all processes. A process has to request permission to use such a feature. To support this implement a architecture specific prctl() with the options: - ARCH_GET_XCOMP_SUPP Copies the supported feature bitmap into the user space provided u64 storage. The pointer is handed in via arg2 - ARCH_GET_XCOMP_PERM Copies the process wide permitted feature bitmap into the user space provided u64 storage. The pointer is handed in via arg2 - ARCH_REQ_XCOMP_PERM Request permission for a feature set. A feature set can be mapped to a facility, e.g. AMX, and can require one or more XSTATE components to be enabled. The feature argument is the number of the highest XSTATE component which is required for a facility to work. The request argument is not a user supplied bitmap because that makes filtering harder (think seccomp) and even impossible because to support 32bit tasks the argument would have to be a pointer. The permission mechanism works this way: Task asks for permission for a facility and kernel checks whether that's supported. If supported it does: 1) Check whether permission has already been granted 2) Compute the size of the required kernel and user space buffer (sigframe) size. 3) Validate that no task has a sigaltstack installed which is smaller than the resulting sigframe size 4) Add the requested feature bit(s) to the permission bitmap of current->group_leader->fpu and store the sizes in the group leaders fpu struct as well. If that is successful then the feature is still not enabled for any of the tasks. The first usage of a related instruction will result in a #NM trap. The trap handler validates the permission bit of the tasks group leader and if permitted it installs a larger kernel buffer and transfers the permission and size info to the new fpstate container which makes all the FPU functions which require per task information aware of the extended feature set. [ tglx: Adopted to new base code, added missing serialization, massaged namings, comments and changelog ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-7-chang.seok.bae@intel.com
2021-10-26x86/fpu: Add fpu_state_config::legacy_featuresThomas Gleixner
The upcoming prctl() which is required to request the permission for a dynamically enabled feature will also provide an option to retrieve the supported features. If the CPU does not support XSAVE, the supported features would be 0 even when the CPU supports FP and SSE. Provide separate storage for the legacy feature set to avoid that and fill in the bits in the legacy init function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-6-chang.seok.bae@intel.com
2021-10-26x86/fpu: Add members to struct fpu to cache permission informationThomas Gleixner
Dynamically enabled features can be requested by any thread of a running process at any time. The request does neither enable the feature nor allocate larger buffers. It just stores the permission to use the feature by adding the features to the permission bitmap and by calculating the required sizes for kernel and user space. The reallocation of the kernel buffer happens when the feature is used for the first time which is caught by an exception. The permission bitmap is then checked and if the feature is permitted, then it becomes fully enabled. If not, the task dies similarly to a task which uses an undefined instruction. The size information is precomputed to allow proper sigaltstack size checks once the feature is permitted, but not yet in use because otherwise this would open race windows where too small stacks could be installed causing a later fail on signal delivery. Initialize them to the default feature set and sizes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-5-chang.seok.bae@intel.com
2021-10-26x86/fpu/xstate: Provide xstate_calculate_size()Chang S. Bae
Split out the size calculation from the paranoia check so it can be used for recalculating buffer sizes when dynamically enabled features are supported. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> [ tglx: Adopted to changed base code ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-4-chang.seok.bae@intel.com