linux.git - Linus' kernel tree

diff options

author	Kai Huang <kai.huang@intel.com>	2025-09-01 18:09:25 +0200
committer	Dave Hansen <dave.hansen@linux.intel.com>	2025-09-05 10:40:40 -0700
commit	83214a775f33bc9d61c2c284f2ace3f854a4cddb (patch)
tree	eae6b964afeb3ae7fa6d69521ad4e0bd16562a2d /rust/kernel/irq/request.rs
parent	744b02f62634b64345d05a8a3f145d56469313b4 (diff)

x86/sme: Use percpu boolean to control WBINVD during kexec

TL;DR: Prepare to unify how TDX and SME do cache flushing during kexec by making a percpu boolean control whether to do the WBINVD. -- Background -- On SME platforms, dirty cacheline aliases with and without encryption bit can coexist, and the CPU can flush them back to memory in random order. During kexec, the caches must be flushed before jumping to the new kernel otherwise the dirty cachelines could silently corrupt the memory used by the new kernel due to different encryption property. TDX also needs a cache flush during kexec for the same reason. It would be good to have a generic way to flush the cache instead of scattering checks for each feature all around. When SME is enabled, the kernel basically encrypts all memory including the kernel itself and a simple memory write from the kernel could dirty cachelines. Currently, the kernel uses WBINVD to flush the cache for SME during kexec in two places: 1) the one in stop_this_cpu() for all remote CPUs when the kexec-ing CPU stops them; 2) the one in the relocate_kernel() where the kexec-ing CPU jumps to the new kernel. -- Solution -- Unlike SME, TDX can only dirty cachelines when it is used (i.e., when SEAMCALLs are performed). Since there are no more SEAMCALLs after the aforementioned WBINVDs, leverage this for TDX. To unify the approach for SME and TDX, use a percpu boolean to indicate the cache may be in an incoherent state and needs flushing during kexec, and set the boolean for SME. TDX can then leverage it. While SME could use a global flag (since it's enabled at early boot and enabled on all CPUs), the percpu flag fits TDX better: The percpu flag can be set when a CPU makes a SEAMCALL, and cleared when another WBINVD on the CPU obviates the need for a kexec-time WBINVD. Saving kexec-time WBINVD is valuable, because there is an existing race[*] where kexec could proceed while another CPU is active. WBINVD could make this race worse, so it's worth skipping it when possible. -- Side effect to SME -- Today the first WBINVD in the stop_this_cpu() is performed when SME is *supported* by the platform, and the second WBINVD is done in relocate_kernel() when SME is *activated* by the kernel. Make things simple by changing to do the second WBINVD when the platform supports SME. This allows the kernel to simply turn on this percpu boolean when bringing up a CPU by checking whether the platform supports SME. No other functional change intended. [*] The aforementioned race: During kexec native_stop_other_cpus() is called to stop all remote CPUs before jumping to the new kernel. native_stop_other_cpus() firstly sends normal REBOOT vector IPIs to stop remote CPUs and waits them to stop. If that times out, it sends NMI to stop the CPUs that are still alive. The race happens when native_stop_other_cpus() has to send NMIs and could potentially result in the system hang (for more information please see [1]). Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/kvm/b963fcd60abe26c7ec5dc20b42f1a2ebbcc72397.1750934177.git.kai.huang@intel.com/ [1] Link: https://lore.kernel.org/all/20250901160930.1785244-3-pbonzini%40redhat.com

Diffstat (limited to 'rust/kernel/irq/request.rs')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: