diff options
author | Kai Huang <kai.huang@intel.com> | 2025-09-01 18:09:25 +0200 |
---|---|---|
committer | Dave Hansen <dave.hansen@linux.intel.com> | 2025-09-05 10:40:40 -0700 |
commit | 83214a775f33bc9d61c2c284f2ace3f854a4cddb (patch) | |
tree | eae6b964afeb3ae7fa6d69521ad4e0bd16562a2d /rust/kernel/irq/request.rs | |
parent | 744b02f62634b64345d05a8a3f145d56469313b4 (diff) |
x86/sme: Use percpu boolean to control WBINVD during kexec
TL;DR:
Prepare to unify how TDX and SME do cache flushing during kexec by
making a percpu boolean control whether to do the WBINVD.
-- Background --
On SME platforms, dirty cacheline aliases with and without encryption
bit can coexist, and the CPU can flush them back to memory in random
order. During kexec, the caches must be flushed before jumping to the
new kernel otherwise the dirty cachelines could silently corrupt the
memory used by the new kernel due to different encryption property.
TDX also needs a cache flush during kexec for the same reason. It would
be good to have a generic way to flush the cache instead of scattering
checks for each feature all around.
When SME is enabled, the kernel basically encrypts all memory including
the kernel itself and a simple memory write from the kernel could dirty
cachelines. Currently, the kernel uses WBINVD to flush the cache for
SME during kexec in two places:
1) the one in stop_this_cpu() for all remote CPUs when the kexec-ing CPU
stops them;
2) the one in the relocate_kernel() where the kexec-ing CPU jumps to the
new kernel.
-- Solution --
Unlike SME, TDX can only dirty cachelines when it is used (i.e., when
SEAMCALLs are performed). Since there are no more SEAMCALLs after the
aforementioned WBINVDs, leverage this for TDX.
To unify the approach for SME and TDX, use a percpu boolean to indicate
the cache may be in an incoherent state and needs flushing during kexec,
and set the boolean for SME. TDX can then leverage it.
While SME could use a global flag (since it's enabled at early boot and
enabled on all CPUs), the percpu flag fits TDX better:
The percpu flag can be set when a CPU makes a SEAMCALL, and cleared when
another WBINVD on the CPU obviates the need for a kexec-time WBINVD.
Saving kexec-time WBINVD is valuable, because there is an existing
race[*] where kexec could proceed while another CPU is active. WBINVD
could make this race worse, so it's worth skipping it when possible.
-- Side effect to SME --
Today the first WBINVD in the stop_this_cpu() is performed when SME is
*supported* by the platform, and the second WBINVD is done in
relocate_kernel() when SME is *activated* by the kernel. Make things
simple by changing to do the second WBINVD when the platform supports
SME. This allows the kernel to simply turn on this percpu boolean when
bringing up a CPU by checking whether the platform supports SME.
No other functional change intended.
[*] The aforementioned race:
During kexec native_stop_other_cpus() is called to stop all remote CPUs
before jumping to the new kernel. native_stop_other_cpus() firstly
sends normal REBOOT vector IPIs to stop remote CPUs and waits them to
stop. If that times out, it sends NMI to stop the CPUs that are still
alive. The race happens when native_stop_other_cpus() has to send NMIs
and could potentially result in the system hang (for more information
please see [1]).
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/kvm/b963fcd60abe26c7ec5dc20b42f1a2ebbcc72397.1750934177.git.kai.huang@intel.com/ [1]
Link: https://lore.kernel.org/all/20250901160930.1785244-3-pbonzini%40redhat.com
Diffstat (limited to 'rust/kernel/irq/request.rs')
0 files changed, 0 insertions, 0 deletions