Randomized slab caches for kmalloc()

When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc. There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill. To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed. Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups. The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below: sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec) control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759 experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343 The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies: control 4 copies 8 copies 16 copies total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou <dennis@kernel.org> # percpu Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
author: GONG, Ruiqi <gongruiqi@huaweicloud.com> 2023-07-14 14:44:22 +0800
committer: Vlastimil Babka <vbabka@suse.cz> 2023-07-18 10:07:47 +0200
commit: 3c6152940584290668b35fa0800026f6a1ae05fe (patch)
tree: 7b9b7ff782dfe1b1e353e466ab19c3c9d4277040 /mm/Kconfig
parent: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5 (diff)
1 files changed, 17 insertions, 0 deletions
diff --git a/mm/Kconfig b/mm/Kconfig
index 09130434e30d..4bf7dc5ae5ef 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -337,6 +337,23 @@ config SLUB_CPU_PARTIAL
 	  which requires the taking of locks that may cause latency spikes.
 	  Typically one would choose no for a realtime system.
 
+config RANDOM_KMALLOC_CACHES
+	default n
+	depends on SLUB && !SLUB_TINY
+	bool "Randomize slab caches for normal kmalloc"
+	help
+	  A hardening feature that creates multiple copies of slab caches for
+	  normal kmalloc allocation and makes kmalloc randomly pick one based
+	  on code address, which makes the attackers more difficult to spray
+	  vulnerable memory objects on the heap for the purpose of exploiting
+	  memory vulnerabilities.
+
+	  Currently the number of copies is set to 16, a reasonably large value
+	  that effectively diverges the memory objects allocated for different
+	  subsystems or modules into different caches, at the expense of a
+	  limited degree of memory and CPU overhead that relates to hardware and
+	  system workload.
+
 endmenu # SLAB allocator options
 
 config SHUFFLE_PAGE_ALLOCATOR
author	GONG, Ruiqi <gongruiqi@huaweicloud.com>	2023-07-14 14:44:22 +0800
committer	Vlastimil Babka <vbabka@suse.cz>	2023-07-18 10:07:47 +0200
commit	3c6152940584290668b35fa0800026f6a1ae05fe (patch)
tree	7b9b7ff782dfe1b1e353e466ab19c3c9d4277040 /mm/Kconfig
parent	06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5 (diff)