diff options
| author | Saravana Kannan <saravanak@google.com> | 2020-06-22 18:18:02 -0700 | 
|---|---|---|
| committer | Catalin Marinas <catalin.marinas@arm.com> | 2020-07-02 12:17:13 +0100 | 
| commit | d4e0340919fb9190a57e879fb3125c4acce0d9b2 (patch) | |
| tree | ae5cc48336b9f07f6f4a9ba1b6aef868c5c5ff2a /lib/mpi/mpi-internal.h | |
| parent | 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68 (diff) | |
arm64/module: Optimize module load time by optimizing PLT counting
When loading a module, module_frob_arch_sections() tries to figure out
the number of PLTs that'll be needed to handle all the RELAs. While
doing this, it tries to dedupe PLT allocations for multiple
R_AARCH64_CALL26 relocations to the same symbol. It does the same for
R_AARCH64_JUMP26 relocations.
To make checks for duplicates easier/faster, it sorts the relocation
list by type, symbol and addend. That way, to check for a duplicate
relocation, it just needs to compare with the previous entry.
However, sorting the entire relocation array is unnecessary and
expensive (O(n log n)) because there are a lot of other relocation types
that don't need deduping or can't be deduped.
So this commit partitions the array into entries that need deduping and
those that don't. And then sorts just the part that needs deduping. And
when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely
because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26
if it's disabled.
This gives significant reduction in module load time for modules with
large number of relocations with no measurable impact on modules with a
small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE
enabled, these were the results for a few downstream modules:
Module		Size (MB)
wlan		14
video codec	3.8
drm		1.8
IPA		2.5
audio		1.2
gpu		1.8
Without this patch:
Module		Number of entries sorted	Module load time (ms)
wlan		243739				283
video codec	74029				138
drm		53837				67
IPA		42800				90
audio		21326				27
gpu		20967				32
Total time to load all these module: 637 ms
With this patch:
Module		Number of entries sorted	Module load time (ms)
wlan		22454				61
video codec	10150				47
drm		13014				40
IPA		8097				63
audio		4606				16
gpu		6527				20
Total time to load all these modules: 247
Time saved during boot for just these 6 modules: 390 ms
Signed-off-by: Saravana Kannan <saravanak@google.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Link: https://lore.kernel.org/r/20200623011803.91232-1-saravanak@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Diffstat (limited to 'lib/mpi/mpi-internal.h')
0 files changed, 0 insertions, 0 deletions
